Class 18: Pertussis Mini-Project

Yane (PID A17670350)

Background

Pertussis (whooping cough) is a common lung infection cause by the bacteria B. Pertussis. It can infect everyone but is most deadly for infants (under 1 year of age)

#CDC tracking data The CDC tracks the number of Pertussis cases:

Q. I want a plot of year vs cases

library(ggplot2)

ggplot(cdc) +
  aes(year, cases) +
  geom_point() +
  geom_line() +
  geom_vline(xintercept = 1946, col="blue", lty=2) +
  geom_vline(xintercept = 1996, col="red", lty=2) +
  geom_vline(xintercept = 2020, col="gray", lty=2)

After 1946, the cases of Pertussis decrease as a result of the wP vaccine. In 1996, after the switch to the aP vaccine there was a slight increase in cases. The aP vaccine was created to have less side effects, and this, along with the increase of cases after 1996, suggests that the aP vaccine could be slightly weaker than the wP vaccine.

Q. Add annotation lines for the major milestones of wP vaccination roll-out (1946) and the switch to the aP vaccine (1996).

cdc <- data.frame( year = c(1922L,1923L,1924L,1925L,
                            1926L,1927L,1928L,1929L,1930L,1931L,
                            1932L,1933L,1934L,1935L,1936L,
                            1937L,1938L,1939L,1940L,1941L,1942L,
                            1943L,1944L,1945L,1946L,1947L,
                            1948L,1949L,1950L,1951L,1952L,
                            1953L,1954L,1955L,1956L,1957L,1958L,
                            1959L,1960L,1961L,1962L,1963L,
                            1964L,1965L,1966L,1967L,1968L,1969L,
                            1970L,1971L,1972L,1973L,1974L,
                            1975L,1976L,1977L,1978L,1979L,1980L,
                            1981L,1982L,1983L,1984L,1985L,
                            1986L,1987L,1988L,1989L,1990L,
                            1991L,1992L,1993L,1994L,1995L,1996L,
                            1997L,1998L,1999L,2000L,2001L,
                            2002L,2003L,2004L,2005L,2006L,2007L,
                            2008L,2009L,2010L,2011L,2012L,
                            2013L,2014L,2015L,2016L,2017L,2018L,
                            2019L,2020L,2021L,2022L,2023L, 2024L,
                            2025L),
                   cases = c(107473,164191,165418,152003,
                            202210,181411,161799,197371,
                            166914,172559,215343,179135,265269,
                            180518,147237,214652,227319,103188,
                            183866,222202,191383,191890,109873,
                            133792,109860,156517,74715,69479,
                            120718,68687,45030,37129,60886,
                            62786,31732,28295,32148,40005,
                            14809,11468,17749,17135,13005,6799,
                            7717,9718,4810,3285,4249,3036,
                            3287,1759,2402,1738,1010,2177,2063,
                            1623,1730,1248,1895,2463,2276,
                            3589,4195,2823,3450,4157,4570,
                            2719,4083,6586,4617,5137,7796,6564,
                            7405,7298,7867,7580,9771,11647,
                            25827,25616,15632,10454,13278,
                            16858,27550,18719,48277,28639,32971,
                            20762,17972,18975,15609,18617,
                            6124,2116,3044,7063, 22538,
                            21906)
       )

Exploring CMI-PB data

The CMI-PB project’s https://www.cmi-pb.org/ mission is to provide the scientific community with a comprehensive, high-quality and freely accessible resources of Pertussis booster vaccination.

Basically, make available a large dataset of the immune response to Pertussis. They use a “booster” vaccination as a proxy for Pertussis infection.

They make their data available as JSON format API. We can read this into R with the read_json() function from the jsonlite package:

library(jsonlite)
subject <- read_json("https://www.cmi-pb.org/api/v5_1/subject",
                     simplifyVector = TRUE)
head(subject)

  subject_id infancy_vac biological_sex              ethnicity  race
        1          wP         Female Not Hispanic or Latino White
        2          wP         Female Not Hispanic or Latino White
        3          wP         Female                Unknown White
        4          wP           Male Not Hispanic or Latino Asian
        5          wP           Male Not Hispanic or Latino Asian
        6          wP         Female Not Hispanic or Latino White
  year_of_birth date_of_boost      dataset
  1986-01-01    2016-09-12 2020_dataset
  1968-01-01    2019-01-28 2020_dataset
  1983-01-01    2016-10-10 2020_dataset
  1988-01-01    2016-08-29 2020_dataset
  1991-01-01    2016-08-29 2020_dataset
  1988-01-01    2016-10-10 2020_dataset

Q. How many aP and wP individuals are there in this subject table?

table(subject$infancy_vac)

aP wP 
87 85 

Q. How many male/female are there?

table(subject$biological_sex)

Female   Male 
   112     60 

Q. What is the break down of biological_sex and race?

Q. Is this representative of the US population?

table(subject$race, subject$biological_sex)

                                            Female Male
  American Indian/Alaska Native                  0    1
  Asian                                         32   12
  Black or African American                      2    3
  More Than One Race                            15    4
  Native Hawaiian or Other Pacific Islander      1    1
  Unknown or Not Reported                       14    7
  White                                         48   32

We can read more tables from the CMI-PB database

specimen <- read_json("https://www.cmi-pb.org/api/v5_1/specimen",
                      simplifyVector = TRUE)
ab_titer <- read_json("https://www.cmi-pb.org/api/v5_1/plasma_ab_titer",
                      simplifyVector = TRUE)

head(specimen)

  specimen_id subject_id actual_day_relative_to_boost
         1          1                           -3
         2          1                            1
         3          1                            3
         4          1                            7
         5          1                           11
         6          1                           32
  planned_day_relative_to_boost specimen_type visit
                           0         Blood     1
                           1         Blood     2
                           3         Blood     3
                           7         Blood     4
                          14         Blood     5
                          30         Blood     6

head(ab_titer)

  specimen_id isotype is_antigen_specific antigen        MFI MFI_normalised
         1     IgE               FALSE   Total 1110.21154       2.493425
         1     IgE               FALSE   Total 2708.91616       2.493425
         1     IgG                TRUE      PT   68.56614       3.736992
         1     IgG                TRUE     PRN  332.12718       2.602350
         1     IgG                TRUE     FHA 1887.12263      34.050956
         1     IgE                TRUE     ACT    0.10000       1.000000
   unit lower_limit_of_detection
UG/ML                 2.096133
IU/ML                29.170000
IU/ML                 0.530000
IU/ML                 6.205949
IU/ML                 4.679535
IU/ML                 2.816431

To make sense of all this data we need to “join” (a.k.a. “merge” or “link”) all these tables together. Only then will you know that a given Ab measurement (from the ab_titer table) was collected on a certain date (from the specimen table) from a certain wP or aP subject (from the subject table.

We can use dplyr and the *_join() family of functions to do this.

library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

meta <- inner_join(subject, specimen)

Joining with `by = join_by(subject_id)`

head(meta)

  subject_id infancy_vac biological_sex              ethnicity  race
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
  year_of_birth date_of_boost      dataset specimen_id
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           2
  1986-01-01    2016-09-12 2020_dataset           3
  1986-01-01    2016-09-12 2020_dataset           4
  1986-01-01    2016-09-12 2020_dataset           5
  1986-01-01    2016-09-12 2020_dataset           6
  actual_day_relative_to_boost planned_day_relative_to_boost specimen_type
                         -3                             0         Blood
                          1                             1         Blood
                          3                             3         Blood
                          7                             7         Blood
                         11                            14         Blood
                         32                            30         Blood
  visit
   1
   2
   3
   4
   5
   6

Let’s do one more inner_join() to join the ab_titer with all our meta data

abdata <- inner_join(ab_titer, meta)

Joining with `by = join_by(specimen_id)`

head(abdata)

  specimen_id isotype is_antigen_specific antigen        MFI MFI_normalised
         1     IgE               FALSE   Total 1110.21154       2.493425
         1     IgE               FALSE   Total 2708.91616       2.493425
         1     IgG                TRUE      PT   68.56614       3.736992
         1     IgG                TRUE     PRN  332.12718       2.602350
         1     IgG                TRUE     FHA 1887.12263      34.050956
         1     IgE                TRUE     ACT    0.10000       1.000000
   unit lower_limit_of_detection subject_id infancy_vac biological_sex
UG/ML                 2.096133          1          wP         Female
IU/ML                29.170000          1          wP         Female
IU/ML                 0.530000          1          wP         Female
IU/ML                 6.205949          1          wP         Female
IU/ML                 4.679535          1          wP         Female
IU/ML                 2.816431          1          wP         Female
               ethnicity  race year_of_birth date_of_boost      dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
  actual_day_relative_to_boost planned_day_relative_to_boost specimen_type
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
  visit
   1
   1
   1
   1
   1
   1

Q. How many different Ab “isotype” values are in this dataset?

table(abdata$isotype)

  IgE   IgG  IgG1  IgG2  IgG3  IgG4 
 6698  7265 11993 12000 12000 12000 

Q. How many different “antigen” values are measured?

table(abdata$antigen)

    ACT   BETV1      DT   FELD1     FHA  FIM2/3   LOLP1     LOS Measles     OVA 
   1970    1970    6318    1970    6712    6318    1970    1970    1970    6318 
    PD1     PRN      PT     PTM   Total      TT 
   1970    6712    6712    1970     788    6318 

Let’s focus on IgG isotype

igg <- abdata |>
  filter (isotype=="IgG")

head(igg)

  specimen_id isotype is_antigen_specific antigen        MFI MFI_normalised
         1     IgG                TRUE      PT   68.56614       3.736992
         1     IgG                TRUE     PRN  332.12718       2.602350
         1     IgG                TRUE     FHA 1887.12263      34.050956
        19     IgG                TRUE      PT   20.11607       1.096366
        19     IgG                TRUE     PRN  976.67419       7.652635
        19     IgG                TRUE     FHA   60.76626       1.096457
   unit lower_limit_of_detection subject_id infancy_vac biological_sex
IU/ML                 0.530000          1          wP         Female
IU/ML                 6.205949          1          wP         Female
IU/ML                 4.679535          1          wP         Female
IU/ML                 0.530000          3          wP         Female
IU/ML                 6.205949          3          wP         Female
IU/ML                 4.679535          3          wP         Female
               ethnicity  race year_of_birth date_of_boost      dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
Not Hispanic or Latino White    1986-01-01    2016-09-12 2020_dataset
              Unknown White    1983-01-01    2016-10-10 2020_dataset
              Unknown White    1983-01-01    2016-10-10 2020_dataset
              Unknown White    1983-01-01    2016-10-10 2020_dataset
  actual_day_relative_to_boost planned_day_relative_to_boost specimen_type
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
  visit
   1
   1
   1
   1
   1
   1

Make a plot of MFI_normalised values for all antigen values.

ggplot(igg) +
  aes(MFI_normalised, antigen) +
  geom_boxplot()

The antigens “PT”, “FIM2/3” and “FHA” appear to have the widest range of values.

Q. Is there a difference for these responses between aP and wP individuals?

ggplot(igg) +
  aes(MFI_normalised, antigen, col=infancy_vac) +
  geom_boxplot()

ggplot(igg) +
  aes(MFI_normalised, antigen, col=infancy_vac) +
  geom_boxplot() +
  facet_wrap(~infancy_vac)

Q. Is there a difference with time (i.e. before booster shot vs after booster shot)

ggplot(igg) +
  aes(MFI_normalised, antigen, col=infancy_vac) +
  geom_boxplot() +
  facet_wrap(~visit)

## filter to 2021 dataset, IgG and PT only
abdata.PT.21 <- abdata |>
  filter(dataset == "2021_dataset",
         isotype == "IgG",
         antigen =="PT")

  ggplot(abdata.PT.21) +
    aes(x=planned_day_relative_to_boost,
        y=MFI_normalised,
        col=infancy_vac,
        group=subject_id) +
    geom_point() +
    geom_line() +
    geom_smooth(aes(group=infancy_vac), se = FALSE, method = "loess",
                span = 0.4, linewidth = 1.5) +
    geom_vline(xintercept=0, linetype="dashed") +
    geom_vline(xintercept=14, linetype="dashed") +
  labs(title="2021 dataset IgG PT",
       subtitle = "Dashed lines indicate day 0 (pre-boost) and 14 (apparent peak levels)")

`geom_smooth()` using formula = 'y ~ x'

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at -0.6

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: neighborhood radius 3.6

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 0

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 11364

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at -0.6

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: neighborhood radius 3.6

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 0

Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 11364