4). We compared the performance of gene sets with their constituent genes in profiles from high versus low HAI responders to influenza vaccination. We found that the top-scoring gene sets in TIV responders were more strongly correlated with the high antibody response phenotype than any constituent GSK-3 activity gene in either gene set (Supporting Information Fig. 5A). Moreover,
although both complement and antibody genes were present in gene sets enriching in responders, the antibody genes were among those most upregulated (Supporting Information Fig. 5A and B). Thus a gene set based analytic approach identifies signatures of proliferation and immunoglobulin genes that are strongly correlated ABT199 with high antibody response. We next sought
to determine if enrichment of the immunoglobulin and/or proliferation gene sets could be used as a predictor of vaccine response, using high or low HAI titers as an outcome. To do this, we selected the most differentially enriched gene set from each of the two clusters, and fitted them into logistic regression models. Both models closely fit the data and yielded an AUC of ∼0.9 (Fig. 3A and B), suggesting that each independent gene set could provide a strongly predictive model of vaccine response. To integrate both biological processes into a single model, we applied Bayes’ rule, and found that the integrated model achieved an AUC of 0.94 (Fig. 3C). To compare our integrated gene set based model with the single-gene level model previously described for this dataset [16], we tested our model in a validation dataset comprised of PBMC samples Oxymatrine from an independent trial of TIV vaccination. We found that our predictive model yielded an accuracy of 88% in the test set, comparable
to the performance of the single-gene level predictor [16]. This indicates that gene set based analysis of expression profiles provide accurate predictors of response to vaccination. An advantage of a gene set enrichment analysis is that it can capture subtle changes in gene expression distributed across transcriptional networks. We therefore compared the degree of differential expression of genes in the predictive gene sets (proliferation and immunoglobulin gene sets) with that of the genes selected in the single-gene level predictor originally applied to this dataset (Fig. 4). Predictive genes selected in the study by Nakaya et al. [16] were all highly differentially expressed in day seven PBMC expression profiles from responders compared to nonresponders, as expected (mean fold change 3.36). In contrast, the gene sets identified in our analysis included many genes that were much less differentially expressed (mean fold change of proliferation cluster 2.13; mean fold change of immunoglobulin cluster 2.53) (Fig. 4).