Despite a decade of scientific research, methodological differences between studies have hampered the ability to critically evaluate the clinical utility of polygenic risk prediction models of common diseases. Understanding the clinical settings in which such models will be utilised is essential to their success as a tool for identifying high risk individuals.
Polygenic risk scores are based on the selection of markers or single nucleotide polymorphisms (SNPs), which, individually, do not achieve significance in large-scale genome-wide association studies (GWAS), but in combination are associated with a disease trait.(1) A polygenic risk score, defined by the combination of SNPs and the sum of their weighted and unweighted effects, can be used to predict individual trait values and construct risk prediction models for common diseases.(1, 2) The ability to predict disease enables high risk individuals to make clinical decisions which could reduce or prevent disease presentation. However determining the clinical validity of these risk scores is not straight forward. What results support the conclusion that they predict new cases? What constitutes a high predictive value at the population level?
After a decade of polygenic risk research, questions like these are still unanswered and, as a result, researchers often base their conclusions on the statistical significance of the performance measures or on their own judgements. Evaluations of prediction models often do not clearly state the clinical implications of their model assessments. When researchers conclude that a model “predicts new cases”, they could imply that the model predicts cases better than random, irrespective of how much better, or alternatively they could propose that the model is appropriate for use in health care settings to identify people at risk. This ambiguity is not helpful and creates unrealistic expectations about the clinical utility of risk models, and the value of polygenic risk scores.
A polygenic risk model may predict disease, but its predictive performance for identifying at-risk individuals may vary between studies, populations, and settings. Researchers of risk prediction need to clearly state both the model’s statistical performance and its health or health care relevance. The latter should not just be a general statement, but an informed evaluation: researchers should clearly present the intended use of their models. Specifying the intended use determines in which (target) population the polygenic risk score needs to be investigated, what alternative clinical risk models are currently being used or are available (against which the polygenic risk model should be compared), and the level of predictive performance needed to make the model useful for predicting new cases in this population.
To move research forward and build the evidence base for precision medicine, we need a stronger focus on the right methodology, the selection of a relevant population and a clinically-relevant (and realistic) model to compare. Methods matter.
- Dudbridge F. Power and Predictive Accuracy of Polygenic Risk Scores. PLoS Genet. 2013;9(3):e1003348.
- Chatterjee N, Shi J, Garcia-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet. 2016;17(7):392-406.