Common approaches to prediction include using a
significance-based criterion for evaluating variables to use in models
and evaluating variables and models simultaneously for prediction using
cross-validation or independent test data.
Researchers at Princeton, Columbia and Harvard have created a new
method to analyze big data that better predicts outcomes in health care,
politics and other fields.
‘A new measure called the influence score, or I-score, to better measure a variable's ability to predict has been discovered by researchers.’
The study appears this week in the journal Proceedings of the National Academy of Sciences
In previous studies, the researchers showed that significant
variables might not be predictive and that good predictors might not
appear statistically significant. This posed an important question: how
can we find highly predictive variables if not through a guideline of
In an effort to reduce the error rate with currently used methods, the
researchers proposed a new measure called the influence score, or
I-score, to better measure a variable's ability to predict. They found
that the I-score is effective in differentiating between noisy and
predictive variables in big data and can significantly improve the
For example, the I-score improved the prediction rate
in breast cancer data from 70% to 92%. The I-score can be
applied in a variety of fields, including terrorism, civil war,
elections and financial markets.
"The practical implications are what drove the project, so they're quite broad," says lead author Adeline Lo,
a postdoctoral researcher in Princeton's Department of Politics.
"Essentially anytime you might be interested in predicting and
identifying highly predictive variables, you might have something to
gain by conducting variable selection through a statistic like the
I-score, which is related to variable predictivity. That the I-score
fares especially well in high dimensional data and with many complex
interactions between variables is an extra boon for the researcher or
policy expert interested in predicting something with large dimensional