Large-scale atlases of organs in a healthy state are soon going to be available, in particular, within the Human Cell Atlas. This is a significant step in better understanding cells, tissues and organs in healthy state and provides a reference when diagnosing, monitoring, and treating disease. However, due to the sheer number of possible combinations of treatment and disease conditions, expanding these data to characterize disease and disease treatment in traditional life science laboratories is labor intensive and costly and, hence, not scalable.
‘scGen is a generative deep learning model that leverages ideas from image, sequence and language processing, and, for the first time, applies these ideas to model the behavior of a cell in silico. ’
Accurately modeling cellular response to perturbations (e.g. disease, compounds, genetic interventions) is a central goal of computational biology. Although models based on statistical and mechanistic approaches exist, no machine-learning based solution viable for unobserved, high-dimensional phenomena has yet been available.
This means that scGen, if trained on data that capture the effect of perturbations for a given system, is able to make reliable predictions for a different system. "For the first time, we have the opportunity to use data generated in one model system such as mouse and use the data to predict disease or therapy response in human patients," said Mohammad Lotfollahi, PhD student (Helmholtz Zentrum München and Technische Universität München).
The next step for the team concerns the improving scGen to a fully data-driven formulation, increasing its predictive power to enable the study of combinations of perturbations. "We can now start optimizing scGen to answer more and more complex questions about diseases," said Alex Wolf, Team Leader, and Fabian Theis, Director of the Institute of Computational Biology and Chair of Mathematical Modeling of Biological Systems at Technische Universität München.