NIH Grant Makes Pitt a National Center of Big-Data Science

Issue Date: 
October 20, 2014

By Anita Srikameswaran


The National Institutes of Health has awarded the University of Pittsburgh an $11 million, four-year grant to lead a Big Data to Knowledge Center of Excellence, aimed at helping scientists learn to better analyze vast amounts of data to discover more biomedical knowledge.

“Individual biomedical researchers now have the technology to generate an enormous quantity and diversity of data. But adequately analyzing these data to discover new biomedical knowledge remains a major challenge,” said Gregory Cooper, professor and vice chair of the Department of Biomedical Informatics, Pitt School of Medicine, and director of the new Center for Causal Modeling and Discovery.

So-called “big data” refer to huge sets of complex data that may be analyzed to reveal patterns and associations—but that are so voluminous they are difficult to process using traditional database and software technology.

 “Our goal is to make it much easier for researchers to analyze big data to discover causal relationships in biomedicine,” Cooper said.

 The new Pitt Center for Causal Modeling and Discovery—comprising researchers at Pitt, Carnegie Mellon University, the Pitt-CMU Pittsburgh Supercomputing Center, Yale University and four other universities—will be part of an elite national team addressing the challenges of big data in biomedicine. 

“As part of a national consortium, this Center of Excellence will put Pitt on the map as a home of big-data science,” said Arthur S. Levine, senior vice chancellor for the health sciences and John and Gertrude Petersen Dean of Pitt’s School of Medicine. “Our strengths in this field have stimulated collaborations with leading institutions, including Harvard and Stanford, and now we will be able to further develop such partnerships in many more meaningful ways.”

According to center codirector Jeremy Berg, researchers now have access to a tremendous amount of information from electronic health records, digital images, and molecular analyses of genes, proteins, and metabolites.

“The good news is that we have so much data. But the bad news is that we have so much data,” said Berg, Pitt associate senior vice chancellor for science strategy and planning in the health sciences and director of Pitt’s Institute for Personalized Medicine. “Our challenge is to find strategies that enable us to sort through all this collected information efficiently and effectively to find meaningful relationships that lead us to new insights in health and disease.”

The Pitt Center for Causal Modeling and Discovery will develop and disseminate tools that can find causal links in very large and complex biomedical data. Faculty in CMU’s Department of Philosophy, led by Clark Glymour, Alumni University Professor and founding chair, are key partners in this data science effort; and Nicholas Nystrom, director of strategic applications at the Pittsburgh Supercomputing Center, will work to optimize these tools for a high-performance computing environment.

The center includes a team that will develop and implement causal modeling and discovery algorithms, or processes, to support the data analyses of three separate investigative groups—each focusing on a distinct biomedical problem whose answer lies in a sea of data: cell signals that drive the development of cancer, the molecular basis of lung disease susceptibility and severity, and the functional connections within the human brain (the “connectome”).

Each project will act as a test bed for the development, rigorous testing, and refinement of analytic tools. When successful, these algorithms and software likely can be applied to other biomedical research questions. The center will provide free, open-source software that scientists all over the world can use with their own datasets to uncover causal biomedical relationships. Their feedback will further enhance the algorithms and software.

“The center also will be a training ground for the next generation of data scientists who will advance and accelerate the development and broader use of big-data science models and methods,” said center codirector Ivet Bahar, Distinguished Professor and John K. Vries Chair, Department of Computational and Systems Biology, Pitt School of Medicine. She added that the center will create educational materials, workshops, and online tutorials to facilitate the use of causal modeling and discovery algorithms by the broader scientific community. 

Other collaborators in the new center include the California Institute of Technology, Rutgers University, the University of Crete, and the University of North Carolina.

“Data creation in today’s research is exponentially more rapid than anything we anticipated even a decade ago,” said NIH Director Francis S. Collins. “Mammoth data sets are emerging at an accelerated pace in today’s biomedical research and these funds will help us overcome the obstacles to maximizing their utility. The potential of these data, when used effectively, is quite astounding.”