Gregory Cooper: Herding Big Data

Issue Date: 
January 26, 2015

Data are everywhere in medicine. Little bits of information scurry through vast computer networks. Trying to collect it and make sense of it all—to use that data to make important, new discoveries—is difficult. Kind of like herding cats. 

Gregory Cooper is one of Pitt’s resident cat herders.

Gregory Cooper

Cooper, professor and vice chair of the Department of Biomedical Informatics in the University of Pittsburgh School of Medicine, came to that role when he was named director of Pitt’s new Center for Causal Modeling and Discovery. The center was created  in October when the University received an $11 million, four-year National Institutes of Health grant to lead a Big Data to Knowledge Center of Excellence. Carnegie Mellon University, the Pittsburgh Supercomputing Center, and Yale University are key partners in the new center.

“Much of science, including biomedical science, consists of discovering causal relationships in nature,” Cooper says. “In the Center for Causal Modeling and Discovery, we are developing new computational methods for discovering causal relationships from biomedical data.”

To uncover those relationships,  he says, researchers and scientists  must get better at teaching computers how to learn from data, how to separate the wheat from the chaff, the signal from the noise.

An example: “Is this new drug causing a given side effect in patients, or is there something about the kinds of patients who get the drug who might have gotten the side effect anyway?” Cooper says.

It would take eons for a person, or even a team of people, to sort through all the patients, their genomes, their clinical data, and scores of other factors to sort  it out. “You can’t do it manually,” Cooper says. That is why it is important to develop computer systems that can. 

The center’s team will develop and apply causal discovery algorithms in several biomedical areas, including algorithms that help discover the cell signaling pathways that drive cancer development, the molecular basis of lung disease susceptibility and progression, and the functional connections within the human brain.Those are big goals that can only be achieved by learning better how to analyze Big Data.

The algorithms and software the center develops can be applied to other biomedical research questions as well. To that end, the center will provide free, open-source software that biomedical scientists all over the world can use.

“We’re toolbuilders,” Cooper says. “We are making tools that biomedical scientists can use to analyze their data for causal relationships. With the vast amount of biomedical data now available for analysis, the opportunities are excellent for these tools to play a significant role in advancing biomedical science.”