Pitt Unlocks Trove of Public Health Data to Fight Deadly Contagious Diseases

Issue Date: 
December 9, 2013

In an unprecedented windfall for public access to health data, researchers at the University of Pittsburgh Graduate School of Public Health have collected and digitized more than 125 years of weekly surveillance records for reportable diseases in the United States.

Their work has created a searchable database—covering the years from 1888 to 2013—that reveals patterns of how infectious diseases spread and how interventions such as vaccines slowed or halted the spread. The database, described in the Nov. 28 issue of the New England Journal of Medicine, is free and available to the public. Supported by the Bill & Melinda Gates Foundation and the National Institutes of Health, the project’s goal is to aid scientists and public health officials in the eradication of deadly and devastating diseases.

“Using this database, we estimate that more than 100 million cases of serious childhood contagious diseases have been prevented, thanks to the introduction of vaccines,” said lead author Willem G. van Panhuis, assistant professor of epidemiology at Pitt’s Graduate School of Public Health. “But we also are able to see a resurgence of some of these diseases in the past several decades as people forget how devastating they can be and start refusing vaccines.”

Despite the availability of a pertussis vaccine since the 1920s, for example, the United States’ largest pertussis epidemic since 1959 occurred last year. Measles, mumps, and rubella outbreaks also have recurred since the early 1980s. 

“Analyzing historical epidemiological data can reveal patterns that help us understand how infectious diseases spread and what interventions have been most effective,” said Irene Eckstrand, the scientific director of the Models of Infectious Disease Agent Study consortium at the National Institutes of Health, which partially funded the research. “This new work shows the value of using computational methods to study historical data—in this case, to show the impact of vaccination in reducing the burden of infectious diseases over the past century.”

The digitized dataset is dubbed Project TychoTM, for 16th century Danish nobleman Tycho Brahe, whose meticulous astronomical observations enabled Johannes Kepler to derive the laws of planetary motion.

“Tycho Brahe’s data were essential to Kepler’s discovery of the laws of planetary motion,” said study senior author Donald S. Burke, dean of Pitt’s Graduate School of Public Health and UPMC-Jonas Salk Chair in Global Health. “Similarly, we hope that our Project Tycho disease database will help spur new life-saving research on patterns of epidemic infectious disease and the effects of vaccines. Open access to disease surveillance records should be standard practice, and we are working to establish this as the norm worldwide.”

The researchers selected eight vaccine-preventable contagious diseases for a more detailed analysis: smallpox, polio, measles, rubella, mumps, hepatitis A, diphtheria, and pertussis. By overlaying the reported outbreaks with the year of vaccine licensure, the researchers are able to give a clear, visual representation of the effect that vaccines have in controlling communicable diseases.

“Infectious disease research is critically dependent on reliable historical data to understand underlying epidemic dynamics. However, my colleagues and I repeatedly find ourselves digging out historical datasets from various sources in different states of preservation,” van Panhuis said. “By digitizing and giving open access to the entire collection of U.S. notifiable disease data, we’ve made a bold move toward solving this problem.”

The U.S. Centers for Disease Control and Prevention operates a National Notifiable Diseases Surveillance System that helps public health officials monitor the occurrence and spread of diseases. Each state has laws mandating that health care providers report cases of certain diseases to state and/or local health departments for monitoring. Each week and year, the U.S. Centers for Disease Control and Prevention summarizes and publishes this data from 57 state, territorial, and local reporting jurisdictions in its Morbidity and Mortality Weekly Report.

Pitt’s Graduate School of Public Health researchers obtained all weekly notifiable disease surveillance tables published between 1888 and 2013—approximately 6,500 tables—in various historical reports, including the Morbidity and Mortality Weekly Report. Previously, those tables were available only in paper format or as PDF scans in online repositories that could not be read by computers. The researchers had to hand-enter the data—including death counts, reporting locations, time periods, and diseases—so the results could be digitized. A total of 56 diseases were reported for at least some period of time during the 125-year time span, with no single disease reported continuously.   

“This work by the Tycho Team is remarkable and represents the next step in making government data accessible and useful,” said Bryan Sivak, U.S. Department of Health and Human Services chief technology officer and entrepreneur in residence.

The data can be explored and retrieved by accessing the Project Tycho Web site, www.tycho.pitt.edu. The open access release of these data has ignited a collaboration with the United States Open Government Initiative and, in the near future, the Project Tycho database will be available on the HealthData.gov Web pages. 

“Historical records are a precious yet undervalued resource. As Danish philosopher Soren Kierkegaard said, we live forward but understand backward,” Burke explained. “By ‘rescuing’ these historical disease data and combining them into a single, open-access, computable system, we now can better understand the devastating impact of epidemic diseases and the remarkable value of vaccines in preventing illness and death.”