Health care professionals have long theorized that medical conditions, like asthma or cancer, are a result of not only genetics but also environmental and lifestyle factors. These theories, however, have been difficult to prove conclusively without the ability to analyze large-scale population health data.
Now, thanks to a seed grant supporting Penn State strategic priorities, University researchers are hoping to realize the promise and potential of big data to advance biomedical research by creating a Digital Collaboratory for Precision Health Research.
In the project, collaborators are aiming to establish a secure infrastructure for studying health conditions through shared data. Ultimately, a piece of software could be used to analyze disparate data sets, and researchers could use computation and data to better understand why different health problems occur in different demographics.
“We are a product of our genes, behavior and environment,” said Vasant Honavar, professor and Edward Frymoyer Chair in the College of Information Sciences and Technology (IST) and the project lead. “In order to improve health, we need to look beyond treating individuals who are sick and need to understand the underlying genetic, environmental and behavioral factors so we can develop effective interventions.”
Rather than one-size-fits-all medical treatments, the group hopes to further advance personalized health by contributing new methods and tools that enhance individualized care.
“In this project, we want to focus on the environmental aspect to make progress towards realizing the grand vision of personalizing health.”
A massive undertaking, the team will bring expertise from across the University, including the Center for Big Data Analytics and Discovery Informatics, Penn State College of Medicine, the College of Information Sciences and Technology, the Institute for CyberScience, the Clinical and Translational Sciences Institute, the College of Engineering, the Eberly College of Science, and the Social Science Research Institute, among many others. The team will also leverage existing collaborations with colleagues around the nation.
Safeguarding patient information
While health care providers have an immense repository of patient information through their clinical and personally identifiable information, it is rarely utilized by the research community due to logistical and privacy concerns.
“The key challenge in working with that data is that electronic health records contain sensitive information,” Honavar said. “There are many barriers to sharing health data.”
Penn State’s solution aims to allow investigators to analyze health data to answer specific research questions while adhering to all applicable data access and use policies that safeguard sensitive information.
The infrastructure would allow researchers with approved projects to integrate and analyze the relevant data sets on the platform using reproducible and shareable analytic workflows.
“The personal health data never leaves the secure platform,” Honavar explained.
Since safeguarding patient information is of the utmost importance, Honavar hopes that other institutions will be able to implement their own systems and that cross-institutional analyses can be conducted.
“We plan to build this infrastructure and share it with other institutions,” he said.
Having a network of similar platforms could accelerate discovery in the biomedical and health sciences. For example, Honavar theorized how a group of researchers studying risk factors that contribute to breast cancer could examine data from multiple sources for a more comprehensive understanding of potential causes.
“They could receive similar data from different sites,” he explained. “While Institution A doesn’t have to access Institution B’s data, the researchers can conduct similar analyses across multiple sites.”
Harnessing the power of big data
Traditionally, medical research is conducted through clinical trials with results extrapolated to the population at large. But this new collaborative effort has the potential to upend that method.
Access to large data sets, combined with rapid advances in computing and advanced analytics allow researchers in diverse disciplines to gain new insights by analyzing large data sets. The group aims to leverage recent advances in data sciences to derive actionable findings that could dramatically improve health care.
“Such understanding is essential for developing and adapting interventions that are effective for individuals with different characteristics and from different contexts,” noted Susan McHale, a collaborator on the project and distinguished professor of human development and family studies and demography, director of the Social Science Research Institute, and co-director of Penn State’s Clinical and Translational Science Institute.
For example, with the growing national rate of obesity, researchers have often theorized it could be linked to community factors, like the location of grocery stores or the availability of safe, walkable streets. But it’s difficult to prove these connections without integrating clinical measures of obesity with these other types of data for researchers to examine. This infrastructure aims to enable such analyses.
“It’s important to understand the relationships between disparate risk factors,” Honavar said. “You can’t do that if you don’t tie them to the health information that sits in electronic health records.”
The team also hopes to leverage the infrastructure for training the next generation of graduate students, such as those participating in the NIH-funded Biomedical Data Sciences Training Program at Penn State.
Advancing strategic priorities
Bolstering the University-wide strategic priorities of Driving Digital Innovation and Enhancing Health, Honavar believes this project is of critical importance to the University and the Commonwealth.
Through the seed funding provided by the University, the team will work toward realizing their vision.
“When this call came out, it seemed like the right opportunity,” Honavar said. “You can build these predictive models that can be used to personalize interventions and improve health. However, this is hard to do if each researcher has to overcome the challenges of accessing health data and integrating it with other relevant data for each such project.”
Concluded Honavar, “We will build this infrastructure once, and we’ll do it right, so as to enable interdisciplinary teams of researchers across the university to focus on using the data effectively to answer important research questions, and ultimately, improve population and individual health outcomes.”