Academic scientists devote their lives to research, often toiling away on problems that few people outside their discipline fully understand. Perhaps some are driven by pure curiosity or competition, while others have a personal interest in the topic at hand.
For Shirley Pepke, a genomics researcher based in Los Angeles, the urgency to find answers comes from her own instinct for survival. Since 2014, she has been working on a tool capable of tailoring ovarian cancer treatment to each patient using genomics data and a machine learning algorithm.
The first subject in this DIY precision medicine project was Pepke herself, who was diagnosed with stage IIIC ovarian cancer in September 2013.
“Some people get cancer and do fundraisers — I’m good at doing computational research on complex systems, so it seemed really natural for me to work on this,” she said. “Because I have really young children, I felt that I had to pursue every avenue to try and extend my life, and I owed it to them.”
She began her career as a physicist and data scientist, developing artificial intelligence software for NASA’s launch vehicles and algorithms to analyze high-throughput genomics data at the California Institute of Technology. But her research focus abruptly pivoted after her diagnosis of ovarian cancer, which had already spread to nearby organs in her pelvis.
Since then, Pepke has taken her computational know-how, experience with genomics, and broad network of research collaborators to battle back the disease.
To start, a colleague at Caltech put her in touch with local researchers who had access to high-throughput genomic sequencing technology — a rapid and cheap method that can sequence multiple DNA or RNA molecules at once — to measure her tumor.
The researchers analyzed the DNA sequence of Pepke’s cancer for possible mutations that could steer her toward a personalized treatment option, but nothing too compelling turned up. They also gave her an enormous data set containing gene expression data, which can provide information about gene activity that the genome cannot.
“Gene expression methods measure how much of the protein is going to be made, how quickly the body is transcribing the DNA in that protein, and if there are epigenetic changes or chemical modifiers that affect the rate of transcription of DNA that gets made into protein,” said Pepke. “These fall into the category of mutations that are picked up in gene expression data, which can have an effect on cancer and its response to therapies.”
However, the gene expression data set was unwieldy and difficult to sift through, even for an experienced data scientist like Pepke.
At this point, a friend connected her with Greg Ver Steeg, an assistant professor at the University of Southern California who specialized in mining complex data. In 2014, Ver Steeg developed an advanced machine learning method called Correlation Explanation (CorEx) capable of teasing out hidden patterns in large, high-dimensional data sets.
“If you observe a bunch of things all related to each other, that relationship must come about due to some hidden factor you couldn’t see,” said Ver Steeg. “In human biology, there are many hidden factors — for instance, gene expression can tell us about how disease is progressing and which treatments could work.”
Ver Steeg has applied his machine learning algorithm to problems in neuroscience, psychology and finance often crippled by overwhelming amounts of data. A study published last year looked at over 200 potential biomarkers in 566 older adults with CorEx, identifying those that were most predictive of cognitive decline and brain atrophy. Online dating website eHarmony recently recruited Ver Steeg to improve its matchmaking process, in hopes that CorEx can target the hidden factors that contribute to happy relationships.
The two began to collaborate in 2014, first modifying CorEx to analyze the publicly available gene expression data from ovarian cancer patients in the Cancer Genome Atlas. Their goal was to unearth the hidden factors in the data set that correlated with patient survival. For instance, they found patients whose immune systems became activated in certain ways — seen in the data as a particular gene expression profile — had better long-term survival.
“Gene expression methods measure how much of the protein is going to be made, how quickly the body is transcribing the DNA in that protein, and if there are epigenetic changes or chemical modifiers that affect the rate of transcription of DNA that gets made into protein,” said Pepke. “These fall into the category of mutations that are picked up in gene expression data, which can have an effect on cancer and its response to therapies.”
However, the gene expression data set was unwieldy and difficult to sift through, even for an experienced data scientist like Pepke.
At this point, a friend connected her with Greg Ver Steeg, an assistant professor at the University of Southern California who specialized in mining complex data. In 2014, Ver Steeg developed an advanced machine learning method called Correlation Explanation (CorEx) capable of teasing out hidden patterns in large, high-dimensional data sets.
“If you observe a bunch of things all related to each other, that relationship must come about due to some hidden factor you couldn’t see,” said Ver Steeg. “In human biology, there are many hidden factors — for instance, gene expression can tell us about how disease is progressing and which treatments could work.”
Ver Steeg has applied his machine learning algorithm to problems in neuroscience, psychology and finance often crippled by overwhelming amounts of data. A study published last year looked at over 200 potential biomarkers in 566 older adults with CorEx, identifying those that were most predictive of cognitive decline and brain atrophy. Online dating website eHarmony recently recruited Ver Steeg to improve its matchmaking process, in hopes that CorEx can target the hidden factors that contribute to happy relationships.
The two began to collaborate in 2014, first modifying CorEx to analyze the publicly available gene expression data from ovarian cancer patients in the Cancer Genome Atlas. Their goal was to unearth the hidden factors in the data set that correlated with patient survival. For instance, they found patients whose immune systems became activated in certain ways — seen in the data as a particular gene expression profile — had better long-term survival.
“I have a great deal of hope for the future, seeing as things in cancer research are changing so quickly, and the field is learning so much,” said Pepke. “While cancer may not be cured in five years, the landscape of treatment will be very different. I just hope to be around to see what happens.”
This article was published online by The Washington Post. Read news like this and more in The Clearity Portal by clicking here.