Hypothesising disease mechanisms using publicly available literature and graph-based analytics

PhD project (3/4 yr research project leading to independent research at the doctorate level)

Dr Tom Gaunt, Dr Ben Elsworth

Return to list


An understanding of the mechanisms that relate important biological events is vital. For example, by identifying the pathway of events from a mutation in a gene to a given disease we can begin to look at possible treatments based on each step in the pathway. The first step in this process is often an examination of the literature, followed by detailed investigation, either in the lab or in silico. However, most scientific subjects produce a vast number of publications, too many to read in detail. To this end, tools which help us search and refine an entire set of literature are becoming increasingly important, especially as the number of publications is rising exponentially. We have developed a tool that attempts to address this issue called MELODI (www.melodi.biocompute.org.uk) which derives and explores intermediates between any two sets of PubMed articles via a simple to use web application.

Aims & objectives

1. To develop novel approaches/methods for graph database investigation

2. Expand the sources of information available via new information extraction methods

3. Improve the quality of the generated hypotheses.

4. Identify examples whereby we can prove the effectiveness of MELODI

5. Work with other departments to validate hypotheses


Currently, MELODI has limited analytical options and there is tremendous scope to explore some of the fundamental concepts surrounding this area of research. These include exploring the ways the graph can be traversed from one article set to another, including other sources of information and improving the quality and accuracy of generated hypotheses.

Gaining experience with graph databases and the analysis of large heterogeneous data sets will be an incredibly valuable skill. The student will develop and expand their interests and skill sets as they address the aims and objectives. These might include developing natural language processing approaches for various types of text, considering how to optimise and explore traversal options across the graph from one article set to another, creating recommendation engines and exploring Deep Learning methods to improve the generation and filtering of hypotheses.



Kilicoglu, H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. (2012) SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158-60.

Created on Dec. 6, 2016, 3:44 p.m.