The use of genome-wide data for the design and undertaking of ‘recall by genotype’ experiments

PhD project (3/4 yr research project leading to independent research at the doctorate level)

Dr Nic Timpson, Prof George Davey Smith,, Prof Marcus Munafo (

Return to list


RBG is a study design where the recruitment of a sub-set of participants from an existing study, analysis of their biosamples or collection of new data is undertaken on the basis of measured genotypic variation. Often, the exhaustive collection of phenotypic measurements in large studies is impractical and not the most efficient approach to the allocation of finite resources. This work will assess and test the use of genotypes known to be correlated with features of interest, or which require further examination given unexplained correlations with disease, to define strata or subsamples for intense or directed phenotypic data collection. This will allow examination of detailed phenotypic information in financially and pragmatically feasible sample sizes, with analytical power optimized for measurement depth and precision. The motivation for employing genotypic data in this way is the pursuit of causal relationships based on the Mendelian randomization paradigm.

Aims & objectives

The aim of this PhD will be to trial various aspects of the RBG design and to develop, execute and evaluate a specific RBG study. Thesis work will be developed by undertaking one such experiment with the ultimate aim of testing this approach whilst providing a contribution to the scientific area of interest.


Work will explore the properties of RBG methods as a means of applying Mendelian randomization (MR) and undertaking causal analyses. Using genome-wide genetic data available from the imputation of complete genetic variant collections down to a minor allele frequency of ~1% alongside comprehensive phenotype databases, the student will test the assumptions made concerning the properties of recall groups. The student will extend previous work on the use of multiple genetic variants for the construction of predictive scores, evaluating their ability to increase variance explained whilst retaining the integrity of MR i.e. is grouping individuals by multiple genetic variants more informative than grouping by a single genetic variant? The project will also aim to define the conditions most conducive to RBG designs and to apply them to worked examples aligned to a common research interest.


Lawlor DA et al. Statistics in medicine 27, 1133-1163 (2008).

McGuire SE. Genome Research 18, 1683-5 (2008).

Beskow LM. Genome Research 20, 705-9 (2010).

Created on Oct. 1, 2015, 9 a.m.