Puzzle of Genes

Improving statistical methods for whole genome analysis

In the genomic era, there are numerous genotype and phenotype data publicly available as a form of ‘big genomic data’. Genome-wide genotype information has provided valuable insights into the genetic basis of complex human diseases. It is now increasingly recognised that whole-genome approach is useful in complex disease analyses, which can use all or most genetic variants across the genome simultaneously. The approach is to link two individuals who are not related in the conventional sense, but who can be compared experimentally because they share part of their genome by descent over many generations. This is a paradigm-shifting approach, leading to a design-free experiment for population genetic analyses that does not require pedigree-informative individuals or relatives. Combined with advanced statistical methods, the whole-genome approach is a promising tool to dissect the genetic architecture and maximise the accuracy of risk prediction for complex diseases, leading to effective precision medicine.

We are currently developing advanced whole-genome methods for causative variant detection, genotype-environment interaction and dissection of a dynamic genetic architecture of complex traits to maximise the accuracy of individual risk prediction.

Current research projects

Advanced whole-genome approaches for causative variant detection and individual risk prediction of complex traits in human populations

(PI: Associate Professor Sang Hong Lee)

The genomics era has demonstrated the true complexity of genetic traits, but brings promise for personalised genomic medicine in which diagnosis and treatment are tailored to individuals based on profiles recorded in their genome. More feasible and realistic is the opportunity of ‘stratified medicine’ in which individuals are classified into treatment-relevant sub-groups based on profiles that incorporate information from both genomic and environmental risk factors. This project aims to develop advanced statistical methods to better detect causative variants, and to better predict an individual’s risk of disease. We have pioneered whole-genome methods and propose to improve upon them in several ways. These include a flexible Bayesian framework to elucidate the genetic architecture of complex traits and a linear mixed model to capture currently undetected genetic variance. We will apply our new methods to large data sets, including next-generation sequencing data. Our methods may lead to predictions of risk of disease for individuals that have clinical utility.

Multivariate whole genome estimation and prediction analysis of genomics data for complex diseases

(PI: Associate Professor Sang Hong Lee)

Complex disease is caused by a combination of multiple genes and environmental effects that may affect other traits and diseases. The relative importance of pleiotropic effects is expressed by the genetic correlation, which is often high between diseases and traits. This implies that considering multiple diseases and traits jointly fitted in a model is important to shed light on the etiology of complex diseases. Genomics data, combined with an advanced statistical tool, provide a plausible strategy to identify the latent mechanism of multivariate mode of diseases and to increase the accuracy of genomic risk prediction. In this project, we develop multivariate whole genome estimation and prediction analysis of genomics data for complex diseases, which may lead to improved and personalised treatments for complex diseases