Linkage Disequilibrium: Definition and Significance in Genomics
Linkage disequilibrium (LD) is a nonrandom association or dependence of alleles at different loci in a given population, making the frequencies of the alleles deviate from the expected frequency if the alleles were independent. 1
Key Characteristics of Linkage Disequilibrium
- LD occurs when alleles at different loci are physically linked on a chromosome, leading to non-random coinheritance such that their frequencies in a population are correlated 1
- LD is a consequence of genetic distance, history of mutation events, and changes in population dynamics 1
- The strength of LD between loci decreases with increasing physical and genetic distance between them 2
- LD patterns vary significantly across different populations due to their unique demographic histories 3
Measurement and Quantification of LD
- Common statistical measures of LD include squared correlation coefficients (r²) for phased genotypes and a related measure for unphased genotypes 4
- At equilibrium, LD values are determined by sample size, recombination frequency, effective population size, and mating system 4
- Testing for LD when linkage phase is unknown requires statistical methods that account for the ambiguity of unobserved haplotypes 5
- The likelihood-ratio statistic is commonly used but can have biased type I error rates; composite statistics provide more reliable alternatives 5
Applications in Genomics and Medicine
- LD is used as a tool for gene mapping and estimation of effective population size 4
- In genome-wide association studies (GWAS), LD is crucial for understanding the relationship between identified variants and potential causal variants 1
- LD patterns help determine appropriate multiple testing thresholds in GWAS (typically p < 5×10⁻⁸), based on the estimated one million independent regions across the genome 1
- LD is exploited in transcriptome-wide association studies (TWAS) to identify gene-trait associations by integrating GWAS and gene expression datasets 1
Importance in Population Genetics
- LD serves as a sensitive indicator of population genetic forces that structure a genome 6
- Evolutionary biologists and human geneticists use LD to understand past evolutionary and demographic events 6
- LD patterns differ between isolated populations and more general populations, with isolated populations often showing higher levels of LD, particularly on sex chromosomes 3
- LD is widely distributed in anonymous regions of the human genome and may allow more accurate measurement of small genetic distances than standard linkage analysis 2
Limitations and Considerations
- LD patterns vary substantially across different ancestral populations, affecting the accuracy of imputation and genetic analyses 1
- Many genotyping platforms have been designed based on European LD patterns, potentially limiting their utility in non-European populations 1
- The non-independence of genetic variants due to LD means that standard multiple testing corrections (like Bonferroni) may be overly conservative in genomic studies 1
- LD can lead to spurious prioritization of non-causal genes in methods like TWAS, especially when using expression data from non-trait-related tissues 1
Clinical and Research Implications
- Understanding LD is essential for fine-mapping procedures that aim to identify causal genetic variations affecting traits of interest 1
- LD patterns influence the design of genotyping arrays and the selection of tagging SNPs for genetic studies 1
- The extent of LD in different populations is an important consideration when selecting ascertainment strategies for studies of complex diseases 3
- Accounting for LD is necessary when performing multiple testing adjustment in GWAS to avoid being overly conservative 1