4.4 Article

Fine-Scale Estimation of Location of Birth from Genome-Wide Single-Nucleotide Polymorphism Data

Journal

GENETICS
Volume 190, Issue 2, Pages 669-U583

Publisher

GENETICS SOC AM
DOI: 10.1534/genetics.111.135657

Keywords

-

Funding

  1. Academy of Finland [104781, 120315, 129269, 1114194]
  2. University Hospital Oulu, Biocenter, University of Oulu, Finland [75617]
  3. European Commission [QLG1-CT-2000-01643]
  4. National Heart, Lung, and Blood Institute [5R01HL087679-02]
  5. SNP Typing for Association with Multiple Phenotypes from Existing Epidemiologic Data program [1RL1MH083268-01]
  6. National Institutes of Health/National Institute of Mental Health [5R01MH63706:02]
  7. European Network for Genetic and Genomic Epidemiology (ENGAGE) [HEALTH-F4-2007-201413]
  8. Medical Research Council (MRC), United Kingdom [G0500539, G0600705]
  9. Biocentrum Helsinki
  10. National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre Imperial College Healthcare National Health Service Trust
  11. British Heart Foundation [SP/04/002]
  12. MRC [G0700931]
  13. Wellcome Trust [084723/Z/08/Z]
  14. NIHR [RP-PG-0407-10371]
  15. European Union [HEALTH-2007-201550 HyperGenes]
  16. ENGAGE consortium [P12892_DFHM]
  17. Research Council UK
  18. Wellcome Trust [084723/Z/08/Z] Funding Source: Wellcome Trust
  19. MRC [G0600705, G0700931, G0601966] Funding Source: UKRI
  20. Medical Research Council [G0801056B, G0700931, G0600705, G0601966] Funding Source: researchfish

Ask authors/readers for more resources

Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer individuals' birthplace from genetic data. Such inference has been demonstrated for individuals' country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses individuals' PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 individuals to a median distance of 47 km. We also apply pcLOCATE to Indian individuals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available