Deakin University
Browse

A Hybrid Supervised Approach to Human Population Identification Using Genomics Data

Version 2 2024-06-05, 11:51
Version 1 2019-07-31, 16:38
journal contribution
posted on 2024-06-05, 11:51 authored by S Araghi, T Nguyen
Single nucleotide polymorphisms (SNPs) are one type of genetic variations and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research demonstrated that SNPs can be used to identify the correct source population of an individual. In addition, variations in the DNA sequences have an influence on human diseases. In this regard, SNPs studies are helpful for personalised medicine and treatment. In the literature, unsupervised clustering methods especially principal component analysis (PCA) have been popular for studying population structure. In this study, we investigate supervised approaches, particularly the LASSO multinomial regression classification method, for recognizing individuals' origin genetic population. Then, we introduce PCA-LASSO as an extension of LASSO method that benefits from advantageous characteristics of both PCA and LASSO regression. The experimental results obtained on the 1000 genome project dataset show PCA-LASSO's significantly high accuracy in prediction of individual's origin population.

History

Journal

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Volume

18

Season

March-April

Pagination

443-454

Location

United States

ISSN

1545-5963

eISSN

1557-9964

Language

English

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2019, IEEE

Issue

2

Publisher

IEEE COMPUTER SOC