An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes

dc.contributor.authorSolis-Reyes, Stephen
dc.contributor.authorAvino, Mariano
dc.contributor.authorPoon, Art
dc.contributor.authorKari, Lila
dc.date.accessioned2026-05-13T16:57:40Z
dc.date.available2026-05-13T16:57:40Z
dc.date.issued2018-11-14
dc.description© 2018 Solis-Reyes et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.description.abstractFor many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus.
dc.description.sponsorshipNatural Sciences and Engineering Research Council of Canada (NSERC), Discovery Grant R2824A01 || Canadian Institutes of Health Research (CIHR), PJT-155990 || CIHR, PJT-156178 || Government of Canada through Genome Canada and the Ontario Genomic Institute, OGI-131.
dc.identifier.urihttps://doi.org/10.1371/journal.pone.0206409
dc.identifier.urihttps://hdl.handle.net/10012/23307
dc.language.isoen
dc.publisherPublic Library of Science
dc.relation.ispartofseriesPLoS ONE; 13(11); e0206409
dc.relation.urihttps://github.com/stephensolis/kameris-experiments
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectHIV-1
dc.subjectgenomics
dc.subjectsequence alignment
dc.subjectopen source software
dc.subjectgenomic databases
dc.subjectsequence databases
dc.subjectviral genomics
dc.subjectcomputer software
dc.titleAn open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
dc.typeArticle
dcterms.bibliographicCitationSolis-Reyes S, Avino M, Poon A, Kari L (2018) An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. PLoS ONE 13(11): e0206409. https://doi.org/10.1371/journal.pone.0206409
uws.contributor.affiliation1Faculty of Mathematics
uws.contributor.affiliation2David R. Cheriton School of Computer Science
uws.peerReviewStatusReviewed
uws.scholarLevelFaculty
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
file (59).pdf
Size:
1.64 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
4.47 KB
Format:
Item-specific license agreed upon to submission
Description: