A molecular barcode and online tool to identify and map imported infection with Plasmodium vivax
Imported cases present a considerable challenge to the elimination of malaria. Traditionally, patient travel history has been used to identify imported cases, but the long-latency liver stages confound this approach in Plasmodium vivax. Molecular tools to identify and map imported cases offer a more robust approach, that can be combined with drug resistance and other surveillance markers in high-throughput, population-based genotyping frameworks. Using a machine learning approach incorporating hierarchical FST (HFST) and decision tree (DT) analysis applied to 831 P. vivax genomes from 20 countries, we identified a 28-Single Nucleotide Polymorphism (SNP) barcode with high capacity to predict the country of origin. The Matthews correlation coefficient (MCC), which provides a measure of the quality of the classifications, ranging from −1 (total disagreement) to 1 (perfect prediction), exceeded 0.9 in 15 countries in cross-validation evaluations. When combined with an existing 37-SNP P. vivax barcode, the 65-SNP panel exhibits MCC scores exceeding 0.9 in 17 countries with up to 30% missing data. As a secondary objective, several genes were identified with moderate MCC scores (median MCC range from 0.54-0.68), amenable as markers for rapid testing using low-throughput genotyping approaches. A likelihood-based classifier framework was established, that supports analysis of missing data and polyclonal infections. To facilitate investigator-lead analyses, the likelihood framework is provided as a web-based, open-access platform (vivaxGEN-geo) to support the analysis and interpretation of data produced either at the 28-SNP core or full 65-SNP barcode. These tools can be used by malaria control programs to identify the main reservoirs of infection so that resources can be focused to where they are needed most.