An Improved Plasmodium cynomolgi Genome Assembly Reveals an Unexpected Methyltransferase Gene Expansion

16 Jun 2017
Pasini EM, Böhme U, Rutledge GG, Voorberg-Van der Wel A, Sanders M, Berriman M, Kocken CH, Otto TD

BACKGROUND

Plasmodium cynomolgi, a non-human primate malaria parasite species, has been an important model parasite since its discovery in 1907. Similarities in the biology of to the closely related, but less tractable, human malaria parasite make it the model parasite of choice for liver biology and vaccine studies pertinent to malaria. Molecular and genome-scale studies of have relied on the current reference genome sequence, which remains highly fragmented with 1,649 unassigned scaffolds and little representation of the subtelomeres.  Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated a new reference genome sequence, PcyM, sourced from an Indian rhesus monkey. We compare the newly assembled genome sequence with those of several other species, including a re-annotated assembly.

RESULTS

The new PcyM genome assembly is of significantly higher quality than the existing reference, comprising only 56 pieces, no gaps and an improved average gene length. Detailed manual curation has ensured a comprehensive annotation of the genome with 6,632 genes, nearly 1,000 more than previously attributed to . The new assembly also has an improved representation of the subtelomeric regions, which account for nearly 40% of the sequence. Within the subtelomeres, we identified more than 1300 interspersed repeat ( ) genes, as well as a striking expansion of 36 methyltransferase pseudogenes that originated from a single copy on chromosome 9.

CONCLUSIONS

The manually curated PcyM reference genome sequence is an important new resource for the malaria research community. The high quality and contiguity of the data have enabled the discovery of a novel expansion of methyltransferase in the subtelomeres, and illustrates the new comparative genomics capabilities that are being unlocked by complete reference genomes.