Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads

Show simple item record

dc.contributor.author Pearman, William S
dc.contributor.author Freed, Nikki E
dc.contributor.author Silander, Olin K
dc.coverage.spatial England
dc.date.accessioned 2023-11-05T22:23:27Z
dc.date.available 2023-11-05T22:23:27Z
dc.date.issued 2020-05
dc.identifier.citation (2020). BMC Bioinformatics, 21(1), 220-.
dc.identifier.issn 1471-2105
dc.identifier.uri https://hdl.handle.net/2292/66436
dc.description.abstract Background: The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pipelines are available for analysing metagenomic data. However, the majority of these tools have been designed and benchmarked using highly accurate short read data (i.e. Illumina), with few studies benchmarking classification accuracy for long error-prone reads (PacBio or Oxford Nanopore). In addition, few tools have been benchmarked for nonmicrobial communities. Results: Here we compare simulated long reads from Oxford Nanopore and Pacific Biosciences (PacBio) with high accuracy Illumina read sets to systematically investigate the effects of sequence length and taxon type on classification accuracy for metagenomic data from both microbial and non-microbial communities. We show that very generally, classification accuracy is far lower for non-microbial communities, even at low taxonomic resolution (e.g. family rather than genus). We then show that for two popular taxonomic classifiers, long reads can significantly increase classification accuracy, and this is most pronounced for non-microbial communities. Conclusions: This work provides insight on the expected accuracy for metagenomic analyses for different taxonomic groups, and establishes the point at which read length becomes more important than error rate for assigning the correct taxon.
dc.format.medium Electronic
dc.language eng
dc.publisher Springer Nature
dc.relation.ispartofseries BMC bioinformatics
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.subject Sequence Analysis, DNA
dc.subject Computer Simulation
dc.subject Metagenomics
dc.subject Eukaryota
dc.subject High-Throughput Nucleotide Sequencing
dc.subject Nanopore Sequencing
dc.subject Community composition
dc.subject Illumina
dc.subject Long read
dc.subject Nanopore
dc.subject 3107 Microbiology
dc.subject 31 Biological Sciences
dc.subject 3102 Bioinformatics and Computational Biology
dc.subject 3103 Ecology
dc.subject Human Genome
dc.subject Genetics
dc.subject Science & Technology
dc.subject Life Sciences & Biomedicine
dc.subject Biochemical Research Methods
dc.subject Biotechnology & Applied Microbiology
dc.subject Mathematical & Computational Biology
dc.subject Biochemistry & Molecular Biology
dc.subject SPECIES DEFINITION
dc.subject DNA
dc.subject CLASSIFICATION
dc.subject COMMUNITIES
dc.subject TAXONOMY
dc.subject 01 Mathematical Sciences
dc.subject 06 Biological Sciences
dc.subject 08 Information and Computing Sciences
dc.subject 46 Information and computing sciences
dc.subject 49 Mathematical sciences
dc.title Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads
dc.type Journal Article
dc.identifier.doi 10.1186/s12859-020-3528-4
pubs.issue 1
pubs.begin-page 220
pubs.volume 21
dc.date.updated 2023-10-28T02:43:57Z
dc.rights.holder Copyright: The authors en
dc.identifier.pmid 32471343 (pubmed)
pubs.author-url https://www.ncbi.nlm.nih.gov/pubmed/32471343
pubs.publication-status Published
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.subtype research-article
pubs.subtype Journal Article
pubs.elements-id 852969
dc.identifier.eissn 1471-2105
dc.identifier.pii 10.1186/s12859-020-3528-4
pubs.number 220
pubs.record-created-at-source-date 2023-10-28
pubs.online-publication-date 2020-05-29


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics