BIOCOM-PIPE: a new user-friendly metabarcoding pipeline for the characterization of microbial diversity from 16S, 18S and 23S rRNA gene amplicons
Résumé
Background: The ability to compare samples or studies easily using metabarcoding
so as to better interpret microbial ecology results is an upcoming challenge. A grow‑
ing number of metabarcoding pipelines are available, each with its own benefts and
limitations. However, very few have been developed to ofer the opportunity to charac‑
terize various microbial communities (e.g., archaea, bacteria, fungi, photosynthetic
microeukaryotes) with the same tool.
Results: BIOCOM-PIPE is a fexible and independent suite of tools for processing data
from high-throughput sequencing technologies, Roche 454 and Illumina platforms,
and focused on the diversity of archaeal, bacterial, fungal, and photosynthetic microeu‑
karyote amplicons. Various original methods were implemented in BIOCOM-PIPE to (1)
remove chimeras based on read abundance, (2) align sequences with structure-based
alignments of RNA homologs using covariance models, and (3) a post-clustering tool
(ReClustOR) to improve OTUs consistency based on a reference OTU database. The
comparison with two other pipelines (FROGS and mothur) and Amplicon Sequence
Variant defnition highlighted that BIOCOM-PIPE was better at discriminating land use
groups.
Conclusions: The BIOCOM-PIPE pipeline makes it possible to analyze 16S, 18S and
23S rRNA genes in the same packaged tool. The new post-clustering approach defnes
a biological database from previously analyzed samples and performs post-clustering
of reads with this reference database by using open-reference clustering. This makes it
easier to compare projects from various sequencing runs, and increased the congru‑
ence among results. For all users, the pipeline was developed to allow for adding or
modifying the components, the databases and the bioinformatics tools easily, giving
high modularity for each analysis.