Software

DADA2: Fast, accurate, single-nucleotide resolution for amplicon data

The DADA2 algorithm for the inference of exact amplicon sequence variants (ASVs) from amplicon data is implemented in the dada2 R package available in Bioconductor. The core algorithm replaces the traditional OTU picking step in 16S/18S/ITS marker-gene surveys with the inference of the exact sequences present in the sample after errors are removed. Accessory functions in the R package remove chimeras and assign taxonomy. In addition to the improvements in taxonomic resolution and accuracy, DADA2 improves on OTU methods by increasing the reproducibility, reusability and comprehensiveness of marker-gene analysis. The dada2 R package is actively supported and maintained. Please see the dada2 web site for details and current developments.

decontam: Statistical identification of contaminant sequences in marker-gene and metagenomics data

The decontam R package provides several simple statistical methods to identify and visualize contaminant DNA in marker-gene or metagenomics (MGS) data, allowing contaminants to be removed and a more accurate picture of sampled communities to be constructed from MGS data. Removal of contaminats by decontam also helps reduce technical batch effects arising from differences in MGS protocols between sites or labs. Please see the decontam web site for details and current developments.

Bioconductor workflow for microbiome data analysis: from raw reads to community analyses

The R environment for statistical computation is the premier analysis platform for most biological sciences. In order to make the functionality available in R more accessible to the broader scientific community, we maintain an R workflow that goes through an entire set of analyses on marker-gene data so commonly used in microbiome studies. In addition to leveraging the dada2 R package, this workflow makes heavy use of the phyloseq R package created and maintained by Joey McMurdie and the lab of Susan Holmes at Stanford University.