For environments that lack cultured isolates or are relatively underexplored, researchers are often unable to find an appropriate training set to reveal the taxonomic identity of the extracted sequences [11–13]. However, if previous clone libraries have generated full length, high-quality 16S rRNA gene sequences, then these sequences can be utilized in a training set and taxonomy framework, potentially increasing the precision of the classification provided by the RDP-NBC. Our primary goal in this study was to test the effect of training set on the RDP-NBC-based classification of Apis mellifera (European honey bee) gut derived 16S rRNA gene sequences. Insect guts are SN-38 mouse relatively
underexplored and host novel bacterial groups for which there do not exist close, cultured relatives, making taxonomic assignments for 16S sequences and metatranscriptomic data difficult [14–16]. We also sought to improve the classification of sequences from the honey bee gut by the RDP-NBC EPZ015938 through the creation of training sets
that include full-length sequences identified as core honey bee microbiota as part of a phylogenetic framework first put forward by Cox-Foster et al., 2006 and extended by Martinson et al., 2010 [17, 18]. Below we compare the precision and reproducibility of classification of the honey bee gut microbiota using six different training sets: RDP, Greengenes, arb-silva, and custom, honey bee specific databases Mirabegron generated from each. Methods Generating a bee-specific seed alignment Sequences that corresponded to accession numbers published in analyses of bee-associated microbiota and that were near full
length (at least 1250 bp) were used to generate the seed alignment for our subsequent analyses (A total of 5,713 sequences were downloaded and 5,158 passed the length threshold) [18–22]. These sequences were clustered at 99% identity, reducing the dataset to 276 representatives. This set of sequences is referred to as the honey bee database (HBDB) throughout and were aligned using the SINA aligner (v 1.2.9, ) to the arb-silva SSU database (SSURef_108_SILVA_NR_99_11_10_11_opt_v2.arb) and visually inspected using ARB . We refer to this custom seed alignment as the arb-silva SSU + honey bee alignment (ASHB). To generate a phylogeny we used the ASHB as input to RAxML (GTR + γ with 1,000 bootstrap Foretinib replicates) using a maximum likelihood framework (Stamatakis 2006). This phylogeny was used to inform the taxonomic designations (see below). In addition, we used the RAxML evolutionary placement algorithm to identify the placement of short reads within this framework (raxmlHPC-SSE3 –f v –m GTRGAMMA –n Placement). Alignment (ASHB) and phylogeny are available in TreeBase at http://purl.