Metagenomes were also analyzed with a local BLASTN to a database of N metabolism genes that we constructed with searches at the NCBI site. The database included the known genes for the enzymes involved in denitrification, DNRA, and Annamox (using [12, 52] as guides for the genes to include), as these processes are nitrate reduction pathways. High Content Screening The highly profiled functional genes for nitrification (amoA, amoB, and amoC) and nitrogen
fixation (nifD, nifH, and nifK) were also included. The database contained a total of 111,502 sequences and a complete list of the genes included in the database can be found in Additional file 2: Table S5. The searches for the genes to include in the database at the NCBI site were to the “Nucleotide” collection of the International Nucleotide Sequence Database Collaboration (DDBJ/EMBL/GenBank) with limits, which excluded Ponatinib solubility dmso sequence tagged sites (STSs), third party annotation (TPA) sequences, high throughput genomic (HTG) sequences, patents, and whole genome shotgun (WGS) sequences. Additional limits
were that the search field was gene name and the molecule was genomic DNA/RNA., We also excluded hits that included “complete genome” in any field. (The search field was as follows: “xxxX [Gene Name] AND biol_genomic [PROP] NOT “complete genome” [All Fields]”, where “xxxX” corresponds to the gene that was being searched for, such as “nosZ”.) The local BLASTN was conducted at Case Western Reserve
Morin Hydrate University’s Genome and Transcriptome Analysis Core facility. A number of sequences in our database were complete chromosome sequences that included genes other than the N metabolism genes we were interested in. If sequences from the metagenomes matched with these database entries, they were only retained if the gene region of the BLASTN match was to a N metabolism gene of interest (e.g., if the match between the metagenome sequence and the database entry was to the gene region coding for a N metabolism gene of interest, such as the napA gene, it was kept, but if the match was to a non-N metabolism gene, such as the trpS gene, it was removed.) The BLASTN comparison included an e-value cutoff of 10-5 or lower and sequence similarity cutoff of 50 base pairs or greater. Statistical analysis The Statistical Analysis of Metagenomic Profiles (STAMP) program was used to compare the +NO3- and –N metagenomes by identifying the proportional representation of different metabolic or phylogenetic groups and determining if they were statistically different between the two metagenomes with two-sided Fisher exact tests [53]. The MG-RAST functional matches at all levels and taxonomic matches at the class level and higher were compared with Fisher exact tests.