The lack of explicit selleck chemicals llc sample site geographic location and time (x, y, z, t) is apparent (Figure 3), and for environmental isolates, this may be the most ‘value-added’ component of MIGS compliance. These elements allow for genomes to be “put on the map” [20], thus reaping the benefits of, for example, comparisons using environmental data, either collected in situ, or interpolated using, i.e., the megx.net GIS Tools [16]. Using the resources of megx.net, any sample site in the ocean where location, depth, and time (x, y, t, z) are known can be supplemented by interpolated environmental data, such as temperature, salinity, phosphate, silicate, nitrate, dissolved oxygen, Apparent Oxygen Utilization (AOU), oxygen saturation, and chlorophyll, at standard depth levels for various time periods [16].
Geo-referenced genomes can be viewed in their environmental context on a world map (Panel a of Figure 4), and can be overlaid on numerous map data layers, such as nitrate, phosphate, silicate, and chlorophyll, or the environmental stability (expressed as standard deviations) of a parameter. Having such environmental data easily accessible and integrated with sequenced entities via GCDML reports allows for a rapid, automated “first pass” evaluation of environmental/ecological clusters and outliers (Panels b and c of Figure 4). This process greatly facilitates hypothesis and research question generation, such as: “what are the functional implications of Cyanophage PSS2 being isolated from such a comparatively high nutrient, low oxygen site?” and “what genomic features might be shared among isolates from similar habitats, such as the Sargasso Sea cluster?” Having such data accessible narrows the search time and space as researchers design comparative genomic, and even laboratory, studies.
Discussion We have manually curated MIGS-compliant GCDML reports for the 30 sequenced marine phage genomes currently available (Figure 1 and Figure 3).This study (i) is the first to publish a set of legacy MIGS reports for public genomes, (ii) is the first to publish MIGS reports for phage, and (iii) helps to establish ecogenomic trends within the sequenced marine phage genome collection using contextual data, with the end-goal of capturing richer descriptions of our public collection of genomes [8]. Towards consistency and persistence of contextual data This work shows that MIGS-compliant fields are largely missing for legacy genomes.
This study found the most overlooked components to be sample site location (x, y, z), sample collection date (t), host range, and whether the organism exists in a culture collection (Figure 3). Likewise, AV-951 nearly all of the ‘Sequencing’ components (Figure 3) are missing or filled with a ‘not available’ placeholder in the final MIGS reports, even following curation. In a world of rapidly evolving technologies, this component is critical as techniques change through time.