The values of λij > 1 indicate the affinity of the family for the environment, whereas the values of λij < 1 suggest a lack of affinity. In the second layer, the 'affinities' λij (on the log scale) are decomposed into the taxa and environment main effects plus an interaction: log λij = α + θi + γj + νij. The main effects of taxa and environments can be interpreted as surrogates for the unobserved variables that associate to each one. The interaction terms (or residuals) can be seen as an
adjusted affinity, that is, the part of the over- or under-presence that cannot be accounted Smad inhibitor for by the factors linked to the taxa or environment. Statistical inference was performed under the Bayesian paradigm, which implies assigning prior distributions to the parameters. We chose normal distributions for each of the main effects and a mixture of two normal distributions for the interactions. One of the components of the MG-132 ic50 mixture is intended to pick up noise, whereas the other aims to pick up true departures from the main effects. We implemented the model in JAGS http://mcmc-jags.sourceforge.net, a free-license software for Bayesian inference. The outputs from this analysis
were samples from the posterior distribution of the model parameters. We then represented the posterior median of the affinities between taxa and environments using a heatmap; we chose a dichromatic scale from purples to oranges. The former represent low affinity values (meaning an underpresence of the taxa in the environment), whereas the latter represent affinity (overpresence). We used standard hierarchical clustering with Euclidean distance to group the environment types according to the values of their taxa affinities (on the log scale). The resulting cluster dendrogram is displayed next to the heatmap to make visualization and the interpretation of the results easier. Database creation We have created envDB, a mySQL database containing all the data associated with this work. The user can perform queries on sequences, OTUs, samples and environments under a flexible and user-friendly interface. The
database will be updated regularly and its capabilities are described elsewhere . The database is available at http://metagenomics.uv.es/envDB Acknowledgements This tuclazepam work was supported by project SAF2009-13032 and CGL2005-06549-C02-02/ANT from the Spanish Ministerio de Ciencia e Innovación (MICINN), and projects GV/2007/050, GVPRE/2008/010 and PROMETEO/2009/092 from the Generalitat Valenciana, Spain. JT is a recipient of a contract in the FIS Program from ISCIII, Spanish Ministry of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript Electronic supplementary material Additional file 1: Table S1. Dominant environments for taxonomic families. (XLS 56 KB) Additional file 2: Figure S1.