Visualization of links to all phenotypes creates a very large figure that is difficult to present and interpret (results not shown). Since each experiment category represents a related set of experiments, each experiment category was analyzed separately. Therefore for four of the experiment categories HDAC inhibitor (Table 2), strains were hierarchically clustered based on their phenotypes (see phenotype clustering section of the Additional file 2). Based on the hierarchical clustering results, strains isolated from the same source showed different levels of phenotype similarity: growth on sugar (high similarity), antibiotic resistance experiments (medium similarity), growth on milk and polysaccharides
(low similarity) and metal resistance (no similarity). Phenotype-based hierarchical clustering of these strains showed that niche properties better correspond to phenotype differences of strains rather than their subspecies-level differences. Clustering provided only limited information
and, thus, it can only be used as an initial screening of phenotype data. As the focus of this study is to find relations between genes and phenotypes we applied integrative analysis of phenotype and genotype data to reveal these associations. Table 2 Experiments grouped based on experimental conditions Group name Number of experiments Description Growth on sugar 16 Contains phenotypes based on 50CH API experiments Antibiotic resistance 18 Contains phenotypes based on antibiotic resistance experiments Metal resistance 17 Contains phenotypes based on metal resistance experiments Growth I-BET151 on milk or polysaccharides 11 Contains phenotypes based on growth on milk or polysaccharides Other experiments 10 Contains phenotypes based on all remaining experiments, which include growth test on medium with nisin, arginine hydrolase, salt or different enzymes.
These are experiments of which at least a single phenotype was accurately classified; for full list of experiments and their descriptions see Additional file 1. Genotype-phenotype Cediranib (AZD2171) matching Integrated analysis using an iterative gene selection allowed identification of gene-phenotype relations that could not be found by studying genotype and phenotype data separately. In genotype-phenotype matching, we used the presence/absence of 4026 ortholog groups (OGs; see Methods) in 38 L. AZD3965 concentration lactis strains (Table 1) determined by comparative genome hybridization (CGH) as genotype data. These 38 strains are a subset of a large representative collection of L. lactis trains that covers genotype, niche and phenotype diversity of L. lactis species [15]. For phenotype data, we used phenotypic measurements of these strains in 207 experiments that were previously assessed in separate studies (see Methods and Additional file 1). After pre-processing, phenotype data from 130 experiments was usable for genotype-phenotype matching (see Methods).