Sunghwan Kim - Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

Version 5

      Publication Details (including relevant citation   information):

      S. Kim, E.E.   Bolton, and S.H. Bryant;

      Journal of Cheminformatics, 2012, 4,   28.

       

      Abstract:

      Background

      To improve the utility of PubChem, a public repository containing   biological activities of small molecules, the PubChem3D project   adds computationally-derived three-dimensional (3-D) descriptions   to the small-molecule records contained in the PubChem Compound   database and provides various search and analysis tools that   exploit 3-D molecular similarity. Therefore, the efficient use of   PubChem3D resources requires an understanding of the statistical   and biological meaning of computed 3-D molecular similarity   scores between molecules.

      Results

      The present study investigated effects of employing multiple   conformers per compound upon the 3-D similarity scores between   ten thousand randomly selected biologically-tested compounds   (10-K set) and between non-inactive compounds in a given   biological assay (156-K set). When the "best-conformer-pair"   approach, in which a 3-D similarity score between two compounds   is represented by the greatest similarity score among all   possible conformer pairs arising from a compound pair, was   employed with ten diverse conformers per compound, the average   3-D similarity scores for the 10-K set increased by 0.11, 0.09,   0.15, 0.16, 0.07, and 0.18 for STST-opt,   CTST-opt, ComboTST-opt,   STCT-opt, CTCT-opt, and ComboT   CT-opt, respectively, relative to the corresponding   averages computed using a single conformer per compound.   Interestingly, the best-conformer-pair approach also increased   the average 3-D similarity scores for the   non-inactive-non-inactive (NN) pairs for a given assay, by   comparable amounts to those for the random compound pairs,   although some assays showed a pronounced increase in the   per-assay NN-pair 3-D similarity scores, compared to the average   increase for the random compound pairs.

      Conclusion

      These results suggest that the use of ten diverse conformers per   compound in PubChem bioassay data analysis using 3-D molecular   similarity is not expected to increase the separation of   non-inactive from random and inactive spaces "on average",   although some assays show a noticeable separation between the   non-inactive and random spaces when multiple conformers are used   for each compound. The present study is a critical next step to   understand effects of conformational diversity of the molecules   upon the 3-D molecular similarity and its application to   biological activity data analysis in PubChem. The results of this   study may be helpful to build search and analysis tools that   exploit 3-D molecular similarity between compounds archived in   PubChem and other molecular libraries in a more efficient way.

       

      Address (URL): http://dx.doi.org/10.1186/1758-2946-4-28