Sunghwan Kim - Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

Version 5

      Publication Details (including relevant citation   information):

      S. Kim, E.E.   Bolton, and S.H. Bryant;

      Journal of Cheminformatics, 2012, 4,   28.




      To improve the utility of PubChem, a public repository containing   biological activities of small molecules, the PubChem3D project   adds computationally-derived three-dimensional (3-D) descriptions   to the small-molecule records contained in the PubChem Compound   database and provides various search and analysis tools that   exploit 3-D molecular similarity. Therefore, the efficient use of   PubChem3D resources requires an understanding of the statistical   and biological meaning of computed 3-D molecular similarity   scores between molecules.


      The present study investigated effects of employing multiple   conformers per compound upon the 3-D similarity scores between   ten thousand randomly selected biologically-tested compounds   (10-K set) and between non-inactive compounds in a given   biological assay (156-K set). When the "best-conformer-pair"   approach, in which a 3-D similarity score between two compounds   is represented by the greatest similarity score among all   possible conformer pairs arising from a compound pair, was   employed with ten diverse conformers per compound, the average   3-D similarity scores for the 10-K set increased by 0.11, 0.09,   0.15, 0.16, 0.07, and 0.18 for STST-opt,   CTST-opt, ComboTST-opt,   STCT-opt, CTCT-opt, and ComboT   CT-opt, respectively, relative to the corresponding   averages computed using a single conformer per compound.   Interestingly, the best-conformer-pair approach also increased   the average 3-D similarity scores for the   non-inactive-non-inactive (NN) pairs for a given assay, by   comparable amounts to those for the random compound pairs,   although some assays showed a pronounced increase in the   per-assay NN-pair 3-D similarity scores, compared to the average   increase for the random compound pairs.


      These results suggest that the use of ten diverse conformers per   compound in PubChem bioassay data analysis using 3-D molecular   similarity is not expected to increase the separation of   non-inactive from random and inactive spaces "on average",   although some assays show a noticeable separation between the   non-inactive and random spaces when multiple conformers are used   for each compound. The present study is a critical next step to   understand effects of conformational diversity of the molecules   upon the 3-D molecular similarity and its application to   biological activity data analysis in PubChem. The results of this   study may be helpful to build search and analysis tools that   exploit 3-D molecular similarity between compounds archived in   PubChem and other molecular libraries in a more efficient way.


      Address (URL):