Sunghwan Kim - PubChem3D: Similar conformers

Version 3

      Publication Details (including relevant citation   information):

      E.E. Bolton, S.   Kim, and S.H. Bryant;

      Journal of Cheminformatics, 2011, 3,   13.




      PubChem is a free and open public resource for the biological   activities of small molecules. With many tens of millions of both   chemical structures and biological test results, PubChem is a   sizeable system with an uneven degree of available information.   Some chemical structures in PubChem include a great deal of   biological annotation, while others have little to none. To help   users, PubChem pre-computes "neighboring" relationships to relate   similar chemical structures, which may have similar biological   function. In this work, we introduce a "Similar Conformers"   neighboring relationship to identify compounds with similar 3-D   shape and similar 3-D orientation of functional groups typically   used to define pharmacophore features.


      The first two diverse 3-D conformers of 26.1 million PubChem   Compound records were compared to each other, using a shape   Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5   or greater, yielding 8.16 billion conformer neighbor pairs and   6.62 billion compound neighbor pairs, with an average of 253   "Similar Conformers" compound neighbors per compound. Comparing   the 3-D neighboring relationship to the corresponding 2-D   neighboring relationship ("Similar Compounds") for molecules such   as caffeine, aspirin, and morphine, one finds unique sets of   related chemical structures, providing additional significant   biological annotation. The PubChem 3-D neighboring relationship   is also shown to be able to group a set of non-steroidal   anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D   similarity. In a study of 4,218 chemical structures of biomedical   interest, consisting of many known drugs, using more diverse   conformers per compound results in more 3-D compound neighbors   per compound; however, the overlap of the compound neighbor lists   per conformer also increasingly resemble each other, being 38%   identical at three conformers and 68% at ten conformers. Perhaps   surprising is that the average count of conformer neighbors per   conformer increases rather slowly as a function of diverse   conformers considered, with only a 70% increase for a ten times   growth in conformers per compound (a 68-fold increase in the   conformer pairs considered). Neighboring 3-D conformers on the   scale performed, if implemented naively, is an intractable   problem using a modest sized compute cluster. Methodology   developed in this work relies on a series of filters to prevent   performing 3-D superposition optimization, when it can be   determined that two conformers cannot possibly be a neighbor.   Most filters are based on Tanimoto equation volume constraints,   avoiding incompatible conformers; however, others consider   preliminary superposition between conformers using reference   shapes.


      The "Similar Conformers" 3-D neighboring relationship locates   similar small molecules of biological interest that may go   unnoticed when using traditional 2-D chemical structure   graph-based methods, making it complementary to such   methodologies. The computational cost of 3-D similarity   methodology on a wide scale, such as PubChem contents, is a   considerable issue to overcome. Using a series of efficient   filters, an effective throughput rate of more than 150,000   conformers per second per processor core was achieved, more than   two orders of magnitude faster than without filtering.


      Address (URL):