Sunghwan Kim - PubChem3D: Conformer ensemble accuracy

Document created by Sunghwan Kim on Jul 8, 2015Last modified by Sunghwan Kim on Jul 9, 2015
Version 3Show Document
  • View in full screen mode

  Publication Details (including relevant citation   information):

  S.   Kim, E.E. Bolton, and S.H. Bryant;

  Journal of Cheminformatics, 2013, 5, 1.




  PubChem is a free and publicly available resource containing   substance descriptions and their associated biological activity   information. PubChem3D is an extension to PubChem containing   computationally-derived three-dimensional (3-D) structures of   small molecules. All the tools and services that are a part of   PubChem3D rely upon the quality of the 3-D conformer models.   Construction of the conformer models currently available in   PubChem3D involves a clustering stage to sample the   conformational space spanned by the molecule. While this stage   allows one to downsize the conformer models to more manageable   size, it may result in a loss of the ability to reproduce   experimentally determined "bioactive" conformations, for example,   found for PDB ligands. This study examines the extent of this   accuracy loss and considers its effect on the 3-D similarity   analysis of molecules.


  The conformer models consisting of up to 100,000 conformers per   compound were generated for 47,123 small molecules whose   structures were experimentally determined, and the conformers in   each conformer model were clustered to reduce the size of the   conformer model to a maximum of 500 conformers per molecule. The   accuracy of the conformer models before and after clustering was   evaluated using five different measures: root-mean-square   distance (RMSD), shape-optimized shape-Tanimoto (ST §ssup§ ST-opt   §esup§) and combo-Tanimoto (ComboT §ssup§ ST-opt §esup§), and   color-optimized color-Tanimoto (CT §ssup§ CT-opt §esup§) and   combo-Tanimoto (ComboT §ssup§ CT-opt §esup§). On average, the   effect of clustering decreased the conformer model accuracy,   increasing the conformer ensemble's RMSD to the bioactive   conformer (by 0.18 ± 0.12 Å), and decreasing the ST §ssup§ ST-opt   §esup§, ComboT §ssup§ ST-opt §esup§, CT §ssup§ CT-opt §esup§, and   ComboT §ssup§ CT-opt §esup§ scores (by 0.04 ± 0.03, 0.16 ± 0.09,   0.09 ± 0.05, and 0.15 ± 0.09, respectively).


  This study shows the RMSD accuracy performance of the PubChem3D   conformer models is operating as designed. In addition, the   effect of PubChem3D sampling on 3-D similarity measures shows   that there is a linear degradation of average accuracy with   respect to molecular size and flexibility. Generally speaking,   one can likely expect the worst-case minimum accuracy of 90% or   more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13,   in terms of ST §ssup§ ST-opt §esup§, ComboT §ssup§ ST-opt §esup§,   CT §ssup§ CT-opt §esup§, and ComboT §ssup§ CT-opt §esup§,   respectively. This expected accuracy improves linearly as the   molecule becomes smaller or less flexible.


  Address (URL):