Publication Details (including relevant citation information):
J. Phys. Chem. A,115(45), 12905-12918 (2011).
Discontinuous changes in molecular structure (resulting from continuous transformations of molecular coordinates) lead to changes in chemical properties and biological activities that chemists attempt to describe through structureactivity or structureproperty relationships (QSAR/QSPR). Such relationships are commonly envisioned in a continuous high-dimensional space of numerical descriptors, referred to as chemistry space. The choice of descriptors defining coordinates within chemistry space and the choice of similarity metrics thus influence the partitioning of this space into regions corresponding to local structural similarity. These are the regions (known as domains of applicability) most likely to be successfully modeled by a structureactivity relationship. In this work the network topology and scaling relationships of chemistry spaces are first investigated independent of a specific biological activity. Chemistry spaces studied include the ZINC data set, a qHTS PubChem bioassay, as well as the space of protein binding sites from the PDB. The characteristics of these networks are compared and contrasted with those of the bioassay SALI subnetwork, which maps discontinuities or cliffs in the structureactivity landscape. Mapping the locations of activity cliffs and comparing the global characteristics of SALI subnetworks with those of the underlying chemistry space networks generated using different representations, can guide the choice of a better representation. A higher local density of SALI edges with a particular representation indicates a more challenging structureactivity relationship using that fingerprint in that region of chemistry space.