Igor Baskin - The One-Class Classification Approach to Data Description and to Models Applicability Domain

Version 1

      Publication Details (including relevant citation   information):

      Baskin, I. I.; Kireeva, N.; Varnek, A. Mol. Inf.  2010, 29 (8-9), 581-587.


      In this paper, we associate an applicability domain (AD) of   QSAR/QSPR models with the area in the input (descriptor) space in   which the density of training data points exceeds a certain   threshold. It could be proved that the predictive performance of   the models (built on the training set) is larger for the test   compounds inside the high density area, than for those outside   this area. Instead of searching a decision surface separating   high and low density areas in the input space, the one-class   classification 1-SVM approach looks for a hyperplane in the   associated feature space. Unlike other reported in the literature   AD definitions, this approach: (I) is purely “data-based”, i.e.   it assigns the same AD to all models built on the same training   set, (ii) provides results that depend only on the initial   descriptors pool generated for the training set, (iii) can be   used for the huge number of descriptors, as well as in the   framework of structured kernel-based approaches, e.g., chemical   graph kernels. The developed approach has been applied to improve   the performance of QSPR models for stability constants of the   complexes of organic ligands with alkaline-earth metals in water.

      Address (URL): http://dx.doi.org/10.1002/minf.201000063