Presented at the Second Indo-US Workshop on Mathematical Chemistry With Applications to Drug Discovery, Environmental Toxicology, Cheminformatics and Bioinformatics, to be held in Duluth, MN on May 30 - June 3, 2000.

Model Ensemble Approaches for Structure-Activity Relationships

Vincent C. Arena, Nancy B. Sussman, Kuo-Szo Chiang, Sati Mazumdar, and Orest T. Macina

ABSTRACT
 
Computational models are increasingly used to predict the biological activity of potentially health-threatening agents. Structure-activity relationships (SAR) are important examples of such models. We have recently demonstrated new approaches for developing SAR models for computational toxicology from ensembles of random bootstrap models. Encouraging results were obtained using data from the Chernoff/Kavlock developmental toxicity assay (CKA), the logistic regression platform, and the bagging approach (Sussman NB, Claycamp HG, Arena VC, Mazumdar S, Chiang KS, Macina OS. Structure-Activity Relationship Models: Using the Results of Model Ensembles. 6th International Symposium on Artificial Intelligence and Mathematics. Fort Lauderdale, FL. January 2000). The CKA database consists of 54 developmental toxic chemicals and 53 developmental nontoxic chemicals for each of which a set of physico-chemical features describing steric, electronic, and hydrophobic properties is calculated. In the present paper, we further investigate the utility of the bagging approach using a replicated bootstrap simulation scheme with the CKA database under the logistic regression modeling platform (Breiman L. Bagging predictors. Machine Learning 24:2, 123-140. 1996). Issues related to misclassification rates and optimum number of replicates to achieve maximum reduction in these rates are emphasized.