This model predicts whether a molecules causes Drug Induced Liver Injury (DILI). DILI is a fatal liver disease caused by certain drugs. Given a SMILES string representation, the model classifies it as either causing liver injury (1) or not (0). The dataset used for this model is aggregated from U.S. FDA’s National Center for Toxicological Research and obtained though TDC (Therapeutics Data Commons). The dataset is created by analyzing the hepatotoxic descriptions presented in the FDA-approved drug labeling documents and assessing causality evidence in literature of four datasets containing human experiments.
The dataset is balanced and was split into a training set and a hold out test set for validation purposes. Molecular Descriptors generated using RDKit were used as the input features to train a Support Vector Machine (SVM) Classifier. The Domain of Applicability (DOA) method applied is Leverage and the validation scores are as follows: