Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Analysis of Domain Knowledge for Machine Learning Prediction of Frequently Occurring Drug Side-Effects

Abstract

Development of drugs often fails due to toxicity and intolerable side effects. Recent advancements in the scientific community have rendered it possible to leverage machine learning techniques to predict individual side effects with domain knowledge features, such as drug classification. While several factors can be used to anticipate drug effects including their targets, pathways, and drug classes, it is unclear which domain knowledge is most predictive and whether certain domain knowledge is more important than others for different side effects. The goal of this project is to understand the predictive values of drug targets, drug classification (level 2 ATC codes), and protein-protein interaction networks (PathFX targets and network proteins) for the prediction of 30 frequently occurring side effects. We compared the prediction accuracy for individual side effects of trained models across five domain knowledge combinations and discovered that level 2 ATC codes have the highest predictive value across the domain knowledge features. Logistic regression coefficient analyses further suggest that side effects are significantly influenced by drug targets and drug classes, and not PathFX targets and network proteins. Our quantitative assessments may inform the development of safe and effective drugs by understanding the domain knowledge features underlying frequently occurring drug-induced side effects.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View