To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity–Application to the Tox21 and Mutagenicity Data Sets
Department of Chemistry, Umeå University, Umeå, Sweden.
Unit of Toxicology Sciences, Karolinska Institute, Södertälje, Sweden.
Unit of Toxicology Sciences, Karolinska Institute, Södertälje, Sweden; Department of Computer and System Sciences, Stockholm University, Kista, Sweden.ORCID iD: 0000-0003-3107-331X
Drug Discovery Institute, London, England.
2019 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 59, no 10, p. 4150-4158Article in journal (Refereed) Published
Abstract [en]

Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster speed and lower cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm, inherited its high predictivity but resolved its scalability and long computational time by adopting a leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity data sets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity data sets. We recommend LightGBM for applications of in silico safety assessment and also other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related end points of large compound libraries present in the pharmaceutical and chemical industry.

Place, publisher, year, edition, pages
Washington: American Chemical Society (ACS), 2019. Vol. 59, no 10, p. 4150-4158
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:oru:diva-83142DOI: 10.1021/acs.jcim.9b00633ISI: 000503918200012PubMedID: 31560206Scopus ID: 2-s2.0-85073168945OAI: oai:DiVA.org:oru-83142DiVA, id: diva2:1440447
Note

Forskningsfinansiärer:

Alzheimer's Research UK, Grant Number: 1077089, SC042474

Cancer Research UK, Grant Number: FC001002

UK Medical Research Council, Grant Number: FC001002

Wellcome Trust, Grant Number: FC001002

Available from: 2020-06-15 Created: 2020-06-15 Last updated: 2024-01-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Norinder, Ulf

Search in DiVA

By author/editor
Norinder, Ulf
In the same journal
Journal of Chemical Information and Modeling
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 86 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf