4.7 Article

ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 62, 期 23, 页码 5975-5987

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c01290

关键词

-

资金

  1. Key R&D Program of China
  2. Natural Science Foundation of Zhejiang Province of China
  3. Key R&D Program of Zhejiang Province
  4. Funda-mental Research Funds for the Central Universities
  5. [2021YFF1201400]
  6. [LD22H300001]
  7. [2020C03010]
  8. [2020QNA7003]

向作者/读者索取更多资源

ALipSol is a attention-driven mixture-of-experts (MoE) model that accurately predicts the lipophilicity and aqueous solubility of drugs. By breaking down the complex endpoints into simpler ones and assigning specific expert networks, combining transfer learning and attention mechanism, ALipSol achieves significant performance improvement on different datasets.
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据