reading notes of《Artificial Intelligence in Drug Design》


文章目录

  • 1.Introduction
  • 2.Materials
    • 2.1.Chemical Libraries
    • 2.2.Protein Preparation
    • 2.3.Docking Protocol
  • 3.Methods
    • 3.1.Deep Learning for Protein-Ligand Docking
    • 3.2.Model Types and Featurization
    • 3.3.Analysis
  • 4.Notes

1.Introduction

  • uHTVS campaigns aim well beyond compound libraries of 100 million, with the sights set on the 15.5 billion Enamine real space library and beyond.
  • Compound diversity is an essential aspect of a successful uHTVS campaign.
  • Humans have roughly 20 million proteins, give or take an order of magnitude for mutant and wild-type proteins, and the drug-like chemical space spans up to an estimated 1060 unique molecules.

2.Materials

2.1.Chemical Libraries

  • Common chemical libraries for virtual screening are ZINC, and Enamine Real. These compound libraries are often obtained in SMILES or SDF format.
  • If a set of SDF files is obtained, utilize an open source tool such as OpenBabel to covert “.sdf” to “.smi”.

2.2.Protein Preparation

  • Proteins can be obtained from the Protein Data Bank (PDB). Structure preparation from this point will depend on the docking protocol used for generating scores.

2.3.Docking Protocol

  • A docking protocol is a program which optimizes a scoring function by sampling positions of a 3D ligand in a protein binding region.
  • An important but sometimes overlooked preparation for docking is enumerating 3D conformations of compounds in the library (see Fig. 1).
  • Consider function freceptor to be a scoring function prepared with a particular protein pocket. We associate the score of the predicted pose, m a x τ ∈ v a l i d p o s e s f r e c e p t o r ( τ ) max_{\tau\in valid\ poses}f_{receptor}(\tau) maxτ∈valid poses​freceptor​(τ) as a stand-in property to rank the “goodness” of molecules in a database (some authors call this value the predicted binding affinity.
  • There are many scoring functions used, such as DOCK, GOLD, and FlexX.
  • There is research into using the combination, or consensus score, as a more reliably and interpretable metric, such as CScore.
  • A general overview of scoring functions, and outlining the differences amongst them for sampling speed is available。

3.Methods

  • There is the option of ignoring the idea of scoring functions and utilizing an ML/AI model to generate the pose directly without the exhaustive search framework—such models tend to suffer from (1) their specificity to a particular scoring function protocol used for the training data, (and the imposition of a docking protocol), and (2) how docking the objective is nearly cast to mimic the physical processes involved in protein–ligand binding, which is usually represented as a multiobjective optimization problem.

3.1.Deep Learning for Protein-Ligand Docking

  • With two goals in mind, accurate and informative virtual screens that extend deeply and diversely into chemical space, the tools we choose for uHTVS must be scalable for the problem.
  • As Table 1 outlines, as machines become heavier on GPU compute targets, and with the rise of hardware accelerators for deep learning, adjusting the tools of uHTVS toward deep learning addresses one of the biggest problems in this space: library diversity.

  • Given this viewpoint, ML models as filtering large library to a list of hits, those molecules which pass the filter should be consid- ered for standard programmatic docking.

3.2.Model Types and Featurization

  • It should be noted that molecular descriptors are distinct from of molecular fingerprints. Molecular fingerprints, another alternative for featurization, are bit vectors based on hashing different molecular neighborhoods together. Molecular fingerprints originated for use in databases as a surrogate for molecular similarity. Unlike fingerprints, molecular descriptors are explainable, where fingerprints individual bits are relatively opaque.


3.3.Analysis

  • uHTVS distributions are highly skewed and vary heavily among targets and VS campaigns (see Note 2). uHTVS from docking are unlike normal distributions. In particular, one is interested in high scores rather than the mean or central tendency of data.
  • If the goal is to locate 100 top scoring hits from a molecular database with one billion molecules, the model needs a predictive accuracy around 0.00001% (which should go without reference, is unheard of in machine learning literature).
  • In the literature, one often encounters predictive models being evaluated based on their mean absolute error (MAE), mean squared error (MSE), or correlation coefficients (such as r2 or Pearson). In this section, we will argue these measures are uninformative at best and most often misleading and propose adopting a pragmatic model evaluation scheme based on the specifics of uHTVS.
  • Scaffold splitting involves splitting molecular data into clusters based on their scaffold and holding out specific scaffolds from the training data to evaluate distribution model performance. In a similar light, time dimension splitting involves hiding compounds that were recently discovered pharmaceuticals to see if the model could have discovered them without seeing that class of molecular before.

4.Notes

Chapter13 : Ultrahigh Throughput Protein-Ligand Docking with Deep Learning相关推荐

  1. [论文阅读]使用深度学习方法预测蛋白质磷酸化位点DeepPhos: prediction of protein phosphorylation sites with deep learning(一)

    文章目录 摘要 一.背景 二.数据收集和预处理 三.Deepphos框架和模型训练 摘要 这项研究中,文章提出了一种新颖的多层CNN架构DeepPhos,以准确预测具有蛋白质序列信息的磷酸化位点.与之 ...

  2. Science | 利用深度学习搭建蛋白质功能位点(Scaffolding protein functional sites using deep learning)

    参考文献: https://www.science.org/doi/10.1126/science.abn2100 文献提供的代码地址: https://github.com/RosettaCommo ...

  3. 薛定谔教程--Glide分子对接 | Ligand Docking

    薛定谔教程–Glide分子对接 | Ligand Docking Ligand Docking功能模块 Ligands Receptor grid:From file 上传受体文件形式 Use lig ...

  4. 2022-ACS-Boosting Protein−Ligand Binding Pose Prediction and Virtual Screening Based on Residue−Atom

    2022-ACS-Boosting Protein−Ligand Binding Pose Prediction and Virtual Screening Based on Residue−Atom ...

  5. DeepSurf: A surface-based deep learning approach for theprediction of ligand binding sites on prote

    论文解读:DeepSurf:一种基于表面的深度学习方法预测蛋白质上的配体结合位点 期刊:Bioinformatics 中科院分区;Q1 影像因子:6.937 发表日期:2021.6.20 代码与数据集 ...

  6. 深度学习Deep learning From Image to Sequence

    本文笔记旨在概括地讲deep learning的经典应用.内容太大,分三块. ------------------------------------------------------------- ...

  7. Deep learning From Image to Sequence

    本文笔记旨在概括地讲deep learning的经典应用.内容太大,分三块. ------------------------------------------------------------- ...

  8. 基因序列 深度学习Deep Learning for Genomics: A Concise Overview

    基因组学所需的数据量如此巨大,用深度学习技术去探索人类基因组密码便成为了趋势与未来. 由卡耐基梅龙大学硕士岳天溦与Eric Xing教授的学生汪浩瀚合著的论文"Deep Learning f ...

  9. Privacy-Preserving Deep Learning via Additively Homomorphic Encryption

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Abstract 我们建立了一个隐私保护的深度学习系统,在这个系统中,许多学习参与者对组合后的数据集执行基于神经网络的深度学习,而实际上没 ...

最新文章

  1. 计算机插座符号,插座图形符号
  2. idea自动生成类注释和方法注释
  3. 【Flutter】Flutter 开发环境搭建 ( Android Studio 下 Flutter / Dart 插件安装 | Flutter SDK 安装 | 环境变量配置 | 开发环境检查 )
  4. Bzoj4817:[SDOI2017]树点涂色
  5. 区块链系统之《基于区块链的PKI数字证书系统》
  6. windows10 计算文件的HASH(SHA256\MD5等)
  7. 甲骨文就 Java 安全问题与 FTC 达成和解
  8. 前端工程师的CI进阶之路
  9. 手机软件Toast无法显示提示信息
  10. 《Python学习手册第4版》PDF源代码+《流畅的Python》PDF思维导图
  11. bing的翻译API 国际化
  12. 雷达基础系列文章之四:雷达专业国内期刊
  13. java 区分中英文_在java中如何判断一个字符串是中文的还是英文的
  14. 吴恩达-deep learning 02.改善深层神经网络:优化算法 (Optimization algorithms)Week2
  15. 在 SQL 中计算总行数的百分比
  16. 学习(一)C#利用窗体打开Excel文件进行正常访问和写入
  17. IT工作人员健康指南
  18. 01、【正点原子】sys.c、sys.h位带操作的简单应用
  19. 浅析云计算的六种架构
  20. linux 射击 游戏,Ubuntu下安装第一人称射击游戏 Nexuiz 2.4.2(图)

热门文章

  1. 古代文论真可谓是我的一处死穴
  2. 万能五笔会导致VPS黑屏
  3. 如何在matlab中高效优雅地绘制论文插图?
  4. k-近邻算法(KNN)Python实现
  5. 计算机教学设计与反思,信息技术应用 用计算机画函数图象教学设计与反思
  6. Fail2ban防止网站CC
  7. 教你怎样用几十兆的U盘装下几个G的东东!(转贴)
  8. Java架构师成长之道之Java程序流程控制
  9. Android开发---RxJava+Retrofit封装
  10. 刷脸支付代理真有那么好做?小心别中了他们的骗局!