Learning to Extend Molecular Scaffolds with Structural Motifs

https://github.com/microsoft/molecule-generation

MoLeR ：指定包含骨架scaffold生成

MoLeR 分子生成模型，可以指定包含骨架scaffold生成_loong_XL的博客-CSDN博客

DATA REPRESENTATION

Motifs

https://blog.csdn.net/weixin_43135178/article/details/128954807

Learning to Extend Molecular Scaffolds with Structural Motifs

Training our model relies on a set of fragments M – called the motif vocabulary – which we infer directly from data. For each training molecule, we decompose it into fragments by breaking some of the bonds; we only consider acyclic bonds（bonds that do not lie on a cycle，as breaking rings is chemically challenging）.仅切断没在环上的键

We break all acyclic bonds adjacent to a cycle (i.e. at least one endpoint lies on a cycle), as that separates the molecule into cyclic substructures, such as ring systems, and acyclic substructures, such as functional groups. We then aggregate the resulting fragments over the entire training set, and define M as the n most common motifs, where n is a hyperparameter.【切断了临近环的所有键，所以这将分子分离成环状子结构和非环状子结构，然后，我们将整个训练集的结果fragments 聚合在一起，将n个最常见的motifs定义为M，其中n是超参数】

Having selected M, we pre-process molecules (both for training and during inference) by noting which atoms are covered by motifs belonging to the vocabulary. This is done by applying the same bond-breaking procedure as used for motif vocabulary extraction. During generation, our model can either add an entire motif in one step, or generate atoms and bonds one-by-one. This means that it can generate arbitrary structures, such as an unusual ring, even if they do not appear in the training data.

Finally, note that in contrast to Jin et al. (2020), we do not decompose ring systems into individual rings. This means that our motifs are atom-disjoint, and we consequently do not need to model a motif-specific attachment point vocabulary, as attaching a motif to a partial graph requires adding only a single bond, and thus there is only one attachment point. 【我们不会将环系统分解为单个环】

Molecule Representation

We represent a molecule as a graph G = (V, E), where vertices V are atoms, and edges E are bonds. Edges may also be annotated with extra features such as the bond type. Each node (atom) v ∈ V is associated with an initial node feature vector , chosen as chemically relevant features , both describing the atom (type, charge, mass, valence, and isotope information) and its local neighborhood (aromaticity and presence of rings). These features can be readily extracted using the RDKit library . Additionally, for atoms that are part of a motif, we concatenate with the motif embedding; for the other atoms we use a special embedding vector to signify the lack of a motif. We show this at the top of Figure 1. Throughout this paper we use Graph Neural Networks to learn contextualized node representations hv (see Appendix A for background information on GNNs). Motif embeddings are initialized randomly, and learned end-to-end with the rest of the model.

摘要：

最近在基于深度学习的分子建模方面的进展有望加速硅药物的发现。大量的生成模型是可用的，它们构建分子，或者是逐个原子、逐个键或逐个片段的分子。然而，许多药物发现项目需要在生成的分子中存在固定的Scaffold，并且引入该限制只是最近才被探索。在这里，我们提出了一个基于图的MoLeR模型，它支持Scaffold作为生成过程的初始种子，这是可能的，因为它不受生成历史的限制。我们的实验表明，MoLeR在不受约束的分子优化任务上的性能与现有技术相当，在基于Scaffold的任务上的性能优于它们，同时与现有方法相比，它的训练和采样速度要快一个数量级。此外，我们还展示了一些看似微不足道的设计选择对整体性能的影响。

Figure 1: Overview of our approach.

We discover motifs from data (a)

and use them to decompose an input molecule (b) into motifs and single atoms.

In the encoder (c), atom features (bottom) are combined with motif embeddings (top), making the motif information available at the atom level.

Decoder steps (d) are only conditioned on the encoder output and partial graph (hence independent) and have to select one of the valid options (shown below, correct choices marked in red).

Our generative procedure is shown in Algorithm 1 and example steps are shown at the bottom of Figure 1.

It takes as input a conditioning input vector z, which can either be obtained from encoding (in our setting of training it as an autoencoder) or from sampling (at inference time), and optionally a partial molecule to start generation from. Our generator constructs a molecule piece by piece. In each step, it first selects a new atom or entire motif to add to the current partial molecule, or to stop the generation. If generation continues, the new atom (or an atom picked from the added motif) is then in “focus” and connected to the partial molecule by adding one or several bonds.

Our decoder relies on three neural networks to implement the functions PickAtomOrMotif, PickAttachment and PickBond. These share a common GNN to process the partial molecule M, yielding high-level features $h_v$ for each atom v and an aggregated graph-level feature vector $h_{mol}$ . We call our model MoLeR, as each step is conditioned on the Molecule-Level Representation $h_{mol}$ .

PickAtomOrMotif uses $h_{mol}$ as an input to an MLP that selects from the set of known atom types, motifs, and a special END_GEN class to signal the end of the generation.

PickAttachment is used to select which of the atoms in an added motif to connect to the partial molecule (this is trivial in the case of adding a single atom). This is implemented by another MLP that computes a score for each added atom $v_{a}$ using its representation $h_{v_{a}}$ and $h_{mol}$ . As motifs are often highly symmetric, we determine the symmetries using RDKit and only consider one atom per equivalence class. An example of this is shown in step (3) at the bottom of Figure 1, where only three of the five atoms in the newly-added motif are available as choices, as there are only three equivalence classes.

Finally, PickBond is used to predict which bonds to add, using another MLP that scores each candidate bond between the focus atom $v^{\odot}$ and a potential partner $v_{b}$ using their representations $h_v^{\odot}$ , $h_{v_{b}}$ and $h_{mol}$ . We also consider a special, learned END_BONDS partner to allow the network to choose to stop adding bonds. Similarly to Liu et al. (2018), we employ valence checks to mask out bonds that would lead to chemically invalid molecules. Moreover, if $v^{\odot}$ was selected as an attachment point in a motif, we mask out edges to other atoms in the same motif.

The probability of a generation sequence is the product of probabilities of its steps; we note that the probability of a molecule is the sum over the probability of all different generation sequences leading to it, which is infeasible to compute. However, note that steps are only conditioned on the input z and the current partial molecule M. Our decoding is therefore not fully auto-regressive, as it marginalizes over all different generation sequences yielding the partial molecule M. During training, we use a softmax over the candidates considered by each subnetwork to obtain a probability distribution. As there are many steps where there are several correct next actions (e.g. many atoms could be added next), during training, we use a multi-hot objective that encourages the model to learn a uniform distribution over all correct choices. For more details about the architecture see Appendix B.

Appendix B.

B ARCHITECTURE

The backbone of our architecture consists of two GNNs: 1）one used to encode the input molecule, 2）and the other used to encode the current partial graph. Both GNNs have the same architecture, but are otherwise completely separate, and do not share any parameters.

To implement our GNNs, we employ the GNN-MLP layer (Brockschmidt, 2020). We use 12 layers with separate parameters, Leaky ReLU non-linearities (Maas et al., 2013), and LayerNorm (Ba et al., 2016) after every GNN layer. If using motifs, we concatenate the atom features with a motif embedding of size 64, and then linearly project the result back into 64 dimensions. We use 64 as the hidden dimension throughout all GNN layers, guided by early experiments showing that wider hidden representations were less beneficial than a deeper GNN. Moreover, to improve the flow of gradients in the GNNs, we produce the final node-level feature vectors by concatenating both initial and intermediate node representations across all layers, resulting in feature vectors of size 64 · 13 = 832. Intuitively, this concatenation serves as a skip connection that shortens the path from the node features to the final representation.

To pool node-level representations into a graph-level representation, we use an expressive multiheaded aggregation scheme. The i-th aggregation head consists of two MLPs: which computes a scalar aggregation score, and which computes a transformed version of the node representation. These are then used to compute the i-th graph-level output according to

Specifically, we compute the scores for all of the nodes, normalize across the graph using normalize, and then use them to construct a weighted sum of the transformed representations . For the normalization function we consider either passing the scores through a softmax (which results in a head that implements a weighted mean) or a sigmoid (weighted sum). We use 32 heads for the encoder GNN, and 16 heads for the partial graphs GNN. In both cases, half of the heads use a softmax normalization, while the other half uses sigmoid. The outputs from all heads are concatenated to form the fifinal graph-level vector; as different heads use different normalization functions (softmax or sigmoid), this is in spirit related to Principal Neighborhood Aggregation (Corso et al., 2020), but here used for graph-level readout instead of aggregating node-level messages.

Our node aggregation layer allows to construct a powerful graph-level representation; its dimensionality can be adjusted by varying the number of heads and the output dimension of the transformations ti . For input graphs we use a 512-dimensional graph-level representation (which is then transformed to produce the mean and standard deviation of a 512-dimensional latent code z), and for partial graphs we use 256 dimensions.

To implement the functions used in our decoder procedure (i.e., the neural networks implementing PickAtomOrMotif, PickAttachment, and PickBond in Algorithm 1), we use simple multilayer perceptrons (MLPs).

The MLP for PickAtomOrMotif has to output a distribution over all atom and motif types and the special END_GEN option. As input, it receives the latent code z and the partial molecule representation hmol. As the number of choices is large, we use hidden layers which maintain high dimensionality (two hidden layers with dimension 256). Predicting the type of the first node in an empty graph would require encoding an empty partial molecule to obtain hmol; in practice, we side-step this technicality by using a separate MLP to predict the first node type, which takes as input only the latent encoding z.

In contrast, the networks for PickAttachment and PickBond are used as scorers (i.e. need to output a single value), therefore we use MLPs with hidden layers that gradually reduce dimensionality (concretely, three hidden layers with dimension 128, 64, 32, respectively). The MLPs for PickAttachment and PickBond take the latent code z, the partial molecule representation hmol, and the representation hv of each scored candidate node v. Finally, PickBond not only needs to predict the partner of a bond, but also one of three bond types (single, double and triple); for that we use an additional MLP with the same architecture as the scoring network, but used for classification.

【 ICLR 2022】MoleR: Learning to Extend Molecular Scaffolds with Structural Motifs相关推荐

【ICLR 2022】 10篇机器学习研究论文推荐
ICLR,即国际表征学习大会,是公认的深度学习领域国际顶级会议之一,关注有关深度学习各个方面的前沿研究,在人工智能.统计和数据科学领域以及机器视觉.语音识别.文本理解等重要应用领域中发布了众多极其有影 ...
【ICLR 2022】在注意力中重新思考Softmax，多个任务达到SOTA
来源:机器之心来自商汤.上海人工智能实验室等机构的研究者用线性 COSFORMER 来取代 transformer 中的 softmax 注意力机制,在多项任务上达到最优. Transformer ...
【Coling 2022】Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
如何根据上下文生成好的prompt? 摘要近年来,预训练语言模型在语言生成方面取得了很大的成功.为了利用预训练语言模型中丰富的知识,一种简单而强大的范式是使用离散token形式或连续embeddin ...
【ACMMM 2022】Learning Hierarchical Dynamics with Spatial Adjacency for Image Enhancement
[ACMMM 2022]Learning Hierarchical Dynamics with Spatial Adjacency for Image Enhancement 代码:https://g ...
【IJCV 2022】RIConv++: Effective Rotation Invariant Convolutions for 3D Point Clouds Deep Learning
文章目录研究旋转不变就从这里开始吧. [3DV 2019]Rotation Invariant Convolutions for 3D Point Clouds Deep Learning [IJC ...
【论文阅读】Cross-X Learning for Fine-Grained Visual Categorization
[论文阅读]Cross-X Learning for Fine-Grained Visual Categorization 摘要具体实现 OSME模块跨类别跨语义正则化(C3SC^{3} SC3S ...
【深度学习】Deep Learning必备之必背十大网络结构
深度学习网络结构: [深度学习]Deep Learning必备之必背十大网络结构 (从公众号转发过来发现图片不能引用,直接点上面链接吧) 昨天的文章介绍了在学习Deep Learning过程中必须背熟 ...
【设计模式2022】第六章建造者模式
[设计模式2022]第六章建造者模式文章目录 [设计模式2022]第六章建造者模式一.建造者模式 1.结构 2.案例 3.分析 4.使用场景 5.扩展 6.对比工厂模式一.建造者模式将一个 ...
【设计模式2022】第十三章享元模式
[设计模式2022]第十三章享元模式文章目录 [设计模式2022]第十三章享元模式一.概述二.结构三.实现四.分析 1.优点 2.缺点 3.使用场景五.Integer 包装类一.概述 ...

【 ICLR 2022】MoleR: Learning to Extend Molecular Scaffolds with Structural Motifs