编者按

论文《A Model for Uncertainty-Calibrated Chemical Reaction Prediction》对模型的不确定性进行了估计，提升了泛化能力。本文从实验背景、参数设置、结果与影响等方面进行了详细的解析，以供读者更好的理解文献。

今天讲一下一篇关于小分子生成的文章，产物Predicted by the given reactants and reagents.反应预测被认为是试剂和产物的微笑输入和反应物的Machine translation problem of smile output。并且该实验的方案可以准确估计分类预测Uncertainty of correctness。此外，该模型不需要人工规则，还可以处理无需分离反应试剂的输入，And data with stereochemistry，准确预测有效的合理的化学反应。

1.1 使用数据

The author uses the USPTO_STEREO dataset and the USPTO_MIT dataset respectively. Two data processing methods are used, separated and mixed. Seperated divides the reactants and reagents with >, but mixed does not distinguish between molecules that provide products and atoms that do not provide products. Let the network learn automatically, so more molecules are needed to determine the reaction center. This improves accuracy.

数据集形式如图1所示。

图1 数据集划分

1.2 基于opennmt模型参数设置

其中对transformer进行参数调优，束搜索大小为5，transformer的层数为4层，[1] embed的size为256，注意力头为8。并且在训练过程中使用了ADAM优化器，将batchsize扩充到4096，梯度每累计四次就回传一次。

1.3 消融实验

尽管融合模型集合可以获得Higher precision and very good的不确定性估计，但是需要额外的训练或测试时间。[2]在不同数据集上，最好的top5个单一The second best model accuracy is obvious高于最好的精度，达到>93%，如图2所示。

图2 分离试剂对USPTO_MIT数据集的消融实验

同时也和之前的单一模型通过将反应类型流行区域进行分类做了比较，如图3所示，发现了Molecular Transformer的潜在优势[3]，当bin的数量大于2000时，top1的ACC都在90以上。并且对MIT和STEREO数据集进行比较，如图4所示。它不仅可以记忆数据，而且可以利用从更常见的反应中推断出的信息，对更罕见的反应做出预测。可以看出top1的指标上的separated数据集还是比mixed效果更好，在MIT中精度可以达到百分之90以上，以及top2到top5均大于90。

图3 USPTO_MIT单一模型与USPTO_MIT测试集上的模型相比的最高精度

图4 Molecular Transformer的topk精度

1.4 精准度策略与反应路径评分

Because organic synthesis is a multi-step process, for a reaction predictor to be useful, it must be able to estimate its own uncertainty. The Molecular Transformer model provides an implementation method: the product of the probabilities of all predicted tokens can be used as a confidence score, and the threshold of the confidence score is used to determine whether a response is predicted incorrectly to determine the ROC curve. Indicators true positive (TP), true negative (TN) and false positive rate (FP/ (FP + TN)) and true positive rate (TP/(TP + FN)). As shown in Figure 5, it can be found that the change of the threshold versus the roc curve increases and decreases, but the change of ACC is not particularly significant.

图5 评估时，在MIT数据集上训练的模型的不同标签平滑值的roc曲线

不难看出平滑对精度的影响相对较小，但对不确定性量化有显着影响。在训练期间没有给出目标产品的 one-hot 编码的时间步长，Label smoothing reduces the quality of correct labels in the target vector，并将平滑质量分配给词汇表中的所有其他标签。它有助于产生Higher translation accuracy and human language BLEU score，也有助于在响应预测中获得更高的最好的准确度。此外，不确定性估计度量还可以用作对响应路径进行排序的分数，该分数基于所有预测token的概率的乘积，可以看出smiles的长度是一个比较大的偏差，一个大分子不应该意味着“困难”的预测。并且置信度分数与smiles的长度之间并没有相关性。

1.5 结论与影响

First of all, the innovation of this article is the use of a multi-head attention mechanism, which can be regarded as an ensemble inside the model. It achieved 90.4% of Top1 on a public benchmark data set (Top2 was 93.7%), and more importantly, the model did not use any hand-made rules. It can accurately predict the chemical change of selectivity and obtain the correct chemical selectivity, regioselectivity and stereoselectivity. In addition, our model can also estimate the uncertainty in whether it correctly predicts the classification of a response. The ROC−AUC of the uncertainty score predicted by the model is 0.89. This model has been used in the back-end of IBM Chemical RXN since August 2018. So far, thousands of organic chemists around the world have used it to make more than 40,000 predictions.

参考文献

[1] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.;Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need.Advances in Neural Information Processing Systems 2017, 6000−6010

[2] Coley, C. W.; Jin, W.; Rogers, L.; Jamison, T. F.; Jaakkola, T. S.;Green, W. H.; Barzilay, R.; Jensen, K. F. A Graph-ConvolutionalNeural Network Model for the Prediction of Chemical Reactivity.Chemical science 2019, 10, 370−377.

[3] Schwaller, P.; Gaudin, T.; Lanyi, D.; Bekas, C.; Laino, T. Foundin Translation”: Predicting Outcomes of Complex Organic ChemistryReactions Using Neural Sequence-To-Sequence Models. Chemical Science

[4] Segler, M. H.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018,555, 604.

极链AI云平台现上传了Opennmt的模型库，大家可以点击阅读全文，去官网康一康哦～

论文解析 | 不确定性校准的化学反应预测模型相关推荐

论文解析-基于图卷积神经网络的癌症基因预测模型，利用LRP算法增加模型可解释性
论文解析:Integration of multiomics data with graph convolutional networks to identify new cancer genes a ...
机器人导航两篇顶级会议论文解析
机器人导航两篇顶级会议论文解析一．一种用于四旋翼无人机室内自主导航的卷积神经网络特征检测算法标题:A Convolutional Neural Network Feature Detection ...
传感器标定两篇顶会论文解析
传感器标定两篇顶会论文解析一．在城市环境中的多个3D激光雷达的自动校准标题:Automatic Calibration of Multiple 3D LiDARs in Urban Environ ...
将视频插入视频：CVPR2019论文解析
将视频插入视频:CVPR2019论文解析 Inserting Videos into Videos 论文链接: http://openaccess.thecvf.com/content_CVPR_20 ...
全景分割：CVPR2019论文解析
全景分割:CVPR2019论文解析 Panoptic Segmentation 论文链接: http://openaccess.thecvf.com/content_CVPR_2019/papers/ ...
目标形体形状轮廓重建：ICCV2019论文解析
目标形体形状轮廓重建:ICCV2019论文解析 Shape Reconstruction using Differentiable Projections and Deep Priors 论文链接: ...
CVPR2020论文解析：实例分割算法
CVPR2020论文解析:实例分割算法 BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation 论文链接:https://arxiv ...
白平衡（Color Constancy，无监督AWB）：CVPR2019论文解析
白平衡(Color Constancy,无监督AWB):CVPR2019论文解析 Quasi-Unsupervised Color Constancy 论文链接: http://openaccess. ...
NeuroSLAM 论文解析
NeuroSLAM 论文解析本文对NeuroSLAM论文进行翻译和总结,之后可能会有关于代码的补充. 论文链接:NeuroSLAM: a brain-inspired SLAM system for ...
Adam算法_Tensorflow实现——论文解析：ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
目录 Adam优化器论文解析:ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION 摘要背景算法介绍偏差修正收敛性理论证明相关算法实验 ADAMAX 结论 ...

论文解析 | 不确定性校准的化学反应预测模型

编者按