
论文《A Model for Uncertainty-Calibrated Chemical Reaction Prediction》对模型的不确定性进行了估计,提升了泛化能力。本文从实验背景、参数设置、结果与影响等方面进行了详细的解析,以供读者更好的理解文献。

今天讲一下一篇关于小分子生成的文章,产物Predicted by the given reactants and reagents.反应预测被认为是试剂和产物的微笑输入和反应物的Machine translation problem of smile output。并且该实验的方案可以准确估计分类预测Uncertainty of correctness。此外,该模型不需要人工规则,还可以处理无需分离反应试剂的输入,And data with stereochemistry,准确预测有效的合理的化学反应。

1.1 使用数据

The author uses the USPTO_STEREO dataset and the USPTO_MIT dataset respectively. Two data processing methods are used, separated and mixed. Seperated divides the reactants and reagents with >, but mixed does not distinguish between molecules that provide products and atoms that do not provide products. Let the network learn automatically, so more molecules are needed to determine the reaction center. This improves accuracy.


图1 数据集划分

1.2 基于opennmt模型参数设置

其中对transformer进行参数调优,束搜索大小为5,transformer的层数为4层,[1] embed的size为256,注意力头为8。并且在训练过程中使用了ADAM优化器,将batchsize扩充到4096,梯度每累计四次就回传一次。

1.3 消融实验

尽管融合模型集合可以获得Higher precision and very good的不确定性估计,但是需要额外的训练或测试时间。[2]在不同数据集上,最好的top5个单一The second best model accuracy is obvious高于最好的精度,达到>93%,如图2所示。

图2 分离试剂对USPTO_MIT数据集的消融实验

同时也和之前的单一模型通过将反应类型流行区域进行分类做了比较,如图3所示,发现了Molecular  Transformer的潜在优势[3],当bin的数量大于2000时,top1的ACC都在90以上。并且对MIT和STEREO数据集进行比较,如图4所示。它不仅可以记忆数据,而且可以利用从更常见的反应中推断出的信息,对更罕见的反应做出预测。可以看出top1的指标上的separated数据集还是比mixed效果更好,在MIT中精度可以达到百分之90以上,以及top2到top5均大于90。

图3 USPTO_MIT单一模型与USPTO_MIT测试集上的模型相比的最高精度

图4 Molecular  Transformer的topk精度

1.4 精准度策略与反应路径评分

Because organic synthesis is a multi-step process, for a reaction predictor to be useful, it must be able to estimate its own uncertainty. The Molecular Transformer model provides an implementation method: the product of the probabilities of all predicted tokens can be used as a confidence score, and the threshold of the confidence score is used to determine whether a response is predicted incorrectly to determine the ROC curve. Indicators true positive (TP), true negative (TN) and false positive rate (FP/ (FP + TN)) and true positive rate (TP/(TP + FN)). As shown in Figure 5, it can be found that the change of the threshold versus the roc curve increases and decreases, but the change of ACC is not particularly significant.

图5 评估时,在MIT数据集上训练的模型的不同标签平滑值的roc曲线

不难看出平滑对精度的影响相对较小,但对不确定性量化有显着影响。在训练期间没有给出目标产品的 one-hot 编码的时间步长,Label smoothing reduces the quality of correct labels in the target vector,并将平滑质量分配给词汇表中的所有其他标签。它有助于产生Higher translation accuracy and human language BLEU score,也有助于在响应预测中获得更高的最好的 准确度。此外,不确定性估计度量还可以用作对响应路径进行排序的分数,该分数基于所有预测token的概率的乘积,可以看出smiles的长度是一个比较大的偏差,一个大分子不应该意味着“困难”的预测。并且置信度分数与smiles的长度之间并没有相关性。

1.5 结论与影响

First of all, the innovation of this article is the use of a multi-head attention mechanism, which can be regarded as an ensemble inside the model. It achieved 90.4% of Top1 on a public benchmark data set (Top2 was 93.7%), and more importantly, the model did not use any hand-made rules. It can accurately predict the chemical change of selectivity and obtain the correct chemical selectivity, regioselectivity and stereoselectivity. In addition, our model can also estimate the uncertainty in whether it correctly predicts the classification of a response. The ROC−AUC of the uncertainty score predicted by the model is 0.89. This model has been used in the back-end of IBM Chemical RXN since August 2018. So far, thousands of organic chemists around the world have used it to make more than 40,000 predictions.


