



Posterior-and-Prior Knowledge Exploring-and-Distilling approach (PPKED)

  • first examine the abnormal regions 检查异常部位

  • assign the disease topic tags 分配疾病主题标签

  • include modules:

    • Posterior Knowledge Explorer (PoKE) 后验知识探索器

      • explores the posterior knowledge 探索后验知识
      • provides **explicit abnormal visual regions ** 提供显式异常视觉区域
      • alleviate visual data bias 缓解视觉数据偏差
      • 使用疾病的词袋探索后验知识,捕捉罕见、多样和重要的异常区域
    • Prior Knowledge Explorer (PrKE) 先验知识探索器
      • explores the prior knowledge from the prior medical knowledge graph (prior medical knowledge PrMK GPrG_{Pr}GPr) and prior radiology reports (prior working experience PrWE WPrW_{Pr}WPr) 从既往医学知识图(医学知识)和既往放射学报告(工作经验)中探索既往知识
      • alleviate textual data bias 缓解文本数据偏差
      • 从以前的工作经验和以前的医学知识中探索以前的知识
    • Multi-domain Knowledge Distiller (MKD) 多领域知识提取器
      • generate the final reports
      • 将提取的知识提取出来生成报告
      • adaptive distilling attention (ADA)
        • make the model adaptively learn to distill correlate knowledge


directly applying image captioning approaches to radiology images has problems:

  • visual data deviation - unbalanced visual distribution
  • textual data deviation - too much normal discriptions

Related Works

Image Captioning

encoder-decoder framework - translates the image to a single descriptive sentence 单一描述性句子

radiology report generation - aims to generate a long paragraph - consists of multiple structural sentences

  • each one focusing on a specific medical observation for a specific region in the radiology image 每一个都聚焦于放射图像中特定区域的特定医学观察

Image Paragraph Generation

  • in a natural image paragraph: each sentence has equal importance
  • in radiology report: generating abnormalities should be emphasized more than other normalities 需要更重视异常信息

Radiology Report Generation

explore and distill the posterior and prior knowledge for accurate radiology report generation 探索和提取后验和先验知识,以便准确地生成放射学报告

  1. for the network structure: explore the posterior knowledge of input radiology image by proposing to explicitly extract the abnormal regions 通过提出明确地提取异常区域来探索输入放射学图像的后验知识
  2. leverage the retrieved reports and medical knowledge graph to model the prior working experience and prior medical knowledge 利用检索到的报告和医学知识图对以前的工作经验和以前的医学知识建模
  3. retrieve a large amount of similar reports
  4. treat the retrieved reports as latent guidance 将检索到的报告作为潜在的指引
    (use fixed templates to introduce inevitable errors)

Posterior-and-Prior Knowledge Exploring-and-Distilling (PPKED)

PoKE后验知识资源管理器 + PrKE先验知识资源管理器 + MKD多域知识蒸馏器

  • PoKE: explores the posterior knowledge by extracting the explicit abnormal regions 通过提取显式异常区域来探索后验知识
  • PrKE: explores the relevant prior knowledge for the input image 通过提取显式异常区域来探索后验知识
  • MKD: distills accurate posterior and prior knowledge and adaptively merging them to generate accurate reports 提取准确的后验和先验知识,并自适应地合并它们以生成准确的报告


Problem Formulation

PoKE:{I,T}→I′;PrKE:{I′,WPr};{I′,GPr}→GPr′MKD:{I′,WPr′,GPr′}→R\text{PoKE}:\{I,T\}\to I'; \\ \text{PrKE}:\{I',W_{\text{Pr}}\};\ \{I',G_{\text{Pr}}\}\to G'_{\text{Pr}} \\ \text{MKD}:\{I',W'_{\text{Pr}},G'_{\text{Pr}}\}\to R PoKE:{I,T}I;PrKE:{I,WPr};{I,GPr}GPrMKD:{I,WPr,GPr}R

Information Sources

  • III: adopt the ResNet-152 to extract 2048 7$\times7imagefeaturemapswhicharefurtherprojectedinto51277 image feature maps which are further projected into 512 77imagefeaturemapswhicharefurtherprojectedinto5127\times$7 feature maps, resulting I={i1,i2,...,iN1}∈RN1×d(N1=49,d=512)I=\{i_1,i_2,...,i_{N_1}\}\in \mathbb{R}^{N_1 \times d}(N_1=49,d=512)I={i1,i2,...,iN1}RN1×d(N1=49,d=512)

  • TTT: topic bag (common abnormality topics or findings)

    • T={t1,t2,...,tNT∈RNT×d}T=\{t_1,t_2,...,t_{N_T}\in \mathbb{R}^{N_T \times d}\}T={t1,t2,...,tNTRNT×d}
    • ti∈Rdt_i\in\mathbb{R}^dtiRd: the word embedding of the ithi^{th}ith topic 主题的词嵌入
  • WPrW_{\text{Pr}}WPr: the reports of the top-NKN_KNK retrieved images are returned and encoded as the WPr={R1,R2,...,RNK}∈RNK×dW_{\text{Pr}}=\{R_1,R_2,...,R_{N_K}\}\in\mathbb{R}^{N_K\times d}WPr={R1,R2,...,RNK}RNK×d

    • use a BERT encoder followed by a max-pooling layer over all output vectors 在所有的输出向量上使用一个BERT编码器后跟一个max-pooling层 as the report embedding module Ri∈RdR_i\in\mathbb{R}^dRiRd of the ithi^{th}ith retrieved report
    • 先验工作经验:从ResNet-152的最后一个平均池化层提取image embedding,这个image embedding是针对所有图像的; 然后对于给定一张图片。在语料库中找与输入图像余弦相似度最高的100张图片,将这样检索到的100张图片的报告用BERT和一个最大池化连接层进行编码,以此得到工作经验
  • GPrG_{\text{Pr}}GPr:

    1. build a universal graph GUni=(V,E)G_{\text{Uni}}=(V,E)GUni=(V,E): models the domain-specific prior knowledge structure 为特定领域的先验知识结构建模
    2. compose a graph that covers the most common abnormalities or findings 组成一个图表,涵盖最常见的异常或发现
    3. connect nodes with bidirectional edges 用双向边连接节点
      • nodes VVV: NTN_TNT common topics in TTT
    4. acquire a set of nodes V′={v1′,v2′,...,vNT}∈RRT×dV'=\{v_1',v_2',...,v_{N_T}\}\in \mathbb{R}^{R_T\times d}V={v1,v2,...,vNT}RRT×d encoded by a graph embedding module 由图形嵌入模块编码
      • based on the graph convolution operation 基于图的卷积运算
    • 先验医学知识:构建一张医学图。词袋中的主题被设置为节点,根据它们相关的器官和身体部分进行分组;对于分在一起的主题用边连接起来,用图卷积神经网络提取先验医学知识

Basic Module

Multi-Head Attention (MHA)

The MHA consists of n parallel heads and each head is defined as a scaled dot-product attention:
Atti(X,Y)=softmax(XWiQ(YWiK)Tdn)YWiVMHA(X,Y)=[Att1(X,Y);...;Attn(X,Y)]WO\text{Att}_i(X,Y)=\text{softmax}(\frac{X\text{W}_i^\text{Q}(Y\text{W}_i^\text{K})^T}{\sqrt{d_n}})Y\text{W}_i^\text{V} \\ \text{MHA}(X,Y)=[\text{Att}_1(X,Y);...;\text{Att}_n(X,Y)]\text{W}^{\text{O}} Atti(X,Y)=softmax(dn


  • X∈Rlx×dX\in\mathbb{R}^{l_x \times d}XRlx×d: the Query matrix

  • Y∈Rly×dY\in\mathbb{R}^{l_y \times d}YRly×d: the Key/Value matrix

  • WiQ,WiK,WiV∈Rd×dn\text{W}_i^\text{Q},\text{W}_i^\text{K},\text{W}_i^\text{V}\in\mathbb{R}^{d\times d_n}WiQ,WiK,WiVRd×dn, WiO∈Rd×d\text{W}_i^\text{O}\in \mathbb{R}^{d\times d}WiORd×d: learnable parameters

    • dn=d/nd_n=d/ndn=d/n

    • [⋅,⋅][·,·][,]: concatenation operation 序连运算


Feed-Forward Network (FFN)

FNN(x)=max(0,xWf+bf)Wff+bff\text{FNN}(x)=\text{max}(0,x\text{W}_\text{f}+\text{b}_\text{f})\text{W}_\text{ff}+\text{b}_\text{ff} FNN(x)=max(0,xWf+bf)Wff+bff

  • max(0,∗)\text{max}(0,*)max(0,): ReLU activation function
  • Wf∈Rd×4d\text{W}_\text{f} \in \mathbb{R}^{d\times4d}WfRd×4d & Wff∈R4d×d\text{W}_\text{ff} \in \mathbb{R}^{4d\times d}WffR4d×d : learnable matrices for linear transformation 线性变换的可学习矩阵
  • bf\text{b}_\text{f}bf & bff\text{b}_\text{ff}bff: bias terms 偏置项


  • MHA computes the association weights between different features 计算不同特征之间的关联权值

    • allows probabilistic many-to-may relations 概率多对多关系

apply MHA to correlate the posterior and prior knowledge for the input radiology image, as well as distilling useful knowledge to generate accurate reports 应用MHA将输入的放射图像的后验和先验知识关联起来,并提取有用的知识以生成准确的报告

Posterior Knowledge Explorer (PoKE)

extract the posterior knowledge from the input image (abnormal regions) 从输入图像中提取后验知识
T^=FFN(MHA(I,T));I^=FFN(MHA(T^,I));\hat{T}=\text{FFN}(\text{MHA}(I,T)); \\ \hat{I}=\text{FFN}(\text{MHA}(\hat{T},I)); T^=FFN(MHA(I,T));I^=FFN(MHA(T^,I));

the image features I∈RN1×dI\in\mathbb{R}^{N_1\times d}IRN1×d are first used to find the most relevant topics and filter out the irrelevant topics, resulting in T^∈RN1×d\hat{T}\in\mathbb{R}^{N_1\times d}T^RN1×d. Then the attended topics T^\hat{T}T^ are further used to mine topic related image features I^∈RN1×d\hat{I}\in\mathbb{R}^{N_1\times d}I^RN1×d 用于挖掘与主题相关的图像特征


align the attended abnormal regions with the relevant topics 异常区域与相关的主题相一致

  • need to filter out the irrelevant topics


since I^\hat{I}I^ and T^\hat{T}T^ are aligned, we directly add them up to acquire the posterior knowledge of the input image:
I′=LayerNorm(I^+T^)I'=\text{LayerNorm}(\hat{I}+\hat{T}) I=LayerNorm(I^+T^)

  • LayerNorm\text{LayerNorm}LayerNorm: Layer Normalization 层归一化
  • I′I'I: first impression of radiologists after check the abnormal regions

Prior Knowledge Explorer (PrKE)

PrKE consists of a Prior Working Experience component and a Prior Medical Knowledge component

  • both obtain prior knowledge from existing radiology report corpus and represent them as WPrW_{\text{Pr}}WPr & GPrG_{\text{Pr}}GPr
  • WPr′W'_{\text{Pr}}WPr & GPr′G'_{\text{Pr}}GPr: prior knowledge relating to the abnormal regions of the input image 分别代表先前工作经验和先前医学知识
  • I′∈RNI×dI'\in\mathbb{R}^{N_\text{I} \times d}IRNI×d: Query
  • WPr∈RNK×dW_{\text{Pr}} \in\mathbb{R}^{N_\text{K} \times d}WPrRNK×d: Key
  • GPr∈RNT×dG_{\text{Pr}} \in\mathbb{R}^{N_\text{T} \times d}GPrRNT×d: Value

WPr′=FNN(MHA(I′,WPr))GPr′=FNN(MHA(I′,GPr))W'_{\text{Pr}}=\text{FNN}(\text{MHA}(I',W_{\text{Pr}})) \\ G'_{\text{Pr}}=\text{FNN}(\text{MHA}(I',G_{\text{Pr}})) WPr=FNN(MHA(I,WPr))GPr=FNN(MHA(I,GPr))

  • WPr′∈RNI×dW'_{\text{Pr}} \in\mathbb{R}^{N_\text{I} \times d}WPrRNI×d & GPr′∈RNI×dG'_{\text{Pr}} \in\mathbb{R}^{N_\text{I} \times d}GPrRNI×d: a set of attended prior knowledge related to the abnormalities of the input image 一组与输入图像异常相关的相关先验知识

    • have potential to alleviate the textual data bias


Multi-domain Knowledge Distiller (MKD)

performs as a decoder 作为解码器生成最终的放射学报告
take the embedding of current input word xt=wt+etx_t=w_t+e_txt=wt+et as input:

  • wtw_twt: word embedding 词嵌入
  • ete_tet: fixed position embedding 位置嵌入

ht=MHA(xt,x1:t)h_t = \text{MHA}(x_t,x_{1:t}) ht=MHA(xt,x1:t)

Then employ the proposed Adaptive Distilling Attention (ADA) to distill the useful and correlated knowledge: 然后使用提出的自适应蒸馏注意(ADA)来提取有用的和相关的知识:
ht′=ADA(ht,I′,GPr′,WPr′)h_t'=\text{ADA}(h_t,I',G'_{\text{Pr}},W'_{\text{Pr}}) ht=ADA(ht,I,GPr,WPr)
Finally, the ht′h_t'ht is passed to a FFN and a linear layer to predict the next word: 被传递给一个FFN和一个线性层来预测下一个单词
yt∼pt=softmax(FNN(ht′)Wp+bp)y_t\sim p_t=\text{softmax}(\text{FNN}(h'_t)\text{W}_p+\text{b}_p) ytpt=softmax(FNN(ht)Wp+bp)

  • Wp\text{W}_pWp & bp\text{b}_pbp: learnable parameters

train the PPKED by minimizing the cross-entropy loss:
LCE(θ)=−∑i=1NRlog(pθ(yi∗∣y1:i−1∗))L_{\text{CE}}(\theta)=-\sum_{i=1}^{N_R}\text{log}(p_\theta(y_i^*|y_{1:i-1}^*)) LCE(θ)=i=1NRlog(pθ(yiy1:i1))

  • R∗={y1∗,y2∗,...,yNR∗}R^*=\{y_1^*,y_2^*,...,y_{N_R}^*\}R={y1,y2,...,yNR}: ground truth report

Adaptive Distilling Attention (ADA)

make the model adaptively learn to distill correlate knowledge: 使模型自适应学习提取相关知识
ADA(ht,I′,GPr′,WPr′)=MHA(ht,I′+λ1⊙GPr′+λ2⊙WPr′)λ1,λ2=σ(htWh⊕(I′WI+GPr′WG+WPr′WW))\text{ADA}(h_t,I',G'_{\text{Pr}},W'_{\text{Pr}})=\text{MHA}(h_t,I'+\lambda_1\odot G'_{\text{Pr}}+\lambda_2\odot W'_{\text{Pr}}) \\ \lambda_1,\lambda_2 = \sigma(h_t\text{W}_h\oplus(I'\text{W}_I+G'_{\text{Pr}}\text{W}_G+W'_{\text{Pr}}\text{W}_W)) ADA(ht,I,GPr,WPr)=MHA(ht,I+λ1GPr+λ2WPr)λ1,λ2=σ(htWh(IWI+GPrWG+WPrWW))

  • Wh,WI,WG,WW∈Rd×2\text{W}_h,\text{W}_I,\text{W}_G,\text{W}_W\in\mathbb{R}^{d\times 2}Wh,WI,WG,WWRd×2: learnable parameters
  • ⊙\odot: element-wise multiplication 哈达玛积
  • σ\sigmaσ: sigmoid function
  • ⊕\oplus: matrix-vector addition
  • λ1,λ2∈[0,1]\lambda_1,\lambda_2\in [0,1]λ1,λ2[0,1]: weight the expected importance of GPr′G'_{\text{Pr}}GPr & WPr′W'_{\text{Pr}}WPr for each target word


datasets: IU-Xray and MIMIC-CXR

Quantitative Analysis

Posterior Knowledge Explorer

PoKE can better recognize abnormalities

Prior Knowledge Explorer

  • PrMK: can help the model learn enriched medical knowledge of the most common abnormalities or findings 能帮助模型学习最常见的异常或发现的丰富的医学知识
  • PrWE: verifies the effectiveness of introducing existing similar reports 验证了引入现有类似报告的有效性

Multi-domain Knowledge Distiller

based on the Transformer Decoder equipped with the proposed Adaptive Distilling Attention

Qualitative Analysis

prove that their arguments and verify the effectiveness of our proposed approach in alleviating the data bias problem by exploring and distilling posterior and prior knowledge 证明了我们的论点,并验证了我们提出的方法通过探索和提取后验和先验知识来缓解数据偏差问题的有效性


  • generate meaning and robust radiology reports supported with accurate abnormal descriptions and regions
  • outperforms previous state-of-the-art models on the 2 public datasets

【论文笔记】Exploring and Distilling Posterior and Prior Knowledge for Radiology Report ... (CVPR 2021)相关推荐

  1. 论文笔记--Exploring Translation Similarities for Building a Better Sentence Aligner

    论文笔记--Exploring Translation Similarities for Building a Better Sentence Aligner 1. 文章简介 2. 文章背景 3. 文 ...

  2. 论文笔记《Learning Deep Correspondence through Prior and Posterior Feature Constancy》

    摘要 介绍 相关工作 本论文方法 1 用于多尺度特征提取的茎块 2 初始视差估计子网络 3 视差精细化子网络 4 迭代精细化 实验 1 脱离实验 2 测试基准结果 总结 参考文献 摘要 立体匹配算法通 ...

  3. 论文笔记:AAAI 2020 InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing

    1. 前言 论文链接:https://arxiv.org/pdf/1911.00219.pdf github:https://github.com/malllabiisc/InteractE 现有的大 ...

  4. CVPR 2021 论文大盘点-医学影像篇

    关注公众号,发现CV技术之美 本文总结医学影像相关论文,包含医学图像分割.医学图像合成.X射线检测等.共计 22 篇. 大家可以在: https://openaccess.thecvf.com/CVP ...

  5. CVPR 2021 最佳论文奖项出炉,何恺明等获最佳论文提名 !

    点上方计算机视觉联盟获取更多干货 仅作学术分享,不代表本公众号立场,侵权联系删除 转载于:AI科技评论,专知 AI博士笔记系列推荐 周志华<机器学习>手推笔记正式开源!可打印版本附pdf下 ...

  6. CVPR 2021论文分享会日程公布!

    ↑↑↑关注后"星标"Datawhale 每日干货 & 每月组队学习,不错过 Datawhale学术 活动:CVPR 2021论文分享会 随着人工智能的火热,AAAI.Neu ...

  7. 4个Keynote、12篇论文分享、40个Poster,CVPR 2021论文分享会全日程公布

    随着人工智能的火热,AAAI.NeurIPS.CVPR 等顶级学术会议的影响力也愈来越大,每年接收论文.参会人数的数量连创新高.但受疫情影响,近两年国外举办的学术会议都转为了线上,无法满足学者们现场交 ...

  8. 论文笔记:Distilling the Knowledge

    原文:Distilling the Knowledge in a Neural Network Distilling the Knowledge 1.四个问题 要解决什么问题? 神经网络压缩. 我们都 ...

  9. Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior 论文笔记

    Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior 论文笔记 一.摘要 二.引言 三.相关工作 ...


  1. html表格联动,html前端基础:table和select操作
  2. 如何在word里面插入目录
  3. BlueMix与商业智能BI(第二部分:Bluemix应用创建)
  4. 一些设计上的基本常识(转载)
  5. 多继承中构造器和析构器的调用顺序
  6. IntelliJ IDEA中怎么创建xml文件?
  7. Google开放最大目标检测数据集,还要为它举办AI挑战赛
  8. IT民工系列——通用7130芯片视频采集卡 SDK 兼容任意天敏 宏视 等板卡
  9. securiteCRT中退出全屏
  10. matlab中单位格式,[转载]matlab中的数据显示格式-format
  11. 【EduCoder实训答案】JSP入门
  12. 统计碱基数目、GC含量、read数、最长的read、最短的read及平均read长度
  13. 【OpenCV】Chapter7.图像噪声与滤波器
  14. 亚马逊关键词应该如何选择?
  15. Android 虚拟按键与沉浸式的适配
  16. 第一篇,怎么增加SYN数据包的大小(syn flood攻击实验)
  17. discuz数据字典
  18. 安桌16进制变色列表和计算方法
  19. 小学数学计算机教案模板,小学数学信息化教学设计模板.doc
  20. 如何进行PLC数据采集?如何进行PLC录波?如何进行PLC时序分析?看,我的工作成果PLC-Recorder!


  1. 如何使用IBM SPSS Statistics统计两个变量的交叉频率分布
  2. 【ElementUI】el-table 设置 max-height 后,当表格数据未超过高度时,表格右侧会出现空白列的解决方案
  3. 循环神经网络的概念,结构和代码注释
  4. java回顾:私服搭建
  5. linux+gedit+编辑文件,GEDIT + GMATE = 实用的编辑器
  6. CodeForces 348D Turtles(LGV定理)题解
  7. 新版Dell本本BIOS设置完全手册--U盘装系统bios设置教程
  8. 网络通信误码率测试软件,DMR终端直通模式误码率测试软件的设计与实现
  9. 极客评论:使用TeraCopy加快文件复制
  10. TM、ETM+数据介绍.doc