Transformer Meets Tracker(TrDiMP)

transformer结构应用在分类分支。

第一帧self.init_classifier()

  • self.transformer_label:(15,1,22,22),根据目标在特征图上的位置制定高斯标签。

  • self.transformer_memory:(7260,1,512),其中7260=152222,512的特征图维度,通过transformer encode得到,步骤如下,几点说明:

        1. 没有位置嵌入2. normalization具体是第3维度除第3维的L2范数,即对每个token单独作normalization3. Query和Key共用W_k4. 没有转换矩阵W_v5. 没有multi-head,n_head=16. Transformer Encode结构只有1次且没有FFN
    
#mermaid-svg-qdDR6q7sn6wvwUpz .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .label text{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .node rect,#mermaid-svg-qdDR6q7sn6wvwUpz .node circle,#mermaid-svg-qdDR6q7sn6wvwUpz .node ellipse,#mermaid-svg-qdDR6q7sn6wvwUpz .node polygon,#mermaid-svg-qdDR6q7sn6wvwUpz .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-qdDR6q7sn6wvwUpz .node .label{text-align:center;fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .node.clickable{cursor:pointer}#mermaid-svg-qdDR6q7sn6wvwUpz .arrowheadPath{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-qdDR6q7sn6wvwUpz .flowchart-link{stroke:#333;fill:none}#mermaid-svg-qdDR6q7sn6wvwUpz .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-qdDR6q7sn6wvwUpz .edgeLabel rect{opacity:0.9}#mermaid-svg-qdDR6q7sn6wvwUpz .edgeLabel span{color:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-qdDR6q7sn6wvwUpz .cluster text{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-qdDR6q7sn6wvwUpz .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-qdDR6q7sn6wvwUpz text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-qdDR6q7sn6wvwUpz .actor-line{stroke:grey}#mermaid-svg-qdDR6q7sn6wvwUpz .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-qdDR6q7sn6wvwUpz #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .sequenceNumber{fill:#fff}#mermaid-svg-qdDR6q7sn6wvwUpz #sequencenumber{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz #crosshead path{fill:#333;stroke:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .messageText{fill:#333;stroke:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-qdDR6q7sn6wvwUpz .labelText,#mermaid-svg-qdDR6q7sn6wvwUpz .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-qdDR6q7sn6wvwUpz .loopText,#mermaid-svg-qdDR6q7sn6wvwUpz .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-qdDR6q7sn6wvwUpz .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-qdDR6q7sn6wvwUpz .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-qdDR6q7sn6wvwUpz .noteText,#mermaid-svg-qdDR6q7sn6wvwUpz .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-qdDR6q7sn6wvwUpz .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-qdDR6q7sn6wvwUpz .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-qdDR6q7sn6wvwUpz .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-qdDR6q7sn6wvwUpz .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .section{stroke:none;opacity:0.2}#mermaid-svg-qdDR6q7sn6wvwUpz .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-qdDR6q7sn6wvwUpz .section2{fill:#fff400}#mermaid-svg-qdDR6q7sn6wvwUpz .section1,#mermaid-svg-qdDR6q7sn6wvwUpz .section3{fill:#fff;opacity:0.2}#mermaid-svg-qdDR6q7sn6wvwUpz .sectionTitle0{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .sectionTitle1{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .sectionTitle2{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .sectionTitle3{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-qdDR6q7sn6wvwUpz .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .grid path{stroke-width:0}#mermaid-svg-qdDR6q7sn6wvwUpz .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-qdDR6q7sn6wvwUpz .task{stroke-width:2}#mermaid-svg-qdDR6q7sn6wvwUpz .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .taskText:not([font-size]){font-size:11px}#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-qdDR6q7sn6wvwUpz .task.clickable{cursor:pointer}#mermaid-svg-qdDR6q7sn6wvwUpz .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-qdDR6q7sn6wvwUpz .taskText0,#mermaid-svg-qdDR6q7sn6wvwUpz .taskText1,#mermaid-svg-qdDR6q7sn6wvwUpz .taskText2,#mermaid-svg-qdDR6q7sn6wvwUpz .taskText3{fill:#fff}#mermaid-svg-qdDR6q7sn6wvwUpz .task0,#mermaid-svg-qdDR6q7sn6wvwUpz .task1,#mermaid-svg-qdDR6q7sn6wvwUpz .task2,#mermaid-svg-qdDR6q7sn6wvwUpz .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutside0,#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutside2{fill:#000}#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutside1,#mermaid-svg-qdDR6q7sn6wvwUpz .taskTextOutside3{fill:#000}#mermaid-svg-qdDR6q7sn6wvwUpz .active0,#mermaid-svg-qdDR6q7sn6wvwUpz .active1,#mermaid-svg-qdDR6q7sn6wvwUpz .active2,#mermaid-svg-qdDR6q7sn6wvwUpz .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-qdDR6q7sn6wvwUpz .activeText0,#mermaid-svg-qdDR6q7sn6wvwUpz .activeText1,#mermaid-svg-qdDR6q7sn6wvwUpz .activeText2,#mermaid-svg-qdDR6q7sn6wvwUpz .activeText3{fill:#000 !important}#mermaid-svg-qdDR6q7sn6wvwUpz .done0,#mermaid-svg-qdDR6q7sn6wvwUpz .done1,#mermaid-svg-qdDR6q7sn6wvwUpz .done2,#mermaid-svg-qdDR6q7sn6wvwUpz .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-qdDR6q7sn6wvwUpz .doneText0,#mermaid-svg-qdDR6q7sn6wvwUpz .doneText1,#mermaid-svg-qdDR6q7sn6wvwUpz .doneText2,#mermaid-svg-qdDR6q7sn6wvwUpz .doneText3{fill:#000 !important}#mermaid-svg-qdDR6q7sn6wvwUpz .crit0,#mermaid-svg-qdDR6q7sn6wvwUpz .crit1,#mermaid-svg-qdDR6q7sn6wvwUpz .crit2,#mermaid-svg-qdDR6q7sn6wvwUpz .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-qdDR6q7sn6wvwUpz .activeCrit0,#mermaid-svg-qdDR6q7sn6wvwUpz .activeCrit1,#mermaid-svg-qdDR6q7sn6wvwUpz .activeCrit2,#mermaid-svg-qdDR6q7sn6wvwUpz .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-qdDR6q7sn6wvwUpz .doneCrit0,#mermaid-svg-qdDR6q7sn6wvwUpz .doneCrit1,#mermaid-svg-qdDR6q7sn6wvwUpz .doneCrit2,#mermaid-svg-qdDR6q7sn6wvwUpz .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-qdDR6q7sn6wvwUpz .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-qdDR6q7sn6wvwUpz .milestoneText{font-style:italic}#mermaid-svg-qdDR6q7sn6wvwUpz .doneCritText0,#mermaid-svg-qdDR6q7sn6wvwUpz .doneCritText1,#mermaid-svg-qdDR6q7sn6wvwUpz .doneCritText2,#mermaid-svg-qdDR6q7sn6wvwUpz .doneCritText3{fill:#000 !important}#mermaid-svg-qdDR6q7sn6wvwUpz .activeCritText0,#mermaid-svg-qdDR6q7sn6wvwUpz .activeCritText1,#mermaid-svg-qdDR6q7sn6wvwUpz .activeCritText2,#mermaid-svg-qdDR6q7sn6wvwUpz .activeCritText3{fill:#000 !important}#mermaid-svg-qdDR6q7sn6wvwUpz .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-qdDR6q7sn6wvwUpz g.classGroup text .title{font-weight:bolder}#mermaid-svg-qdDR6q7sn6wvwUpz g.clickable{cursor:pointer}#mermaid-svg-qdDR6q7sn6wvwUpz g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-qdDR6q7sn6wvwUpz g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-qdDR6q7sn6wvwUpz .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-qdDR6q7sn6wvwUpz .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-qdDR6q7sn6wvwUpz .dashed-line{stroke-dasharray:3}#mermaid-svg-qdDR6q7sn6wvwUpz #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz .commit-id,#mermaid-svg-qdDR6q7sn6wvwUpz .commit-msg,#mermaid-svg-qdDR6q7sn6wvwUpz .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-qdDR6q7sn6wvwUpz g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-qdDR6q7sn6wvwUpz g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-qdDR6q7sn6wvwUpz g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-qdDR6q7sn6wvwUpz .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-qdDR6q7sn6wvwUpz .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-qdDR6q7sn6wvwUpz .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-qdDR6q7sn6wvwUpz .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-qdDR6q7sn6wvwUpz .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-qdDR6q7sn6wvwUpz .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-qdDR6q7sn6wvwUpz .edgeLabel text{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-qdDR6q7sn6wvwUpz .node circle.state-start{fill:black;stroke:black}#mermaid-svg-qdDR6q7sn6wvwUpz .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-qdDR6q7sn6wvwUpz #statediagram-barbEnd{fill:#9370db}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-state .divider{stroke:#9370db}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-qdDR6q7sn6wvwUpz .note-edge{stroke-dasharray:5}#mermaid-svg-qdDR6q7sn6wvwUpz .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-qdDR6q7sn6wvwUpz .error-icon{fill:#522}#mermaid-svg-qdDR6q7sn6wvwUpz .error-text{fill:#522;stroke:#522}#mermaid-svg-qdDR6q7sn6wvwUpz .edge-thickness-normal{stroke-width:2px}#mermaid-svg-qdDR6q7sn6wvwUpz .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-qdDR6q7sn6wvwUpz .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-qdDR6q7sn6wvwUpz .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-qdDR6q7sn6wvwUpz .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-qdDR6q7sn6wvwUpz .marker{fill:#333}#mermaid-svg-qdDR6q7sn6wvwUpz .marker.cross{stroke:#333}:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-qdDR6q7sn6wvwUpz {color: rgba(0, 0, 0, 0.75);font: ;}

通过各种reshape,注意reshape顺序至关重要
W_k->normalization->reshape
W_k->normalization->reshape
softmax
Ins.Norm
输入特征: 15,512,22,22
输入特征: 15*22*22token个数,1batch,512dim
Query: 7260,1,512
Key: 7260,1,512
Value: 7260,1,512
Query: 1,7260,128
Key: 1,128,7260
乘1,7260,7260
attention矩阵: 1,7260,7260
乘1,7260,512->7260,1,512
加:7260,1,512
输出transformer encode的template特征self.transformer_memory 7260,1,512
  • 对输入特征(15,512,22,22)遍历第一维度(样本),对每个样本分别通过transformer decode,然后将每隔decode的特征(1,512,22,22)拼接成(15,512,22,22)得到最终的样本特征,再输入DiMP中的分类模型。每个样本特征的transformer decode步骤如下:

      1. self-attention参数与encode共享2. cross-attention1((query=tgt(484,1,512), key=memory(7260,1,512), value=pos(7260,1,512)):通过self.transformer_label:(15,1,22,22)复制512份,对每个维度的特征施加相同的mask,**mask的作用类似与位置嵌入**)3. cross-attention2:这里输入的value=memory*pos4. 所有attention中Query和Key共用W_k、没有转换矩阵W_v、n_head=15. Transformer Decode结构只有1次且没有FFN
    
#mermaid-svg-AXTvahbwX34UNIeC .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-AXTvahbwX34UNIeC .label text{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .node rect,#mermaid-svg-AXTvahbwX34UNIeC .node circle,#mermaid-svg-AXTvahbwX34UNIeC .node ellipse,#mermaid-svg-AXTvahbwX34UNIeC .node polygon,#mermaid-svg-AXTvahbwX34UNIeC .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-AXTvahbwX34UNIeC .node .label{text-align:center;fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .node.clickable{cursor:pointer}#mermaid-svg-AXTvahbwX34UNIeC .arrowheadPath{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-AXTvahbwX34UNIeC .flowchart-link{stroke:#333;fill:none}#mermaid-svg-AXTvahbwX34UNIeC .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-AXTvahbwX34UNIeC .edgeLabel rect{opacity:0.9}#mermaid-svg-AXTvahbwX34UNIeC .edgeLabel span{color:#333}#mermaid-svg-AXTvahbwX34UNIeC .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-AXTvahbwX34UNIeC .cluster text{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-AXTvahbwX34UNIeC .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-AXTvahbwX34UNIeC text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-AXTvahbwX34UNIeC .actor-line{stroke:grey}#mermaid-svg-AXTvahbwX34UNIeC .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-AXTvahbwX34UNIeC .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-AXTvahbwX34UNIeC #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-AXTvahbwX34UNIeC .sequenceNumber{fill:#fff}#mermaid-svg-AXTvahbwX34UNIeC #sequencenumber{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC #crosshead path{fill:#333;stroke:#333}#mermaid-svg-AXTvahbwX34UNIeC .messageText{fill:#333;stroke:#333}#mermaid-svg-AXTvahbwX34UNIeC .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-AXTvahbwX34UNIeC .labelText,#mermaid-svg-AXTvahbwX34UNIeC .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-AXTvahbwX34UNIeC .loopText,#mermaid-svg-AXTvahbwX34UNIeC .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-AXTvahbwX34UNIeC .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-AXTvahbwX34UNIeC .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-AXTvahbwX34UNIeC .noteText,#mermaid-svg-AXTvahbwX34UNIeC .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-AXTvahbwX34UNIeC .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-AXTvahbwX34UNIeC .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-AXTvahbwX34UNIeC .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-AXTvahbwX34UNIeC .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .section{stroke:none;opacity:0.2}#mermaid-svg-AXTvahbwX34UNIeC .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-AXTvahbwX34UNIeC .section2{fill:#fff400}#mermaid-svg-AXTvahbwX34UNIeC .section1,#mermaid-svg-AXTvahbwX34UNIeC .section3{fill:#fff;opacity:0.2}#mermaid-svg-AXTvahbwX34UNIeC .sectionTitle0{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .sectionTitle1{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .sectionTitle2{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .sectionTitle3{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-AXTvahbwX34UNIeC .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .grid path{stroke-width:0}#mermaid-svg-AXTvahbwX34UNIeC .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-AXTvahbwX34UNIeC .task{stroke-width:2}#mermaid-svg-AXTvahbwX34UNIeC .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .taskText:not([font-size]){font-size:11px}#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-AXTvahbwX34UNIeC .task.clickable{cursor:pointer}#mermaid-svg-AXTvahbwX34UNIeC .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-AXTvahbwX34UNIeC .taskText0,#mermaid-svg-AXTvahbwX34UNIeC .taskText1,#mermaid-svg-AXTvahbwX34UNIeC .taskText2,#mermaid-svg-AXTvahbwX34UNIeC .taskText3{fill:#fff}#mermaid-svg-AXTvahbwX34UNIeC .task0,#mermaid-svg-AXTvahbwX34UNIeC .task1,#mermaid-svg-AXTvahbwX34UNIeC .task2,#mermaid-svg-AXTvahbwX34UNIeC .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutside0,#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutside2{fill:#000}#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutside1,#mermaid-svg-AXTvahbwX34UNIeC .taskTextOutside3{fill:#000}#mermaid-svg-AXTvahbwX34UNIeC .active0,#mermaid-svg-AXTvahbwX34UNIeC .active1,#mermaid-svg-AXTvahbwX34UNIeC .active2,#mermaid-svg-AXTvahbwX34UNIeC .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-AXTvahbwX34UNIeC .activeText0,#mermaid-svg-AXTvahbwX34UNIeC .activeText1,#mermaid-svg-AXTvahbwX34UNIeC .activeText2,#mermaid-svg-AXTvahbwX34UNIeC .activeText3{fill:#000 !important}#mermaid-svg-AXTvahbwX34UNIeC .done0,#mermaid-svg-AXTvahbwX34UNIeC .done1,#mermaid-svg-AXTvahbwX34UNIeC .done2,#mermaid-svg-AXTvahbwX34UNIeC .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-AXTvahbwX34UNIeC .doneText0,#mermaid-svg-AXTvahbwX34UNIeC .doneText1,#mermaid-svg-AXTvahbwX34UNIeC .doneText2,#mermaid-svg-AXTvahbwX34UNIeC .doneText3{fill:#000 !important}#mermaid-svg-AXTvahbwX34UNIeC .crit0,#mermaid-svg-AXTvahbwX34UNIeC .crit1,#mermaid-svg-AXTvahbwX34UNIeC .crit2,#mermaid-svg-AXTvahbwX34UNIeC .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-AXTvahbwX34UNIeC .activeCrit0,#mermaid-svg-AXTvahbwX34UNIeC .activeCrit1,#mermaid-svg-AXTvahbwX34UNIeC .activeCrit2,#mermaid-svg-AXTvahbwX34UNIeC .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-AXTvahbwX34UNIeC .doneCrit0,#mermaid-svg-AXTvahbwX34UNIeC .doneCrit1,#mermaid-svg-AXTvahbwX34UNIeC .doneCrit2,#mermaid-svg-AXTvahbwX34UNIeC .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-AXTvahbwX34UNIeC .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-AXTvahbwX34UNIeC .milestoneText{font-style:italic}#mermaid-svg-AXTvahbwX34UNIeC .doneCritText0,#mermaid-svg-AXTvahbwX34UNIeC .doneCritText1,#mermaid-svg-AXTvahbwX34UNIeC .doneCritText2,#mermaid-svg-AXTvahbwX34UNIeC .doneCritText3{fill:#000 !important}#mermaid-svg-AXTvahbwX34UNIeC .activeCritText0,#mermaid-svg-AXTvahbwX34UNIeC .activeCritText1,#mermaid-svg-AXTvahbwX34UNIeC .activeCritText2,#mermaid-svg-AXTvahbwX34UNIeC .activeCritText3{fill:#000 !important}#mermaid-svg-AXTvahbwX34UNIeC .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-AXTvahbwX34UNIeC g.classGroup text .title{font-weight:bolder}#mermaid-svg-AXTvahbwX34UNIeC g.clickable{cursor:pointer}#mermaid-svg-AXTvahbwX34UNIeC g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-AXTvahbwX34UNIeC g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-AXTvahbwX34UNIeC .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-AXTvahbwX34UNIeC .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-AXTvahbwX34UNIeC .dashed-line{stroke-dasharray:3}#mermaid-svg-AXTvahbwX34UNIeC #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC .commit-id,#mermaid-svg-AXTvahbwX34UNIeC .commit-msg,#mermaid-svg-AXTvahbwX34UNIeC .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-AXTvahbwX34UNIeC g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-AXTvahbwX34UNIeC g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-AXTvahbwX34UNIeC g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-AXTvahbwX34UNIeC .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-AXTvahbwX34UNIeC .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-AXTvahbwX34UNIeC .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-AXTvahbwX34UNIeC .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-AXTvahbwX34UNIeC .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-AXTvahbwX34UNIeC .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-AXTvahbwX34UNIeC .edgeLabel text{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-AXTvahbwX34UNIeC .node circle.state-start{fill:black;stroke:black}#mermaid-svg-AXTvahbwX34UNIeC .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-AXTvahbwX34UNIeC #statediagram-barbEnd{fill:#9370db}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-state .divider{stroke:#9370db}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-AXTvahbwX34UNIeC .note-edge{stroke-dasharray:5}#mermaid-svg-AXTvahbwX34UNIeC .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-AXTvahbwX34UNIeC .error-icon{fill:#522}#mermaid-svg-AXTvahbwX34UNIeC .error-text{fill:#522;stroke:#522}#mermaid-svg-AXTvahbwX34UNIeC .edge-thickness-normal{stroke-width:2px}#mermaid-svg-AXTvahbwX34UNIeC .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-AXTvahbwX34UNIeC .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-AXTvahbwX34UNIeC .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-AXTvahbwX34UNIeC .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-AXTvahbwX34UNIeC .marker{fill:#333}#mermaid-svg-AXTvahbwX34UNIeC .marker.cross{stroke:#333}:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-AXTvahbwX34UNIeC {color: rgba(0, 0, 0, 0.75);font: ;}

通过各种reshape,注意reshape顺序至关重要
经过与encode相同的过程,而且参数共享
经过与encode相同的过程,而且参数共享
经过与encode相同的过程,而且参数共享
cross-attention1,本质也是self-attention
cross-attention2,本质也是self-attention
Ins.Norm
reshape
输入特征: 1,512,22,22
输入特征: *22*22token个数,1batch,512dim
Query: 484,1,512
Key: 484,1,512
Value: 484,1,512
输出特征:484,1,512
mask:484,1,512
输出特征*mask->Ins.Norm:484,1,512
484,1,512
加->Ins.Norm:484,1,512
输出transformer decode的template特征:1,512,22,22

后续帧self.init_classifier()

主要干两件事

  • 根据跟踪结果判断是否更新self.transformer_memory,以及pos来更新W
  • 将后续帧的特征输入transformer decode得到text_x,再与W卷积得到score map

Transformer Tracker

Transformer单目标跟踪相关推荐

  1. 单目标跟踪——【数据集基准】RGB数据集OTB / NFS / TrackingNet / LaSOT / GOT-10k / UAV123 / VOT 简介

    入手单目标跟踪的三个方面之数据集简介 目前单目标跟踪领域有哪些公认比较好的数据集? 这可以从一些优秀论文中找,看他们在验证自己的tracker用到哪些数据集.这些数据集的侧重不一,有的会包含快速移动: ...

  2. 352万帧标注图片,1400个视频,亮风台推最大单目标跟踪数据集

    CVPR 2019期间,专注于AR技术,整合软硬件的人工智能公司亮风台公开大规模单目标跟踪高质量数据集LaSOT,包含超过352万帧手工标注的图片和1400个视频,这也是目前为止最大的拥有密集标注的单 ...

  3. python 粒子滤波目标追踪_Python实现基于相关滤波的单目标跟踪算法

    最近在阅读一些单目标跟踪的算法论文,主要看了一系列基于相关滤波的算法,尝试着用python实现了其中一些比较经典的算法,地址在https://github.com/wwdguu/pyCFTracker ...

  4. CVPR 2019 | 亮风台发布全球最大单目标跟踪数据集LaSOT

    点击我爱计算机视觉标星,更快获取CVML新技术 CVPR 2019 正在美国加州举行,发布交流来自全球的工业界与学术界最新研究成果. 会上,亮风台公开大规模单目标跟踪高质量数据集LaSOT,包含超过3 ...

  5. 单目标跟踪——个人笔记

    单目标跟踪--个人笔记 以<Handcrafted and Deep Trackers: A Review of Recent ObjectTracking Approaches>为主线看 ...

  6. 基于嵌入式设备的 单目标跟踪算法

    基于嵌入式设备的单目标跟踪实现 最近基于嵌入式设备(Khadas Vim3)做了一套单目标跟踪算法,跟踪效率可以做到每秒25帧左右. 算法运行耗时记录 time is:37.6241 ms time ...

  7. 视觉单目标跟踪任务概述

      视觉目标跟踪的主要目的是:模仿生理视觉系统的运动感知功能,通过对摄像头捕获到的图像序列进行分析,计算出运动目标在每一帧图像中的位置:然后,根据运动目标相关的特征值,将图像序列中连续帧的同一运动目标 ...

  8. 单目标跟踪SiamMask:特定目标车辆追踪 part2

    日萌社 人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新) CNN:RCNN.SPPNet.Fast RCNN.Faste ...

  9. 单目标跟踪通过CAM绘制heatmap图像(以SiamCAR为例)

    论文链接: SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking Group-C ...

最新文章

  1. APMServ下Xdebug安装与使用
  2. 【c语言】蓝桥杯算法提高 输入输出格式练习
  3. 牛客网数据开发题库_数据库刷题—牛客网(21-30)
  4. [vue] 怎么在vue中使用插件?
  5. JSONP跨域原理和jQuery.getJSON用法
  6. py2exe将脚本生成exe可执行文件,不用安装复杂的环境了
  7. 【BlackHat】黑帽大会上值得关注的安全工具
  8. Java Timer TimerTask示例
  9. hbase shell基础和常用命令详解
  10. 清理windows10系统垃圾文件 bat批处理命令(进阶版)
  11. 在线重建索引 oracle,ORACLE重建索引详解
  12. py从入门到实践 第四章
  13. 中国工程师如何获 Google 的工作机会?
  14. 《四世同堂》金句摘抄(十)
  15. DBeaver 修改快捷键(自定义快捷键位)
  16. 一张图快速get浅层辛普森公式
  17. 《数据挖掘导论》- 读书笔记(3)- 数据
  18. 7. (8.10~8.31)2022年自动化保研信息汇总(预推免)
  19. /*数电 译码器3-8变4-16
  20. 《永不止步》--[奥]力克胡哲

热门文章

  1. 利用企业微信免费发送各种信息,开发网页客服
  2. c语言 一个数组奇数左边 偶数右边,C语言设计实验报告(第七次)
  3. calcite查询mysql_Apache Calcite教程-SQL解析-Calcite SQL解析
  4. Docker镜像常用命令
  5. 老旗舰华为能用上鸿蒙吗,千元机也能用鸿蒙!曝荣耀 9X 手机年内全部升级鸿蒙系统...
  6. 基于VQ适量特征的说话人识别
  7. Android使用MediaCodec进行视频编码 视频的一些基础概念介绍
  8. 阅读的 10 大好处:为什么你应该每天阅读
  9. 【论文阅读01】2021 兵棋推演的智能决策技术与挑战 尹奇跃
  10. 小知识:什么是build.prop?