参考代码：LaneATT

1. 概述

导读：这篇文章提出了一种使用anchor points进行车道线检测的算法，该算法的设计源自于Line-CNN。其在Line-CNN的基础上增加了一个global attention操作（在“RoI pooling”特征基础上）使得抽取的单个anchor的RoI特征能够感知全局范围的特征，从而利用了全局信息实现车道线更好的定位。在文中还提出在训练集上统计对预先设置anchor集合进行筛选，从而减少proposal数量，可以在原本的基础上进一步减少最后的计算量，因而文章的在保持效果的同时具有较好的车道线检测性能。不过文章的车道线检测回归的目标是车道线上的点，并没有去区分车道线的实例，要确立具体的车道线实例需要采用类似启发式算法去解决。

文章的方法很大程度上源自于Line-CNN，其来源为：
Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit

在该算法中将车道线检测归纳为左右下3个方向进行RoI proposal，产生的proposal是一系列角度的点集，之后再再这些点集上进行回归，所以按照道理来讲这样的方式是可以进行一定程度扭曲程度曲线的预测的，对此文章给出的一些图例也可以看出其预测效果。Line-CNN的网络结构见下图所示：

可以看到它可以看作是一个“二阶段”的检测网络，其也是先在左右下3个方向上产生RoI proposal，之后将这些proposal提出再进行分类个车道线回归，下面是传统二阶段检测网络和文章提出的检测网络的对比：

2. 方法设计

2.1 网络结构

文章的算法在Line-CNN的基础上进行改进而来，主要的差异在使用了global attention操作，从而使得RoI proposal的特征具有了感知全局的特性。文章的网络结构见下图所示：

2.2 anchor机制

2.2.1 anchor数量确定

backbone产生的特征图尺寸为F∈RCF∗HF∗WFF\in R^{C_F*H_F*W_F}F∈RCF∗HF∗WF，文章的anchor是在其左右下3个方向上进行设置的，其中左右分别6个角度变量，y轴方向上的72个切分点；下面包含15个角度变量和x轴方向的128个切分点（网络输入的图像尺寸是360∗640360*640360∗640，所以自然x轴方向的切分相对多一些）。因而在左右下3个方向上的所有proposal总量为：
N=2∗6∗72+15∗128=2784N=2*6*72+15*128=2784N=2∗6∗72+15∗128=2784
但是这么多的数量自然带来很多的计算量，文章提出的一种解决方案是采用数据驱动的方式统计众数，之后取top-k，按照这样的思路，文章将最后取得anchor集合总数描述为NancN_{anc}Nanc。

2.2.2 anchor的种类与生成

将文章讲述的内容与实际的代码进行比较，发现代码中提出的anchor集合类型是有两个的，它们之间的区别在于使用的采样点数量不同，一个是使用NptsN_{pts}Npts（代码中描述为self.anchors），另外一个为HFH_FHF（代码中描述为self.anchors_cut），这两种anchor在不同的阶段完成不同的任务。

对于最后用于生成检测结果使用的是self.anchors。其中采样的点数为NptsN_{pts}Npts，则其中在y轴方向上的采样点描述为Y={yi}i=0Npts−1Y=\{y_i\}_{i=0}^{N_{pts}-1}Y={yi}i=0Npts−1，其中的一个点为yi=i⋅HINpts−1y_i=i\cdot\frac{H_I}{N_{pts}-1}yi=i⋅Npts−1HI。对应的x轴方向上的集合描述为：X={xi}i=0Npts−1X=\{x_i\}_{i=0}^{N_{pts}-1}X={xi}i=0Npts−1，那么xix_ixi和yiy_iyi它们之间和角度θ\thetaθ关联起来可以得到下面的关系式子（PS：下面的式子与代码中的算式存在矛盾）：
xi=⌊1tanθ(yi−yorigδback)+xorigδback⌋x_i=\lfloor\frac{1}{tan\theta}(y_i-\frac{y_{orig}}{\delta_{back}})+\frac{x_{orig}}{\delta_{back}}\rfloorxi=⌊tanθ1(yi−δbackyorig)+δbackxorig⌋
其中，O=(xorig,yorig)O=(x_{orig},y_{orig})O=(xorig,yorig)代表的是当前anchor的起点，δback\delta_{back}δback代表的是backbone产生特征的stride。从上面的关系可以得到确定一个anchor需要两个关键属性：起点OOO以及旋转角度θ\thetaθ。

对于用于生成RoI proposal特征使用的是self.anchors_cut。其生成的过程与上文提到的anchor生成过程类似，只是在生成anchor的采样点数量上有所区别。

这两种anchor生成的过程可以参考下面的代码：

# lib/models/laneatt.py#L241
def generate_anchors(self, lateral_n, bottom_n):left_anchors, left_cut = self.generate_side_anchors(self.left_angles, x=0., nb_origins=lateral_n)right_anchors, right_cut = self.generate_side_anchors(self.right_angles, x=1., nb_origins=lateral_n)bottom_anchors, bottom_cut = self.generate_side_anchors(self.bottom_angles, y=1., nb_origins=bottom_n)return torch.cat([left_anchors, bottom_anchors, right_anchors]), torch.cat([left_cut, bottom_cut, right_cut])def generate_side_anchors(self, angles, nb_origins, x=None, y=None):if x is None and y is not None:  # 在X/Y指定的方向上进行切分，确定起点位置starts = [(x, y) for x in np.linspace(1., 0., num=nb_origins)]elif x is not None and y is None:starts = [(x, y) for y in np.linspace(1., 0., num=nb_origins)]else:raise Exception('Please define exactly one of `x` or `y` (not neither nor both)')n_anchors = nb_origins * len(angles)  # 切分的每个点与对应边的angle数量组成anchor的数量# each row, first for x and second for y:# 2 scores, 1 start_y, start_x, 1 lenght, S coordinates, score[0] = negative prob, score[1] = positive probanchors = torch.zeros((n_anchors, 2 + 2 + 1 + self.n_offsets))anchors_cut = torch.zeros((n_anchors, 2 + 2 + 1 + self.fmap_h))for i, start in enumerate(starts):for j, angle in enumerate(angles):k = i * len(angles) + janchors[k] = self.generate_anchor(start, angle)  # 以起点未开始构造anchor pointsanchors_cut[k] = self.generate_anchor(start, angle, cut=True)return anchors, anchors_cutdef generate_anchor(self, start, angle, cut=False):if cut:anchor_ys = self.anchor_cut_ysanchor = torch.zeros(2 + 2 + 1 + self.fmap_h)else:anchor_ys = self.anchor_ysanchor = torch.zeros(2 + 2 + 1 + self.n_offsets)angle = angle * math.pi / 180.  # degrees to radiansstart_x, start_y = startanchor[2] = 1 - start_yanchor[3] = start_xanchor[5:] = (start_x + (1 - anchor_ys - 1 + start_y) / math.tan(angle)) * self.img_wreturn anchor

2.3 roi proposal的生成

在文章的检测算法中也使用了类似RoI Pooling的操作，只不过其采用的是indexing的方式，也就是按照之前生成的self.anchors_cut在特征图的(C,H,W)(C,H,W)(C,H,W)维度上进行indexing，可以参考下面的函数实现代码：

# lib/models/laneatt.py#L208
def compute_anchor_cut_indices(self, n_fmaps, fmaps_w, fmaps_h):# definitionsn_proposals = len(self.anchors_cut)  # cut范围内的anchor数量# indexingunclamped_xs = torch.flip((self.anchors_cut[:, 5:] / self.stride).round().long(), dims=(1,))unclamped_xs = unclamped_xs.unsqueeze(2)unclamped_xs = torch.repeat_interleave(unclamped_xs, n_fmaps, dim=0).reshape(-1, 1)cut_xs = torch.clamp(unclamped_xs, 0, fmaps_w - 1)unclamped_xs = unclamped_xs.reshape(n_proposals, n_fmaps, fmaps_h, 1)invalid_mask = (unclamped_xs < 0) | (unclamped_xs > fmaps_w)cut_ys = torch.arange(0, fmaps_h)cut_ys = cut_ys.repeat(n_fmaps * n_proposals)[:, None].reshape(n_proposals, n_fmaps, fmaps_h)cut_ys = cut_ys.reshape(-1, 1)cut_zs = torch.arange(n_fmaps).repeat_interleave(fmaps_h).repeat(n_proposals)[:, None]return cut_zs, cut_ys, cut_xs, invalid_mask

产生了对应anchor的indexing结果索引之后，接下里在infer中去计算backbone的输出特征之后再计算得到RoI pooling之后的特征：

# lib/models/laneatt.py#L72
batch_features = self.conv1(batch_features)  # [B, 64, 20, 11]
batch_anchor_features = self.cut_anchor_features(batch_features)  # [B, n_proposals, 64, 11, 1]

其中使用的特征抽取函数为：

# lib/models/laneatt.py#L226
def cut_anchor_features(self, features):# definitionsbatch_size = features.shape[0]n_proposals = len(self.anchors)n_fmaps = features.shape[1]batch_anchor_features = torch.zeros((batch_size, n_proposals, n_fmaps, self.fmap_h, 1), device=features.device)# actual cuttingfor batch_idx, img_features in enumerate(features):rois = img_features[self.cut_zs, self.cut_ys, self.cut_xs].view(n_proposals, n_fmaps, self.fmap_h, 1)rois[self.invalid_mask] = 0batch_anchor_features[batch_idx] = roisreturn batch_anchor_features

2.4 attention机制

在上述的过程中得到了RoI去定的特征（ailoc∈RCF∗HFa_i^{loc}\in R^{C_F*H_F}ailoc∈RCF∗HF）之后文章对其使用global attention运算，使得每个RoI都具有感知全局信息的能力。attention的过程其实就是使用一个FC层去预测得到对应的映射权重，其权重的计算过程关系描述为：
wi,j={softmax(Latt(ailoc))j,if j<i0,if j=isoftmax(Latt(ailoc))j−1,if j>iw_{i,j} = \begin{cases} softmax(L_{att}(a_i^{loc}))_j, & \text{if $j\lt i$} \\ 0, & \text{if $j=i$} \\ softmax(L_{att}(a_i^{loc}))_{j-1}, & \text{if $j\gt i$} \end{cases} wi,j=⎩⎪⎨⎪⎧softmax(Latt(ailoc))j,0,softmax(Latt(ailoc))j−1,if j<iif j=iif j>i
经过全局信息增强之后的特征描述为：
aiglob=∑jwi,jajloca_i^{glob}=\sum_jw_{i,j}a_j^{loc}aiglob=j∑wi,jajloc
上面的过程自然也可以写成矩阵相乘的形式更加高效，对于这部分的实现，可以参考下面的代码：

# lib/models/laneatt.py#L79
# Add attention features
softmax = nn.Softmax(dim=1)
scores = self.attention_layer(batch_anchor_features)  # [B*n_proposals, 64*11]
attention = softmax(scores).reshape(x.shape[0], len(self.anchors), -1)  # [B, n_proposals, 64*11]
attention_matrix = torch.eye(attention.shape[1], device=x.device).repeat(x.shape[0], 1, 1)  # [B, n_proposals, n_proposals]
non_diag_inds = torch.nonzero(attention_matrix == 0., as_tuple=False)  # 取出非对角线区域
attention_matrix[:] = 0
attention_matrix[non_diag_inds[:, 0], non_diag_inds[:, 1], non_diag_inds[:, 2]] = attention.flatten()
batch_anchor_features = batch_anchor_features.reshape(x.shape[0], len(self.anchors), -1)  # [B, n_proposals, 64*11]
attention_features = torch.bmm(torch.transpose(batch_anchor_features, 1, 2),torch.transpose(attention_matrix, 1, 2)).transpose(1, 2)  # [B, n_proposals, 64*11]
attention_features = attention_features.reshape(-1, self.anchor_feat_channels * self.fmap_h)
batch_anchor_features = batch_anchor_features.reshape(-1, self.anchor_feat_channels * self.fmap_h)
batch_anchor_features = torch.cat((attention_features, batch_anchor_features), dim=1)  # [B*n_proposals, 2*64*11]

2.4 损失函数

网络预测的结果包含3个部分：
1）车道线检测的类别K+1K+1K+1，KKK是车道线的类别；
2）车道线检测的NptsN_{pts}Npts个采样点偏移；
3）生成的有效的采样点的数量lll，那么最后生成的采样点终点为e=s+⌊l⌉−1e=s+\lfloor l\rceil-1e=s+⌊l⌉−1；

在计算loss的时候需要预测结果和GT进行匹配，匹配的时候需要计算两个预测车道线D(Xa,Xb)D(X_a,X_b)D(Xa,Xb)之间的IoU，对此文章提出了针对所在场景的IoU计算算子：
D(Xa,Xb)={1e′−s′+1⋅∑i=s′e′∣xia−xib∣,e′≥s′+∞,otherwiseD(X_a,X_b) = \begin{cases} \frac{1}{e^{'}-s^{'}+1}\cdot\sum_{i=s^{'}}^{e^{'}}|x_i^a-x_i^b|, & \text{$e^{'}\ge s^{'}$} \\ +\infty, & \text{otherwise} \end{cases} D(Xa,Xb)={e′−s′+11⋅∑i=s′e′∣xia−xib∣,+∞,e′≥s′otherwise
文章的损失函数是典型的检测网络的损失函数，也就是分类损失加上回归损失的形式，很多个采样点进行累加可以描述为：
L({pi,ri}i=0Np&n−1)=λ∑iLcls(pi,pi∗)+∑iLreg(ri,ri∗)L(\{p_i,r_i\}_{i=0}^{N_{p\&n}-1})=\lambda\sum_iL_{cls}(p_i,p_i^{*})+\sum_iL_{reg}(r_i,r_i^{*})L({pi,ri}i=0Np&n−1)=λi∑Lcls(pi,pi∗)+i∑Lreg(ri,ri∗)

3. 实验结果

性能比较（TuSimple）：

性能比较（CULane）：

例子检测示例（具有一定弯曲车道线的检测能力）：

《Keep your Eyes on the Lane：Real-time Attention-guided Lane Detection》相关推荐

AI：2020年6月22日北京智源大会演讲分享之机器感知专题论坛—13:30-14:10山世光教授《从看脸到读心：基于视觉的情感感知技术》
AI:2020年6月22日北京智源大会演讲分享之机器感知专题论坛-13:30-14:10山世光教授<从看脸到读心:基于视觉的情感感知技术> 导读:首先感谢北京智源大会进行主题演讲的各领域顶 ...
Paper：人工智能可解释性的背景/方法/伦理/教育/可解释性的基本原理/Interpretability和Explainability区别之《可解释人工智能的教育视角：基于伦理和素养的思考》解读笔记
Paper:人工智能可解释性的背景/方法/伦理/教育/可解释性的基本原理/Interpretability和Explainability区别之<可解释人工智能的教育视角:基于伦理和素养的思考&g ...
《升职，凭什么是你：内卷时代快速升职法则》读书笔记
@Bing 提炼<升职,凭什么是你:内卷时代快速升职法则 >尼亚姆·奥基夫,这本书的主要内容您好,这是Bing.根据网上的信息,这本书的主要内容是: <升职,凭什么是你:内卷时代快 ...
《Istio 从懵圈到熟练：二分之一活的微服务》
作者 | 声东阿里云售后技术专家 <关注阿里巴巴云原生公众号,回复排查即可下载电子书> <深入浅出 Kubernetes>一书共汇集 12 篇技术文章,帮助你一次搞懂 ...
《TridentNet：Scale-Aware Trident Networks for Object Detection》论文笔记
代码地址:TridentNet 1. 概述导读:对于检测网络来说网络的深度.stride大小与感受野会直接影响检测性能,对于网络的深度与stride大小已经有很多前人的工作在里面了,这篇文章从感受野 ...
《智能经济时代初现雏形：数据赋能至上，数据共享先行》阅读笔记
【论文学习】《“Hello, It’s Me”: Deep Learning-based Speech Synthesis Attacks in the Real World》
<"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World>论文 ...
《智能数据时代：企业大数据战略与实战》一3.5　步步为营
本节书摘来自华章出版社<智能数据时代:企业大数据战略与实战>一书中的第3章,第3.5节,作者 TalkingData ,更多章节内容可以访问云栖社区"华章计算机"公众号 ...
《用于物联网的Arduino项目开发：实用案例解析》—— 3.4 小结
本节书摘来自华章出版社<用于物联网的Arduino项目开发:实用案例解析>一书中的第3章,第3.4节,作者［美］安德尔·杰韦德(Adeel Javed),更多章节内容可以访问云栖社区&q ...
《从问题到程序：用Python学编程和计算》——第2章计算和编程初步 2.1 数值表达式和算术...
本节书摘来自华章计算机<从问题到程序:用Python学编程和计算>一书中的第2章,第2.1节,作者裘宗燕,更多章节内容可以访问云栖社区"华章计算机"公众号查看. 第2 ...

《Keep your Eyes on the Lane：Real-time Attention-guided Lane Detection》