《Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Camer》


This paper addresses 360-degree road scene semantic segmentation using surround view cameras.使用环视相机解决360°道路语义分割问题。


First, in order to address large distortion problem in the fisheye images, Restricted Deformable Convolution (RDC) is proposed for semantic segmentation, which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map. Second, in order to obtain a large-scale training set of surround view images, a novel method called zoom augmentation is proposed to transform conventional images to fisheye images. Finally, an RDC based semantic segmentation model is built; the model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images.



First, the Restricted Deformable Convolution (RDC) is proposed to enhance the transformation modeling capability of CNNs, so that the net can handle the images with large distortions. Second, in order to enrich surround view training data which
are lacking, the zoom augmentation method is proposed to transform conventional images to fisheye images. Two existing
complementary datasets are transformed using this method. Finally, an RDC based semantic segmentation model is trained
for real-world surround view images through a multi-task learning architecture with the approaches of AdaBN and HLW.

首先,提出了受限可变形卷积(Restricted Deformable Convolution, RDC)算法,以增强CNNs的变换建模能力,使网络能够处理较大的畸变图像。其次,为了丰富缺少的环绕视图训练数据,提出了将传统图像转化为鱼眼图像的缩放增强方法。使用此方法转换两个现有的互补数据集。最后,采用AdaBN和HLW两种方法,通过多任务学习结构,对真实环境下的环绕视图图像进行基于RDC的语义分割模型训练。


We compare the proposed approach with fine-tuned FCNVGG16, ENet, and ERFNet on the test set of SVScape.
These models are fine-tuned from the pretrained weights on Cityscapes dataset. Table V shows per-class accuracy results.


RDC has a good ability to model geometric transformations and is less prone to saturation. Deformable convolution shows
a better ability of modeling geometric transformations if only applied to the last few convolutional layers. As future work, RDC and deformable convolution should be combined in one network to further enhance the CNNs’ transformation modeling ability. Future work also needs to incorporate weakly or other domain adaptation methods to further improve the performance on real surround view images.



