Scene understanding 场景理解感觉定义并不是十分明确,找了几个供参考。

LSUN Challenge 大规模场景理解比赛

The PASCAL VOC and ImageNet ILSVRC challenges have enabled significant progress for object recognition in the past decade. Beginning with CVPR 2015, we borrowed this mechanism to speed up the progress for scene understanding via the LSUN workshop. Complementary to the object-centric ImageNet ILSVRC Challenge hosted at ICCV/ECCV every year, we propose to continue hosting this scene-centric challenge at CVPR every year. Our challenge will focus on major tasks in scene understanding, including scene object retrieval, outdoor scene segmentation, RGB-D 3D object detection and saliency prediction. Inspired by recent successes using big data, such as deep learning, we focus on providing benchmarks that are significantly bigger and more diverse than the existing ones, to support training these data-hungry algorithms. By providing a set of large-scale benchmarks in an annual challenge format, we expect significant progress to continue for scene understanding in the coming years. Given the experience of our previous workshops, we are updating all of our existing tasks and rolling out new tasks.

  • scene object retrieval 场景目标检索
  • outdoor scene segmentation 室外场景分割
  • RGB-D 3D object detection RGB-D 3D 目标检测
  • saliency prediction 显著性预测

综述Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

One of the basic requirements of autonomous driving is to fully understand its surrounding area such as a complex traffic scene. The complex task of outdoor scene understanding involves several sub-tasks such as depth estimation, scene categorization, object detection and tracking, event categorization, and more. Each of these tasks describe particular aspect of a scene. It is beneficial to model some of these aspects jointly to exploit the relations between different elements of the scene and obtain a holistic understanding. The goal of most scene understanding
models is to obtain a rich but compact representation of the scene including all its elements e.g., layout elements, traffic participants and the relations with respect to each other. Compared to reasoning in the 2D image domain, 3D reasoning plays a significant role in solving geometric scene understanding problems and results in a more informative representation of the scene in the form of 3D object models, layout elements and occlusion relationships. One specific challenge in scene understanding is the interpretation of urban and sub-urban traffic scenarios. Compared to highways and rural roads, urban scenarios comprise many independently moving traffic participants, more variability in the geometric layout of roads and crossroads, and an increased level of difficulty due to ambiguous visual features and illumination changes.

  • 深度估计
  • 场景分类
  • 目标检测和跟踪
  • 事件分类

MIT 自动驾驶公开课

可以直观理解成为Where is someone else?
- 关于目标检测的
- 关于驾驶全场景分割的,比如说SegNet
- 从音频数据得到路况信息,分析路面纹理特征等



  • 目标检测
  • 语义分割
  • 场景解析和标注 Scene Parsing and Labelling




