Recognizing the environment in one glance is one of the human brain’s most accomplished deeds. While the tremendous recent progress in object recognition tasks originates from the availability of large datasets such as COCO and the rise of Convolution Neural Networks ( CNNs) to learn high-level features, scene recognition performance has not achieved the same level of success.

一眼认清环境是人类大脑最成就的事迹之一。 尽管最近在对象识别任务中取得的巨大进步源于大型数据集(例如COCO)的可用性以及卷积神经网络(CNN)的兴起,以学习高级功能,但场景识别性能并未达到相同的成功水平。

In this blog post, we will see how classification models perform on classifying images of a scene. For this task, we have taken the Places365-Standard dataset to train the model. This dataset has 1,803,460 training images and 365 classes with the image number per class varying from 3,068 to 5,000 and size of images is 256*256.

在此博客文章中,我们将看到分类模型如何对场景图像进行分类。 为此,我们采用了Places365-Standard数据集来训练模型。 该数据集包含1,803,460个训练图像和365个类别,每个类别的图像编号从3,068到5,000不等,图像大小为256 * 256。

安装和下载数据 (Installing and Downloading the data)

Let’s start by setting up Monk and its dependencies:


!git clone https://github.com/Tessellate-Imaging/monk_v1.git! cd monk_v1/installation/Linux && pip install -r requirements_cu9.txt

After installing the dependencies, I downloaded the Places365-Standard dataset which is available to download from here.


创建实验 (Create an Experiment)

I have created an experiment, and for this task, I used mxnet gluon back-end.

我创建了一个实验,为此任务,我使用了mxnet gluon后端。

import osimport syssys.path.append("monk_v1/monk/");from gluon_prototype import prototypegtf = prototype(verbose=1);gtf.Prototype("Places_365", "Experiment");

选型与培训 (Model Selection and Training)

I experimented with various models like resnet, densenet, inception, vgg16, and many more but only vgg16 gives the greater validation accuracy than any other model.


gtf.Default(dataset_path="train/",            path_to_csv="labels.csv",            model_name="vgg16",            freeze_base_network=False,            num_epochs=20);gtf.Train();

After training for 20 epoch I got the training accuracy of 65% and validation accuracy of 53%.


预测 (Prediction)

gtf = prototype(verbose=1);gtf.Prototype("Places_365", "Experiment", eval_infer=True);img_name = "test_256/Places365_test_00208427.jpg" predictions = gtf.Infer(img_name=img_name);from IPython.display import ImageImage(filename=img_name)
Prediction on test images
img_name = "test_256/Places365_test_00151496.jpg" predictions = gtf.Infer(img_name=img_name);from IPython.display import ImageImage(filename=img_name)
Prediction on test images

After this, I tried to find out why the accuracy has not improved more than what I got. Some of the possible reasons are:

此后,我试图找出为什么精度没有比我得到的提高更多的原因。 一些可能的原因是:

Incorrect Labels:- While inspecting the training folder, there are images that have incorrect labels like baseball_field has the wrong image. There are many more incorrect labels.

标签不正确:-检查训练文件夹时,有些图像的标签不正确,例如balloon_field的图像错误。 还有更多不正确的标签。

Wrong Image in baseball_field
img=mpimg.imread(“images/train/baseball_field2469.jpg”)imgplot = plt.imshow(img)

Unclear Scenes:- Due to various similar classes that share similar objects like dining_room and dining_hall, forest_road and field_road, there are unclear images that are very hard to classify.


Label: field_road
Label: forest_road

As we can see it is very hard to classify these 2 images.


Multiple Scene Parts:- Images consist of multiple scenes parts can not be classified into one category like buildings near the ocean. These scenes can be hard to classify and require more ground truth labels for describing the environment.

多个场景部分:-由多个场景部分组成的图像无法分类为一类,例如海洋附近的建筑物。 这些场景可能难以分类,并且需要更多地面真实性标签来描述环境。

To summarize, this blog post has shown how we can use deep learning networks to perform a natural scene classification and why scene recognition performance has not achieved the same level of success as that of object recognition.


