-下载安装MSCOCO2015的image_caption数据集,下载方式

【linux】

具体步骤:
1.$ git clone https://github.com/pdollar/coco.git
2.$ mkdir images $ mkdir annotations

3.根据需求在http://cocodataset.org/#download下载需要的,unzip解压放在以上目录里

此处使用2014年caption,82783train,40504val, 40775testimages///【查看文件数目:http://www.jb51.net/article/56474.htm】

然后安装pythoncocoapi接口
4.$ cd PythonAPI
5.$ make
6.$ python $ import pycocotools不报错,即安装成功。
安装完成。

coco数据集的注释数据是以json格式存储的,coco很贴心的配置了数据读取的API。COCO API帮助加载、解析和可视化COCO中的注释。该API支持对象实例、对象关键点和图像标题注释(用于说明并非所有功能都已定义)。

7.写一个image_caption的demo.py

官方给的使用方法在下载目录下的pycocoDemo.ipynb文件下,包括了object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation几种对于不同数据集的不同调用方式。此处我们只看imagecaption的调用方式:

from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)
dataDir='/pytorch/image_caption/coco'
dataType='val2014'
# initialize COCO api for caption annotations\n",
annFile = '{}/annotations/captions_{}.json'.format(dataDir,dataType)
coco=COCO(annFile)
catIds = coco.getCatIds(catNms=['person','dog','skateboard'])
imgIds = coco.getImgIds(catIds=catIds )
imgIds = coco.getImgIds(imgIds = [324158])
img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]
# load and display caption annotations\n",
annIds = coco.getAnnIds(imgIds=img['id'])
anns = coco.loadAnns(annIds)
coco.showAnns(anns)

运行报错:

    catIds = coco.getCatIds(catNms=['person','dog','skateboard'])File "/us。。un/anaconda3/lib/python3.6/site-packages/pycocotools/coco.py", line 172, in getCatIdscats = self.dataset['categories']
KeyError: 'categories'

需要注意到:

pycocotools/coco.py文件下已声明:

使用API的另一种方法是直接将注释加载到Python字典中。
使用API提供了额外的实用功能。
注意,这个API同时支持*instance*和*caption*的annotations。
在captions的情况下,不是所有的函数都被定义了(例如,categories 是没有定义的)。
# getAnnIds -获得满足给定筛选条件的annotations id。
# getCatIds -获取满足给定筛选条件的categories id。
# getImgIds -获取满足给定筛选条件的imges id。 
# loadAnns -加载指定id的anns。
# loadCats -装载指定id的cats。
# loadImgs -装载指定id的imgs。

所以对于caption来说,我们不使用getCatIds和loadCats

所以demo.py修改如下

from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)
dataDir='/pytorch/image_caption/coco'
dataType='val2014'
# initialize COCO api for caption annotations\n",
annFile = '{}/annotations/captions_{}.json'.format(dataDir,dataType)
coco=COCO(annFile)
imgIds = coco.getImgIds(imgIds = [324158])
img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]
# load and display caption annotations\n",
annIds = coco.getAnnIds(imgIds=img['id'])
anns = coco.loadAnns(annIds)
coco.showAnns(anns)

完成。

其他参数暂做记载:

#  decodeMask - Decode binary mask M encoded via run-length encoding.
#  encodeMask - Encode binary mask M using run-length encoding.
#  annToMask  - Convert segmentation in an annotation to binary mask.
#  loadRes    - Load algorithm results and create API for accessing them.
#  download   - Download COCO images from mscoco.org server.
# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
# Help on each functions can be accessed by: "help COCO>function".

8. eval方式

https://github.com/tylin/coco-caption/blob/master/cocoEvalCapDemo.ipynb

windows步骤: https://www.jianshu.com/p/de455d653301

【ref】请访问http://cocodataset.org/获取更多关于COCO的信息,包括数据、论文和教程;
请参见http://cocodataset.org/download 查看有关API的详细信息。
下载MSCOCO2015的image_caption参考http://blog.csdn.net/happyhorizion/article/details/77894205#coco
http://blog.csdn.net/daniaokuye/article/details/78699138(附代码)
http://blog.csdn.net/u012905422/article/details/52372755#reply
一篇很好的文章https://zhuanlan.zhihu.com/p/22408033(上)
https://zhuanlan.zhihu.com/p/22520434(下)===以下来自于该链接内容

1. 对于image-caption,CNN-RNN大同小异,差别在于更好的CNN以及lstm等。
2. 输入图像进入CNN,输出为Vatt(I)向量,其长度为标签集合中标签的数量(也就是词汇表的数量)(每个维度上装的是某个标签对应的预测概率),输入到RNN。
3. Vatt(I)是否仅在RNN开始时输入?还是每一t都输入?有论文验证当每一t都输入的情况下会造成图像噪点和过拟合问题。
4. 属性预测部分:该论文,我个人感到最有价值的部分,还是在它的图像分析部分中如何从图像到属性的实现,这是它的核心创新点

9.使用torchvision调用的方法

https://pypi.python.org/pypi/torchvision/0.1.8

Example:

import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',annFile = 'json annotation file',transform=transforms.ToTensor())
print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sampleprint("Image Size: ", img.size())
print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

pytorch源码:https://github.com/pytorch/vision/blob/master/torchvision/datasets/coco.py

以下内容用于在无法fanqiang时 进行学习:http://cocodataset.org/#download 下的内容:

Tools

COCO API

Images

2014 Train images [83K/13GB]
2014 Val images [41K/6GB]
2014 Test images [41K/6GB]
2015 Test images [81K/12GB]
2017 Train images [118K/18GB]
2017 Val images [5K/1GB]
2017 Test images [41K/6GB]
2017 Unlabeled images [123K/19GB]

Annotations

2014 Train/Val annotations [241MB]
2014 Testing Image info [1MB]
2015 Testing Image info [2MB]
2017 Train/Val annotations [241MB]
2017 Stuff Train/Val annotations [401MB]
2017 Testing Image info [1MB]
2017 Unlabeled Image info [4MB]

1. Overview

Which dataset splits should you download? Each year's split is associated with different challenges. Specifically:

2014 Train/Val

Detection 2015, Captioning 2015, Detection 2016, Keypoints 2016

2014 Testing

Captioning 2015

2015 Testing

Detection 2015, Detection 2016, Keypoints 2016

2017 Train/Val

Detection 2017, Keypoints 2017, Stuff 2017

2017 Testing

Detection 2017, Keypoints 2017, Stuff 2017

2017 Unlabeled

[optional data for any competition]

If you are submitting to a 2017 challenge, you only need to download the 2017 data. You can disregard earlier data splits.

For efficiently downloading the images, we recommend using gsutil rsync to avoid the download of large zip files.

Please follow the instructions in the COCO API Readme to setup the downloaded COCO data (the images and annotations). By downloading this dataset, you agree to our Terms of Use.

2017 Update: The main change in 2017 is that instead of an 80K/40K train/val split, based on community feedback the split is now 115K/5K for train/val. The same exact images are used, and no new annotations for detection/keypoints are provided. However, new in 2017 are stuff annotations on 40K train images (subset of the full 115K train images from 2017) and 5K val images. Also, for testing, in 2017 the test set only has two splits (dev / challenge), instead of the four splits (dev / standard / reserve / challenge) used in previous years. Finally, new in 2017 we are releasing 120K unlabeled images from COCO that follow the same class distribution as the labeled images; this may be useful for semi-supervised learning on COCO.

Note: Annotations last updated 09/05/2017 (stuff annotations added). If you find any issues with the data please let us know!

2. COCO API

The COCO API assists in loading, parsing, and visualizing annotations in COCO. The API supports object instance, object keypoint, and image caption annotations (for captions not all functionality is defined). For additional details see: CocoApi.m, coco.py, and CocoApi.lua for Matlab, Python, and Lua code, respectively, and also the Python API demo.

Throughout the API "ann"=annotation, "cat"=category, and "img"=image.

download

Download COCO images from mscoco.org server.

getAnnIds

Get ann ids that satisfy given filter conditions.

getCatIds

Get cat ids that satisfy given filter conditions.

getImgIds

Get img ids that satisfy given filter conditions.

loadAnns

Load anns with the specified ids.

loadCats

Load cats with the specified ids.

loadImgs

Load imgs with the specified ids.

loadRes

Load algorithm results and create API for accessing them.

showAnns

Display the specified annotations.

3. MASK API

COCO provides segmentation masks for every object instance. This creates two challenges: storing masks compactly and performing mask computations efficiently. We solve both challenges using a custom Run Length Encoding (RLE) scheme. The size of the RLE representation is proportional to the number of boundaries pixels of a mask and operations such as area, union, or intersection can be computed efficiently directly on the RLE. Specifically, assuming fairly simple shapes, the RLE representation is O(√n) where n is number of pixels in the object, and common computations are likewise O(√n). Naively computing the same operations on the decoded masks (stored as an array) would be O(n).

The MASK API provides an interface for manipulating masks stored in RLE format. The API is defined below, for additional details see: MaskApi.m, mask.py, or MaskApi.lua. Finally, we note that a majority of ground truth masks are stored as polygons (which are quite compact), these polygons are converted to RLE when needed.

encode

Encode binary masks using RLE.

decode

Decode binary masks encoded via RLE.

merge

Compute union or intersection of encoded masks.

iou

Compute intersection over union between masks.

area

Compute area of encoded masks.

toBbox

Get bounding boxes surrounding encoded masks.

frBbox

Convert bounding boxes to encoded masks.

frPoly

Convert polygon to encoded mask.

4. Annotation format

COCO currently has three annotation types: object instances, object keypoints, and image captions. The annotations are stored using the JSON file format. All annotations share the basic data structure below:

{

"info"

: info,

"images"

: [image],

"annotations"

: [annotation],

"licenses"

: [license],

}

info{

"year"

: int,

"version"

: str,

"description"

: str,

"contributor"

: str,

"url"

: str,

"date_created"

: datetime,

}

image{

"id"

: int,

"width"

: int,

"height"

: int,

"file_name"

: str,

"license"

: int,

"flickr_url"

: str,

"coco_url"

: str,

"date_captured"

: datetime,

}

license{

"id"

: int,

"name"

: str,

"url"

: str,

}

The data structures specific to the various annotation types are described below.

4.4. Image Caption Annotations

These annotations are used to store image captions. Each caption describes the specified image and each image has at least 5 captions (some images have more). See also the Captioning Challenge.

annotation{

"id"

: int,

"image_id"

: int,

"caption"

: str,

}

http://cocodataset.org/#captions-eval下内容

1. Caption Evaluation

This page describes the caption evaluation code used by COCO and provides instructions for submitting results to the evaluation server. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple common metrics, including BLEU, METEOR, ROUGE-L, and CIDEr (the writeup below contains references and descriptions of each metric). If you use the captions, evaluation code, or server, we ask that you cite Microsoft COCO Captions: Data Collection and Evaluation Server:

@article{capeval2015,

Author={X. Chen and H. Fang and TY Lin and R. Vedantam and S. Gupta and P. Dollár and C. L. Zitnick}, 
Journal = {arXiv:1504.00325}, 
Title = {Microsoft COCO Captions: Data Collection and Evaluation Server}, 
Year = {2015}

}

To obtain results on the COCO test set, for which ground truth annotations are hidden, generated results must be submitted to the evaluation server. The exact same evaluation code, described below, is used to evaluate generated captions on the test set.

2. Evaluation Code

Evaluation code can be obtained on the coco-captions github page. Unlike the general COCO API, the COCO caption evaluation code is only available under Python. Before running the evaluation code, please prepare your results in the format described on the results format page.

Running the evaluation code produces two data structures that summarize caption quality. The two structs are evalImgsand eval, which summarize caption quality per-image and aggregated across the entire test set, respectively. Details for the two data structures are given below. We recommend running the python caption evaluation demo for more details.

evalImgs[{

"image_id"

: int,

"BLEU_1"

: float,

"BLEU_2"

: float,

"BLEU_3"

: float,

"BLEU_4"

: float,

"METEOR"

: float,

"ROUGE_L"

: float,

"CIDEr"

: float,

}]

eval{

"BLEU_1"

: float,

"BLEU_2"

: float,

"BLEU_3"

: float,

"BLEU_4"

: float,

"METEOR"

: float,

"ROUGE_L"

: float,

"CIDEr"

: float,

}

3. Upload Results

This rest of this page describes the upload instructions for submitting results to the caption evaluation server. Submitting results allows you to participate in the COCO Captioning Challenge 2015 and compare results to the state-of-the-art on the captioning leaderboard.

Training Data: The recommended training set for the captioning challenge is the COCO 2014 Training Set. The COCO 2014 Validation Set may also be used for training when submitting results on the test set. External data of any form is allowed (except any form of annotation on the COCO Testing set is forbidden). Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.

Please limit the number of entries to the captioning challenge to a reasonable number, e.g. one entry per paper. To avoid overfitting to the test data, the number of submissions per user is limited to 1 upload per day and a maximum of 5 submissions per user. It is not acceptable to create multiple accounts for a single project to circumvent this limit. The exception to this is if a group publishes two papers describing unrelated methods, in this case both sets of results can be submitted for evaluation.

First you need to create an account on CodaLab. From your account you will be able to participate in all COCO challenges.

Before uploading your results to the evaluation server, you will need to create two JSON files containing your captioning results in the correct results format. One file should correspond to your results on the 2014 validation dataset, and the other to the 2014 test dataset. Both sets of results are required for submission. Your files should be named as follows:

results.zip
  captions_val2014_[alg]_results.json
  captions_test2014_[alg]_results.json

Replace [alg] with your algorithm name and place both files into a single zip file named "results.zip".

To submit your zipped result file to the COCO Captioning Challenge click on the “Participate” tab on the CodaLab webpage. When you select “Submit / View Results” you will be given the option to submit new results. Please fill in the required fields and click “Submit”. A pop-up will prompt you to select the results zip file for upload. After the file is uploaded the evaluation server will begin processing. To view the status of your submission please select “Refresh Status”. Please be patient, the evaluation may take quite some time to complete. If the status of your submission is “Failed” please check to make sure your files are named correctly, they have the right format, and your zip file contains two files corresponding to the validation and testing datasets.

After you submit your results to the evaluation server, you can control whether your results are publicly posted to the CodaLab leaderboard. To toggle the public visibility of your results please select either “post to leaderboard” or “remove from leaderboard”. For now only one result can be published to the leaderboard at any time, we may change this in the future. After your results are posted to the CodaLab leaderboard, your captions on the validation dataset will be publicly available. Your captions on the test set will not be publicly released.

In addition to the CodaLab leaderboard, we also host our own more detailed leaderboard that includes additional results and method information (such as paper references). Note that the CodaLab leaderboard may contain results not yet migrated to our own leaderboard.

After evaluation is complete and the server shows a status of “Finished”, you will have the option to download your evaluation results by selecting “Download evaluation output from scoring step.” The zip file will contain five files:

captions_val2014_[alg]_evalimgs.json
captions_val2014_[alg]_eval.json
captions_test2014_[alg]_eval.json
metadata
scores.txt

% per image evaluation on val 
% aggregated evaluation on val 
% aggregated evaluation on test 
% auto generated (safe to ignore) 
% auto generated (safe to ignore)

The format of the json eval file is described earlier on this page. Please note that the *_evalImgs.json file is only available for download on the validation dataset, and not the test set.

http://cocodataset.org/#external下内容

External Annotations on COCO and Related Datasets

Please contact us to add your dataset here! Do not release annotations on test set images under any circumstances to keep the integrity of the COCO challenges intact (contact us with any questions). Note: the following datasets may use COCO data but are independent efforts not directly affiliated with COCO.

COCO-Stuff

COCO-Stuff augments the COCO dataset with pixel-level stuff annotations for 10,000 images. The 91 stuff classes are carefully selected to have a similar level of granularity to the thing classes in COCO, allowing the study of stuff and things in context.

FOIL

Don't be foiled: Detect the mistake and correct it! FOIL augments COCO images with incorrect ('foil') captions which differ from the original ones by introducing one single error. The dataset contains ~300K datapoints and 98K unique images.

SPEECH-COCO

SPEECH-COCO augments COCO with speech captions generated using TTS synthesis. The corpus contains 600K+ spoken captions, allowing research of language acquisition, term discovery, keyword spotting, or semantic embedding using speech and vision.

VISUAL GENOME

Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language.

RefCOCO

RefCOCO dataset was collected using the Refer-it Game. Each expression aims to unambiguously indicate a particular person or object in an image.

COCO Attributes

COCO Attributes has over 3.5M attribute annotations for People, Animals, and Objects from the COCO training dataset.

G-Ref

Google referring expression dataset (G-Ref) is a dataset focuses on unambiguous object text descriptions (i.e. referring expressions) that allow one to uniquely identify a single object or region within an image.

VerSe

VerSe annotates COCO images with OntoNotes senses for 90 verbs (actions) which have ambiguous visual usages. Along with the sense information we provide visualness labels for OntoNotes senses of 150 visual verbs.

COCO-Text

COCO-Text is for both text detection and recognition. The dataset annotates scene text with transcriptions along with attributes such as legibility, printed or handwritten text.

FM-IQA

The Freestyle Multilingual Image Question Answering (FM-IQA) dataset contains over 120,000 images and 250,000 freestyle Chinese question-answer pairs and their English translations.

VQA

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

VISUAL MADLIBS

Visual Madlibs is a new dataset consisting of focused natural language descriptions collected using automatically produced fill-in-the-blank templates. This dataset can be used for targeted generation or multiple-choice question-answering.

COCO-a

COCO-a annotates human actions and interactions with objects (or other people) with 140 visual actions (verbs with an unambiguous visual connotation), along with information such as emotional state and relative distance and position with the object.

SALICON

The SALICON dataset offers a large set of saliency annotations on the COCO dataset. This data complements the task-specific annotations to advance the ultimate goal of visual understanding.

PASCAL VOC

Annotations for PASCAL VOC 2007 and 2012 in COCO format. This allows use of the PASCAL detection data with the COCO API (including visualization and evaluation tools). JSON available here.

ImageNet Detection

Annotations for ImageNet 2014 train/val in COCO format. This allows use of the ImageNet detection data with the COCO API (including visualization and evaluation tools). JSON available here.

YOUR DATASET

Please contact us to add your dataset here! Do not release annotations on the test-set images under any circumstances to keep the integrity of the COCO challenges intact (please contact us with any questions if in doubt)

MSCOCO数据集下载安装---image_caption相关推荐

  1. MS coco数据集下载

    登录ms-co-co数据集官网,一直不能进入,翻墙之后开看到下载链接.有了下载链接下载还是很快的,在我这儿晚上下载,速度能达到7M/s,所以也不上传到网盘了,有需要的人等夜深人静的时候下载效果更佳哦. ...

  2. MSCOCO数据集分析

    MSCOCO数据集是多模态学习领域重要的数据集,网址:http://cocodataset.org/#home COCO is a large image dataset designed for o ...

  3. labelme 语义分割数据集_labelme安装以及使用教程——自制语义分割数据集(保姆级示范)...

    前言: 这次我们要做的是一个物堆(也可以是沙.矿堆之类)的数据集,用于语义分割.我们的工作,就是要使用labelme工具,利用原图1.jpg生成它的标签图片1.png,得到的最终结果像下面这样: 1. ...

  4. MSCOCO数据集转VOC数据集训练目标检测模型

    MSCOCO数据集转VOC数据集训练目标检测模型 Images 2014 Train images [83K/13GB] 2014 Val images [41K/6GB] 2014 Test ima ...

  5. Qgis的下载安装(Qgis3.16.12)

    1.下载 使用Qgis首先就是下载:下载 QGIS 我下载安装的是最新版64位的QGIS-OSGeo4W-3.16.12-1-Setup-x86_64.exe 下载完毕后,进行安装步骤 2.安装 双击 ...

  6. ImageNet数据集下载及整理

    记录一下下载过程防止遗忘: 如果只是用来做分类,可以先看第二个链接,下载其中三个文件即可. 1. 下载. 官网下载需要注册,找到了一个分享的网址: ImageNet 10 图像识别数据集 因为比较文件 ...

  7. TUH癫痫脑电数据集下载

    TUH癫痫脑电数据集下载 一.准备工作 1.wget安装 数据集的下载需要用到wget工具,如果是linux或者是macos操作系统可以直接下载,我这里是Windows操作系统,就需要先安装wget, ...

  8. 教程:Yolo5训练自己的数据集以及安装教程

    教程:Yolo5训练自己的数据集以及安装教程 [一]准备相关文件 1.Anaconda 2.Yolo5 3.权重文件 [二]环境搭建 1.在Anaconda中为Yolo5创建一个环境 2.Yolo5文 ...

  9. quickdraw数据集下载

      对于问题,百度最常见的答案是直接链接到https://github.com/googlecreativelab/quickdraw-dataset这个GitHub仓库,这个确实很重要,一定要认真阅 ...

最新文章

  1. kubeasz_使用kubeasz安装K8S集群,不受国内网络环境影响
  2. linux下如何为redis配置path,linux环境下如何启动redis
  3. 梯度下降法的三种形式批量梯度下降法、随机梯度下降以及小批量梯度下降法
  4. 用来理解 Java 编程语言的 8 个图表
  5. 用户扫描二维码进入公众号后自动发送指定消息_wetool的封杀、企业微信的崛起,我们要怎么利用企业微信进行用户增长?...
  6. Termux第一篇之ssh使用
  7. 《Master Bitcoin》学习笔记02——比特币的交易模型
  8. paip.提升用户体验---gcc c++ try catch不能捕获异常的解决--使用 SEH Ca6
  9. delphi问题归档(转载摘录)
  10. 怎么禁用计算机usb驱动,在Windows中启用或禁用USB驱动器或端口的5种方法 | MOS86...
  11. 为国内软件质量呐喊:《2021年国内质量调查报告》发布
  12. win10定时语音提醒
  13. 友善串口助手使用教程_友善串口调试助手基本功能怎么使用-友善串口调试助手使用教程...
  14. MySQL软件多次安装失败解决方案
  15. 用html5实现一个播放器,6 个基于 HTML5 实现的多媒体播放器
  16. 更换内存条需要注意什么
  17. C语言进阶第39式:程序中的三国天下
  18. 苹果平板哪款电容笔好用?超好用的苹果平板电容笔推荐
  19. 从边缘视角展望元宇宙
  20. 慢性病概念——>分类——>数据集

热门文章

  1. python3 urllib3_Python urllib3.disable_warnings方法代码示例
  2. 2022-6浙江工业大学计算机学院转专业一志愿机试题目
  3. Kinect DK相机标定 RGB及Depth内参获取
  4. 阿里+中科院提出:将角度margin引入到对比学习目标函数中并建模句子间不同相似程度...
  5. 玻璃钢 硅胶 石膏模具制作 大理石 花岗岩 树脂 仿玉背景墙 人造砂岩技术视频教程
  6. fusionsphere环境搭建_fusioncompute 环境搭建
  7. 蓝桥杯基础练习-数的读法
  8. 2017年电大计算机随学随考,2017年电大 电大本科计算机统考.doc
  9. suricata 学习使用
  10. JPA 组合查询之AND和OR组合查询