Ukbench图像数据集官网地址：http://www.vis.uky.edu/~stewe/ukbench/

Revised set!

In the first set which went online there were some errors. Most notably one subset being included twice. Also some transposed images. Tests on the old set are invalid.

Recognition Benchmark Images

Henrik Stewénius and David Nistér

The set consists of N groups of 4 images each. All the images are 640x480.

If you use the dataset, please refer to:

D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161-2168, June 2006. [ bib | .ppt | .pdf ]

Subsets

For users of subsets of the database please note that the difficulty is dependent on the chosen subset. Important factors are:
1. Difficulty of the objects themselves. CD-covers are much easier than flowers. See performance curve below.
2. Sharpness of the images. Many of the indoor images are somewhat blurry and this can affect some algorithms.
3. Similar or identical objects. All the pictures where taken by CS students/faculty/staff and thus keyboards and computer equipment are popular motives. So is computer vision literature.
Download

Please note BEFORE starting your download that the file is almost 2GB. Please save a local copy in order to save bandwidth at our server.
Zipped File.
Visual Words. We extracted visual words for each document and wrote them one document per line. Data before ":" is header and then data. The vocabulary was 6 levels and splitting with a factor of 10. The vocabulary was trained on non-related data.

Performance

In the paper we give results either for a subset of 6376 images (all we had at that time) or a smaller subset of 1400 images. The smaller set was used when we did not have an efficient enough implementation in order to handle the larger set.

Performance Measures
Our simplest measure of performance is to count how many of the 4 images which are top-4 when using a query image from that set of four images.
A matlab implementation which computes this measure: Download.

Numbers for computing our measure on the full 10200 database using different training-sets and different scoring strategies:

	Scoring Strategy
Quantizer	Flat	10	100	1000
cd	2.895588	2.574118	3.139706	3.161275
moving	2.828529	2.161275	3.014216	3.083824
moving+cd	2.884412	2.551078	3.139902	3.157157
flip	3.014412	2.534902	3.135098	3.188333
test	3.166373	3.070098	3.294314	3.286863

Please see the page for Semiprocessed Data for explainations.

How our performance varies when taking subsets 0:n from the set. The different curces represent different choices in scoring strategy. For extremely fast applications we use the flat-scoring while for better performance we use hierarchical scoring.
The feature extractor was set to use relatively few features for these experiments.

How the score is computed

int nrblocks = nr_docs/4;
int totaltopcount = 0;
for(  block = 0; block < nrblocks; block++) {for( int i=0; i < 4; i++){int pos = block*4+i;for( int j=0; j < 4; j++){r = find_rank_of_doc (4*block+j) relative to doc (block*4+i); if( r < 4) totaltopcount++;}}
}
score = totaltopcount/(nrblocks*4);

What we are measuring is how many of the images are found on average.

Getting everything right gives a score of 4
Getting nothing right gives a score of 0
Getting only identical image right gives a score of 1
A score of 3 means that we find the identical image plus 2 of the 3 other images of the set.

Semiprocessed Data

We have computed lots of semiprocessed data along with SIFT vectors for training.
Semiprocessed Data
This page is maintained by Henrik Stewénius

Stewenius

Ukbench图像数据集相关推荐

ImageNet图像数据集介绍
ImageNet图像数据集始于2009年,当时李飞飞教授等在CVPR2009上发表了一篇名为<ImageNet: A Large-Scale Hierarchical Image Databas ...
机器学习和计算机视觉的前20个图像数据集
作者 | Meiryum Ali 翻译 | 火火酱,责编 | 晋兆雨出品 | AI科技大本营头图 | 付费下载于视觉中国计算机视觉使计算机能够理解图像和视频的内容.计算机视觉的目标是使人类视觉系 ...
MIT 更新最大自然灾害图像数据集，囊括 19 种灾害事件
作者 | 神经小兮来源 | HyperAI超神经(ID:HyperAI) 内容提要:麻省理工学院在最近 ECCV 2020 上提交的一篇论文中,发布了一套自然灾害图像数据集.这是迄今为止规模最大.质 ...
腾讯AI Lab开源业内最大规模多标签图像数据集（附下载地址）
今日(10 月 18 日),腾讯AI Lab宣布正式开源"Tencent ML-Images"项目.该项目由多标签图像数据集 ML-Images,以及业内目前同类深度学习模型中精度 ...
AI一分钟 | 特斯拉再融46亿；腾讯AI Lab宣布开源多标签图像数据集
▌特斯拉再融 46 亿近日,<证券日报>记者登录上海市工商行政管理局官网发现,特斯拉(上海)有限公司的注册资本已由 1 亿元增至 46.7 亿元,这意味着马斯克凭借特斯拉这匾金字招牌在上 ...
腾讯 AI Lab 开源业内最大规模多标签图像数据集
2018年9月10日,腾讯AI Lab宣布将于9月底开源"Tencent ML-Images"项目,该项目由多标签图像数据集ML-Images,以及业内目前同类深度学习模型中精度最 ...
快速构建深度学习图像数据集，微软Bing和Google哪个更好用？
译者 | Serene 编辑 | 明明出品 | AI 科技大本营(公众号ID:rgznai100) [AI 科技大本营导读]在本文中,作者将利用微软的 Bing Image Search API 来 ...
数据集轻松按需搜索，这个工具汇集近2000个图像数据集，可免费获取｜Reddit高热...
杨净发自凹非寺量子位报道 | 公众号 QbitAI 每个研究机器学习项目的人,似乎都有这样的痛苦.那就是从学术网站.GitHub上寻找到合适的数据集. 但现在,有这样一个网站可以帮你搞定,让 ...
图像数据集如何制作？增强？？
图像数据集如何制作?增强?? 本文使用的BGA图像数据集由X-Ray检测系统平台XD7600NT采集获得,以此为例. BGA图像600张作为数据集,图像的尺寸不一,其中部分图像数据集如图所示.最小的尺 ...

Ukbench图像数据集

Revised set!

Recognition Benchmark Images

Subsets

Download

Performance

Performance Measures

How the score is computed

Semiprocessed Data

Ukbench图像数据集相关推荐

最新文章

热门文章