UCI机器学习数据库的网址:      http://archive.ics.uci.edu/ml/

数据库不断更新至2010年,是所有学习人工智能都需要用到的数据库,是看文章、写论文、测试算法的必备工具。数据库种类涉及生活、工程、科学各个领域,记录数也是从少到多,最多达几十万条。

UCI数据可以使用matlab的dlmread或textread读取,不过,需要先将不是数字的类别用数字,比如1/2/3等替换,否则读入不了数值,当字符了。

UCI数据库使用说明

转自:http://www.aiseminar.cn/bbs/thread-37-1-1.html

此目录包含数据集和相关领域知识(后面以简短的列表形式进行的注释),这些数据已经或能用于评价学习 算法。

每个数据文件 (*.data)包含以“属性-值”对形式描述的很多个体样本的记录。对应的*.info文件包含的大量的文档资料 。(有些文件_generate_ databases;他们不包含*.data文件。)作为数据集和领域知识的补充,在utilities目录里包含了一些在使用这一数据集时的有用资料。

地址 http://www.ics.uci.edu/~mlearn/MLRepository.html ,这里的UCI数据集可以看作是通过web的远程拷贝。作为选择
,这些数据同样可以通过ftp获得,ftp://ftp.ics.uci.edu . 可是使用匿名登陆ftp。可以在pub/machine-learning-databases目录中找到。

注意:
UCI一直都在寻找可加入的新数据,这些数据将被写入incoming子目录中。希望您能贡献您的数据,并提供相应的文档。谢谢——贡献过程可以参考DOC-REQUIREMENTS文件。目前,多数数据使用下面的格式 :一个实例一行,没有空格,属性值之间使用逗号“,”隔开,并且缺少的值使用问号“?”表示。并请在做出您的贡献后提醒一下站点管理员:ml-repository@ics.uci.edu

下面以UCI中IRIS为例介绍一下数据集:

ucidata/iris中有三个文件:
Index
iris.data
iris.names

index为文件夹目录,列出了本文件夹里的所有文件,如iris中index的内容如下:
Index of iris
18 Mar 1996      105 Index
08 Mar 1993     4551 iris.data
30 May 1989     2604 iris.names

iris.data为iris数据文件,内容如下:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
……
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
……
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
……
如上,属性直接以逗号隔开,中间没有空格(5.1,3.5,1.4,0.2,),最后一列为本行属性对应的值,即决策属性Iris-setosa

iris.names介绍了irir数据的一些相关信息,如数据标题、数据来源、以前使用情况、最近信息、实例数目、实例的属性等,如下所示部分:
……
7. Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica
……
9. Class Distribution: 33.3% for each of 3 classes.

本数据的使用实例请参考其他论文,或本站后面的内容。

对应的英文有:

This is the UCI Repository Of Machine Learning Databases and Domain Theories

============================================================================
  This is the UCI Repository Of Machine Learning Databases and Domain Theories
                             4 December 1995
              ftp.ics.uci.edu: pub/machine-learning-databases
      http://www.ics.uci.edu/~mlearn/MLRepository.html 
          Librarian: Patrick M. Murphy (ml-repository@ics.uci.edu )
                  111 databases and domain theories (36MB)
  ============================================================================
This directory contains data sets and domain theories (the latter have been
annotated as such in the following brief listing) that have been or can be
used to evaluate learning algorithms. Each data file (*.data) contains
individual records described in terms of attribute-value pairs.  The
corresponding *.info file contains voluminous documentation.  (Some files
_generate_ databases; they do not have *.data files.)
In addition to data sets and domain theories, the "utilities/" directory
contains utilities that you may find useful when using datasets in this
repository.
The contents of this repository can be viewed and remotely copied over
the web.  The address is http://www.ics.uci.edu/~mlearn/MLRepository.html.   
Alternatively, the contents of this repository can be remotely copied via 
ftp to ftp.ics.uci.edu.  Enter "anonymous" for user id, and e-mail address 
([email=user@host]user@host[/email]) for password.  These databases can be found by executing 
"cd pub/machine-learning-databases".
Notes:
1. We're always looking for addition al databases, which can be
    written to the sub-directory named "/incoming". Please send yours, with 
    documentation.  Thanks -- See DOC-REQUIREMENTS for suggested documentation 
    procedures. Presently, most databases have the following format: 1 
    instance per line, no spaces, commas separate attribute values, and 
    missing values are denoted by "?".  Also, please notify the site librarian 
    (ml-repository@ics.uci.edu ) after making a donation.
2. Ivan Bratko requested that the databases he donated from the Ljubljana
    Oncology Institute (e.g., breast-cancer, lymphography, and primary-tumor)
    have restricted access. We are allowed to share them with academic
    institutions upon request. These databases (like several others) require
    providing proper citations be made in published articles that use them.
    Citation requirements are in each database's corresponding *.doc file.
    To access any of these databases, send email to ml-repository@ics.uci.edu .
    To aid you in deciding if you want any of these databases, the 
    documentation files are available.
3. An archive server may now be used to recieve via e-mail files in this
    repository.  Installed on ics, it provides email access to files in
    our anonymous ftp/uucp area (~ftp).  If people have no other access to
    our archives, then they can send mail to:
archive-server@ics.uci.edu 
    Commands to the server may be given in the body.  Some commands are:
help
send <archive> <file>
find <archive> <string>
    The help command replies with a useful help message.
If you publish material based on databases obtained from this repository,
then, in your acknowledgements, please note the assistance you received by
using this repository.  Thanks -- this will help others to obtain the same
data sets and replicate your experiments.  We suggest the following pseudo-APA
reference format for referring to this repository (LaTeX'd):
  Murphy,~P.~M., /& Aha,~D.~W. (1994). {/it UCI Repository of machine
  learning databases} [http://www.ics.uci.edu/~mlearn/MLRepository.html ]. 
  Irvine, CA: University of California, Department of Information and Computer 
  Science.
Patrick M. Murphy (Repository Librarian)
     
----------------------------------------------------------------------
Brief Overview of Databases and Domain Theories:
Quick Listing:
1. annealing (David Sterling and Wray Buntine)
2. Artificial Characters Database & DT (donated by Attilio Giordana)
3-4. audiology (Ray Bareiss and Bruce Porter, used in Protos)
    1. Original Version
    2. Standardized-Attribute Version of the Original.
5. auto-mpg (from CMU StatLib library)
6. autos (Jeff Schlimmer)
7. badges (Haym Hirsh)
8. balance-scale (Tim Hume)
9. balloons (Michael Pazzani)
10. breast-cancer (Ljubljana Institute of Ontcology, restricted access)
11. breast-cancer-wisconsin (Wisconsin Breast Cancer D'base, Olvi Mangasarian)
   1. Original version
   2. Diagnostic data set
   3. Prognostic data set
12. bridges (Yoram Reich)
13-21. chess
   1. Partial generator of Quinlan's chess-end-game data (kr-vs-kn) (Schlimmer)
   2. Shapiros' endgame database (kr-vs-kp) (Rob Holte)
   3. king-rook-vs-king (Michael Bain, Arthur van Hoff)
   4-9. Six domain theories (Nick Flann)
22. Bach Chorales (time-series) database (Darrell Conklin)
23. Connect-4 Database (John Tromp)
24-25. Credit Screening Database
   1. Japanese Credit Screening Data and domain theory (Chiharu Sano)
   2. Credit Card Application Approval Database (Ross Quinlan)
26. Ein-Dor and Feldmesser's cpu-performance database (David Aha)
27. Diabetes Data (Serdar Uckun, AI-M94)
28. dgp-2 data generation program (Powell Benedict)
29. Document Understanding (Donato Malerba)
30. Nine small EBL domain theories and examples in sub-directory ebl
31. Evlin Kinney's echocardiogram database (Steven Salzberg)
32. flags (Richard Forsyth)
33. function-finding (Cullen Schafer's 352 case studies)
34. glass (Vina Spiehler)
35. hayes-roth (from Hayes-Roth^2's paper)
36-39. heart-disease (Robert Detrano)
40. hepatitis (G. Gong)
41. horse colic database (Mary McLeish & Matt Cecile)
42. (Boston) Housing database (from CMU StatLib library)
43. ICU data (Serdar Uckun, AIM-94)
44. Image segmentation database (Carla Brodley)
45. ionosphere information (Vince Sigillito) 
46. iris (R.A. Fisher, 1936)
47. isolet (Ron Cole and Mark Fanty's database donated by Tom Dietterich)
48. kinship (J. Ross Quinlan)
49. labor-negotiations (Stan Matwin)
50-51. led-display-creator (from the CART book)
52. lenses (Cendrowska's database donated by Benoit Julien)
53. letter-recognition database (created and donated by David Slate)
54. liver-disorders (BUPA Medical's database donated by Richard Forsyth)
55. logic-theorist (Paul O'Rorke)
56. lung cancer (Stefan Aeberhard)
57. lymphography (Ljubjana Institute of Oncology, restricted access)
58-59. mechanical-analysis (Francesco Bergadano)
  1. Original Mechanical Analysis Data Set
  2. PUMPS DATA SET
60 mobile robots (donated by Klingspor, Morik and Rieger)
61-64. molecular-biology 
     1. promoter sequences (Towell, Shavlik, & Noordewier, domain theory also)
     2. splice-junction sequences (Towell, Noordewier, & Shavlik, 
        domain theory also)
     3. protein secondary structure database (Qian and Sejnowski)
     4. protein secondary structure domain theory (Jude Shavlik & Rich Maclin)
65. MONK's Problems (donated by Sebastian Thrun)
66. Moral Reasoner Database (donated by James Wogulis)
67. mushroom (Jeff Schlimmer)
68. MUSK databases (2) (donated by Tom Dietterich)
69. othello domain theory (Tom Fawcett)
70. Page Blocks Classification (Donato Malerba)
71. Pima Indians diabetes diagnoses (Vince Sigillito) 
72. Postoperative Patient data (Jerzy W. Grzymala-Busse)
73. Primary Tumor (Ljubjana Institute of Oncology, restricted access)
74. Qualitative Structure Activity Relationships (QSARs) (Ross King)
75. Quadraped Animals (John H. Gennari)
76. Servo data (Ross Quinlan)
77. shuttle-landing-control (Bojan Cestnik)
78. solar flare (Gary Bradshaw)
79-80. soybean (from Ryszard Michalski's groups)
81. space shuttle databases (David Draper)
82. spectrometer (Infra-Red Astronomy Satellite Project Database, John Stutz)
83. Sponge Database (Iosune Uriz and Marta Domingo)
84. Statlog Project databases (7) (from Ross King,...)
85  Student Loan relational database (from Michael Pazzani)
86. tic-tac-toe endgame database (Turing Institute, David W. Aha)
87-97. thyroid-disease (Garavan Institute, J. Ross Quinlan; Stefan Aeberhard)
98. trains database (David Aha & Eric Bloedorn)
99-104. Undocumented databases: sub-directory undocumented
   1. Economic sanctions database (domain theory included, Mike Pazzani)
   2. Cloud cover images (Philippe Collard)
   3. DNA secondary structure (Qian and Sejnowski, donated by Vince Sigillito) 
   4. Nettalk data (Sejnowski and Rosenberg, taken from connectionist-bench)
   5. Sonar data (Gorman and Sejnowski, taken from connectionist-bench)
   6. Vowel data (Qian, Sejnowski and Turney, taken from connectionist-bench)
105. university (Michael Lebowitz, donated by Steve Souders)
106. voting-records (Jeff Schlimmer)
107. water treatement plant data (donated by Javier Bejar and Ulises Cortes)
108-109. Waveform domain (taken from CART book)
110. Wine Recognition Database (donated by Stefan Aeberhard)
111. Zoological database (Richard Forsyth)

UCI机器学习数据库使用说明相关推荐

  1. 独家 | UCI机器学习数据库的Python API介绍

    作者:Tirthajyoti Sarkar 翻译:王雨桐 校对:丁楠雅 本文约2600字,建议阅读9分钟. 本文将带你了解UCI数据库的Python API,通过实际案例拆解并讲解代码. 本文将介绍如 ...

  2. UCI机器学习数据库的Python API介绍

    作者:Tirthajyoti Sarkar:翻译:王雨桐:校对:丁楠雅 本文约2600字,建议阅读9分钟. 本文将带你了解UCI数据库的Python API,通过实际案例拆解并讲解代码. 本文将介绍如 ...

  3. 发布AI操作系统、应用市场,开源机器学习数据库和AI操作系统内核,第四范式这波操作有点秀!

    "AI的发展经历了'高期望--能否落地--落地是否有用"等多次潮起潮落.今天,AI的价值再次引发一些讨论和怀疑." 第四范式创始人兼首席执行官戴文渊在近日举行的新品发布会 ...

  4. 开源机器学习数据库 OpenMLDB:线上线下一致的生产级特征平台

    本文整理自 OpenMLDB PMC 卢冕 在 OpenMLDB Meetup No.6 中的分享--<开源机器学习数据库 OpenMLDB:线上线下一致的生产级特征平台>. 非常感谢大家 ...

  5. cosmic数据库使用说明_使用Cosmic JS为React + Next.js博客提供动力

    cosmic数据库使用说明 TLDR: Cosmic JS为博客提供了出色的后端. 它是功能齐全的内容管理系统(CMS),具有直观的用户界面,非技术客户可以使用该界面来管理其站点内容. 请点击以下链接 ...

  6. 开源机器学习数据库OpenMLDB贡献者计划全面启动

    「无开源 不AI」 开源精神为人工智能在近十年的快速发展提供了重要的源动力,伴随着计算框架.算法等AI技术的相继开源,AI模型构建的门槛得以降低.但AI的产业化落地,需要针对数据处理.特征工程.模型构 ...

  7. UCI机器学习数据集库

    机器学习数据集库 https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection https://archive.ics.uci.edu/ml/ ...

  8. UCI机器学习数据集

    链接:http://archive.ics.uci.edu/ml/datasets.php 机器学习系列教程 从随机森林开始,一步步理解决策树.随机森林.ROC/AUC.数据集.交叉验证的概念和实践. ...

  9. mysql 检索操作时间段_postgresql数据库使用说明_实现时间范围查询

    按照日期查询通常有好几种方法: 按照日期范围查询有好几种方法,日期字段类型一般为: Timestamp without timezone 方法一: select * from user_info wh ...

最新文章

  1. Kali Linux 下渗透测试 | 3389 批量爆破神器 | hydra | 内网渗透测试
  2. C++习题 虚函数-计算图形面积
  3. httpClient学习笔记1
  4. 突发!美国财政部、商务部双双出手制裁!大疆、旷视等8家被列入投资黑名单!34家被拉入实体清单,GPU龙头景嘉微在列!...
  5. oracle10数据库链接失败,oracle 10g Enterprise Manager 无法连接到数据库实例分析
  6. Java ArrayList contains()方法及示例
  7. 平安性格测试题及答案_性格趣味小测试题 有趣的心理测试题大全及答案
  8. 于谦一共收了几个徒弟,为什么?
  9. HDU2030-汉字机内码
  10. 桌面上计算机图标老是自动删除,win10系统总是自动删除桌面快捷方式的处理方案...
  11. SQL FULL OUTER JOIN
  12. KVM 虚拟化技术高级特性详解
  13. 国产备份软件、备份设备
  14. 机器学习所需要的数学知识
  15. 微信小程序在线考试项目开发-接口封装调用
  16. 淘宝/天猫官方商品/订单订单API接口
  17. EOS智能合约开发(二):EOS创建和管理钱包
  18. 无线服务器插电跳闸,家里的插座有一个一插电就跳闸,一插电就跳闸,没有短路,不知道怎么回事...
  19. 大豆技术面分析_基本面+技术面分析的两个案例,别怪我没提醒啊
  20. 如何自己申请免费的通配符证书(基于 Let‘s Encrypt 的免费证书)

热门文章

  1. Qt实现在QLabel上显示图片并进行线条/矩形框/多边形的绘制
  2. ListView 实现阻尼回弹效果 并去除边缘阴影
  3. JS实现动画特效2(缓动函数封装、导航栏筋斗云效果)
  4. 指南-Luat二次开发教程指南-第一个Luat程序
  5. cad渐变线怎么画_花花绿绿的股票线是怎么画出来的?想怎么画就怎么画!
  6. 手动删除eureka多余服务
  7. 为什么要特征标准化及特征标准化方法
  8. python中char是什么意思_MySQL中char和varchar的区别是什么
  9. 中国存储国家队豪言:2020 年追上世界级大厂
  10. mycat连接数据库8.0以上 处理程序连接query_cache_size报错信息 mycat升级数据库踩坑