R语言swirl教程(R Programming)11——vapply and tapply

| In the last lesson, you learned about the two most fundamental members of R’s *apply family of functions: lapply() and sapply(). Both take a list as input, apply a function to each element of the list, then combine and return the result. lapply() always returns a list, whereas sapply() attempts to simplify the result.

| In this lesson, you’ll learn how to use vapply() and tapply(), each of which serves a very specific purpose within the Split-Apply-Combine methodology. For consistency, we’ll use the same dataset we used in the ‘lapply and sapply’ lesson.

| The Flags dataset from the UCI Machine Learning Repository contains details of various nations and their flags. More information may be found here: http://archive.ics.uci.edu/ml/datasets/Flags

| I’ve stored the data in a variable called flags. If it’s been a while since you completed the ‘lapply and sapply’ lesson, you may want to reacquaint yourself with the data by using functions like dim(), head(), str(), and summary() when you return to the prompt (>). You can also type viewinfo() at the prompt to bring up some documentation for the dataset. Let’s get started!

| As you saw in the last lesson, the unique() function returns a vector of the unique values contained in the object passed to it. Therefore, sapply(flags, unique) returns a list containing one vector of unique values for each column of the flags dataset. Try it again now.

sapply(flags, unique)
$name
[1] Afghanistan Albania Algeria American-Samoa
[5] Andorra Angola Anguilla Antigua-Barbuda
[9] Argentina Argentine Australia Austria
[13] Bahamas Bahrain Bangladesh Barbados
[17] Belgium Belize Benin Bermuda
[21] Bhutan Bolivia Botswana Brazil
[25] British-Virgin-Isles Brunei Bulgaria Burkina
[29] Burma Burundi Cameroon Canada
[33] Cape-Verde-Islands Cayman-Islands Central-African-Republic Chad
[37] Chile China Colombia Comorro-Islands
[41] Congo Cook-Islands Costa-Rica Cuba
[45] Cyprus Czechoslovakia Denmark Djibouti
[49] Dominica Dominican-Republic Ecuador Egypt
[53] El-Salvador Equatorial-Guinea Ethiopia Faeroes
[57] Falklands-Malvinas Fiji Finland France
[61] French-Guiana French-Polynesia Gabon Gambia
[65] Germany-DDR Germany-FRG Ghana Gibraltar
[69] Greece Greenland Grenada Guam
[73] Guatemala Guinea Guinea-Bissau Guyana
[77] Haiti Honduras Hong-Kong Hungary
[81] Iceland India Indonesia Iran
[85] Iraq Ireland Israel Italy
[89] Ivory-Coast Jamaica Japan Jordan
[93] Kampuchea Kenya Kiribati Kuwait
[97] Laos Lebanon Lesotho Liberia
[101] Libya Liechtenstein Luxembourg Malagasy
[105] Malawi Malaysia Maldive-Islands Mali
[109] Malta Marianas Mauritania Mauritius
[113] Mexico Micronesia Monaco Mongolia
[117] Montserrat Morocco Mozambique Nauru
[121] Nepal Netherlands Netherlands-Antilles New-Zealand
[125] Nicaragua Niger Nigeria Niue
[129] North-Korea North-Yemen Norway Oman
[133] Pakistan Panama Papua-New-Guinea Parguay
[137] Peru Philippines Poland Portugal
[141] Puerto-Rico Qatar Romania Rwanda
[145] San-Marino Sao-Tome Saudi-Arabia Senegal
[149] Seychelles Sierra-Leone Singapore Soloman-Islands
[153] Somalia South-Africa South-Korea South-Yemen
[157] Spain Sri-Lanka St-Helena St-Kitts-Nevis
[161] St-Lucia St-Vincent Sudan Surinam
[165] Swaziland Sweden Switzerland Syria
[169] Taiwan Tanzania Thailand Togo
[173] Tonga Trinidad-Tobago Tunisia Turkey
[177] Turks-Cocos-Islands Tuvalu UAE Uganda
[181] UK Uruguay US-Virgin-Isles USA
[185] USSR Vanuatu Vatican-City Venezuela
[189] Vietnam Western-Samoa Yugoslavia Zaire
[193] Zambia Zimbabwe
194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla Antigua-Barbuda … Zimbabwe
$landmass
[1] 5 3 4 6 1 2
$zone
[1] 1 3 2 4
$area
[1] 648 29 2388 0 1247 2777 7690 84 19 1 143 31 23 113 47 1099 600 8512
[19] 6 111 274 678 28 474 9976 4 623 1284 757 9561 1139 2 342 51 115 9
[37] 128 43 22 49 284 1001 21 1222 12 18 337 547 91 268 10 108 249 239
[55] 132 2176 109 246 36 215 112 93 103 3268 1904 1648 435 70 301 323 11 372
[73] 98 181 583 236 30 1760 3 587 118 333 1240 1031 1973 1566 447 783 140 41
[91] 1267 925 121 195 324 212 804 76 463 407 1285 300 313 92 237 26 2150 196
[109] 72 637 1221 99 288 505 66 2506 63 17 450 185 945 514 57 5 164 781
[127] 245 178 9363 22402 15 912 256 905 753 391
$population
[1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 35 4 24 2 11 1008 5 47
[23] 31 54 17 61 14 684 157 39 57 118 13 77 12 56 18 84 48 36 22 29 38 49
[45] 45 231 274 60
$language
[1] 10 6 8 1 2 4 3 5 7 9
$religion
[1] 2 6 1 0 5 3 4 7
$bars
[1] 0 2 3 1 5
$stripes
[1] 3 0 2 1 5 9 11 14 4 6 13 7
$colours
[1] 5 3 2 8 6 4 7 1
$red
[1] 1 0
$green
[1] 1 0
$blue
[1] 0 1
$gold
[1] 1 0
$white
[1] 1 0
$black
[1] 1 0
$orange
[1] 0 1
$mainhue
[1] green red blue gold white orange black brown
Levels: black blue brown gold green orange red white
$circles
[1] 0 1 4 2
$crosses
[1] 0 1 2
$saltires
[1] 0 1
$quarters
[1] 0 1 4
$sunstars
[1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50
$crescent
[1] 0 1
$triangle
[1] 0 1
$icon
[1] 1 0
$animate
[1] 0 1
$text
[1] 0 1
$topleft
[1] black red green blue white orange gold
Levels: black blue gold green orange red white
$botright
[1] green red white black blue gold orange brown
Levels: black blue brown gold green orange red white

| What if you had forgotten how unique() works and mistakenly thought it returns the number of unique values contained in the object passed to it? Then you might have incorrectly expected sapply(flags, unique) to return a numeric vector, since each element of the list returned would contain a single number and sapply() could then simplify the result to a vector.

| When working interactively (at the prompt), this is not much of a problem, since you see the result immediately and will quickly recognize your mistake. However, when working non-interactively (e.g. writing your own functions), a misunderstanding may go undetected and cause incorrect results later on. Therefore, you may wish to be more careful and that’s where vapply() is useful.

| Whereas sapply() tries to ‘guess’ the correct format of the result, vapply() allows you to specify it explicitly. If the result doesn’t match the format you specify, vapply() will throw an error, causing the operation to stop. This can prevent significant problems in your code that might be caused by getting unexpected return values from sapply().

| Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to be a numeric vector of length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR. Once you get the error, type ok() to continue to the next question.

vapply(flags, unique, numeric(1))
Error in vapply(flags, unique, numeric(1)) : values must be length 1,
but FUN(X[[1]]) result is length 194
ok()

| Recall from the previous lesson that sapply(flags, class) will return a character vector containing the class of each column in the dataset. Try that again now to see the result.

sapply(flags, class)
name landmass zone area population language religion bars stripes colours
“factor” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer”
red green blue gold white black orange mainhue circles crosses
“integer” “integer” “integer” “integer” “integer” “integer” “integer” “factor” “integer” “integer”
saltires quarters sunstars crescent triangle icon animate text topleft botright
“integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “factor” “factor”

| If we wish to be explicit about the format of the result we expect, we can use vapply(flags, class, character(1)). The ‘character(1)’ argument tells R that we expect the class function to return a character vector of length 1 when applied to EACH column of the flags dataset. Try it now.

vapply(flags, class, FUN.VALUE = character(1))
name landmass zone area population language religion bars stripes colours
“factor” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer”
red green blue gold white black orange mainhue circles crosses
“integer” “integer” “integer” “integer” “integer” “integer” “integer” “factor” “integer” “integer”
saltires quarters sunstars crescent triangle icon animate text topleft botright
“integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “factor” “factor”

| Note that since our expectation was correct (i.e. character(1)), the vapply() result is identical to the sapply() result – a character vector of column classes.

| You might think of vapply() as being ‘safer’ than sapply(), since it requires you to specify the format of the output in advance, instead of just allowing R to ‘guess’ what you wanted. In addition, vapply() may perform faster than sapply() for large datasets. However, when doing data analysis interactively (at the prompt), sapply() saves you some typing and will often be good enough.

| As a data analyst, you’ll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group. The next function we’ll look at, tapply(), does exactly that.

| Use ?tapply to pull up the documentation.

?tapply

| The ‘landmass’ variable in our dataset takes on integer values between 1 and 6, each of which represents a different part of the world. Use table(flags$landmass) to see how many flags/countries fall into each group.

table(flags$landmass)

1 2 3 4 5 6
31 17 35 52 39 20

| The ‘animate’ variable in our dataset takes the value 1 if a country’s flag contains an animate image (e.g. an eagle, a tree, a human hand) and 0 otherwise. Use table(flags$animate) to see how many flags contain an animate image.

table(flags$animate)

0 1
155 39

| This tells us that 39 flags contain an animate object (animate = 1) and 155 do not (animate = 0).

| If you take the arithmetic mean of a bunch of 0s and 1s, you get the proportion of 1s. Use tapply(flagsanimate,flagsanimate, flagsanimate,flagslandmass, mean) to apply the mean function to the ‘animate’ variable separately for each of the six landmass groups, thus giving us the proportion of flags containing an animate image WITHIN each landmass group.

tapply(flagsanimate,flagsanimate, flagsanimate,flagslandmass, mean)
1 2 3 4 5 6
0.4193548 0.1764706 0.1142857 0.1346154 0.1538462 0.3000000

| The first landmass group (landmass = 1) corresponds to North America and contains the highest proportion of flags with an animate image (0.4194).

| Similarly, we can look at a summary of population values (in round millions) for countries with and without the color red on their flag with tapply(flagspopulation,flagspopulation, flagspopulation,flagsred, summary).

tapply(flagspopulation,flagspopulation, flagspopulation,flagsred, summary)
$0
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 3.00 27.63 9.00 684.00
$1
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.0 4.0 22.1 15.0 1008.0

| What is the median population (in millions) for countries without the color red on their flag?

1: 0.0
2: 3.0
3: 9.0
4: 22.1
5: 4.0
6: 27.6

Selection: 2

| Lastly, use the same approach to look at a summary of population values for each of the six landmasses.

tapply(flagspopulation,flagspopulation, flagspopulation,flagslandmass, summary)
$1
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 0.00 12.29 4.50 231.00
$2
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.00 6.00 15.71 15.00 119.00
$3
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 8.00 13.86 16.00 61.00
$4
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.000 5.000 8.788 9.750 56.000
$5
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 2.00 10.00 69.18 39.00 1008.00
$6
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 0.00 11.30 1.25 157.00

| What is the maximum population (in millions) for the fourth landmass group (Africa)?

1: 157.00
2: 119.0
3: 56.00
4: 5.00
5: 1010.0

Selection: 3

| In this lesson, you learned how to use vapply() as a safer alternative to sapply(), which is most helpful when writing your own functions. You also learned how to use tapply() to split your data into groups based on the value of some variable, then apply a function to each group. These functions will come in handy on your quest to become a better data analyst.

R语言swirl教程(R Programming)11——vapply and tapply相关推荐

  1. R语言swirl教程(R Programming)10——lapply and sapply

    R语言swirl教程(R Programming)10--lapply and sapply | In this lesson, you'll learn how to use lapply() an ...

  2. R语言swirl教程(R Programming)13——Simulation

    R语言swirl教程(R Programming)13--Simulation | One of the great advantages of using a statistical program ...

  3. R语言系统教程(三):多维数组和矩阵

    R语言系统教程(三):多维数组和矩阵 3.1 生成数组或矩阵 3.1.1 将向量定义为数组 3.1.2 用array()函数构造多维数组 3.1.3 用matrix()函数构造矩阵 3.2 数组下标 ...

  4. R语言基础教程6:程序设计基础

    R语言基础教程1:数据类型 R语言基础教程2:散点图 R语言基础教程3:曲线图.误差线和图例 R语言基础教程4:柱形图 R语言基础教程5:图形页面排版 R语言基础教程6:程序设计基础 R语言基础教程7 ...

  5. R语言(The R Programming Language)

    R是用于统计分析.绘图的语言和操作环境.R是属于GNU系统的一个自由.免费.源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具. R是统计领域广泛使用的诞生于1980年左右的S语言的一个分支. ...

  6. 《量化金融R语言初级教程》一2.4 切线组合和资本市场线

    本节书摘来异步社区<量化金融R语言初级教程>一书中的第2章,第2.1节,作者: [匈牙利]Gergely Daróczi(盖尔盖伊) , 等 译者: 高蓉 , 李茂 责编: 胡俊英,更多章 ...

  7. 《量化金融R语言初级教程》一1.4 波动率建模

    本节书摘来异步社区<量化金融R语言初级教程>一书中的第1章,第1.4节,作者: [匈牙利]Gergely Daróczi(盖尔盖伊) , 等 译者: 高蓉 , 李茂 责编: 胡俊英,更多章 ...

  8. 《量化金融R语言初级教程》一1.2 对英国房屋价格建模并预测

    本节书摘来异步社区<量化金融R语言初级教程>一书中的第1章,第1.2节,作者: [匈牙利]Gergely Daróczi(盖尔盖伊) , 等 译者: 高蓉 , 李茂 责编: 胡俊英,更多章 ...

  9. 《量化金融R语言高级教程》一2.2 在R中建模

    本节书摘来异步社区<量化金融R语言高级教程>一书中的第2章,第2.2节,作者: [匈牙利]Edina Berlinger(艾迪娜•伯林格) , 等 译者: 高蓉 责编: 胡俊英,更多章节内 ...

最新文章

  1. Springboot中优雅进行字段校验
  2. linux dig 命令使用方法
  3. 业内首创普惠保险,看国泰产险如何借助数据进行智能化的升级和战略转型
  4. 面试题整理17 输入一个字符串判断一个字符串是否是有效ip地址
  5. 数据结构之表(5)栈的顺序实现
  6. 2021-07-15
  7. Mysql中使用命令行导入.sql文件新建数据库表(图文)
  8. 工业级交换机大致可以分为哪几类?
  9. leetcode-卡车加气走环
  10. 基于Myeclipse的三大框架(SSH)整合
  11. 和我一起开发Android应用(二)——“悦词-i背单词”项目功能分析
  12. Bootstrap Well 组件
  13. java socket 实现 http_Java Socket编程 - 基于Socket实现HTTP下载客户端
  14. ArchLinux(2013)中的网络配置和静态IP时DNS刷新的解决方法
  15. Linux学习笔记十七——Linux系统启动流程
  16. WPF开发实例——仿QQ登录界面
  17. Jsp中getParameter、getParameterValues、getParameterNames和getParameterMap用法详解
  18. easyexcel 导出数据锁定某个单元格
  19. oracle重做control,Oracle 通过Database Control 向重做日志组中添加成员
  20. 储备物资管理局计算机,国考报名税务过审居首 储备物资管理局最抢手

热门文章

  1. 通过root手机获得微信小游戏源码
  2. 现代机器学习算法的优缺点
  3. 【Swing入门教程】一步一步做Netbeans(1):类Netbeans的主界面
  4. 搭建恋爱话术库一个月赚5万,一年全款车!投入不到两千
  5. NC65 添加客户档案 实战案例
  6. 怎么建立客户的小档案?
  7. 手推SVM 支持向量机的简易推导和理解
  8. EXPLORATION BY RANDOM NETWORK DISTILLATION (RND)
  9. JavaIO流详解——Java教案(十)
  10. 中国珍珠养殖产业发展现状分析,淡水养殖仍然占据主导地位「图」