homework530

加载rpart包中的car.test.frame数据集，完成以下题目：

显示数据集的前六条数据，显示数据集结构信息，获取数据集的维度信息。

实验代码：

install.packages("rpart")

library(rpart)

data(car.test.frame)

View(car.test.frame)

head(car.test.frame,6)

dim(head(car.test.frame,6)) #查看数据集的维度

将变量名称变为价格（Price）、产地（Country）、可靠性（Reliability）、英里数（Mileage）、类型（Type）、车重（Weight）、发动机功率（Disp.），净马力（HP）。

实验代码：

colnames(car.test.frame)

names(car.test.frame) <- c("价格(Price)","产地(Country)",

"可靠性(Reliability)",

"英里数(Mileage)",

"类型(Type)","车重(Weight)",

"发动机功率(Disp.)","净马力(HP)")

View(car.test.frame)

将英里数换算为“油耗”，油耗=100*4.546/（1.6*英里数），增加新列：油耗；

实验代码：

car.test.frame$油耗 <- 100 * 4.546/(1.6 * car.test.frame$`英里数(Mileage)`)

View(car.test.frame)

分组油耗，值为A,B,C。“油耗”11.6~15.8之间为A，9~11.6为B，7.6~9为C，分别计算A,B,C各组的样本数；

实验代码：

car.test.frame$group油耗[which(car.test.frame$油耗 >= 11.6)]="A"

car.test.frame$group油耗[which(car.test.frame$油耗 <= 11.6)]="B"

car.test.frame$group油耗[which(car.test.frame$油耗 <= 9)]="C"

View(car.test.frame)

统计不同产地的车辆数，写入下表中。

France	Germany	Japan	Japan/USA	Korea	Mexico	Sweden	USA
1	2	19	7	3	1	1	26

实验代码：

y <- table(car.test.frame$`产地(Country)`)

增加新列，列名为“名称”，值为数据集的行名。

实验代码：

car.test.frame$名称 <- c(row.names(car.test.frame))

View(car.test.frame)

分别读入“信管2011-2”和“信管2012-2”成绩，完成以下步骤：

将学号字段转换为字符类型（注意处理学号正确完整显示问题），性别字段设置为因子类型。

实验代码：

setwd("F://R语言//data")

a <- read.csv("信管2011-2.csv")

b <- read.csv("信管2012-2.csv")

a$学号 <- as.character(a$学号)

b$学号 <- as.character(b$学号)

class(a$学号)

class(b$学号)

b$性别 <- factor(b$性别)

class(b$性别)

合并两个班的成绩单，并创建新的字段：“班级”，并分别记为“信管2011-2”和“信管2012-2”，保存在student数据框中；

实验代码：

a$班级 <- "信管2011-2"

b$班级 <- "信管2012-2"

student <- rbind(a,b)

在student数据框中找出缺失数据，将其值设为该班的平均分；分别求出两个班的平均分，计入下表中。

	信管2011-2	信管2012-2
平均分	73.66667	73.38636

实验代码：

sum(is.na(student$成绩))

mean(a$成绩)

mean(b$成绩)

b[6,4]<-mean(b$成绩, na.rm = T)

student <- rbind(a,b)

找出两个班最高分的记录，填入下表中。

	信管2011-2	信管2012-2
最高分	93	95

筛选student数据框中男生和女生的成绩，保存为boy.csv，girl.csv。

实验代码：

boy<-subset(student,student$性别 == "男")

gril<-subset(student,student$性别 == "女")

write.table(boy, "boy.csv", row.names=FALSE, col.names=TRUE, sep=",")

write.table(gril, "gril.csv", row.names=FALSE, col.names=TRUE, sep=",")

用if-else分支实现将百分制分数转化为五级制分数的函数。90-100为优秀，80-89以上良好，70-79以上中等，60-69及格，59以下不及格。读入成绩数据stu_score.txt，并将评级结果输出为stu_grade.txt。

实验代码：

stu_score<-read.table("F://R语言//414米佳//data//stu_score.txt",header=T)

stu_score$stu_grade

stu_score$stu_grade<- ifelse(stu_score$X88>=90,"优",

ifelse (stu_score$X88>=80,"良好",

ifelse (stu_score$X88>=70,"中等",

ifelse (stu_score$X88>=60,"及格",

ifelse(stu_score$X88<60,"不及格","不及格")

)

write.table(stu_score,file="stu_grade.txt")

使用鸢尾花数据（iris），其有三个品种Species：Setosa, Versicolor, Virginica，分别用by，aggregate，tapply函数计算三个品种的Sepal.Length、Sepal.Width、Petal.Length、Petal.Width的平均值和最大值。

实验代码：

View(iris)

by(iris$Sepal.Length, iris$Species, mean)

by(iris$Sepal.Width, iris$Species, mean)

by(iris$Petal.Length, iris$Species, mean)

by(iris$Petal.Width, iris$Species, mean)

by(iris$Sepal.Length, iris$Species, max)

by(iris$Sepal.Width, iris$Species, max)

by(iris$Petal.Length, iris$Species, max)

by(iris$Petal.Width, iris$Species, max)

aggregate(. ~ Species, data = iris, mean)

aggregate(. ~ Species, data = iris, max)

iris

tapply(iris$Sepal.Length, iris$Species, mean)

tapply(iris$Sepal.Width, iris$Species, mean)

tapply(iris$Petal.Length, iris$Species, mean)

tapply(iris$Petal.Width, iris$Species, mean)

tapply(iris$Sepal.Length, iris$Species, max)

tapply(iris$Sepal.Width, iris$Species, max)

tapply(iris$Petal.Length, iris$Species, max)

tapply(iris$Petal.Width, iris$Species, max)

读入bank.csv数据，转为数据框df,查看数据框df前10行数据，分别将该数据框写入外部文件bank.txt和bank2.csv。

实验代码：

setwd("F://R语言//data")

bank <- read.csv("bank.csv")

df <- data.frame(bank)

head(df,10)

write.table(df,file="bank.txt")

write.table(df,file="bank2.csv")

使用scan()函数，读入data.txt数据文件；对缺失值用数据文件的均值进行填充，将该数据按每行10个数值写入外部文件data2.txt。

实验代码：

setwd("F://R语言//data")

data<-scan("data.txt")

is.na(data)

x<- data[is.na(data)=="TRUE"]

y<-data[!is.na(data)]

z<-mean(y)

data[is.na(data)]<-z

homework530

homework530相关推荐

最新文章

热门文章