
一、条形图(bar chart):

我在学习这部分时会遇到这样的困惑,觉得条形图(bar chart)和直方图(histogram)差不多,两者都是用来统计频数的,histogram还可以用来统计频率。简单来说,条形图注重分类,而直方图更多用于绘制一系列有意义的连续型变量,具体例子见下文。Both charts display a summary value of a continuous variable that has been split into groups.

In bar charts, the groups are typically categorical variables.

In histograms the groups are typically intervals of another continuous variable.

# of people enrolled in each department of a college would suit a bar chart

# of people in each income quintile in your city would suit a histogram

One implication of this difference is that a natural order exists on the grouping axis of a histogram, but not a bar chart. In other words, it usually makes sense to sort a bar chart by value of the bar but a histogram should almost always remain sorted by the order of the groups. Using the examples above:

It would make sense to order the college departments from highest to lowest enrollment.

It wouldn't make as much sense to order the income quintiles by most to least people, you would end up with a counter-intuitive graph.

Bar charts usually have a space between the bars, histograms usually don't - reflecting the subtle differences in the relationships between adjacent groups.

—— Excerps from QUORA


> library(vcd)


> head(Arthritis)

ID Treatment Sex Age Improved

1 57 Treated Male 27 Some

2 46 Treated Male 29 None

3 77 Treated Male 30 None

4 17 Treated Male 32 Marked

5 36 Treated Male 46 Marked

6 23 Treated Male 58 Marked


> library(ggplot2)

> ggplot(Arthritis,aes(x=Improved))+geom_bar()


> counts

> counts

None Some Marked

42 14 28

> barplot(counts)

简单起见我不给labs(title, xlab等等)啦——咳咳,这个习惯不好,得改!:

顺便提一句,R自带的Plot不如ggplot2包里保存图形来的方便,ggplot()后的图形只要ggsave(file="blalalal")就ok了,这里介绍一下R里普通的Plot图形保存方法:To save a plot, you need to do the following:Open a device, usingpng(),bmp(),pdf()or similar

Plot your model

Close the device usingdev.off()


> png(file="d:/mybarplot.png")

> barplot(counts)

> dev.off()

也许你觉得颜色不好看,可以在barplot(counts, color = "yellow")设置亮瞎双眼,或者想要创新一下把图片横过来:barplot(counts, color = "yellow", horiz = TRUE),动动你的小手指吧~


> barplot(counts,col=c("red","yellow","green"),legend = rownames(counts))

> barplot(counts,col=c("red","yellow","green"),legend = rownames(counts), beside = TRUE)



> head(mtcars)

mpg cyl disp hp drat wt qsec vs am gear carb

Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4

Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4

Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

> png(file = "d:/histo.png")

> layout(matrix(c(1,2,3,3),2,2,byrow = TRUE)


> hist(mtcars$mpg)

> hist(mtcars$mpg,breaks = 12, col="red", xlab = "Miles Per Gallon",main="Colored histograms with 12 bins")

> hist(mtcars$mpg,freq=FALSE,breaks =12)

> lines(density(mtcars$mpg),col = "lightblue",lwd = 2)

> dev.off()

breaks表示直方图的条数(bin),但R不一定听你的,你说几它显示不一定是几。譬如你要求breaks = 13, 它可能显示的是12,看看R的Documentary怎么说:breaks

one of:

a vector giving the breakpoints between histogram cells,

a function to compute the vector of breakpoints,

a single number giving the number of cells for the histogram,

a character string naming an algorithm to compute the number of cells (see ‘Details’),

a function to compute the number of cells.

In the last three cases the number is a suggestion only; the breakpoints will be set to pretty values. If breaks is a function, the x vector is supplied to it as the only argument.


上面也给出了定义breaks的五种方法,最好是用breaks = vector的形式,比如:

> se

> hist(height, breaks = se)



> ggplot(data = mtcars, aes(x= mpg))+geom_histogram(col ="black", fill =

"lightblue",bins = 12)+labs(title="Colored Histogram with 12 Bins")


三、核密度图(density plot):

> plot(density(mtcars$mpg))

> polygon(density(mtcars$mpg),col = "lightgreen")

> rug(mtcars$mpg)


> library(sm)

Package 'sm', version 2.2-5.4: type help(sm) for summary information

> attach(mtcars)

The following object is masked from package:ggplot2:


> str(mtcars)

'data.frame': 32 obs. of 11 variables:

$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...

$ disp: num 160 160 108 258 360 ...

$ hp : num 110 110 93 110 175 105 245 62 95 123 ...

$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...

$ wt : num 2.62 2.88 2.32 3.21 3.44 ...

$ qsec: num 16.5 17 18.6 19.4 17 ...

$ vs : num 0 0 1 1 0 1 0 1 1 1 ...

$ am : num 1 1 1 0 0 0 0 0 0 0 ...

$ gear: num 4 4 4 3 3 3 3 4 4 4 ...

$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

> lab

> lab

> lab

[1] "4 cylinder" "6 cylinder" "8 cylinder"

> cyl

> sm.density.compare(mpg,cyl)

> colfill

> legend("topright",levels(cyl),fill= colfill)


> library(ggplot2)

> ggplot(data = mtcars, aes(x=mpg,col = cyl))+geom_density()+geom_rug()




boxplot(formula, data= dataframe)

> boxplot(mpg~cyl, data=mtcars,notch = TRUE,varwidth = TRUE,col = "lightblue",xlab = "cylinder",ylab = "mpg")




> ggplot(data = mtcars, aes(x=cyl,y=mpg))+geom_boxplot(fill= "lightblue")



> boxplot(mpg~am*cyl, data=mtcars,main = "MPG Distribution by AutoType",xlab = "Auto Type",ylab = "mpg",col = c("green","darkgreen"))


> dotchart(mtcars$mpg, labels = row.names(mtcars),main = "Gas Mileage for Car Models",xlab= "mpg")

> ggplot(data=mtcars, aes(x=mpg,y=row.names(mtcars)))+geom_point()+labs(title="Gas Mileage for Car Models")

