NOTES

1 The basic problem of hypothesis test

1.1 Example

year weight of new born (mean) size of data SD
1989 3190 g ∞\infty∞ 80 g
1990 3210 g 100 NA

It is obvious that the mean of 100 new born’s weight is 20g higher than that of 1989.

The point here is that what is shown by this 20 g diference. One explain is that it caused by randomness of sampling. Or it is true that baby born in 1990 is weighter than born in 1989.

To figure out this problem, we use μ0\mu_0μ0​ indecate the mean of baby’s weight in 1989 and μ\muμ indecate that of 1990. We hypothesize that μ=μ0\mu=\mu_0μ=μ0​, and use 100 samples in 1990 to test whether the hypothesis is true or not.

Here
Null hypothesis: H0:μ=3190gH_0:\mu=3190gH0​:μ=3190g
If the null hypothesis is not stand, we need an alternative hypothesis:
Alternative hypothesis: H1:μ≠3190gH_1:\mu \ne 3190gH1​:μ​=3190g

1.2 Two type error

αerror\alpha\ errorα error: H0H_0H0​ is right but is rejected.
βerror\beta\ errorβ error: H1H_1H1​ is wrong but is accepted.

1.3 The procedure of hypothesis test

  1. raise hypothesis:
    H0:μ=3190gH_0:\mu=3190gH0​:μ=3190g
    H1:μ≠3190gH_1:\mu \ne 3190gH1​:μ​=3190g
  2. statistics (σ\sigmaσ is known, sample > 30):
    z=xˉ−μ0σn=3210−319080/100=2.5z=\frac{\bar{x}-\mu_0}{\sigma\sqrt{n}}=\frac{3210-3190}{80/\sqrt{100}}=2.5z=σn​xˉ−μ0​​=80/100​3210−3190​=2.5
  3. zα2=±1.96<2.5z_{\frac{\alpha}{2}}=\pm1.96<2.5z2α​​=±1.96<2.5, zzz is in the critical region, so we reject H0H_0H0​.

1.4 Test region

  1. two-tailed test:
    H0:μ=μ0H_0 :\mu=\mu_0H0​:μ=μ0​
    acceptance region:∣z∣<∣zα/2∣|z|<|z_{\alpha/2}|∣z∣<∣zα/2​∣
  2. one-tailed test:
    H0:μ>μ0H_0 :\mu>\mu_0H0​:μ>μ0​ or μ<μ0\mu<\mu_0μ<μ0​
    acceptance region:z>zαz>z_\alphaz>zα​ or z<z1−αz<z_{1-\alpha}z<z1−α​

2 Tests for parameters

2.1 One sample test

large samples:
z=xˉ−μ0σnz=\frac{\bar{x}-\mu_0}{\sigma\sqrt{n}}z=σn​xˉ−μ0​​

small samples:
σ\sigmaσ unknown →\to→ estimate with sss

  • small samples:
    t=xˉ−μ0snt=\frac{\bar{x}-\mu_0}{s\sqrt{n}}t=sn​xˉ−μ0​​
  • large samples:
    z=xˉ−μ0snz=\frac{\bar{x}-\mu_0}{s\sqrt{n}}z=sn​xˉ−μ0​​
n>30
n<30
known
unknown
sample size
z test
SD
z test
t test

2.2 Two samples test

  • The test of the difference between two means
  1. σ1\sigma_1σ1​, σ2\sigma_2σ2​ is known
    z=(x1ˉ−x2ˉ)−(μ1ˉ−μ2ˉ)σ12n1−σ22n2z=\frac{(\bar{x_1}-\bar{x_2})-(\bar{\mu_1}-\bar{\mu_2})}{\sqrt{\frac{{\sigma_1}^2}{n_1}-\frac{{\sigma_2}^2}{n_2}}}z=n1​σ1​2​−n2​σ2​2​​(x1​ˉ​−x2​ˉ​)−(μ1​ˉ​−μ2​ˉ​)​
  2. σ1\sigma_1σ1​, σ2\sigma_2σ2​ is unknown:
    2.1 smale sample, and σ1=σ2\sigma_1 = \sigma_2σ1​=σ2​:
    t=(x1ˉ−x2ˉ)−(μ1ˉ−μ2ˉ)sp1n1−1n2t=\frac{(\bar{x_1}-\bar{x_2})-(\bar{\mu_1}-\bar{\mu_2})}{s_p\sqrt{\frac{1}{n_1}-\frac{1}{n_2}}}t=sp​n1​1​−n2​1​​(x1​ˉ​−x2​ˉ​)−(μ1​ˉ​−μ2​ˉ​)​
    here:
    sp2=(n1−1)s12+(n2−1)s22n1+n2−2{s_p}^2=\frac{(n_1-1){s_1}^2+(n_2-1){s_2}^2}{n_1+n_2-2}sp​2=n1​+n2​−2(n1​−1)s1​2+(n2​−1)s2​2​
    2.2 If it can be sure that σ1≠σ2\sigma_1 \ne \sigma_2σ1​​=σ2​, the t-test fomula is totally different. So it is very important to jurge if the standard variance of two samples are equle.

2.3 paired sample test

Samples are related, test the diference with the statistics according to the sample size.

Exercise

One sample t-test

1 data import and exploration

Import data

> mice = read.csv('mice.csv')
> str(mice)
'data.frame': 100 obs. of  2 variables:$ ID     : Factor w/ 100 levels "A1","B0","B1",..: 45 51 1 65 30 48 41 62 24 95 ...$ T.level: num  0.916 0.978 1.448 0.757 0.849 ...

Explore T-level data

x = mice$T.level
h = hist(x)
xfit = seq(min(x, na.rm = T), max(x, na.rm = T), length = 200)
yfit = dnorm(xfit, mean = mean(x, na.rm = T), sd = sd(x, na.rm = T))
yfit = yfit*max(h$counts)/max(yfit)
lines(xfit, yfit, col = 'blue', lwd = 2)

2 Check assumptions

Normality of data

qqnorm(x)
qqline(x) # looks good> shapiro.test(x) # not significantShapiro-Wilk normality testdata:  x
W = 0.98737, p-value = 0.4633
# so accept the dataas being normal
> y = pnorm(summary(x), mean = mean(x, na.rm = T), sd = sd(x, na.rm = T))
> ks.test(x, y) # evenif the ks-test returned a significant resultTwo-sample Kolmogorov-Smirnov testdata:  x and y
D = 0.75333, p-value = 0.0008108
alternative hypothesis: two-sided


The QQ-plot looks pretty good, and the Shapiro-Wilks test is not significant, so we can accept the data as being normal, even if the Kolmgorov-Smirnov test returned a significant result.

3 Carry out one-sample t-test

> t.test(x, mu = 1.12)One Sample t-testdata:  x
t = 1.8215, df = 99, p-value = 0.07154
alternative hypothesis: true mean is not equal to 1.12
95 percent confidence interval:1.115409 1.227396
sample estimates:
mean of x 1.171403

Here, H0H_0H0​ is μ=μ0=1.12\mu=\mu_0=1.12μ=μ0​=1.12, p-value is 0.07154 greater than 0.05. we can accept the null hypothesis, but not very stand.

If we let mu = 1.17

> t.test(x, mu = 1.17)One Sample t-testdata:  x
t = 0.049708, df = 99, p-value = 0.9605
alternative hypothesis: true mean is not equal to 1.17
95 percent confidence interval:1.115409 1.227396
sample estimates:
mean of x 1.171403

The 95 percent confidence interval is from 1.115 to 1.227, if we let mu = 1.1

> t.test(x, mu = 1.1)One Sample t-testdata:  x
t = 2.5303, df = 99, p-value = 0.01297
alternative hypothesis: true mean is not equal to 1.1
95 percent confidence interval:1.115409 1.227396
sample estimates:
mean of x 1.171403

We have to reject the null hypothesis.

4 illustrate the result

In this case we wanted to know if our sampled data was different from a particular reference value(μ0\mu_0μ0​). If the 95% confidence interval of the mean includes the reference value, then there is no significant difference. A good way to illustrate this result is therefore a barplot of the mean value with error bars for the 95% confidence interval.

mean.T = mean(x)
result.T = t.test(x, mu = 1.12)
bp = barplot(mean.T, ylim = c(0, 1.4), col = 'red', asp = 1.5)
arrows(bp, result.T$conf.int[1], bp, result.T$conf.int[2], code = 3, angle = 90)
abline(h = 1.12, lty = 2)

Two-sample t-test

1 data import and exploration

> tits = read.csv('tits.csv')
> str(tits)
'data.frame': 90 obs. of  3 variables:$ SPE: Factor w/ 2 levels "BM","TX": 2 2 2 2 2 2 1 2 2 2 ...$ wei: num  17.3 17.2 14.5 17.8 18 18.3 9.3 15.1 NA 15.8 ...$ egg: int  10 6 9 4 7 8 10 9 10 10 ...
> boxplot(egg~SPE, data = tits) # difference in weight betwenn species


A two-sample t-test is appropriate in this case since we want to compare two different (and unrelated) groups, Blue tits(BM) and Great tits(TX).

2 Check assumptions

same procedure like befor

The results for both BM and TX do not pass formal normality tests.

Then as mentioned before, we shoud think about whether the two sd are different. This is different from one-sample t-test.

We use leveneTest() within the car package.

leveneTest(wei~SPE, data=tits)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  2.3485 0.1305
##       62

The null hypothesis is that the variances are equal. Here P = 0.13 > 0.05, we accept the null hypothesis. Variances are not significantly different. homogeneity of variance is fulfilled.

3 carry out two-sample t-test

So we assumpt that two samples have same SD, in the t-test() let var.equal = T

> t.test(wei~SPE, data = tits, var.equal = T)Two Sample t-testdata:  wei by SPE
t = -16.402, df = 62, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:-7.009003 -5.486166
sample estimates:
mean in group BM mean in group TX 11.16111         17.40870

Here, null hytheposis is μ1−μ2=0\mu_1-\mu_2=0μ1​−μ2​=0, p-value < 0.05, so the weight of egg between two specise is significantly different.

4 Illustrate result of t-test

mean.wei = aggregate(list(weight=tits$wei), by=list(species=tits$SPE), mean, na.rm = T)
SE = function(x) sd(x, na.rm = T)/(sqrt(sum(!is.na(x))))
se.wei = aggregate(list(SE = tits$wei), by = list(species=tits$SPE), FUN=SE)
wei.dat = cbind(mean.wei, SE=se.wei$SE)
bp = barplot(weight~species, data=wei.dat, ylim=c(0,20), col='purple')
arrows(bp, wei.dat$weight-wei.dat$SE, bp, wei.dat$weight+wei.dat$SE, code=3, angle=90)

paired-samples t-test

This test is most appropriate when you want to compare two dependent values.An example could be a measurement made on the same individuals before and after an experimental manipulation.

1 data import and exploration

we want to test whether one type of tree covers a larger proportion of the total area than the other.

forest = read.csv('forest.csv')
hist(forest$oak)
hist(forest$bir)

2 check assumption

Test the difference

> diff = forest$oak - forest$bir
> hist(diff)
> shapiro.test(diff)Shapiro-Wilk normality testdata:  diff
W = 0.92191, p-value = 0.2662> x = diff
> y = pnorm(summary(x), mean = mean(x, na.rm = T), sd = sd(x, na.rm = T))
> ks.test(x, y)Two-sample Kolmogorov-Smirnov testdata:  x and y
D = 0.83333, p-value = 0.002875
alternative hypothesis: two-sided> qqnorm(x)
> qqline(x)

3 the paire-samples t-test.

add the argument paired=TRUE

> t.test(forest$oak, forest$bir, paired = T)Paired t-testdata:  forest$oak and forest$bir
t = -1.4412, df = 12, p-value = 0.1751
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:-0.14934135  0.03043097
sample estimates:
mean of the differences -0.05945519

There is no significant difference.

4 Illustrate the result

we could create a scatterplot of oak vs. birwith a 1:1 reference line, where sites falling above the 1:1 line have an excess of oak and sites below the 1:1 line have an excess of birch. In cases where there is a significant difference, most of the points will be on one or the other side of the line. The advantage of this approach is that it also illustrates the variation among sites.

plot(forest$oak~forest$bir, xlab='brich', ylab='oak', pch=16)
abline(0,1)

BIOS14: Hypothesis testing(假设检验)using R相关推荐

  1. Hypothesis Testing

    Refer to R Tutorial andExercise Solution Researchers retain or reject hypothesis based on measuremen ...

  2. 课堂笔记(3) 假设检验 Hypothesis testing

    目录​​​​​​​ Basic knowledge Estimated standard error of βˆi Hypothesis test for the slope parameter βi ...

  3. Chapter 9 (Classical Statistical Inference): Binary Hypothesis Testing

    本文为 IntroductionIntroductionIntroduction tototo ProbabilityProbabilityProbability 的读书笔记 目录 Binary Hy ...

  4. Probability and Hypothesis Testing

    代做hw06留学生作业.代写java/python编程语言作业.代写Hypothesis作业.代写C/C++课程设计作业 hw06-Copy1 November 16, 2018 1 Homework ...

  5. 统计学 假设检验(Hypothesis Testing)

    什么是假设检验: 通过设定一个假设, 然后通过收集数据.计算等操作来判断这个假设是否成立. 假设检验的步骤: 1. 设定 null hypothesis 和  alternative hypothes ...

  6. 假设检验 Hypothesis testing

    Hypothesis H0{H_0}H0​:零假设 Null Hypothesis H1{H_1}H1​:备择假设 Alternert Hypothesis t-test 单样本 单组和总体比(ind ...

  7. 假设检验(hypothesis testing)及P值(p-value)

    前一篇t检验的文末提到了P值的概念,P值实际上是医学统计中很常用的一个概念,那么这篇文章继续讲解什么是P值.说到P值,就得先从假设检验说起. 首先声明,此篇的内容是来自"马同学高等数学&qu ...

  8. R语言学习笔记(五)假设检验及其R实现

    文章目录 写在前面 概念回顾 关于χ2\chi^2χ2分布的一个重要定理 假设检验 概念 两类错误 功效与样本量 假设检验与置信区间的关系 单个正态总体均值的检验 推导过程 ppp值的有关结论 R语言 ...

  9. 【mathematical statistics】4 hypothesis testing

    假设检验的概念与步骤 统计推断:由样本到总体的推理 常用的统计推断有三种 抽样分布 参数估计 假设检验 正态均值的检验 两正态均值差的推断 成对数据的比较 正态方差的推断 比率的推断 广义似然比检验

最新文章

  1. Python将MySQL表数据写入excel
  2. Java 基础 之 continue和 break
  3. 无法解决 equal to 操作中 SQL_Latin1_General_CP1_CI_AS 和 Chinese_PRC_CI_AS 之间的排序规则冲突。...
  4. oracle 伪列访问序列,Oracle数据库对象,同义词、序列、视图、索引
  5. t-sql判断一个字符串是否为bigint的函数(全角数字需要判断为不合格)
  6. 具有CDI和lambda的策略模式
  7. C++实现二叉树的相应操作
  8. 服务器上在哪修改my.in,wordpress plugin的SVN使用方法
  9. ferror,perror,cleaner
  10. 关于||逻辑或运算符运算符
  11. RK3399 Android上面调试IMX291 Camera驱动
  12. maven和sbt构建项目及相关国内源repositorie配置和不支持cdh包解决方案
  13. 最新的python是哪个版本的好_书声琅琅:Python哪个版本好用
  14. K8s(3):资源清单
  15. docker安装gitlab
  16. 数据库身份证号用什么类型_数据库设计规范
  17. 用于实时视频和图像去雾的优化对比度增强算法
  18. 【python】 turtle实现汉诺塔游戏动画过程
  19. Proftpd配置文件
  20. 计算机64位只有2g,电脑插了4G内存,但只有2G左右可以用,为什么 WIN7 64位

热门文章

  1. 详解ArcGIS Server瓦片合并
  2. 程序员 撩妹java6_程序员撩妹终极攻略——快速获取码农把妹密码
  3. ApiPost是什么?
  4. 2019.11.28
  5. 开机找不到硬盘的原因
  6. 如何删除Word文档中表格最后一行多出的空白页
  7. 可爱的狮子(lion)
  8. pythonrender函数_Render函数
  9. Arduino开发板DIY简易机械臂
  10. Firefly-RK3288开发板Android编译环境搭建开荒