Duke大学的Data Analysis and Statistical Inference课程笔记

anecdotal evidence: 用极端的个例去判断整体的信息。例如“我叔叔每天吸三根烟身体很棒”来验证“吸烟对人体没有危害”。

type of data: 对数据进行进一步处理前,先想一想数据是什么类型,qualitative(有顺序还是无顺序)还是quantitative(连续还是离散)。

Correlation does not imply causation

observation 能让我们得到correlation(高级方法也可以得到causation)
experiment能让我们得到causation

studies分为observational和experiment。
observation通产只能得到assignment(correlation),而experiment可以得到causual。
举个例子:判断是否workOut对energyLevel高低的影响。
obs: 分别从是否workOut中选取一组人,比较各自的energyLevel,能得到相关关系。但是energyLevel不一定是由workOut与否引起的,可能有其他不可控的因素(被称为confounding var)。
exp:从population中做random assignmen,然后分别让两个随机组做workOut与否的测试,然后测量energyLevel。这点来说,与“控制变量法”相似。

sample bias
- convenience sample: 只选取容易获得的sample
- non-response:只选取了随机样本的一部分
- voluntary respoonse:结果的如何取决于投票者的志愿

sample methods
- simple random sample(SRS): each case is equally likely to be selected.
- stratified sample: divide the population into homogenous strata, then rondomly sample
- clusters: divide the population clusters, randomly sample a few clusters, then sample all obs within these clusters
- multistage: like clusters, while randomly sample within these clusters(例如调查一个城市的情况,分成各个区,避免了每个区都去的情况)

principles of experimental design
1. control: compare treatment of interset to a control group
2. randomize: randomly assign subjects to treatments
3. replicate: collect a suufficiently large sample, or replicate the entire study
4. block: block for variables known or suspected to affect the outcome

more on blocking
design an experiment investigating whether energy gels help you run faster
treatment: energy gel
control: no energy gel
block: energy gel might affect pro and amateur athletes differently
block for pro status:
1. divide the sample to pro and amateur
2. randomly assign pro and amateur athletes to treatment and control groups
3. pro and amateur athletes are equally represented in both groups

experimental terminology
1. placebo: fake treatment, often used as the control goup for medical studies
2. placebo effect: showing change despite being on the placebo(they believe that treatment, the mental reason)
3. blinding: experimenal units don’t know which group ther’re in
4. double-blink: both the experimental units and the researchers don’t know the group assignment

random sampling and random assignment
1. random sampling: In observation, random sample in the population.
2. random assignment: In experiment, random assign treatment and control group.
3. random sampling happens first , then random assignment.
4. only a study using random sampling and random assignment can be causal and generalizable.

modality
1. unimodal
2. bimodal
3. uniform
4. multimodal

robust statistics
center: median ; not mean
spread: IQR; not SD,range
skew statistics is good at describing skewed data with extreme obes.

transformation
1. (natural) log transformation: often applied when much of the data cluster near zero(relative to the larger values in the data set) and all observations are positive. For example, the right skewed data transforms to the log data. Then the data is less skewed and has less extreme.
2. square root
3. inverse

goals of transformations
1. see the data structure differently
2. reduce skew assist in modeling
3. straighten a nonlinear relationship in a scatterplot

DASI_1_IntroToData相关推荐

最新文章

  1. 请教大家,如何使用sed命令,替换文件指定行的内容呢?-Linux系统管理-ChinaUnix.net...
  2. JS之数组删除/添加项目方法splice
  3. ASP.NET MVC3 异步刷新
  4. 2019年春运贵州道路客运预计达6700万人次
  5. z-index优先级总结
  6. Examine INIT services -- Linux
  7. linux usb无法识别,求助:USB无法识别
  8. 二十四节气之大雪|PNG免扣素材,设计好素材
  9. python中的递归思想_使用python语言表达分形与递归
  10. 【大数据】0002---MongoDB集群自动分离创建新集群
  11. altium designer 常用元件封装
  12. 使用DroidCam过程中所遇到的问题及处理方法
  13. JSON_UNQUOTE 和JSON_EXTRACT
  14. 麻省理工学院——人工智能公开课06
  15. Spring注解之精进之路--超级详解
  16. linux中date命令详解,linux中date命令的详细解释
  17. spark学习基础篇1--spark概述与入门
  18. 阿里云大数据ACA考点总结
  19. matlab利用已知数据画图三维,Matlab三维数据画图和等高线数据提取
  20. Qtablewidget使用QHeaderView设置表头

热门文章

  1. 最新综述:激光雷达感知深度的域适应方法
  2. ICCV 2021 Workshop 盘点
  3. 打破Transformer宿命,新秀VOLO开源!横扫CV多项记录,首个超越87%的模型
  4. 突破置换模块计算瓶颈,MSRA开源轻量版HRNet,超越主流轻量化网络!|CVPR2021...
  5. 今年CVPR,我们填补了3D场景布局数据集空白,并向全世界开源!
  6. 使用相机暗箱公式和透镜方程估计人脸距离
  7. 那些珍贵的「视觉SLAM」课程资料总结(补充版/完整版)
  8. formRef=React.createRef() this.formRef.current为null
  9. 图像轮廓、凸包、图像的矩、分水岭算法、图像修补
  10. RDKit | 基于随机森林(RF)预测SARS-CoV 3CL蛋白酶抑制剂的pIC50