Table of Contents

  • P_value
    • explain p-value to non-tech people
  • Power of a test / statistical power
  • Standard Error
  • What are covariance and correlation? How are they related?
  • What is the law of large numbers?
  • Q: What is the Central Limit Theorem? Explain it. Why is it important?
  • CTR / CTP
  • What's the major drawback of A/B testing?

P_value

  • https://towardsdatascience.com/120-data-scientist-interview-questions-and-answers-you-should-know-in-2021-b2faf7de8f3e
  • P value means the probability of obtaining the observed results of a test.
  • The smaller the P value, the less likely we can get the observed results in a test based on what our current hypothesis.
  • And in statistics, our current hypothesis is null hypothesis, and a p-value smaller than 0.05 means that we should reject this null hypothesis and accept the alternative hypothesis.

explain p-value to non-tech people

Let’s say your p-value < 0.05, how would you explain p-value to someone who doesn’t understand statistics?

https://quantifyinghealth.com/p-value-explanation/

  • P value means how likely the results were so unusual that they appeared just by chance.
  • The smaller the P value, the more likely that the results were so extreme that they can just appeared by chance.

(P value means the probability of obtaining the observed results of a test.
The smaller the P value, the less likely we can get the observed results in a test based on what our current hypothesis.)

  • We typically set 0.05 as a threshold to determine if the results are unusual or not. If p-value is smaller than 0.05, then we consider it is very likely that the results appeared by chance.

(And in statistics, our current hypothesis is null hypothesis, and a p-value smaller than 0.05 means that we should reject this null hypothesis and accept the alternative hypothesis.)

Power of a test / statistical power

  • https://en.wikipedia.org/wiki/Power_of_a_test

  • The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by 1−β1-\beta1−β ,

  • Statistical power ranges from 0 to 1, and as the power of a test increases, the probability β\betaβ of making a type II error by wrongly failing to reject the null hypothesis decreases.

  • ‘Statistical power’ refers to the power of a binary hypothesis, which is the probability that the test rejects the null hypothesis given that the alternative hypothesis is true.

Standard Error

  • https://en.wikipedia.org/wiki/Standard_error
  • https://stats.stackexchange.com/questions/29641/standard-error-for-the-mean-of-a-sample-of-binomial-random-variables

The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM).

What are covariance and correlation? How are they related?

  • Covariance is a quantitative measure of the extent to which the deviation of one variable from its mean matches the deviation of the other from its mean.
  • Correlation is a measurement of the relationship between two variables. It is the covariance of the two variables, normalized by the variance of each variable.

What is the law of large numbers?

  • The Law of Large Numbers is a theory that states that as the number of trials increases, the average of the result will become closer to the expected value.
  • Eg. flipping heads from fair coin 100,000 times should be closer to 0.5 than 100 times.

Q: What is the Central Limit Theorem? Explain it. Why is it important?

  • The central limit theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size gets larger no matter what the shape of the population distribution.
  • The central limit theorem is important because it is used in hypothesis testing and also to calculate confidence intervals.

https://towardsdatascience.com/120-data-scientist-interview-questions-and-answers-you-should-know-in-2021-b2faf7de8f3e

CTR / CTP

https://yokk.medium.com/differences-between-click-through-rate-ctr-and-click-through-probabilities-ctp-7f7d89d5526f

  • https://regularization.medium.com/udacity-a-b-testing-notes-lession-1-1e8ca8f8a704
  • The difference is CTR cares about clicks and CTP cares about visitors. A visitor may click and view page multiple times. In general, a rate is used to measure the usability and a probability is used to measure the impact. For example, use rate to answer how often a user finds a specific button on a web page with many buttons; use probability to answer how many users progress to the next page.
  • For CTR, engineers modify the website to capture a page view event and a click event.
  • For CTP, need to further match each page view with all of the child clicks, so that you count, at most, one child click per page view

What’s the major drawback of A/B testing?

  • The fact that A/B test results are not telling you in absolute terms which version is better. They are telling you which version is better given your current user base, which is the data you use to test.

  • Can take lots of time and resources

A/B testing can take a lot longer to set up than other forms of testing. Setting up the A/B system can be a resource and time hog, although third-party services can help. Depending on the company size, there may be endless meetings about which variables to include in the tests. Once a set of variables have been agreed, designers and coders will need to effectively work on double the amount of information. In addition, in order to get conclusive results, tests can take weeks and months for low-traffic sites.

https://www.experienceux.co.uk/ux-blog/the-pros-and-cons-of-ab-testing/

  • A/B testing can make you forget about the big picture

https://medium.com/@madsbuchstage/the-limits-of-a-b-testing-9f96691c9a0c

Summary of Statistics for Interview相关推荐

  1. Spark 机器学习 概括统计 summary statistics [摘要统计]

    概括统计 概括统计 summary statistics [摘要统计] 读取要分析的数据,把数据转变成RDD[Vector]类型: 然后,我们调用colStats()方法,得到一个Multivaria ...

  2. 一站式学习Wireshark(七):Statistics统计工具功能详解与应用

    Wireshark一个强大的功能在于它的统计工具.使用Wireshark的时候,我们有各种类型的工具可供选择,从简单的如显示终端节点和会话到复杂的如Flow和IO图表.本文将介绍基本网络统计工具.包括 ...

  3. (七):Statistics统计工具功能详解与应用

    Wireshark一个强大的功能在于它的统计工具.使用Wireshark的时候,我们有各种类型的工具可供选择,从简单的如显示终端节点和会话到复杂的如Flow和IO图表.本文将介绍基本网络统计工具.包括 ...

  4. Spark MLlib 机器学习

    本章导读 机器学习(machine learning, ML)是一门涉及概率论.统计学.逼近论.凸分析.算法复杂度理论等多领域的交叉学科.ML专注于研究计算机模拟或实现人类的学习行为,以获取新知识.新 ...

  5. 一站式学习Wireshark(转载)

    一站式学习Wireshark(一):Wireshark基本用法 2014/06/10 · IT技术 · 4 评论 · WireShark 分享到: 115 与<YII框架>不得不说的故事- ...

  6. 机器学习 客户流失_通过机器学习预测流失

    机器学习 客户流失 介绍 (Introduction) This article is part of a project for Udacity "Become a Data Scient ...

  7. 你真的懂数据分析吗?一文读懂数据分析的流程、基本方法和实践

    导读:无论你的工作内容是什么,掌握一定的数据分析能力,都可以帮你更好的认识世界,更好的提升工作效率.数据分析除了包含传统意义上的统计分析之外,也包含寻找有效特征.进行机器学习建模的过程,以及探索数据价 ...

  8. 案例解读:利用12c渐进式DASH分析ON CPU

    墨墨导读:本文来自墨天轮读者"Anbob"供稿,分享利用12c渐进式DASH分析"ON CPU"的过程. 墨天轮主页:https://www.modb.pro/ ...

  9. sql server表分区_介绍分区表SQL Server增量统计信息

    sql server表分区 If you are maintaining a very large database, you might be well aware of the pain to p ...

最新文章

  1. 融合机器人技术和神经科学的神经工程未来与挑战
  2. jQuery HighchartsTableHTML表格转Highcharts图表插件
  3. 深度学习在CTR预估中的应用 | CTR深度模型大盘点
  4. win7操作系统上,批处理文件,直接双击的时候,能运行起来。但是以管理员身份运行的时候,都闪退。请问,这是怎么回事?...
  5. 详解Vue中watch的高级用法
  6. C++判断文件夹是否存在
  7. 2020年10月份Github上热门的开源项目
  8. Database2Sharp重要更新之完善EnterpriseLibrary架构代码
  9. 【华为云技术分享】降本增效的背后:华为云瑶光数字化经营实战
  10. C#把文字转换成声音
  11. JSP中request内置对象
  12. mysql如何进行宿舍分配_手把手教你做一个Jsp Servlet Mysql实现的学生宿舍管理系统...
  13. 关于内网打印机的研究-利用PRET对惠普打印机进行渗透
  14. 你的PCB地线走的对吗?为什么要有主地?
  15. python 已知三条边求三角形的角度
  16. c语言常量10进制,C语言常量
  17. win10 桌面右键菜单内容修改
  18. 2019依图科技笔试题
  19. TDB Can't open database at location /path/to/db as it is already locked by the process with PID
  20. android自定义打电话界面,两种Android打电话实现方法

热门文章

  1. 使用Python读取XMind格式测试用例,循环处理字符串
  2. Linux 主机网络接入配置
  3. 一道对10年间中国行政区划个数进行对比的Python考试题
  4. git worktree 的使用
  5. 从一个“仅为”$1Bn的开源数据库IPO,聊聊开源和infra的现在与未来
  6. 基于深度学习的恶意软件检测Python代码及数据
  7. 深度相机(五)--Kinect v2.0
  8. linux qt kits叹号,windows系统,HBuilderX无法启动、点击无反应、或启动报错的解决方案...
  9. 怎么做个修改ip服务器,自己做一个服务器怎么固定ip地址
  10. PAD智龙迷城(puzzle and dragon)辅助转珠算法思路和python实现