P_value
- explain p-value to non-tech people
Power of a test / statistical power
Standard Error
What are covariance and correlation? How are they related?
What is the law of large numbers?
Q: What is the Central Limit Theorem? Explain it. Why is it important?
CTR / CTP
What's the major drawback of A/B testing?

P_value

https://towardsdatascience.com/120-data-scientist-interview-questions-and-answers-you-should-know-in-2021-b2faf7de8f3e

P value means the probability of obtaining the observed results of a test.
The smaller the P value, the less likely we can get the observed results in a test based on what our current hypothesis.
And in statistics, our current hypothesis is null hypothesis, and a p-value smaller than 0.05 means that we should reject this null hypothesis and accept the alternative hypothesis.

explain p-value to non-tech people

Let’s say your p-value < 0.05, how would you explain p-value to someone who doesn’t understand statistics?

https://quantifyinghealth.com/p-value-explanation/

P value means how likely the results were so unusual that they appeared just by chance.
The smaller the P value, the more likely that the results were so extreme that they can just appeared by chance.

(P value means the probability of obtaining the observed results of a test.
The smaller the P value, the less likely we can get the observed results in a test based on what our current hypothesis.)

We typically set 0.05 as a threshold to determine if the results are unusual or not. If p-value is smaller than 0.05, then we consider it is very likely that the results appeared by chance.

(And in statistics, our current hypothesis is null hypothesis, and a p-value smaller than 0.05 means that we should reject this null hypothesis and accept the alternative hypothesis.)

Power of a test / statistical power

https://en.wikipedia.org/wiki/Power_of_a_test
The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by 1−β1-\beta1−β ,
Statistical power ranges from 0 to 1, and as the power of a test increases, the probability β\betaβ of making a type II error by wrongly failing to reject the null hypothesis decreases.
‘Statistical power’ refers to the power of a binary hypothesis, which is the probability that the test rejects the null hypothesis given that the alternative hypothesis is true.

Standard Error

https://en.wikipedia.org/wiki/Standard_error
https://stats.stackexchange.com/questions/29641/standard-error-for-the-mean-of-a-sample-of-binomial-random-variables

The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM).

What are covariance and correlation? How are they related?

Covariance is a quantitative measure of the extent to which the deviation of one variable from its mean matches the deviation of the other from its mean.
Correlation is a measurement of the relationship between two variables. It is the covariance of the two variables, normalized by the variance of each variable.

What is the law of large numbers?

The Law of Large Numbers is a theory that states that as the number of trials increases, the average of the result will become closer to the expected value.
Eg. flipping heads from fair coin 100,000 times should be closer to 0.5 than 100 times.

Q: What is the Central Limit Theorem? Explain it. Why is it important?

The central limit theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size gets larger no matter what the shape of the population distribution.
The central limit theorem is important because it is used in hypothesis testing and also to calculate confidence intervals.

https://towardsdatascience.com/120-data-scientist-interview-questions-and-answers-you-should-know-in-2021-b2faf7de8f3e

CTR / CTP

https://yokk.medium.com/differences-between-click-through-rate-ctr-and-click-through-probabilities-ctp-7f7d89d5526f

https://regularization.medium.com/udacity-a-b-testing-notes-lession-1-1e8ca8f8a704
The difference is CTR cares about clicks and CTP cares about visitors. A visitor may click and view page multiple times. In general, a rate is used to measure the usability and a probability is used to measure the impact. For example, use rate to answer how often a user finds a specific button on a web page with many buttons; use probability to answer how many users progress to the next page.
For CTR, engineers modify the website to capture a page view event and a click event.
For CTP, need to further match each page view with all of the child clicks, so that you count, at most, one child click per page view

What’s the major drawback of A/B testing?

The fact that A/B test results are not telling you in absolute terms which version is better. They are telling you which version is better given your current user base, which is the data you use to test.
Can take lots of time and resources

A/B testing can take a lot longer to set up than other forms of testing. Setting up the A/B system can be a resource and time hog, although third-party services can help. Depending on the company size, there may be endless meetings about which variables to include in the tests. Once a set of variables have been agreed, designers and coders will need to effectively work on double the amount of information. In addition, in order to get conclusive results, tests can take weeks and months for low-traffic sites.

https://www.experienceux.co.uk/ux-blog/the-pros-and-cons-of-ab-testing/

A/B testing can make you forget about the big picture

https://medium.com/@madsbuchstage/the-limits-of-a-b-testing-9f96691c9a0c

Summary of Statistics for Interview相关推荐

Spark 机器学习概括统计 summary statistics [摘要统计]
概括统计概括统计 summary statistics [摘要统计] 读取要分析的数据,把数据转变成RDD[Vector]类型: 然后,我们调用colStats()方法,得到一个Multivaria ...
一站式学习Wireshark（七）：Statistics统计工具功能详解与应用
Wireshark一个强大的功能在于它的统计工具.使用Wireshark的时候,我们有各种类型的工具可供选择,从简单的如显示终端节点和会话到复杂的如Flow和IO图表.本文将介绍基本网络统计工具.包括 ...
（七）：Statistics统计工具功能详解与应用
Wireshark一个强大的功能在于它的统计工具.使用Wireshark的时候,我们有各种类型的工具可供选择,从简单的如显示终端节点和会话到复杂的如Flow和IO图表.本文将介绍基本网络统计工具.包括 ...
Spark MLlib 机器学习
本章导读机器学习(machine learning, ML)是一门涉及概率论.统计学.逼近论.凸分析.算法复杂度理论等多领域的交叉学科.ML专注于研究计算机模拟或实现人类的学习行为,以获取新知识.新 ...
一站式学习Wireshark（转载）
一站式学习Wireshark(一):Wireshark基本用法 2014/06/10 · IT技术 · 4 评论 · WireShark 分享到: 115 与<YII框架>不得不说的故事- ...
机器学习客户流失_通过机器学习预测流失
机器学习客户流失介绍 (Introduction) This article is part of a project for Udacity "Become a Data Scient ...
你真的懂数据分析吗？一文读懂数据分析的流程、基本方法和实践
导读:无论你的工作内容是什么,掌握一定的数据分析能力,都可以帮你更好的认识世界,更好的提升工作效率.数据分析除了包含传统意义上的统计分析之外,也包含寻找有效特征.进行机器学习建模的过程,以及探索数据价 ...
案例解读：利用12c渐进式DASH分析ON CPU
墨墨导读:本文来自墨天轮读者"Anbob"供稿,分享利用12c渐进式DASH分析"ON CPU"的过程. 墨天轮主页:https://www.modb.pro/ ...
sql server表分区_介绍分区表SQL Server增量统计信息
sql server表分区 If you are maintaining a very large database, you might be well aware of the pain to p ...

Summary of Statistics for Interview

Table of Contents