哈佛大学统计学教材

By: Asher Noel and Alicia Wu

作者:Asher Noel和Alicia Wu

On May 6, the Group for Undergraduates in Statistics at Harvard College (GUSH) spoke with 6 alumni about their experiences since having graduated from Harvard Statistics. Below are their thoughts.

5月6日, 哈佛大学统计学系本科生 (GUSH)与6位校友就哈佛大学统计学系毕业后的经历进行了交谈。 以下是他们的想法。

We would like to thank our amazing panelists for their participation: Kathy Evans AB ’08 PhD ’17 now at the Toronto Raptors, David Robinson AB ’10 Princeton PhD ’15 now at Heap, Jason Rosenfeld AB ’12 now at NYU, Yannik Pitcan AB ’11 Berkeley PhD ’20 now at Y9 Solutions, Raj Bhuptani AB/AM ’13 now at Two Sigma, and Michele Zemplenyi AB ’13 PhD ’20 now at Bloomberg Harvard City Leadership Initiative. And a special thanks to Joe Blitzstein, GUSH’s faculty advisor and Professor of the Practice in Statistics, for connecting GUSH with the panelists.

我们要感谢我们出色的小组成员的参与: Kathy Evans AB '08 PhD '17现在在多伦多猛龙队, David Robinson AB '10 Princeton PhD '15现在在Heap, Jason Rosenfeld AB '12现在在NYU, Yannik Pitcan AB '11 Berkeley博士'20现在在Y9 Solutions, Raj Bhuptani AB / AM '13现在在2 Sigma,以及Michele Zemplenyi AB '13 PhD '20现在在彭博哈佛市领导计划。 特别感谢 GUSH的教职顾问兼统计学实践教授Joe Blitzstein,他将GUSH与小组成员联系起来。

How does Statistics impact your work?

统计信息如何影响您的工作?

David: As a Principal Data Scientist, I work a lot with web analytics. I ask questions like, “How do people get into websites? What makes them likely to convert to being a user? How does usage change over time?” I then develop features for automated insights, testing many hypotheses to identify features for every client. Multiple hypothesis testing means that I cannot just compute a p-value for every test. As always, I need to look out for confounding factors.

David :作为首席数据科学家,我从事网络分析方面的很多工作 我问诸如“人们如何进入网站? 是什么使他们可能转变为用户? 使用情况随时间如何变化?” 然后,我开发用于自动洞察的功能,测试许多假设以识别每个客户的功能。 多重假设检验意味着我不能只为每个检验计算一个p值。 与往常一样,我需要注意混淆因素。

Kathy: When I worked in public health, I found that people misuse data often. I asked myself, “How do I communicate ideas to people who don’t understand statistics?” Talking about multiple hypothesis testing with physicians is one thing. Talking to basketball coaches is far different. I focused on how I can visualize data, and how I can simplify ideas to focus on the core problems. A lot of audiences are not going to care about a p-value. They want a range of interesting values. Tools like ggplot did not exist when I was in undergrad, but libraries like those are great for expressing data. Now, I do a lot of exploratory data analysis (EDA), where I look at distributions, skews, and other metrics, et cetera.

凯西 :我在公共卫生部门工作时,发现人们经常滥用数据 。 我问自己:“如何与不懂统计的人交流思想?” 与医生讨论多重假设检验是一回事。 与篮球教练交谈完全不同。 我专注于如何可视化数据,以及如何简化构想以关注核心问题 。 很多观众不会关心p值。 他们想要一系列有趣的值。 当我上大学时,没有诸如ggplot之类的工具,但是诸如此类的库非常适合表达数据。 现在,我做了很多探索性数据分析(EDA),主要研究分布,偏差和其他指标等。

Jason: All the problems I work on fall into one of two buckets: 1) “Moneyball-type analytics”: How do we use statistics to evaluate players? How much should we pay players? Every year, the NBA draft is in June, and all 30 NBA teams have analytics groups that build models to try to forecast the future careers of young 18-year old basketball players. Most teams still rely on scouts to go watch and see who “looks” like a good player, but teams are increasingly placing more weight on different types of models in draft evaluation to decide on players. Instead of talking in terms of regression, I have to put ideas in ways other people can understand. For example, “high floor” means that someone has a decent chance of spending time in the NBA, but they likely will not be a star. The other bucket of problems are 2) “Statistics for the betterment of the NBA.” These are harder. We need to be creative. It’s difficult to run experiments, as we can’t just create an NBA game for a month. Instead, we use focus groups. Eventually, we tune questions like, “How long is the game? What are the rules? Where are the lines?”

贾森(Jason) :我研究的所有问题都属于以下两个方面之一:1) “资金类型分析”:我们如何使用统计数据评估玩家? 我们应该付给玩家多少钱? 每年NBA选秀大会都在6月进行,所有30个NBA球队都有分析小组,这些小组会建立模型来尝试预测18岁年轻篮球运动员的未来职业。 大多数团队仍然依靠球探去观察谁看起来像一个好的球员,但是在选拔评估中, 团队越来越重视不同类型的模型来决定球员。 与其谈论回归,不如说我要以其他人可以理解的方式提出想法。 例如,“高层”意味着某人有相当多的机会在NBA中度过时光,但他们可能不会成为明星。 另一个问题是2) “统计数据,以改善NBA。” 这些比较难。 我们需要发挥创造力。 进行实验非常困难,因为我们不能仅仅创建一个月的NBA游戏。 相反,我们使用焦点组。 最终,我们调整了诸如“游戏持续多长时间? 都有些什么样的规矩? 线在哪里?”

Yannik: Statistics help tell stories. With a statistics background, it is easier to interpret and then communicate data to stakeholders and clients, translating rigor into something others can understand. Explaining concepts like time series or concept drift detection, requires more statistical sophistication than you might expect. To explain concepts intuitively, you need to be able to understand the concepts very intuitively as well.

Yannik统计有助于讲故事。 具有统计背景,可以更轻松地解释数据,然后将数据传达给涉众和客户,从而将严谨性转化为其他人可以理解的东西。 解释诸如时间序列或概念漂移检测之类的概念需要比您预期的更多的统计复杂性。 为了直观地解释概念,您还需要能够非常直观理解这些概念

Raj: At Two Sigma, pretty much everyone is a statistician in some way. I have not directly used anything I’ve learned in Stat 210 and above, but I would not be able to do anything I currently do without having taken them. Learning statistics at a very deep level, visualizing randomness, helps you answer questions in a productive way. In industry, what matters is answering the question, not using the coolest model. Statistics can make methods complicated, but it is all about thinking: How do I capture what I want to capture, measure what I want to measure? What expresses what I need to express? It’s important to develop an ‘orthogonal practitioner’s knowledge’ to answer questions in practical, industrial settings. Oftentimes, by the time you get to part q on problem sets, it’s 3am and you hope your teaching fellows won’t read your response closely. But that part q is the question that matters, nothing else matters, everything else is just implementation: “When deploying in real life, how will this thing behave?” These are the most important questions. Don’t ignore part q of your problem sets.

拉吉 :在西格玛(Two Sigma),几乎每个人在某种程度上都是统计学家 。 我没有直接使用我在Stat 210及更高版本中学到的任何东西,但是如果不使用它们,我将无法做我目前所做的任何事情。 深入学习统计数据,可视化随机性,可以帮助您有效地回答问题。 在行业中,重要的是回答问题 ,而不是使用最酷的模型。 统计信息会使方法变得复杂,但这全都与思考有关 :如何捕获要捕获的内容,度量要度量的内容? 什么表达了我需要表达的东西? 开发“正交从业者的知识”以在实际的工业环境中回答问题非常重要。 通常,当您到达问题集的第q部分时,时间是凌晨3点,您希望您的老师不会仔细阅读您的回答。 但是,第q部分是一个重要的问题,其他的都不重要,其他的一切都只是实现:“在现实生活中部署时,这个东西的行为如何?” 这些是最重要的问题。 不要忽略问题集的q部分。

Michele: The role of a statistician with people who do not have strong stats backgrounds is to be there for common sense. People without statistics backgrounds take things like a p-value threshold very seriously, but we know that results can be sensitive to outliers and assumptions. You should be a sounding board to help them guide decisions. Experience in undergrad analyzing data will help build up that sense and intuition.

米歇尔 :统计学家的角色并没有很强的统计背景,这是常识 。 没有统计背景的人会非常重视p值阈值之类的东西,但是我们知道结果可能对异常值和假设敏感。 您应该是一个共鸣板,以帮助他们指导决策。 本科生分析数据的经验将有助于建立这种感觉和直觉。

How do you differentiate Statistics and Applied Math?

您如何区分统计和应用数学?

David: The most important skill that separates people with statistics from people with applied math, physics, or economics backgrounds is programming with data. I have met many brilliant physicists, but they often still use Excel when they want to graph something. Stat undergrad programs have built more R and Python into their curriculum. I learned R in STAT 111 and STAT 135.

大卫(David) :将具有统计学的人与具有应用数学,物理学或经济学背景的人区分开的最重要技能是使用数据编程 。 我遇到了许多杰出的物理学家,但是当他们想要绘制图形时,他们仍然经常使用Excel。 Stat本科生计划在其课程中增加了R和Python。 我在STAT 111和STAT 135中学习了R。

Kathy: In statistics, people think of things in terms of dataframes and flat data; at Google, software engineers using statistics into production frameworks thought about data very differently. The way you picture data in your head is very different.

凯西(Kathy) :在统计中,人们从数据帧和平面数据的角度来思考事物; 在Google,将统计数据用于生产框架的软件工程师对数据的看法大不相同。 在脑海中描绘数据的方式非常不同。

Yannik: The “edge” as a statistician compared to applied math or computer science is stark: it’s how we understand uncertainty. With statistics, you will stand out among data scientists. Most come straight from a computer science background. More than just p-values, can you intuitively understand ordinary least square (OLS) regression? Leverage? Deriving OLS normal equations? Yet it is invaluable to have an understanding of algorithms and data structures in industry.

Yannik :与应用数学或计算机科学相比,统计学家的“优势”是鲜明的: 这就是我们理解不确定性的方式。 借助统计信息,您将在数据科学家中脱颖而出。 大多数人直接来自计算机科学背景。 除了p值以外,您还能直观地理解普通最小二乘(OLS)回归吗? 杠杆? 推导OLS正规方程? 然而,了解行业中的算法和数据结构非常宝贵。

Raj: I am generalizing a bit, but there are way more computer science people who think they can do some stat than there are stat people who know how to code. You should be a stat person who knows how to code: that’s much more powerful. I interview people who have a much stronger computer science than statistics background. They tend not to do as well.

拉吉 :我在概括一下,但是比知道如何编码的统计人员,有更多的计算机科学人员认为他们可以做一些统计。 您应该是一个知道如何编码的重要人物:这要强大得多。 我采访的人比计算机科学背景的计算机科学知识要强得多。 他们往往做得不好。

Michele: If you want to pursue graduate school, it’s still good to have a strong theoretical background by taking proof-based math courses.

米歇尔(Michele) :如果您想读研究生,那么通过参加基于证明的数学课程来拥有强大的理论背景仍然是一件好事。

How do you see and deal with Gender Bias?

您如何看待和处理性别偏见?

Kathy: In my experience in academia, biostatistics has a lot more women than statistics, so there are women in strong leadership positions as teaching assistants. But there are small micro-aggressions: One time, Natalie Dean, my former teaching assistant who is now a professor at the University of Florida, was not referred to as “doctor” or “professor” on CNN, while her male counterpart was. Similar experiences have happened to me a couple of times at conferences.

凯西(Kathy) :根据我在学术界的经验,生物统计学领域的女性人数要多于统计学领域的人数,因此,在担任助教方面,女性担任领导职务的地位很高。 但是有一些小小的侵略行为 :有一次,我的前助教纳塔莉·迪恩(Natalie Dean)现在是佛罗里达大学的教授,但在CNN上却没有被称为“医生”或“教授”,而男性则被称为CNN。 我在会议上也经历过两次类似的经历。

Michele: The places where I have worked have been pretty equal and fair, but people tend to look to women to “organize dinners and organize department picnics.” Be aware that you do not naturally fall into those roles just because people think you do.

米歇尔(Michele) :我工作过的地方相当平等和公正,但是人们倾向于让女性“组织晚餐和组织部门野餐”。 请注意, 您不会自然而然地就因为人们认为您愿意而成为这些角色

What classes do you recommend?

您推荐什么课程?

Jason: STAT 110 “forced me to think in a way other classes did not,” STAT 149.

贾森(Jason)STAT 110 “迫使我以其他班级没有的方式思考”,STAT 149。

Raj: STAT 139 was the “most applicable class.” People need to understand material at the STAT 110 level, not STAT 104 level. Understanding linear modeling from a geometric perspective, by taking classes such as STAT 139, 149, and 244, is important.

RajSTAT 139是“最适用的类别”。 人们需要了解STAT 110级别而不是STAT 104级别的材料。 通过采用STAT 139、149和244之类的类从几何角度理解线性建模非常重要。

Kathy: STAT 244 is the “best class I’ve taken in anything ever.”

凯西(Kathy)STAT 244是“我参加过的有史以来最好的课程”。

Yannik: STAT 110, STAT 210, and stochastic processes if interested in quantitative finance.

Yannik :STAT 110,STAT 210和随机过程(如果对量化金融感兴趣)。

David: STAT 135 on “Statistical Computing Processes,” CS 50 and CS 61.

David :有关“统计计算过程”的STAT 135 ,CS 50和CS 61。

Michele: STAT 139 and 149.

米歇尔 :STAT 139 和149。

The Group for Undergraduates in Statistics at Harvard College (GUSH) is committed to creating a unique and open space on campus for students interested in statistics. Join our mailing list here! Thanks to our panelist sourcers Rachel Li and Ginnie Ma, and Ben Chiu for his questions.

集团为本科生统计在哈佛学院 (GUSH)致力于为有兴趣在统计学生创造在校园里一个独特的开放空间。 在这里 加入我们的邮件列表 感谢我们的专题讨论小组发言人雷切尔·李(Rachel Li)和马金妮(Ginnie Ma),以及本秋(Ben Chiu)的提问。

翻译自: https://medium.com/@harvardgush/statistics-in-the-workplace-lessons-from-harvard-alumni-c2824d52e1f2

哈佛大学统计学教材


http://www.taodudu.cc/news/show-3104261.html

相关文章:

  • python爬虫之古诗词分类爬取加存储
  • 2022.10.9 英语背诵
  • 2022.10.21 单词背诵
  • 2022.10.13 英语背诵
  • to 管理员:网站的“技术区文章列表RSS”有问题 我用GUSH连不上!
  • 【字符串处理】文明的复兴 words.pas/c/cpp/in/out
  • android 通过xmpp即时聊天客户端往服务器发消息,利用XMPP协议推送服务器告警信息到安卓平台及桌面...
  • PHP如何获取网页源码?
  • php源码如何使用教程,php源码的使用方法是什么?
  • php.net国内镜像及php源码下载[非常快]
  • 下载PHP源码包
  • 二次元PHP随机api接口源码,随机二次元图片API接口php源码
  • 国外php开源网站源码,国外收藏的一款免费PHP极简云网盘源码
  • 星座生辰八字算命系统超强大功能程序源码下载
  • windows源码编译PHP7.1
  • PHP7.2源码安装
  • 分享88个搜索链接PHP源码,总有一款适合你
  • c语言实例 魔术师的猜牌术(1),C语言实例:魔术师的猜牌术(2)
  • C语言趣味猜牌术
  • 魔术师的猜牌术(1)
  • 数的变幻(魔术师的猜牌术(2))
  • 魔术师发牌问题
  • 队列及其应用-取牌游戏
  • HNUST 1231 趣味程序设计_猜牌术(-)
  • UVa Problem 10205 Stack ’em Up (完美洗牌术)
  • HNUST 1231:猜牌术(水....)
  • 完美洗牌
  • 【刷题记录】【一维数组】魔术师的猜牌术。
  • 2.8.5 完美洗牌术 Stack 'em Up
  • 2.5趣味数学之猜牌术

哈佛大学统计学教材_哈佛校友在职场上的统计数据相关推荐

  1. python与财务报表_雅虎财经Python网站刮板关键统计数据和财务报表

    我不相信你用来提取信息的方法是最可靠的方法,但是我改变了你的代码来捕捉你需要的信息.我更新了正则表达式以检查括号,并在末尾添加了一个节来替换import urllib import re keysta ...

  2. graphpad数据小数点_教你如何用graphpad统计数据,值得一看

    很多初搞科研的菜鸟都对数据后期的整理头疼,今天就用简洁明了的方式教教大家怎么用graphpad软件统计并分析数据 工具/材料 graphpad软件 操作方法 01 打卡graphpad后打击上方的ne ...

  3. 职场上个人的核心技术_在职场上,一定要让自身强大起来!

    当一个同事,再向你提出建议和表示出自己的观点的时候,你可能会心不在焉的去听,甚至能感觉出来没有什么太大的意义,其结果是不接受或者也不采纳. 如果一个各方面的能力很强的同事,你也尊重他,再给你提出建议和 ...

  4. 面向数据科学家的实用统计学_数据科学家必知的统计数据

    面向数据科学家的实用统计学 Beginners usually ignore most foundational statistical knowledge. To understand differ ...

  5. 最近我一个朋友在职场上陷入了迷茫

    最近我一个朋友在职场上陷入了迷茫,他昨天找到我,向我寻求帮助,他说他很焦虑,我很吃惊的问他,这不像你的性格啊. 他说:我和你说说你就知道了. 他在一家创业型公司任职人事经理,最近公司的核心总监提出了离 ...

  6. 我在职场上的所表达出来的个性

    我在职场上的所表达出来的个性 1.善于总结,善于表达,善于沟通 能把单片机程序方面的专业内容说到客户抚掌大笑:也能帮客户翻译他们的专业知识,让他们高兴的说"哎呀我就是想的就是说不出来,没想到 ...

  7. p值 统计学意义_统计学意义不重要为什么p值不应过高

    p值 统计学意义 Have you ever heard somebody say that a study revealed „significant results"? What doe ...

  8. 【转】你很闲吗?居然想在职场上收割友谊和性?

    有同学问我"要不要和同事谈恋爱"和"要不要和同事交朋友",我的建议都是不要. 当然我也可以说一句:"如果公司是你家开的,那就可以." 然而就 ...

  9. 在职场上奋战不懈的各位12把健康金钥匙

    抗老专家王卫民教授给在职场上奋战不懈的各位12把健康金钥匙/|!z( 健康的金钥匙 第 1 把KEY 晨起一杯水,身体水平衡 第 2 把KEY 右侧卧.睡如弓,睡眠品质好 第 3 把KEY 只吃七分饱 ...

最新文章

  1. python 如何跳过异常继续执行
  2. 关于 UDP Hole Punching 的资料
  3. JACK——PaintRobot Exercise9
  4. Extjs Window用法详解 3 打印具体应用,是否关掉打印预览的界面
  5. 高并发Redis缓存如何设计
  6. django使用Paginator分页展示数据
  7. contos的apt-get安装
  8. 增加mysql的sortbuffer_mysql 参数调优(14)之优化filesort sort_buffer_size、innodb_sort_buffer_size...
  9. vue中computed与watch的区别
  10. 视频质量评价PSNR
  11. IIS下安装php5.3
  12. 番茄插件安装及使用介绍
  13. VMware虚拟机中设置端口映射
  14. 基恩士XG-XvisionEditor离线仿真
  15. echarts柱状图、折线图 渐变色,填充渐变色,鼠标移入样式,双y轴
  16. prof8000安装
  17. c语言共用体类型变量在程序执行期间,2003年10月甘肃省高等教育自学考试C语言程序设计试卷...
  18. 批量修改文件夹里的图片尺寸大小
  19. 从生物进化浅谈产品创新
  20. 哈工大2022春计算机系统大作业:程序人生-Hello‘s P2P

热门文章

  1. RK3399平台开发系列讲解(PCI/PCI-E)5.55、PCIE RC枚举EP过程
  2. 计算机网络——DNS域名解析服务器原理
  3. mac 电池不在充电
  4. THz:随机湍流信道下THz无线LOS链路的性能研究
  5. 迅雷5单磁极 v5.9.99.999【单文件-下磁力】解除敏感资源限制
  6. 华为荣耀手机复制卡号,开启NFV功能。说白了就是将原本的卡的信息复制到手机上,以后不用带卡,带手机就行了。
  7. python中dict.update与__dict__的使用
  8. [main] org.apache.catalina.core.StandardContext.startInternal 一个或多个listeners启动失败,更多详细信息查看对应的容器日志文件
  9. 计算机应用技基础10018,北京关于全国计算机等级考试与自考课程衔接的安排
  10. SoX 安装(Ubuntu+win10)的新手误区和正确安装方法。