数据科学家 数据工程师

by David Venturi

大卫·文图里(David Venturi)

发展数据科学家和工程师 (Developing Data Scientists and Engineers)

Free Code Camp问了15,000个人,他们是谁,以及他们如何学习编码。 我隔离了那些专注于数据科学和数据工程的人。 (Free Code Camp asked 15,000 people who they are, and how they’re learning to code. I isolated those focused on data science and data engineering.)

More than 15,000 people responded to Free Code Camp’s 2016 New Coder Survey, granting researchers (like me!) an unprecedented glimpse into how people are learning to code. They released the entire dataset on Kaggle.

超过15,000人对Free Code Camp的2016年New Coder调查做出了回应,使研究人员( 像我一样! )空前地了解了人们如何学习编码。 他们在Kaggle上发布了整个数据集。

646位受访者回答了“ 数据科学家/数据工程师 ”的问题:“ 您最感兴趣的角色是哪个? ” (646 respondents answered “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?”)

Here are a few high-level statistics from this data-focused subset, which complements Free Code Camp’s exploration of new coders in general.

以下是这个以数据为中心的子集的一些高级统计信息,补充了Free Code Camp 通常对新编码器的探索 。

I’ve borrowed the structure of Free Code Camp’s announcement article for ease of comparison. I’ve also included my comments where findings differ notably. And a few bonus plots, too!

为了便于比较,我借用了Free Code Camp的公告文章的结构。 我还发表了自己的评论,其中发现存在显着差异 还有一些奖励情节!

We asked 15,000 people who they are, and how they’re learning to codeMore than 15,000 people responded to the 2016 New Coder Survey, granting researchers an unprecedented glimpse into how…medium.freecodecamp.com

我们询问了15,000人,他们是谁,以及他们如何学习编码 。超过15,000人对2016年《 New Coder调查》做出了回应,使研究人员能够以前所未有的方式了解…... medium.freecodecamp.com

谁参加了? (Who participated?)

Of the 646 developing data scientists and data engineers who responded to the survey:


  • 25% are women (4% more)

    女性 25% (增加4%)

  • their median age is 26 years old (one year younger)


  • they started programming an average of 16 months ago (5 months earlier)


学习者的目标和方法 (Learner goals and approaches)

平均每周花14个小时学习。 (14 hours each week, on average, are spent learning.)

This is one hour less than new coders in general.


0%的人想要自由职业者或自己创业。* (0% want to freelance or start their own business.*)

Compared to 40% for the full new coder survey, this is a bit shocking. I have a hunch these zero counts are caused by the survey’s design. Every respondent that answered the job role of interest question has zero counts for “start your own business” and “freelance.”

与全新编码器调查的40%相比,这有点令人震惊。 我直觉这些零计数是由调查的设计引起的 。 每个回答了兴趣职位问题的受访者,“开办自己的企业”和“自由职业”的计分都为零。

52%的人已经在申请工作,或者将在明年开始申请。 (52% percent are already applying for jobs, or will start applying within the next year.)

This is a longer time horizon than new coders in general, where 65% are applying within the next year.


他们中的大多数人希望在办公室工作,而不是远程工作。 (Most of them want to work in an office, as opposed to remotely.)

并且大多数人愿意搬迁。 (And a majority are willing to relocate.)

他们中的大多数人尚未参加任何现场编码活动。 (Most of them have not yet attended any in-person coding events.)

64%的人使用过Coursera,edX或Udacity中的至少一种。 (64% have used at least one of Coursera, edX, or Udacity.)

Only 46% of new coders in general have used at least one of these resources. These companies have a wider range of subject areas than the some of the coding-specific resources listed.

通常,只有46%的新编码员至少使用了其中一种资源。 这些公司的主题领域比列出的某些特定于编码的资源还要广泛。

不到20%的人收听与编码相关的播客。 (Less than 20% listen to coding-related podcasts.)

Of them, Partially Derivative, Becoming A Data Scientist, and Talking Machines are the only data-specific podcasts noted.

其中, 部分衍生 , 成为数据科学家和Talking Machines是唯一提到的特定于数据的播客。

只有1%的人参加了训练营。 (Only 1% have attended a bootcamp.)

6% of new coders have attended a bootcamp.


人口统计学和社会经济学 (Demographics and Socioeconomics)

以数据为中心的受访者来自166个国家。 (Data-focused respondents represent 166 countries.)

超过90%来自北美,欧洲和亚洲。 (More than 90% are from North America, Europe, and Asia.)

The dominating percentage of North Americans should be expected because Free Code Camp is based in the United States.

因为Free Code Camp的总部位于美国,所以应该可以预期北美人占主导地位。

他们的城市涵盖了广泛的城市化水平。 (Their cities span a wide range of urbanization levels.)

不到四分之一的受访者是他们国家的少数民族。 (Just under a quarter of respondents are ethnic minorities in their country.)

几乎一半是非英语母语者。 他们长大后会讲148种语言中的一种。 (And nearly half are non-native English speakers. They grew up speaking one of 148 languages.)

67%的人至少拥有学士学位。 (67% have earned at least a bachelor’s degree.)

Compared to 58% for new coders in general, the data-focused subset is more skewed towards post-secondary studies.


他们研究了425个不同的专业。 计算机科学和数学是最受欢迎的两个专业,另有16%的人学习某种形式的工程。 (They studied 425 different majors. Computer Science and Mathematics were the two most popular majors, and an additional 16% studied some form of engineering.)

Diversity amongst majors is greater compared to the full survey, where Computer Science and Information Technology checked in at #1 and #2 with 17% and 5%, respectively.


目前只有一半以上在工作。 (Just over one-half are currently working.)

Two-thirds of the new coder population are currently working.


科技行业的四分之一工作。 (A quarter work in the tech industry.)

There is a higher variety of employment fields compared to the full dataset, where 50% of respondents work in software development and IT.


目前的中位数工资为$ 44k。 (Median current salary is $44k.)

The median current salary for the full dataset is $37k.


他们希望凭借新的数据科学/工程技能获得中位数6万美元。 (And they expect to earn a median of $60k with their new data science/engineering skills.)

The median for the full survey dataset is $50k. With data science/engineering being notoriously lucrative in 2016, some respondents might be seeking higher wages.

整个调查数据集的中位数为5万美元。 随着2016年数据科学/工程学的丰厚利润 ,一些受访者可能会寻求更高的薪水。

7%曾在本国的军队中服役。 (7% have served in their country’s military.)

13%有孩子,另外3%在经济上抚养年长或残疾亲戚。 五分之一的人在没有配偶帮助的情况下这样做。 (13% have children, and another 3% financially support an elderly or disabled relative. And one-fifth are doing this without the help of a spouse.)

47%的人认为自己就业不足(从事的工作低于其教育水平)。 (47% consider themselves underemployed (working a job that is below their education level).)

This is 5% higher than new coders in general.


如果他们有房屋抵押贷款,他们平均要欠$ 194k。 (If they have a home mortgage, they owe an average of $194k.)

如果他们有学生贷款,他们平均要欠37,000美元。 (If they have student loans, they owe an average of $37k.)

This average is $3k more than the full survey dataset.


14%的人尚未在家中使用高速互联网。 (14% don’t yet have high-speed internet at home.)

目前,有3%的人正在从政府那里获得残疾补助。 (And 3% are currently receiving disability benefits from their government.)

这些是正在学习数据科学和工程的人。 免费的,自定进度的学习资源绝对重要。 (These are the people who are learning data science and engineering. Free, self-paced learning resources are definitely important.)

下一步是什么? (What’s next?)

You can find a more detailed version of this analysis on Kaggle, where I outline my exploratory data analysis (EDA) process.

您可以在Kaggle上找到此分析的更详细版本 ,其中概述了探索性数据分析(EDA)过程。

Be sure to check out my initial exploration of Free Code Camp’s dataset, where I dive deeper into the characteristics of new coders:

一定要检查一下我对Free Code Camp数据集的初步探索,在此我将更深入地研究新编码员的特征:

New Coders: How Salary and Time Spent Learning Vary by DemographicI analyzed the 15,000 respondents to Free Code Camp’s New Coder Survey by continent, gender, and whether they’re an…medium.freecodecamp.comThe 6 most desirable coding jobs (and the types of people drawn to each)Free Code Camp asked 15,000 people who they are, and how they’re learning to code. I separated them by their job…medium.freecodecamp.com

新编码员:薪资和学习时间的变化因人口 我按大陆,性别以及他们是否从事过Free Code Camp的New Coder调查对15,000名受访者进行了分析…… medium.freecodecamp.com 6个最理想的编码工作(以及吸引每个人的类型) 免费代码营问了15,000个人,他们是谁,以及他们如何学习编码。 我按他们的工作把他们分开了…… medium.freecodecamp.com

If you have questions or concerns about this series or the R code that generated it, don’t hesitate to let me know.

如果您对此系列或生成它的R代码有疑问或疑虑,请随时告诉我 。

David Venturi (@venturidb) | TwitterThe latest Tweets from David Venturi (@venturidb). Creating my own data science master's degree. @queensu chem eng/econ…twitter.com

大卫·文图里(@venturidb)| Twitter 来自David Venturi的最新推文(@venturidb)。 创建自己的数据科学硕士学位。 @queensu Chem eng / econ… twitter.com

翻译自: https://www.freecodecamp.org/news/developing-data-scientists-engineers-710f4ef5a773/

数据科学家 数据工程师

数据科学家 数据工程师_发展数据科学家和工程师相关推荐

  1. 数据科学学习心得_学习数据科学时如何保持动力

    数据科学学习心得 When trying to learn anything all by yourself, it is easy to lose motivation and get thrown ...

  2. 数据的四大特征_大数据

    数据的四大特征_大数据 我们总是在谈数据分析,数据分析什么的,那我们现在先不谈数据分析,我们先来谈谈数据分析的基础--数据.那么到底什么是数据,数据有什么特征呢?这个问题虽基础却重要. 这里我们所说的 ...

  3. 大数据 端到端_成为数据科学家的端到端指南

    大数据 端到端 数据科学提示/入门指南 (DATA SCIENCE TIPS /BEGINNERS GUIDE) Data Science has improved considerably over ...

  4. 数据科学与大数据技术的案例_作为数据科学家解决问题的案例研究

    数据科学与大数据技术的案例 There are two myths about how data scientists solve problems: one is that the problem ...

  5. 大数据薪水大概多少_大数据工程师工资一般多少钱

    大数据热度不减,大家最关注的还是大数据工程师的工资待遇,在咨询课程前都会问这样的问题:大数据工程师拿多少工资?我国大数据人才缺口大,这是大数据工资收入提升的一个条件,另一个就是其岗位分不同工资待遇也会 ...

  6. 大数据治理工程师_大数据治理体系的思考,究竟能为大数据工程师行业带来什么,原来!!!...

    [摘要]近几年大数据为人类社会做出了很多贡献,而治理就成为了一个规范大数据发展的准则,其中比较吸引网友注意的就是大数据治理体系的思考,这对于大数据行业究竟意味着什么,是否能成了保证大数据领域安全的一把 ...

  7. Hadoop数据工程师_大数据Hadoop技术好学吗

    在大数据技术体系当中,Hadoop无疑是占据着非常重要的位置.从2005年Hadoop项目诞生开始,到如今发展到相对成熟稳定的阶段,Hadoop技术在大数据处理当中的重要性无疑是值得关注的,很多企业也 ...

  8. 大数据治理工程师_大数据治理关键技术解析(转自EAWorld)

    在企业数据建设过程中,大数据治理受到越来越多的重视.从企业数据资产管理和提升数据质量,到自服务和智能化的数据应用,大数据治理的内容在不断地发展和完善,其落地实施的过程中会遇到各种各样的难题和挑战.本篇 ...

  9. python大数据开发工程师_大数据开发工程师的职责

    大数据开发工程师的职责 大数据开发工程师的职责1 职责: 1.负责数据采集.数据存储.数据查询.数据计算等基础平台的设计和开发工作; 2.利用Hadoop.Spark.Flink等技术进行离线和实时数 ...


  1. wpf 多线程处理同步数据_一文带你理解多线程的实际意义和优势
  2. subList生成的列表和原列表的对比
  3. jQuery常用操作
  4. QML Profiler性能优化教程
  5. Freemarker中通过request获得contextPath
  6. spring相互依赖怎么解决_被问到Spring循环依赖怎么解决?秀给面试官看!内附图解...
  7. 阵列信号处理知识概括总结
  8. 【2020.10.27 牛客 普及组 模拟赛5】T4 飞行棋
  9. 翻译: Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
  10. 支付回答——如何理解借记和贷记
  11. 阿里天池:Airbnb短租房数据集分析
  12. Python数据分析三剑客学习笔记Day6——matplotlib包的使用:数据可视化,简单绘制柱状图、曲线图、饼图、频率分布直方图
  13. springboot报错 The Bean Validation API is on the classpath but no implementation could be found
  14. VMware内CentOS-7-Minimal的安装与配置(详细图文教程)
  15. 前台页面与后台管理系统自动生成工具
  16. 【代码】H5页面实现唤起AndroidAPP并传递参数
  17. JS替换字符串中所有指定的字符(串)
  18. 使用vxe-table 制作编辑表格
  19. ipad怎么和mac分屏_Macbook在ipad上怎么投屏和分屏,推荐两个app
  20. 【计算机视觉】Selective Search for Object Recognition论文阅读2


  1. 案例 员工坐小车回家 c# 1614190875
  2. eclipse的安装与基本配置
  3. py程序员写代码的习惯养成 防止想到什么写什么
  4. 08-mysql-条件查询-常见函数与小结
  5. C++头文件,预处理详解
  6. C#.net实现密码加密算法的语句
  7. 管理功能(下):EqualLogic PS5000 强大丰富
  8. 一个老工程师的肺腑之言!!!
  9. Http Ajax技术哪家强?Axios、Superagent、Request、Fetch、Supertest技能大比拼!
  10. 通过NodeJS自动生成的MySQL的REST风格API