如何成为数据科学家

介绍 (Introduction)

If you have been browsing job ads lately, you would have noticed a huge amount of positions available for Data Scientist. The demand seems to be much larger than the supply which means that there is a huge opportunity here. However, there appears to be a catch: Most of these positions requires some experience or knowledge in the field of Data Science. So if you want midway through your career, how can you skill up to become a Data Scientist?

如果您最近一直在浏览招聘广告,您会发现Data Scientist可以提供大量职位。 需求似乎比供应大得多,这意味着这里有巨大的机会。 但是,似乎有一个陷阱:这些职位中的大多数都需要数据科学领域的一些经验或知识。 因此,如果您想在职业生涯中途前进,那么如何才能成为一名数据科学家呢?

Well today I will attempt to answer this question.

今天,我将尝试回答这个问题。

什么是数据科学 (What is Data Science)

Before we jump into how one can become a Data Scientist, let’s first have a quick look at what exactly Data Science is.

在深入探讨如何成为一名数据科学家之前,首先让我们快速了解一下什么是数据科学。

We are all aware of the so-called “explosion of data”. More and more data is gathered through the web, mobile apps, fitness devices and the like. This is collectively known as Big Data. But big data does not only refer to the volume of data, but also to high velocity and high variety data.

我们都知道所谓的“数据爆炸”。 通过网络,移动应用程序,健身设备等收集越来越多的数据。 这统称为大数据。 但是大数据不仅指数据量,而且指的是高速和多变的数据。

Data Science is the skills and techniques required to make sense of all this data. Which includes advanced analytics, data mining, machine learning, data visualization and statistics. It’s the ability to draw insights from raw amounts of data to solve real-world problems.

数据科学是理解所有这些数据所需的技能和技术。 其中包括高级分析,数据挖掘,机器学习,数据可视化和统计。 它是从原始数据中汲取见解以解决实际问题的能力。

According to the Gartner Report “Critical Capabilities for Operational Database Management Systems” 2015 :

根据Gartner报告“运营数据库管理系统的关键功能” 2015:

“By 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform.”

“到2017年,所有领先的运营DBMS将在一个DBMS平台中提供关系和NoSQL的多种数据模型。”

We can already see this in SQL Server 2016 which now includes:

我们已经在SQL Server 2016中看到了这一点,现在它包括:

  • R Services

    R服务

    R services allow data scientists and analysts to run statistical programming queries directly on their database. It supports extremely fast computations using multiple cores, processors and threads.

    R服务使数据科学家和分析人员可以直接在其数据库上运行统计编程查询。 它支持使用多个内核,处理器和线程的超快速计算。

  • PolyBase

    PolyBase

    PolyBase acts as a gateway between SQL Server and Hadoop or Azure blob storage, so you can use Transact-SQL to query non-relational data in the same way you would query relational data on your database.

    PolyBase充当SQL Server与Hadoop或Azure Blob存储之间的网关,因此您可以使用Transact-SQL来查询非关系数据,就像查询数据库上的关系数据一样。

  • PowerBI

    PowerBI

    PowerBI it is tightly integrated with SQL Server allowing for easy analysis and sharing of data insights and creating rich visualizations

    PowerBI与SQL Server紧密集成,可轻松分析和共享数据见解并创建丰富的可视化图像

  • Cortana Intelligence Suite on Azure

    Azure上的Cortana Intelligence套件

    The Cortana intelligence suite combines big data and advanced analytics, allowing you to get actionable intelligence from your data. You can create models with Azure Machine Learning, and analyze data in Azure Data Lake or SQL Data Warehouse using Azure Data Lake Analytics, or Azure stream analytics, to mention but a few of the powerful tools which can be used with Cortana.

    Cortana智能套件结合了大数据和高级分析功能,使您能够从数据中获得可行的情报。 您可以使用Azure机器学习创建模型,并使用Azure Data Lake Analytics或Azure流分析来分析Azure Data Lake或SQL Data Warehouse中的数据,这里仅列举了一些可与Cortana一起使用的强大工具。

  • Keeping this in mind, A SQL Server professional will already have access to the tools required to become a Data Scientist.

    牢记这一点,SQL Server专业人员已经可以使用成为数据科学家所需的工具。

    Here is a look at what Azure Machine Learning Studio looks like. You can try it out for free by going to this link and clicking on the Start Studio button.

    这是Azure Machine Learning Studio的外观。 您可以通过转到此链接并单击“ Start Studio”按钮免费试用。

    A myriad of helpful resources is available here to help you get started, including an interactive tutorial.

    这里提供了大量有用的资源,包括交互式教程,可以帮助您入门。

    Figure 1: Microsoft Azure Machine Learning in Action 图1:Microsoft Azure机器学习的实际应用

    成为一名数据科学家我需要知道什么 (What do I need to know to be a Data Scientist)

  1. You need to understand data. Know how to explore it and how to use statistical and analytical techniques

    您需要了解数据。 知道如何探索它以及如何使用统计和分析技术

  2. You need to be able to query and manipulate data sets into required formats using Transact-SQL

    您需要能够使用Transact-SQL将数据集查询和处理为所需格式

  3. You need to be able to present data in a meaningful way by using tools such as Excel or Power BI.

    您需要能够使用Excel或Power BI等工具以有意义的方式显示数据。

  4. You need to understand statistics, and its role in gaining insights from data.

    您需要了解统计信息及其在从数据中获取见解中的作用。

  5. You need to know how to use a statistical programming language such as R or Python.

    您需要知道如何使用统计编程语言,例如R或Python。

  6. You need to be able to perform data transformation, cleansing and some statistical analysis

    您需要能够执行数据转换,清理和一些统计分析

  7. You must understand data science concepts such as machine learning, algorithms , conditional probability etc

    您必须了解数据科学概念,例如机器学习,算法,条件概率等

  8. You must be able to create machine learning models, and how to evaluate them

    您必须能够创建机器学习模型以及如何评估它们

  9. You must be able to use machine learning to generate predictions and solve problems

    您必须能够使用机器学习来生成预测并解决问题

  10. You must learn how to use tools such as Microsoft Azure HDInsight , Scala, Spark etc

    您必须学习如何使用Microsoft Azure HDInsight,Scala,Spark等工具

I know this is quite daunting. But it is achievable with some hard work and dedication. And luckily there are now multiple resources available to help you on your quest to become a Data Scientist.

我知道这很艰巨。 但是通过一些努力和奉献是可以实现的。 幸运的是,现在有多种资源可帮助您寻求成为数据科学家。

那么,如何向准雇主证明我现在是数据科学家呢? (So how do prove to a prospective employer that I am now a Data Scientist?)

Microsoft recognizes that there is an extreme shortage of data scientists and as such has embarked on a mission to facilitate the study of Data Science for those who want to embrace this new exciting career opportunity.

Microsoft认识到数据科学家的极端短缺,因此已经开始执行一项使命,即为那些希望利用这一新的令人兴奋的职业机会的人们提供便利的数据科学研究。

As such they have launched the Microsoft Professional Degree in Data Science which will run for the first time on the 22nd of August 2016.

因此,他们已经推出了微软专业学位在科学数据,这将是第一次2016年八月22 运行。

These courses have been designed by employers and collaboration of top universities such as Columbia and Harvard and will be available at EdX.com

这些课程是由雇主和哥伦比亚和哈佛等顶尖大学的雇主设计的,可在EdX.com上获得。

The degree program which is available on edX.com consists out of 4 units:

edX.com上提供的学位课程包括4个单元:

  • The Fundamentals

    基础知识

    This is where you will learn the basics, such as querying data and visualizing it. There are 3 compulsory courses in this unit and 1 elective where you can choose between using Excel or PowerBI

    您将在这里学习基础知识,例如查询数据和对其进行可视化。 本单元共有3门必修课和1门选修课,您可以在其中使用Excel或PowerBI进行选择

  • Core Data Science

    核心数据科学

    In this unit you will learn how to use a statistical programming language. You can choose between Python or R

    在本单元中,您将学习如何使用统计编程语言。 您可以选择Python或R

  • Applied Data Science

    应用数据科学

    In this unit you will learn more advanced techniques using Python or R to be able to extract meaningful insights from your data.

    在本单元中,您将学习使用Python或R的更高级的技术,以便能够从数据中提取有意义的见解。

  • A Cortana Intelligence Competition

    Cortana情报竞赛

    Finally you get to prove your recently acquired skills by completing a real world project which will be scored and graded, and ultimately award you your degree in Data Science.

    最后,您将通过完成一个实际项目来证明您最近获得的技能,该项目将进行评分和评分,并最终授予您数据科学学位。

结论 (Conclusion)

Microsoft estimates that there are in the region of 1.5 million jobs available for Data Scientists. Looking at the skills required to become a Data Scientist can take the wind out of your sales. But luckily various universities and companies have recognized the shortage of skills and have started programs to bridge this gap.

微软估计,数据科学家可以提供150万个工作岗位。 查看成为数据科学家所需的技能可以消除您的销售。 但是幸运的是,各种大学和公司已经认识到技能的不足,并已经启动了弥合这一差距的计划。

Microsoft themselves are offering a degree program which has been developed by experts and academics in the industry, which will open the doors for many who aspire to become data scientists.

Microsoft本身正在提供由该行业的专家和学者开发的学位课程,这将为许多渴望成为数据科学家的人打开大门。

参考文献: (References: )

  • Microsoft Data Science User Group Community Newsletter Microsoft数据科学用户组社区新闻稿
  • Data Science 数据科学
  • Microsoft Professional Degree in Data Science 微软数据科学专业学位
  • Data Science Curriculum from Microsoft Microsoft的数据科学课程

翻译自: https://www.sqlshack.com/10-things-need-know-become-data-scientist/

如何成为数据科学家

如何成为数据科学家_成为数据科学家需要了解的10件事相关推荐

  1. 自学成为程序员_成为程序员不需要的10件事

    自学成为程序员 Do you have what it takes to become a programmer? Chances are, you will base your answer on ...

  2. 推荐!关于学习数据科学的10件事

    ↑↑↑关注后"星标"Datawhale 每日干货 & 每月组队学习,不错过 Datawhale干货 编译:张峰,Datawhale成员 我经常在我的YouTube频道Da ...

  3. 在东京大学感受_我们想在东京在线游戏展上看到的10件事

    在东京大学感受 The Tokyo Game Show will take place online this year, from Sept. 23–27, but there are still ...

  4. 如何成为数据科学家_成为数据科学家的5大理由

    如何成为数据科学家 目录 (Table of Contents) Introduction介绍 Variety of Skills各种技能 Uniqueness独特性 Impact影响力 Remote ...

  5. 如何成为数据科学家_成为数据科学家需要了解什么

    如何成为数据科学家 Data science is one of the new, emerging fields that has the power to extract useful trend ...

  6. 什么是数据科学家_为什么数据科学家应该使用功能?

    什么是数据科学家 第一件事第一 (First Things First) I've been working as a Software Engineer even before starting t ...

  7. 趣味数据故事_坏数据的好故事

    趣味数据故事 Meet Julia. She's a data engineer. Julia is responsible for ensuring that your data warehouse ...

  8. 数据创造价值_展示数据并创造价值

    数据创造价值 To create the maximum value, urgency, and leverage in a data partnership, you must present th ...

  9. 什么是大数据口子_大数据分析师年薪几十万,学什么专业才能从事大数据?

    近几年,大数据为各个领域带来了全新的变革,大数据的重要性越来越被企业和国家所看到,大数据工作者的需求再次被无限放大,他们的薪资和社会地位也在不断上涨.马云在演讲中就提到,未来的时代将不是IT时代,而是 ...

最新文章

  1. 你不会编程,不是你不行,很有可能是老师教的方法不好。科学家发现:对大脑而言,代码编程与语言学习不同...
  2. 爬虫学习笔记(四)—— urllib 与 urllib3
  3. 学习 - java位运算符
  4. 小米手环导出心率_这个功能有意思,小米11支持指纹检测心率,没有手环也不怕...
  5. 树(5)-----判断两颗树一样或者一棵树是否是另外一颗的子树
  6. shell命令删除昨日的日志_linux定时自动清理日志文件
  7. 出入机房计算机无登记表,三峡大学机房维护管理制度
  8. C语言实现64格棋盘,在第1个方格放1粒小麦、第2个方格放2粒、第3个方格放4粒小麦,第4个方格放8粒小麦、计算出每个方格应放多少小麦,并计算了总数。把计算的小麦总数与世界小麦年产量相比较。
  9. 遇到的几个运放精密整流电路
  10. 苹果官网首页页面设计
  11. dos u盘测试软件,u盘DOS启动盘制作工具(BootFlashDos)
  12. 计算机组成原理运算器实验报告及数据分析,《计算机组成原理》运算器实验报告...
  13. 关于Jetson TX2刷机各种问题(刷机后键盘等等奇葩错误)
  14. 开源Flash游戏引擎PushButton Engine
  15. linux挂移动硬盘命令,linux挂载命令mount及U盘、移动硬盘的挂载
  16. Northleaf扩大业务开发团队,任命Chris O’Connor 为澳大利亚和新西兰地区董事总经理
  17. send 命令 linux,linux的send命令
  18. 你是胡萝卜,是鸡蛋,还是咖啡豆?
  19. 对(不带头单向不循环)单链表的初步认识
  20. MASM32连接程序时error A2006: undefined symbol : u

热门文章

  1. ethercat如何编程 台达50mc_台达可编程控制器DVP-50MC系列产品介绍
  2. 冰汽朋克侦查机器人_冰汽时代生病机制是什么 寒霜朋克所有机制漏洞一览
  3. QTP 自动化测试--定义变量
  4. Python入门基础之条件判断、循环、dict和set
  5. 第七章部分例题最大乘积
  6. ugui用户定义操作按键
  7. 【TDS学习文档5】IBM Directory schema的管理3——attributes
  8. 讨论记录:求大于一个时间段的最大平均积分,O(n)时间实现
  9. webstorm tsx语法中,使用注释后,发现由红色error
  10. Linux---进程的基本概念