数据科学还是计算机科学

什么是数据科学? (What is data science?)

Well, if you have just woken up from a 10-year coma and have no idea what is data science, don’t worry, there’s still time. Many years ago, statisticians had some pretty good ideas for analysing data and getting insights from it, but they lacked the computational power to do it, so their hands were tied. Until one day, when computers managed to catch up with those guys, and made all their dreams come true. All of a sudden, we not only had more data available than ever in history, but we also had powerful machines to perform heavy calculations on this data, allowing statisticians to try out all these new algorithms. Data science is the hip daughter born from this marriage between statistics and computer science. In other words, it is the science of extracting useful patterns from data sets by use of computer power.

好吧,如果您刚从十年昏迷中醒来,不知道什么是数据科学,请不要担心,还有时间。 许多年前,统计学家在分析数据和从中获取见解方面有一些相当不错的主意,但他们缺乏计算能力,因此束手无策。 直到一天,计算机都赶上了这些家伙,并使所有梦想成真。 突然之间,我们不仅拥有比以往任何时候都多的可用数据,而且还拥有功能强大的机器来对这些数据进行大量计算,从而使统计学家可以尝试所有这些新算法。 数据科学是统计学和计算机科学之间的结合而生的时髦女儿。 换句话说,这是通过使用计算机功能从数据集中提取有用模式的科学。

它是干什么用的? (What is it used for?)

One of the reasons data science is so popular nowadays is the number of possible applications that are emerging.

当今数据科学如此流行的原因之一是正在出现的可能的应用程序数量。

市场营销和销售 (Marketing and sales)

A typical use case for data science in marketing is product recommendation. When you check out a product on Amazon and they tell you there’s another product you might like, there is an algorithm behind that recommendation that thinks you will like those products based on what other customers who also saw that product actually bought.

市场营销中数据科学的典型用例是产品推荐。 当您在Amazon上查看某商品时,他们告诉您可能还会有另一种商品时,该建议背后有一个算法,该算法会根据其他顾客实际购买的商品来认为您会喜欢这些商品。

金融 (Finance)

The most common way that banks use data science methods is for credit risk analysis: back in the day, when someone asked for a loan, usually the banker took a good look at their financial record to decide whether to do it or not. Nowadays, there are sophisticated statistical models that are constantly updated and give a good estimated probability of default, making the whole process a lot faster and more reliable.

银行使用数据科学方法的最常见方法是进行信用风险分析:过去,当有人要求贷款时,银行家通常会仔细查看其财务记录,以决定是否这样做。 如今,有复杂的统计模型可以不断更新,并且可以很好地估计违约概率,从而使整个过程变得更快,更可靠。

卫生保健 (Healthcare)

Healthcare is one of the most promising industries when it comes to data science. There is a lot of data being generated by connected wearables such as smartwatches, including calories spent, miles walked and heartbeats. One of the possible applications is tracking variables that can help explain some diseases, and even remind you to go see a doctor if you present a behavior that might indicate a health issue.

就数据科学而言,医疗保健是最有前途的行业之一。 连接的可穿戴设备(例如智能手表)会生成大量数据,包括所消耗的卡路里,行走的距离和心跳。 一种可能的应用是跟踪变量,这些变量可以帮助解释某些疾病,甚至提醒您如果出现可能表明健康问题的行为,请去看医生。

它回答什么问题? (What questions does it answer?)

We can split data science tasks into two main groups: supervised vs. unsupervised learning

我们可以将数据科学任务分为两大类:有监督与无监督学习

Image by author)作者提供的图片)

监督学习 (Supervised learning)

Supervised learning comprises all tasks for which we have a target variable, that is, some feature in our data that we already know we want to predict. For example, if we want to explain house prices based on their characteristics (such as number of rooms and floors), or if we want to predict the likelihood that a customer will stop using our services.

监督学习包括我们具有目标变量的所有任务,即我们已经知道要预测的数据中的某些功能。 例如,如果我们要根据房价的特征 (例如房间和楼层数)来解释房价 ,或者我们要预测客户停止使用我们的服务的可能性。

无监督学习 (Unsupervised learning)

These are the tasks for when we are not sure of the question we are asking. A typical case is clustering tasks, when we just want to find patterns in the data, not necessarily related to one specific variable (customer segmentation, for instance).

当我们不确定所要提出的问题时,这些就是这些任务。 一种典型的情况是群集任务,当我们只想在数据中查找模式时,不一定与一个特定变量(例如客户细分)相关。

是谁啊 (Who does it?)

Besides the knowledge required in statistics and computer science, data science also calls for business awareness: no matter how good your algorithms are, they will be useless if they are not applicable in that domain. People who work with data usually fall into three categories, depending on which one of those three areas of expertise they are more focused on:

除了统计和计算机科学所需的知识外,数据科学还要求提高商业意识:无论您的算法有多出色,如果它们不适用于该领域,它们将毫无用处。 处理数据的人员通常分为三类,具体取决于他们更专注于这三个专业领域中的哪一个:

数据分析师 (Data analyst)

Sometimes also called business analyst, this guy knows how to talk to people who don’t work directly with data. He’s usually in charge of translating business needs into data requirements (and data insights into business recommendations). He has an overall understanding of the main data science algorithms, and usually has really good skills in data visualization.

有时也称为业务分析师,这个人知道如何与不直接使用数据的人交谈。 他通常负责将业务需求转换为数据需求(以及将数据洞察转换为业务建议)。 他对主要的数据科学算法有全面的了解,并且通常在数据可视化方面具有非常好的技能。

数据工程师 (Data engineer)

This is the person who makes sure the data is collected from all its sources, integrated almost seamlessly into the company’s tech environment and that all the algorithms developed turn well and fast. They almost always come from a tech background, and sometimes have to create dedicated tools to display the data processes, especially if they are to be shared with other stakeholders in the company.

该人员负责确保从所有来源收集数据,几乎无缝地将其集成到公司的技术环境中,并且确保所开发的所有算法都能快速好转。 它们几乎总是来自技术背景,有时必须创建专用工具来显示数据过程,尤其是要与公司中的其他利益相关者共享它们时。

数据科学家 (Data scientist)

As you can guess from the name, this guy has a deeper understanding of the way most algorithms operate, and which are the best ones for each situation. They probably know more about statistics than the data analyst and the data engineer, but less about the ins and outs of the business or of the process industrialisation. Some companies prefer to hire PhD’s for this position, but it is not always the case.

您可能会从名字中猜到,这个家伙对大多数算法的运行方式有更深入的了解,并且每种情况下最好的算法。 他们可能比数据分析师和数据工程师对统计信息了解更多,但对业务或流程工业化的来龙去脉了解较少。 一些公司更愿意聘请博士学位来担任这一职位,但并非总是如此。

去哪儿了 (Where is it going?)

In the next few years, we will see much progress in many different domains. By using data, cities will be able to better manage their traffic, their energy consumption and even their police units allocation. By the use of wearables, we’ll be able to exercise, eat and sleep better. And there might be many other possibilities of which we haven’t even thought of.

在接下来的几年中,我们将在许多不同的领域看到巨大的进步。 通过使用数据,城市将能够更好地管理其交通,能源消耗甚至警力分配。 通过使用可穿戴设备,我们将能够更好地运动,饮食和睡眠。 而且可能还有许多其他我们甚至没有想到的可能性。

However, we will also find out that not everything can be improved with data, and we will soon find out where this limit lies. There will always be an important random component in every human activity or natural phenomenon that will never be tracked by any machine learning algorithm, no matter how sophisticated it is.

但是,我们还将发现并非所有数据都可以改善,而且我们很快就会发现此限制在哪里。 在任何人类活动或自然现象中,总会有一个重要的随机成分,无论它多么复杂,都不会被任何机器学习算法跟踪。

This data-driven culture might also cause some important behavioural changes. People are starting to realize how much of their personal lives is being tracked by big companies and the government, and most do not seem to enjoy it. This might lead people to voluntarily downgrade their tech devices, use tools to prevent data collection, and even reduce their overall technology usage. Governments are already aware of these concerns, and regulation is getting stricter all over the world when it comes to people’s privacy. Let’s see in the years to come how this will shape society (the Black Mirror series offer interesting insights into these possibilities).

这种由数据驱动的文化也可能导致一些重要的行为变化。 人们开始意识到大公司和政府正在追踪他们多少个人生活,而且大多数人似乎并不喜欢它。 这可能会导致人们自愿降级其技术设备,使用工具来防止数据收集,甚至降低其整体技术使用率。 各国政府已经意识到了这些担忧,并且在涉及人们隐私的世界范围内,监管越来越严格。 让我们来看看未来几年这将如何塑造社会(《黑镜》系列为这些可能性提供了有趣的见解)。

怎么做? (How to do it?)

If you want to learn more about it, I recommend the MIT Press Essential Knowledge series book “Data Science”, by John D. Kelleher and Brendan Tierney. It is a very good introduction to the subject, without getting too technical, to help you see if data science is really for you.

如果您想了解更多信息,我建议由John D. Kelleher和Brendan Tierney撰写的麻省理工学院出版社基础知识丛书“数据科学”。 这是对该主题的很好的介绍,并且没有太多的技术知识,可以帮助您了解数据科学是否真的适合您。

Next in line is “Data Science for Business” by Foster Provost and Tom Fawcett. This one is more focused on business applications and it goes deeper into the details of the algorithms. It will give you a really good grasp of all the possibilities enabled by data-driven decision making.

接下来的是Foster Provost和Tom Fawcett撰写的“商业数据科学”。 这是更专注于业务应用程序,它更深入地介绍了算法的细节。 它将使您真正掌握数据驱动的决策制定所带来的所有可能性。

Then, once you got the basics covered, it’s time to study for real: you will almost certainly need to learn to code (if you don’t know it already). The main languages you should focus on are SQL and R or Python. The first one is used to querying databases to extract the data you need, in the right shape. The other two are used for applying the algorithms and creating plots. R was created with a focus on statistics, whereas Python is a more general programming language. To start with, just choose one of the two to concentrate your efforts and, if needed, learn the other one later on.

然后,一旦您掌握了基础知识,就可以学习真实的东西了:您几乎肯定需要学习编码(如果您还不知道的话)。 您应该关注的主要语言是SQL和R或Python。 第一个用于查询数据库,以正确的形式提取所需的数据。 其他两个用于应用算法和创建图。 R的创建侧重于统计数据,而Python是一种更通用的编程语言。 首先,只需选择两者之一以集中精力,如果需要,稍后再学习另一种。

A good way to start practicing your skills is Kaggle.com, where you can play with toy datasets and take part into real competitions. It will help you put your knowledge to test and also build a portfolio of your own. However, keep in mind that eventually, you will need to work with real-life cases, it’s a different beast.

Kaggle.com是开始练习技能的一个好方法,您可以在其中玩玩具数据集并参加真实的比赛。 这将帮助您测试知识,并建立自己的投资组合。 但是,请记住,最终,您将需要处理实际案例,这是另一种野兽。

结论 (Conclusion)

Now that you know some of the data science lingo, you are able to go out there and do your own research. The amount of available resources is pretty much endless, and there’s new information coming out every day, so make sure you are always up to date on the new methods and possibilities.

既然您已经了解了一些数据科学术语,那么您就可以在那里进行自己的研究。 可用资源的数量几乎是无穷无尽的,每天都有新的信息出现,因此请确保您始终了解新的方法和可能性。

翻译自: https://towardsdatascience.com/data-science-101-99e34bea86c

数据科学还是计算机科学


http://www.taodudu.cc/news/show-994940.html

相关文章:

  • js有默认参数的函数加参数_函数参数:默认,关键字和任意
  • 相似邻里算法_纽约市-邻里之战
  • 数据透视表和数据交叉表_数据透视表的数据提取
  • 图像处理傅里叶变换图像变化_傅里叶变换和图像床单视图。
  • 滞后分析rstudio_使用RStudio进行A / B测试分析
  • unity3d 可视化编程_R编程系列:R中的3D可视化
  • python 数据科学 包_什么时候应该使用哪个Python数据科学软件包?
  • 熊猫tv新功能介绍_您应该知道的4种熊猫绘图功能
  • vs显示堆栈数据分析_什么是“数据分析堆栈”?
  • 广告投手_测量投手隐藏自己的音高的程度
  • python bokeh_提升视觉效果:使用Python和Bokeh制作交互式地图
  • nosql_探索NoSQL系列
  • python中api_通过Python中的API查找相关的工作技能
  • 欺诈行为识别_使用R(编程)识别欺诈性的招聘广告
  • nlp gpt论文_GPT-3:NLP镇的最新动态
  • 基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集
  • 划痕实验 迁移面积自动统计_从Jupyter迁移到合作实验室
  • 数据开放 数据集_除开放式清洗之外:叙述是开放数据门户的未来吗?
  • 它们是什么以及为什么我们不需要它们
  • 机器学习 啤酒数据集_啤酒数据集上的神经网络
  • nasa数据库cm1数据集_获取下一个地理项目的NASA数据
  • r语言处理数据集编码_在强调编码语言或工具之前,请学习这3个基本数据概念
  • 数据迁移测试_自动化数据迁移测试
  • 使用TensorFlow概率预测航空乘客人数
  • 程序员 sql面试_非程序员SQL使用指南
  • r a/b 测试_R中的A / B测试
  • 工作10年厌倦写代码_厌倦了数据质量讨论?
  • 最佳子集aic选择_AutoML的起源:最佳子集选择
  • 管道过滤模式 大数据_大数据管道配方
  • 用户体验可视化指南pdf_R中增强可视化的初学者指南

数据科学还是计算机科学_数据科学101相关推荐

  1. 数据科学还是计算机科学_您应该拥有数据科学博客的3个原因

    数据科学还是计算机科学 "Start a Blog to cement the things you learn. When you teach what you've learned in ...

  2. 数据科学生命周期_数据科学项目生命周期第1部分

    数据科学生命周期 This is series of how to developed data science project. 这是如何开发数据科学项目的系列. This is part 1. 这 ...

  3. 数据科学的发展_数据科学的发展与发展

    数据科学的发展 There's perhaps nothing that sets the 21st century apart from others more than the concept o ...

  4. 数据分析模型和工具_数据分析师工具包:模型

    数据分析模型和工具 You've cleaned up your data and done some exploratory data analysis. Now what? As data ana ...

  5. 数据增强 数据集扩充_数据扩充的抽象总结

    数据增强 数据集扩充 班级分配不均衡的创新解决方案 (A Creative Solution to Imbalanced Class Distribution) Imbalanced class di ...

  6. 数据归一化处理方法_数据预处理:归一化和标准化

    1. 概述 数据的归一化和标准化是特征缩放(feature scaling)的方法,是数据预处理的关键步骤.不同评价指标往往具有不同的量纲和量纲单位,这样的情况会影响到数据分析的结果,为了消除指标之间 ...

  7. python数据收集整理教案_数据收集整理教案讲解学习

    一.数据收集整理 第一课时 教学目标 初步体验数据收集. 整理. 描述的过程, 会用分类数数的方法将数据整理成 简单的统计表, 初步认识统计表, 能正确填写统计表, 能从中获得简单统计的结 果. 通过 ...

  8. 数据中台 画像标签_数据中台实战:如何通过标签平台圈出产品高价值用户?...

    这是我的好朋友华仔的文章,华仔是<数据中台实战>的作者,曾任职科大讯飞,现在是富力环球商品贸易港数据中台的产品负责人,他的公众号:改变世界的产品经理 写了很多有关数据中台.产品经理相关的原 ...

  9. 大屏数据可视化源码_数据可视化大屏快速入门

    O 数据可视化的好处 重要的见解往往隐藏在数据之中,它们有助于推动业务发展.但问题在于,只是凭借原始数据,无法总是洞悉真相.当看到数据以可视化形式呈现时,格局.关联和其他会心时刻便浮现出来,而单纯查看 ...

最新文章

  1. c语言编程课程心得,c语言编程课程设计心得.docx
  2. Open Harmony移植:build lite编译构建过程
  3. python3 open打开文件_Python3基础 file open 打开txt文件并打印出全文
  4. oracle excel vba6,vba6.dll下载
  5. [ 物联网篇 ] ESP32 开发板测试亚马逊语音助手Alexa
  6. 关于浏览器及其内核以及什么是浏览器兼容性
  7. 自学mysql教程 资料_数据库MYSQL,自学,命令,教程。
  8. Hacking Team泄露数据表明韩国、哈萨克斯坦针对中国发起网络攻击
  9. python甜橙歌曲音乐网站平台源码
  10. 几年工作之后“十句职场密语”
  11. Java中xml转义字符和gt,gte,lt,lte缩写
  12. Scratch3.0——助力新进程序员理解程序(一、基础使用与运动)
  13. 华南理工计算机应用在线答题,华南理工大学计算机应用基础随堂练习题目及答案...
  14. Aurora 论坛图片下载
  15. Hystrix服务降级、熔断-微服务(十)
  16. 万里长征第一步(非常重要) —— 如何愉快的阅读本小册
  17. 给定一个任意的大写字母A~Z,转换为小写字母。
  18. Log4j修复——Vmware Horizon
  19. 离散数学,自然推理系统,基于假言推理,不能使用消解法的自然推理系统
  20. 2018年华为实习生招聘三道编程题

热门文章

  1. Linux系统编程----8(竞态条件,时序竞态,pause函数,如何解决时序竞态)
  2. 【C++学习之路】第二章——C++基础语法学习(1)之黑客攻击系统
  3. c 加密 java解密错误_java解密出错
  4. 无法获取 vmci 驱动程序版本: 句柄无效
  5. 解决ionic3 android 运行出现Application Error - The connection to the server was unsuccessful
  6. overlay 如何实现跨主机通信?- 每天5分钟玩转 Docker 容器技术(52)
  7. 【NOI2014】起床困难综合症 贪心
  8. Android LBS系列05 位置策略(一)
  9. TechEd2009
  10. MonoRail学习-介绍篇(一)