ai人工智能的数据服务

These are my answers to questions about AI and its business practice, discussed among ~200 of my fellow classmates from IIT Bombay. They are modified slightly to protect privacy, to remove specific references and for better narration. This is the first part of a series of these posts. The second part discusses insights about ‘Why Doesn’t AI Work?’ and the third about ‘AI Hacks That Do Work.’ I will keep editing this header to include links to other parts.

这些是我对AI及其商业实践问题的解答,在IIT孟买的200余位同学中进行了讨论。 对它们进行了稍微的修改以保护隐私,删除特定的参考文献并进行更好的叙述。 这是这些文章系列的第一部分。 第二部分讨论有关“ AI为什么不起作用? ”和“有效的AI骇客”。 '我将继续编辑此标题以包括指向其他​​部分的链接。

数据科学 (Data Science)

At the simplest level data science is just that — a scientific analysis of data. In the fourth grade, when we all learned how to make simple graphs, we had become data scientists already.

在最简单的层面上,数据科学就是这样-对数据的科学分析。 在四年级时,当我们都学习了如何制作简单图形时,我们已经成为数据科学家。

You would think that I am exaggerating to make a point. Well, lookup Microsoft corporate strategy, and its focus on a new product called PowerBI — they are making a massive push on it as a way to cement Windows based systems in enterprises. Then look up demos they have for PowerBI. There is plenty available on YouTube. These demos talk about dash-boarding and how the extremely powerful software can visualize your darkest, deepest data to make excellent plots. And then tell me if a fourth grader can’t make those dashboards.

您可能会认为我在夸大一点。 好吧,查找微软公司的战略,并将其重点放在名为PowerBI的新产品上,他们正在大力推动它,以巩固企业中基于Windows的系统。 然后查找PowerBI的演示。 YouTube上有很多可用的功能。 这些演示讨论仪表板,以及功能强大的软件如何可视化您最黑暗,最深的数据,以绘制出精美的图。 然后告诉我四年级学生是否无法制作这些仪表板。

Of course, there is a lot more to PowerBI than making bar graphs, but the point is that even at that simplest level data science can be very powerful. Add mean and standard deviation to it, and you have covered almost everything in the world of business analytics. Sure, the size of data has bloated recently, particularly because of a take off in deployment of sensors and embedded devices (IoT). Still, your biggest intellectual problem as a data analyst is how to clean the various formats of data, rather than how to process it.

当然,除了制作条形图外,PowerBI还有很多其他功能,但要点是,即使在最简单的水平上,数据科学也可以非常强大。 在其中添加均值和标准差,您几乎涵盖了业务分析领域中的所有内容。 当然,最近数据的大小已经膨胀,特别是因为传感器和嵌入式设备(IoT)的部署取得了腾飞。 不过,作为数据分析师,您最大的智力问题是如何清除各种格式的数据,而不是如何处理它们。

人工智能 (Artificial Intelligence)

There is a small portion of data science world that focuses on using data to write better programs. Here is the intuition behind it. The simplest programs are ‘Do X’. They are very powerful and make up the foundation of the programming world.

数据科学界有一小部分致力于使用数据编写更好的程序。 这是其背后的直觉。 最简单的程序是“ Do X”。 它们非常强大,并构成了编程世界的基础。

Smarter programs say ‘If A do X else do Y.’ I don’t have to explain this, except to say that almost all programming in the last century, and most of programming in this, is as simple as that. Rules engines, and the so-called expert systems are but a set of chained, nested and looped if-else statements.

聪明的程序会说“如果A做X,否则做Y”。 我不必解释这一点,只不过要说上个世纪的几乎所有编程,以及其中的大多数编程就是这么简单。 规则引擎和所谓的专家系统不过是一组链接,嵌套和循环的if-else语句。

The breakthrough behind the field of artificial intelligence started with a simple question — can a machine automatically figure out the condition A in that statement and write these rules itself. We can convert ‘If A do X else do Y’ to ‘c = Cx if A else Cy’ and then depending on the value of c we can perform X or Y. Suddenly this is as simple as a classification problem. If we are given a set of pre-labelled data points, can we find a model, A, which can classify a new data point to Cx or Cy (or one of a number of classes in the generalized case)?

人工智能领域的突破始于一个简单的问题-机器可以自动找出该语句中的条件A并自己编写这些规则。 我们可以将'If A do X else do Y'转换为'c = Cx if A else Cy',然后根据c的值,我们可以执行X或Y。突然之间,这就像分类问题一样简单。 如果给了我们一组预先标记的数据点,我们是否可以找到一个模型A,该模型可以将新数据点分类为Cx或Cy(或广义情况下的多个类之一)?

If we can do that then we don’t have worry about the if-else statements. All we need to do is to get that set of pre-labelled data points, also called training data, run the machine, and go home. We have learnt so many techniques to do classification from the fields of algebra and statistics — Naïve Bayes, logistic regression, decision trees, and what not.

如果我们能够做到这一点,那么我们就不必担心if-else语句。 我们要做的就是获取那组预先标记的数据点(也称为训练数据),运行机器并回家。 我们已经从代数和统计领域学到了很多分类方法,这些方法包括朴素贝叶斯,逻辑回归,决策树以及其他什么都不做。

Congratulations! If you have ever fit a line to some data, you have programmed an artificially intelligent system.

恭喜你! 如果您曾经对某些数据进行过拟合,则已经对人工智能系统进行了编程。

为什么这很重要? (Why is this important?)

So, what’s the big deal? Three things — one, this is a big deal by itself. You have no idea how many artificially intelligent systems seldom use anything more than probabilities. If you want to get more complexity, a popular machine learning algorithm is called Random Forest. It involves making decision trees based on multiple samples of the data, hence the forest, and then taking the mode or the median of the decisions by each of the trees. It’s pure statistics, nothing fancy. However, this is now empowering almost every aspect of human life. Turn anywhere, and it is likely that an intelligent machine like this is helping you along.

那么,有什么大不了的? 三件事-一,这本身就是一件大事。 您不知道有多少个人工智能系统很少使用除概率以外的任何功能。 如果要提高复杂性,一种流行的机器学习算法称为随机森林。 它涉及根据数据的多个样本(即森林)创建决策树,然后根据每棵树的决策模式或中位数进行决策。 纯粹是统计数据,没有幻想。 但是,这现在正在赋予人类生活几乎所有方面的力量。 随处转动,这样的智能机器很可能会帮助您。

神经网络 (Neural Networks)

Second, they figured something called a neural network. Each node in this network is essentially a weighted sum. You take a set of inputs, you weigh each of them and you sum them up. Simple.

其次,他们想出了一种叫做神经网络的东西。 该网络中的每个节点本质上都是一个加权和。 您需要一组输入,对每个输入进行权重,然后进行汇总。 简单。

Let’s make it real. In the fourth year at college one of my friends John (name changed) was really trying to impress this girl, Jane (name changed), who was a co-volunteer at a non-profit called Magic Bus. Magic Bus works for under-privileged children and organizes various camps and events in its efforts. John’s decision tree to go or not to go to an event was simple — if she was coming, John would brave everything and go. Otherwise if the event was a party (vs. a hike or a camp) and it was not raining, John would go.

让我们实现它。 在大学四年级,我的一个朋友约翰(名字更改)确实想打动这个女孩,简(名字更改),她是一家名为Magic Bus的非营利组织的联合志愿者。 Magic Bus为贫困儿童服务,并组织各种营地和活动。 John决定是否参加某项活动的决策树很简单-如果她要来,John会勇敢地勇往直前。 否则,如果活动是聚会(而不是远足或露营),并且没有下雨,约翰会去。

Let’s say a bright-eyed data scientist plotted John’s behavior over the year, he/she could have taken three binary variables, a = whether Jane was going to attend, b = whether the event was a party, and c = whether it was going to rain. It would be very simple to write an equation p = w1.a + w2.b + w3.c, and set a threshold to predict if John was going to that event or not. That is the simple neuron in data science that everyone seems so crazy about. With the right set of weights, it would predicted John’s behavior accurately.

假设某位眼光敏锐的数据科学家绘制了John在一年中的行为,他/她本可以采用三个二进制变量,a =简是否要参加,b =该事件是否为聚会,c =是否要参加下雨。 编写方程p = w1.a + w2.b + w3.c非常简单,并设置一个阈值以预测John是否要去参加该事件。 那是数据科学中的简单神经元,每个人似乎都为之疯狂。 使用正确的权重集,可以准确预测约翰的行为。

Let’s say Jane was also deciding based on weather forecast and the type of the event. Then there are two independent inputs, one hidden layer with a node for her decision (+ two to pass original inputs) and then one node for the final decision. How about whether John was going to wear his new jeans or not — so now we are talking about two nodes in the output layer. You can see how quickly it becomes a network of neurons.

假设Jane还在根据天气预报和事件类型做出决定。 然后有两个独立的输入,一个隐藏层,其中一个节点用于她的决策(+两个通过原始输入),然后一个节点用于最终决策。 约翰是否要穿新牛仔裤呢?所以现在我们要讨论输出层中的两个节点。 您可以看到它很快变成了神经元网络。

The important thing is that we need to find the right set of weights. There are multiple algorithms to automatically detect these weights based on a given set of inputs and corresponding outputs. Something called Gradient Descent rules the roost.

重要的是,我们需要找到正确的权重集。 有多种算法可根据一组给定的输入和相应的输出自动检测这些权重。 叫“梯度下降”的东西统治着栖息地。

It turns out that neural networks can transparently replace most statistical classification algorithms. This is very powerful, because now you can focus on one technique for a wide variety of problems. We should be teaching neural networks in seventh grade instead of linear regression. With one hidden layer between input and output a neural network can also emulate any polynomial relationship given sufficient data. This is called Multi-Level-Perceptron-1 or MLP1.

事实证明,神经网络可以透明地取代大多数统计分类算法。 这非常强大,因为现在您可以专注于解决多种问题的一种技术。 我们应该教七年级的神经网络,而不是线性回归。 在输入和输出之间有一个隐藏层的情况下,神经网络还可以在给定足够数据的情况下模拟任何多项式关系。 这称为Multi-Level-Perceptron-1或MLP1。

深度学习 (Deep Learning)

Does anyone remember Newton and his iterative method of finding answers? For complex equations of the type x = f(x), with x on both sides, you would assume a value of x for the RHS, compute x on the LHS and then use that value for the RHS, and so on. You would continue till the difference between the values of x in subsequent iterations was near zero.

有人记得牛顿和他的迭代寻找答案的方法吗? 对于x = f(x)类型的复杂方程,在x的两边都带有x,您将假设RHS的值为x,在LHS上计算x,然后将该值用于RHS,依此类推。 您将继续操作,直到后续迭代中的x值之差接近零为止。

Same deal here — why do we have to decide directly on the inputs? We will find interim values, and then use those values to find the next set of interim values, and after doing that 100 times will we decide on the output. In other words, you are adding more and more layers of neurons between the input and the output layer. This is called a Deep Neural Network, and process of training it is called Deep Learning. It is very useful for non-linear classifications, like predicting whether a set of pixels represents a nose.

同样的事情-为什么我们必须直接决定输入? 我们将找到临时值,然后使用这些值来查找下一组临时值,并在执行100次之后决定输出。 换句话说,您正在输入和输出层之间添加越来越多的神经元层。 这称为深度神经网络,其训练过程称为深度学习。 这对于非线性分类非常有用,例如预测一组像素是否代表鼻子。

复杂的AI模型 (Complex AI Models)

Here is the third big deal with AI, and it’s not that intuitive. To make any neural network work we must train it and get the right set of weights in the network. Turns out that the weights itself contain a lot of value.

这是人工智能的第三大难题,并不是那么直观。 为了使任何神经网络正常工作,我们必须对其进行训练并在网络中获得正确的权重。 事实证明,权重本身包含很多价值。

There is a very popular model in NLP called Word2Vec. It comes up with a set of numbers (a vector) for each word. Vectors for words with similar meaning will have numbers very close to each other. You can also do things like [King] — [Man] + [Woman] and get the vector for [Queen]. These vectors in fact are the weights from certain neural networks built for some task like predicting the next word.

NLP中有一个非常流行的模型,称为Word2Vec。 它为每个单词提供一组数字(一个向量)。 具有相似含义的单词的向量将具有彼此非常接近的数字。 您还可以执行[国王] — [男人] + [女人]之类的操作,并获取[女王]的向量。 实际上,这些向量是为某些任务(例如预测下一个单词)构建的某些神经网络的权重。

Once scientists figured out how the weights in neural networks carry so much value, they went crazy. Many of the most advanced models are a stack of neural networks where the weights are passed from one to another to get very sophisticated things done.

一旦科学家们弄清楚了神经网络中的权重如何具有如此高的价值,他们就疯了。 许多最先进的模型是一堆神经网络,其中权重从一个传递到另一个,以完成非常复杂的工作。

承诺 (The Promise)

The promise is insane. Now, as long as you have sufficient data you can teach a machine to program itself and learn most sophisticated, convoluted, non-linear relationships. The beauty is that you don’t have to understand those relationships yourselves, let alone articulate them. You can now afford to be completely ignorant. It’s not hard to imagine in the near future machines will be collecting all the data and making all the predictions, while humans will be focused on making smarter machines. Take any problem, select some [hyper-]parameters of a neural network, go to bed. Now, in fact, they have begun automating the process of selecting these hyper-parameters as well.

诺言是疯狂的。 现在,只要您有足够的数据,您就可以教机器进行编程并学习最复杂的,复杂的,非线性的关系。 美丽之处在于您不必自己理解那些关系,更不用说表达它们了。 您现在可以负担得起完全无知。 不难想象,在不久的将来,机器将收集所有数据并做出所有预测,而人类将专注于制造更智能的机器。 遇到任何问题,选择神经网络的某些[超]参数,然后上床睡觉。 现在,实际上,他们也已经开始自动选择这些超参数。

That is the promise. The reality? Coming up.

那是诺言。 现实? 接下来。

Next part in the series: ‘Why Doesn’t AI Work?

本系列的下一部分:' AI为什么不起作用? '

翻译自: https://medium.com/ai-in-plain-english/ai-and-data-science-for-dummies-chat-with-classmates-359e18dcc529

ai人工智能的数据服务

http://www.taodudu.cc/news/show-7133739.html

相关文章:

  • 黑马前端气温案例_黑马指标使用吞吐量的案例
  • xgboost 调参经验
  • 基于28335实现的旋变软解码 1、在0-360°的范围内,与TI方案的偏差非常小,平均偏差最大为0.0009弧度左右
  • 基于28335实现的旋变软解码 利用三角函数积化和差公式将旋变输出信号分解为高频和低频两部分… 锁相环
  • SC2161 国产旋变解码芯片
  • 英飞凌TC37X-TC38X-系列之电机旋变软解码
  • AD2S1205替代 国产旋变数字转换器—MS5905P
  • python矩阵计算 gpu_加速GPU与CPU的矩阵运算
  • 视频系统矩阵服务器,基于树莓派的视频矩阵控制服务器设计
  • <微机原理>[汇编语言]-[实验八]矩阵键盘应用实验
  • OpenGL矩阵学习
  • 指纹识别自学习
  • 速进!点击查看昂视接受《视觉系统设计》现场采访
  • 精彩纷呈金秋数码展,颐高数码国际电脑节开幕在即
  • 计算机视觉——全景图像拼接(作业四)
  • K210视觉体验—摄像头应用
  • 计算机视觉——实验四-全景图像拼接
  • 一对一,屏对屏,菊风远程同屏解决方案,助力多行业实现数字化协同
  • 最全的屏幕适配方案
  • 计算机视觉实验三-全景图像拼接
  • 迪文4.3寸宽视角COF智能屏新品发布
  • 浅谈屏幕适配
  • CSDN 签到在哪里
  • 基于PHP的班级签到管理系统
  • 代码随想录打卡第一天
  • 代码随想录打卡16天
  • 代码随想录打卡14天
  • 代码随想录 打卡1
  • 代码随想录打卡第二天
  • 代码随想录打卡18天

ai人工智能的数据服务_AI和数据科学的傻瓜与同学聊天相关推荐

  1. ai人工智能的数据服务_AI如何帮助提高企业数据质量

    ai人工智能的数据服务 Hardly anyone relying on data can say their data is perfect. There is always that differ ...

  2. ai人工智能将替代人类_AI和人类如何优化空气污染监测

    ai人工智能将替代人类 空气污染监测 (Air-pollution monitoring) Air pollution is responsible for 4.2 million deaths pe ...

  3. ai人工智能将替代人类_AI再次击败人类

    ai人工智能将替代人类 内容丰富 (Informative) Let's take a stroll down memory lane and take a look at the times whe ...

  4. 决策引擎服务平台blaze_是Blaze数据服务还是LiveCycle数据服务?

    决策引擎服务平台blaze 摘要 关于数据服务的不同版本的文章很多,但似乎没有任何文章阐明如何在不同版本之间进行选择. 同样,也没有人详细讨论端点和通道如何影响应用程序性能. 尽管Adobe指的是Da ...

  5. 基于中台的公共图书馆数据服务研究

    基于中台的公共图书馆数据服务研究 摘 要 本文以中台相关概念为切入点,讨论利用中台相关技术,收集图书馆的多源数据,提高公共图书馆数据资源的管控能力.中台的作用不仅仅是将图书馆中的各种数据进行汇聚,而且 ...

  6. 专访爱数智慧CEO张晴晴:数据服务刚性需求,获客难度不大

    本公众号已经改版,推出了线上线下课程,并且推出免费2个月广告服务业界优质产品. 提到数据需求,张晴晴并不认同,明年整个市场的数据需求将会是今年的5-6倍这一说法.她解释道:"当前市场的核心问 ...

  7. 数据之道读书笔记-06面向“自助消费”的数据服务建设

    数据之道读书笔记-06面向"自助消费"的数据服务建设 数据底座建设的目标是更好地支撑数据消费,在完成数据的汇聚.整合.联接之后,还需要在供应侧确保用户更便捷.更安全地获取数据.一方 ...

  8. Adhesive框架系列文章--Mongodb数据服务模块使用(上)

    之前介绍的应用程序信息中心模块中所有日志.异常.性能和状态数据都依赖Mongodb数据服务,Mongodb数据服务的接口也简单的可以: public interface IMongodbInsertS ...

  9. 海量大数据大屏分析展示一步到位:DataWorks数据服务对接DataV最佳实践

    概述 数据服务(https://ds-cn-shanghai.data.a... 是DataWorks产品家族的一员,提供了快速将数据表生成API的能力,通过可视化的向导,一分钟"零代码&q ...

最新文章

  1. eclipse team 没有svn
  2. python笔记基础-python学习笔记(一)python简介和基础
  3. SAP系统与MES系统的数据协同技术方案
  4. 在同一个Linux上配置多个git账户
  5. NPOI office操作
  6. [手把手教]discuzX2插件制作教程__最菜鸟级别的入门坎 【二】
  7. C#LeetCode刷题之#104-二叉树的最大深度​​​​​​​(Maximum Depth of Binary Tree)
  8. 通过组织发展来推动组织变革
  9. 使用虚拟机VMware12定制安装redhat6企业版
  10. Redis 性能问题排查:slowlog 和排队延时
  11. 【转】90后还过五四吗?这些“脸熟”的过来人送给青年10句忠告
  12. curl: symbol lookup error: curl: undefined symbol: curl_mime_free
  13. 浏览器被hao123拦截
  14. 50、LOLNeRF: Learn from One Look
  15. 芭比波朗品牌的男性市场
  16. excel加水印,由于excel没有加水印功能,实际上操作是将图片丢进去
  17. 方形图片转为圆形图片
  18. 百度地图-删除替换标注
  19. 图神经网络 —— 排列不变函数
  20. 【墨子对战平台】还没连接上墨子推演服务器,再等1秒 解决办法

热门文章

  1. 围观设计模式(4)--接口隔离原则(ISP,Interface Segregation Principle)
  2. c语言中反斜线的作用,C语言中反斜杠的使用
  3. Android 通过python实现自动化构建打包上传加固
  4. MAC 系统下使用邮件客户端登录腾讯企业邮箱失败问题
  5. 完美世界运营培训生内推笔试题
  6. (附源码)ssm高校学生宿舍管理系统 毕业设计051443
  7. 坚持 540 天,我有怎样的感受?
  8. 【梯度下降在波士顿房价预测中的应用】
  9. @sun.misc.Contended 伪共享
  10. 9个已开源的GPT4平替分享(附开源代码+论文)