数据库数据过长避免

让我们从一个类比开始 (Let's start with an analogy)

Stick with me, I promise it’s relevant.

坚持下去，我保证这很重要。

If your selling vegetables in a grocery store your business value lies in your loyal customers and your position on the high street that sees a high footfall. You probably don’t have a fancy dandy shop front, it’s just boxes of veg, it’s that and your quality sales staff that sells the veg to the passers-by.

如果您在杂货店里卖菜，您的业务价值就在于您的忠实客户和您在大街上人流量大的位置。您可能没有花哨的花花公子店面，只是一箱蔬菜，是这样，还有您的优质销售人员将蔬菜卖给路人。

One day a salesman from High Tech Veg Retail Solutions Inc comes into your shop. He tells you “cardboard boxes are not efficient and unmanageable”. He has a product that will keep your veg in a locked fridge in the back of the shop, but passers-by could simply ask for cauliflower and it would be whizzed at top speed via conveyer belt to them.

有一天，来自高科技蔬菜零售解决方案公司的推销员走进您的商店。他告诉您“纸箱效率不高且无法管理”。他的产品可以将您的蔬菜放在商店后方的锁冰箱中，但是过路人可以简单地索要花椰菜，然后会通过传送带以极高的速度将菜花打发到他们身上。

It does almost everything, the only downside is that due to the complexity of the machine you will only be able to stock half your current range of veg and by the way, all the veg will still be stored in cardboard boxes inside the fridge.

它几乎可以完成所有工作，唯一的缺点是，由于机器的复杂性，您将只能储备当前范围的蔬菜的一半，而且，所有的蔬菜仍将存储在冰箱内的纸板箱中。

On the upside, you can get rid of your quality staff and employ cheaper staff with fewer skills.

从好的方面来看，您可以摆脱高素质的员工，而聘用技能较少的廉价员工。

I’m sure you would send him on his way to find another victim.

我相信您会派他去寻找另一名受害者。

您的商业价值是知识产权 (Your business value is Intellectual Property)

If your reading this article, then you are either considering AI and ML or are already using it and have heard that there is a much better commercial data science platform available.

如果您阅读本文，那么您正在考虑使用AI和ML或已经在使用AI和ML，并且听说有一个更好的商业数据科学平台可用。

In the remainder of this article, I’m going to explain why you would be making a big mistake investing in a commercial data science solution.

在本文的其余部分中，我将解释为什么您在商业数据科学解决方案上进行投资会犯一个大错误。

开源纸箱 (Open source cardboard boxes)

Those free cardboard boxes that are easily accessible on the shop front are your Open Source AI and ML toolsets, freely available and easily accessible.

那些在商店前部容易获得的免费纸板箱是您的开源AI和ML工具集，可免费获得且易于获得。

They don’t hide anything, you can see everything you put in and you can stand by the output, even for safety-critical applications because you can describe how you got your results.

它们不会隐藏任何内容，您可以看到所输入的所有内容，并且可以支持输出，即使对于安全性至关重要的应用程序也是如此，因为您可以描述如何获得结果。

Every available option for squeezing that last 20% out of your model that produces 80% of its value is available to you.

您可以使用每个可用选项来将模型中的最后20％压缩，从而产生其价值的80％。

Any training you need is free or very low cost at least and is easily accessible 24 hours a day on many different web sites.

您需要的任何培训至少都是免费的或非常廉价的，并且每天24小时均可在许多不同的网站上轻松访问。

The most common language adopted by Opensource tools is Python. A language learned at High School, college, and University.

开源工具采用的最常见的语言是Python。在高中，大学和大学学习的一种语言。

带有闪亮贴纸的昂贵纸板箱 (Expensive cardboard boxes with a shiny sticker)

This is what commercial AI and ML platforms offer.

这就是商业AI和ML平台所提供的。

Under the hood, they are employing the same Opensource tools you can access for free. Yes, they have a fancy wrapper around them, a conveyer belt built-in, and a shiny sticker to boot.

在幕后，他们正在使用可以免费访问的相同开源工具。是的，它们周围有精美的包装纸，内置的传送带和引导套。

The only way to access those free tools though, is through the interface the platform provides you with. Its a really pretty interface, but it only gives you access to a fraction of the capability of what the underlying opensource tools are capable of.

但是，访问这些免费工具的唯一方法是通过平台提供的界面。它的界面非常漂亮，但是只允许您访问底层开源工具所能提供的部分功能。

I can’t think of any commercial data science platform that is not employing Opensource tools at its heart.

我想不出任何没有真正使用开放源代码工具的商业数据科学平台。

The 80/20 ruleThe data scientists that could get that last 20% out of a model for you, are now reduced to dragging, dropping, and clicking a mouse and you're losing 80% of your business value. I hear you say, “but the results are much faster on this vendors platform”, OK, so you’re losing 80% of your business value faster!

80/20规则可以为您从模型中获得最后20％收益的数据科学家现在减少为拖放，单击和单击鼠标，您将失去80％的业务价值。我听到你说，“但是在这个供应商平台上，结果更快”，好的，因此您损失了80％的业务价值！

Also, ask yourself why is this vendors platform faster, it’s because that last 20% that gets 80% of the value is not the low hanging fruit. It’s complex, it’s why data scientists dedicate their careers to the subject and its why they are invaluable as data scientists and not mouse clickers

另外，问自己为什么这个供应商平台更快，这是因为最后20％获得80％的价值的原因并不容易。这很复杂，这就是为什么数据科学家将自己的职业奉献给该学科，以及为什么他们作为数据科学家而不是鼠标点击者而具有不可估量的价值

Where is your business value now?Let’s assume that this commercial platform, by some miracle, could get 100% of the value you can get from unrestricted Opensource tools, where is your business value now? It’s locked into this vendor's platform, a platform your spending a huge amount of money on.

您现在的业务价值在哪里？ 让我们假设这个商业平台可以奇迹般地从无限制的开源工具中获得100％的价值，现在您的商业价值在哪里？它已锁定在该供应商的平台中，您在该平台上花费了大量金钱。

You can’t extract your IP, its been converted into a proprietary format. Even if you could reverse engineer their generated code (see you in court), the best you would get is a result that is missing that last 20% and how long did the reverse engineering take you.

您无法提取您的IP，它已转换为专有格式。即使您可以对他们生成的代码进行逆向工程(法庭上见)，您得到的最好结果就是遗漏了最后20％的结果，以及逆向工程花费了您多长时间。

The tail wagging the dogAI and ML are improving all the time. Every few months a new feature comes out that wows the community and offers your business even more potential revenue.

摇摆狗 AI和ML 的尾巴一直在改善。每隔几个月就会发布一项新功能，该功能引起了社区的赞誉，并为您的企业提供了更多的潜在收入。

Your vendor's commercial application and UI is so tightly integrated into the older versions of the Opensource software, that you won’t see that update for another 6 to 12 months. Forget it, six months is a lifetime in AI and ML, you just missed that opportunity.

您供应商的商业应用程序和用户界面是如此紧密地集成到旧版本的开源软件中，以至于再过6至12个月您都不会看到该更新。算了，六个月是AI和ML的生命，您只是错过了这个机会。

Recruitment, retention, and training. Every data scientist you recruit, will, for the most part, come fully trained on the opensource tools that they have been working with for years. Those that are just out of university, will be full of enthusiasm, have fresh ideas. The one thing they all have in common, is they are all experts on the opensource tools sets, that will let them bring their enthusiasm and ideas to reality.

招聘，保留和培训。 您招募的每位数据科学家都将在很大程度上接受他们多年来使用的开源工具的全面培训。那些刚大学毕业的人会充满热情，并有新的想法。他们都有一个共同点，就是他们都是开源工具集的专家，这将使他们将热情和想法变为现实。

Of course, you're going to tell them in the interview to forget all that knowledge they have worked hard to accrue, you have just invested a lot of money on a proprietary system that has half the data science capability they are used to and which they have never heard of before.

当然，您将在面试中告诉他们，他们会忘记他们辛辛苦苦积累的所有知识，您刚刚在专有系统上投入了很多钱，而该专有系统具有他们惯用的数据科学能力的一半，并且他们从未听说过。

The long and short is you will find it hard to recruit staff and impossible to recruit talented staff. Any talented staff you currently have will soon be leaving as well.

总而言之，您将很难招募员工，也很难招募有才能的员工。您目前拥有的所有有才能的员工也将很快离开。

Trust the grassroots. You will very rarely hear a data scientist raving about a commercial data science platform. For that reason, most of the vendors offering these products don’t target the grassroots. They go directly to the senior managers and even CEO looking for a top-down decision. Most CEO’s understand the value of data science, but the details are complex and overwhelming. So when a well-trained salesman scares the living shit out of them with horror stories of Opensource wow’s they tend to believe them.

相信基层。 您很少会听到数据科学家对商业数据科学平台大加赞赏。因此，大多数提供这些产品的供应商都不以基层为目标。他们直接向高级经理甚至首席执行官寻求自上而下的决定。大多数首席执行官都了解数据科学的价值，但细节复杂而压倒性。因此，当一个训练有素的推销员以开放源代码的恐怖故事吓them他们的生活时，他们往往会相信它们。

Talk to your own loyal staff before forcing something on them. Find out what opensource tools they currently use and what could be done better if a small investment was made, or they were given the time to design and implement a more suitable stack. After all, they work in your business, they know your requirements, and I guarantee the costs will be orders of magnitude less than paying for a commercial platform.

在强迫他们之前，先与自己的忠实员工交谈。找出他们当前使用哪些开源工具，如果进行少量投资，或者他们有时间设计和实现更合适的堆栈，则可以做得更好。毕竟，他们在您的企业中工作，知道您的要求，并且我保证成本将比为商业平台支付的费用少几个数量级。

综上所述 (In summary)

If you have got a data science requirement and money to invest, invest it wisely. Invest in talented individuals. Look at how you can make a small investment in infrastructure to get a big payback from the tools they already use. Your skilled staff will make your company more valuable and you will retain 100% of your business IP. You don’t need a high tech cardboard box, the free opensource ones, you already have are the best you can get.

如果您有数据科学方面的要求和资金来进行投资，请明智地进行投资。投资有才华的人。看一下如何在基础架构上进行少量投资，以从他们已经使用的工具中获得丰厚的回报。熟练的员工将使您的公司更有价值，并且您将保留100％的业务IP。您不需要高科技的纸板箱，免费的开源纸板箱，已经是最好的了。

翻译自: https://medium.com/swlh/why-you-should-avoid-commercial-data-science-platforms-6e9c4b5f3596

数据库数据过长避免

查看全文

http://www.taodudu.cc/news/show-997630.html

数据分析团队的价值_您的数据科学团队的价值
第一名数据科学工作冠状病毒医生
简述yolo1-yolo3_使用YOLO框架进行对象检测的综合指南-第二部分
gcp devops_将GCP AI平台笔记本用作可重现的数据科学环境
电力现货市场现货需求_现货与情绪：现货铜市场中的自然语言处理与情绪评分
sap wm内向交货步骤_内向型人在数据科学中成功的五个有效步骤
数据库备份策略分布式_管理优秀的分布式数据团队的4种基本策略
深度学习免费课程_2020年排名前三的免费深度学习课程
机器学习:分类_机器学习基础：K最近邻居分类
将PDF和Gutenberg文档格式转换为文本：生产中的自然语言处理
协方差意味着什么_“零”到底意味着什么？
全栈入门_启动数据栈入门包（2020）
marlin 三角洲_三角洲湖泊和数据湖泊-入门
机器学习建立模型_建立生产的机器学习系统
风能matlab仿真_发现潜力：使用计算机视觉对可再生风能发电场的主要区域进行分类（第1部分）
实验人员考评指标_了解实验指标
nba数据库统计_NBA板块的价值-从统计学上讲
两个链接合并_如何找到两个链接列表的合并点
工程师的成熟模型_数据工程师的成熟度
scrape创建_确实在2分钟内对Scrape公司进行了评论和评分
如何不认识自己
plotly python_使用Plotly for Python时的基本思路
java项目经验行业_行业研究以及如何炫耀您的项目
数据科学 python_适用于数据科学的Python vs（和）R
r怎么对两组数据统计检验_数据科学中最常用的统计检验是什么
深度学习概述_深度感测框架概述
为什么即使在班级均衡的情况下，准确度仍然令人困扰
接受拒绝算法_通过算法拒绝大学学位
为什么用scrum_为什么Scrum糟糕于数据科学
使用集合映射和关联关系映射_使用R进行基因ID映射

数据库数据过长避免_为什么要避免使用商业数据科学平台相关推荐

光滑噪声数据常用的方法_整理一份详细的数据预处理方法
重磅干货,第一时间送达作者:lswbjtuhttps://zhuanlan.zhihu.com/p/51131210 为什么数据处理很重要? 熟悉数据挖掘和机器学习的小伙伴们都知道,数据处理相关的工 ...
数据透视表筛选_筛选器选择中的数据透视图标题
数据透视表筛选 Instead of adding a static title to your Pivot Chart, use a worksheet formula to create a d ...
大数据好还是不好_学python好还是大数据好？想学IT，但有点搞不清方向的人可以看看...
这是我在某平台上看到的一个问题,学IT的话,是学python好还是学大数据好? 首先这个问题不太对,因为大数据和python,从根源上来说是两码事,就像你问我,是学做湘菜好,还是学做打铁好. 所以,学 ...
数据中心安全风控_平安银行Hadoop集群跨数据中心迁移项目告捷项目骨干专访
Hadoop集群跨数据中心迁移平安银行东莞数据中心建成平安银行科技中心零售大数据团队平安银行科技中心科技运营中心群迁告捷经过平安银行科技运营中心和大数据团队的不懈努力,作为平安银行AI战略转 ...
python大数据运维工程师_运维工程师转型大数据怎么样
运维工作没意思,运维没有前途,运维会被取代--让很多的运维工程师感受到前途无"亮",随着资本寒冬的来临,以及各种新技术的不断出现,很多运维工程师开始走向了转型的道路.那么在如今的数 ...
php链接数据库实行增删查改_利用PHP连接数据库——实现用户数据的增删改查的整体操作实例...
main页面(主页面) 代号姓名性别民族生日操作 $db = new MySQLi("localhost","root","",& ...
lambda 查询大量数据速度很慢_处理百万级以上的数据提高查询速度的方法
处理百万级以上的数据提高查询速度的方法: 1.应尽量避免在 where 子句中使用!=或<>操作符,否则将引擎放弃使用索引而进行全表扫描. 2.对查询进行优化,应尽量避免全表扫描,首先应考 ...
oracle数据泵导入导出_【软件】R语言数据导入与导出
"R语言导入文本和xlsx文件数据的方法,以及数据与图片的输出" 许多数据往往保存在TXT文件或Excel文件中,该如何将这些文件导入R语言进行分析呢?另外,使用R语言处理完数据之 ...
mysql1000w数据怎么加索引_给mysql一百万条数据的表添加索引
直接alter table add index 添加索引,执行一个小时没反应,并且会导致锁表:故放弃该办法,最终解决办法如下: 一.打开mysql 命令行客户端这里我们那可以看到导出的数据文件所存放 ...

数据库数据过长避免_为什么要避免使用商业数据科学平台

让我们从一个类比开始 (Let's start with an analogy)

您的商业价值是知识产权 (Your business value is Intellectual Property)

开源纸箱 (Open source cardboard boxes)

带有闪亮贴纸的昂贵纸板箱 (Expensive cardboard boxes with a shiny sticker)

综上所述 (In summary)

相关文章：

数据库数据过长避免_为什么要避免使用商业数据科学平台相关推荐

最新文章

热门文章