
让我们从一个类比开始 (Let's start with an analogy)

Stick with me, I promise it’s relevant.


If your selling vegetables in a grocery store your business value lies in your loyal customers and your position on the high street that sees a high footfall. You probably don’t have a fancy dandy shop front, it’s just boxes of veg, it’s that and your quality sales staff that sells the veg to the passers-by.

如果您在杂货店里卖菜,您的业务价值就在于您的忠实客户和您在大街上人流量大的位置。 您可能没有花哨的花花公子店面,只是一箱蔬菜,是这样,还有您的优质销售人员将蔬菜卖给路人。

One day a salesman from High Tech Veg Retail Solutions Inc comes into your shop. He tells you “cardboard boxes are not efficient and unmanageable”. He has a product that will keep your veg in a locked fridge in the back of the shop, but passers-by could simply ask for cauliflower and it would be whizzed at top speed via conveyer belt to them.

有一天,来自高科技蔬菜零售解决方案公司的推销员走进您的商店。 他告诉您“纸箱效率不高且无法管理”。 他的产品可以将您的蔬菜放在商店后方的锁冰箱中,但是过路人可以简单地索要花椰菜,然后会通过传送带以极高的速度将菜花打发到他们身上。

It does almost everything, the only downside is that due to the complexity of the machine you will only be able to stock half your current range of veg and by the way, all the veg will still be stored in cardboard boxes inside the fridge.


On the upside, you can get rid of your quality staff and employ cheaper staff with fewer skills.


I’m sure you would send him on his way to find another victim.


您的商业价值是知识产权 (Your business value is Intellectual Property)

If your reading this article, then you are either considering AI and ML or are already using it and have heard that there is a much better commercial data science platform available.


In the remainder of this article, I’m going to explain why you would be making a big mistake investing in a commercial data science solution.


开源纸箱 (Open source cardboard boxes)

Those free cardboard boxes that are easily accessible on the shop front are your Open Source AI and ML toolsets, freely available and easily accessible.


They don’t hide anything, you can see everything you put in and you can stand by the output, even for safety-critical applications because you can describe how you got your results.


Every available option for squeezing that last 20% out of your model that produces 80% of its value is available to you.


Any training you need is free or very low cost at least and is easily accessible 24 hours a day on many different web sites.


The most common language adopted by Opensource tools is Python. A language learned at High School, college, and University.

开源工具采用的最常见的语言是Python。 在高中,大学和大学学习的一种语言。

带有闪亮贴纸的昂贵纸板箱 (Expensive cardboard boxes with a shiny sticker)

This is what commercial AI and ML platforms offer.


Under the hood, they are employing the same Opensource tools you can access for free. Yes, they have a fancy wrapper around them, a conveyer belt built-in, and a shiny sticker to boot.

在幕后,他们正在使用可以免费访问的相同开源工具。 是的,它们周围有精美的包装纸,内置的传送带和引导套。

The only way to access those free tools though, is through the interface the platform provides you with. Its a really pretty interface, but it only gives you access to a fraction of the capability of what the underlying opensource tools are capable of.

但是,访问这些免费工具的唯一方法是通过平台提供的界面。 它的界面非常漂亮,但是只允许您访问底层开源工具所能提供的部分功能。

I can’t think of any commercial data science platform that is not employing Opensource tools at its heart.


The 80/20 ruleThe data scientists that could get that last 20% out of a model for you, are now reduced to dragging, dropping, and clicking a mouse and you're losing 80% of your business value. I hear you say, “but the results are much faster on this vendors platform”, OK, so you’re losing 80% of your business value faster!

80/20规则可以为您从模型中获得最后20% 收益的数据科学家现在减少为拖放,单击和单击鼠标,您将失去80%的业务价值。 我听到你说,“但是在这个供应商平台上,结果更快”,好的,因此您损失了80%的业务价值!

Also, ask yourself why is this vendors platform faster, it’s because that last 20% that gets 80% of the value is not the low hanging fruit. It’s complex, it’s why data scientists dedicate their careers to the subject and its why they are invaluable as data scientists and not mouse clickers

另外,问自己为什么这个供应商平台更快,这是因为最后20%获得80%的价值的原因并不容易。 这很复杂,这就是为什么数据科学家将自己的职业奉献给该学科,以及为什么他们作为数据科学家而不是鼠标点击者而具有不可估量的价值

Where is your business value now?Let’s assume that this commercial platform, by some miracle, could get 100% of the value you can get from unrestricted Opensource tools, where is your business value now? It’s locked into this vendor's platform, a platform your spending a huge amount of money on.

您现在的业务价值在哪里? 让我们假设这个商业平台可以奇迹般地从无限制的开源工具中获得100%的价值,现在您的商业价值在哪里? 它已锁定在该供应商的平台中,您在该平台上花费了大量金钱。

You can’t extract your IP, its been converted into a proprietary format. Even if you could reverse engineer their generated code (see you in court), the best you would get is a result that is missing that last 20% and how long did the reverse engineering take you.

您无法提取您的IP,它已转换为专有格式。 即使您可以对他们生成的代码进行逆向工程(法庭上见),您得到的最好结果就是遗漏了最后20%的结果,以及逆向工程花费了您多长时间。

The tail wagging the dogAI and ML are improving all the time. Every few months a new feature comes out that wows the community and offers your business even more potential revenue.

摇摆狗 AI和ML 的尾巴一直在改善。 每隔几个月就会发布一项新功能,该功能引起了社区的赞誉,并为您的企业提供了更多的潜在收入。

Your vendor's commercial application and UI is so tightly integrated into the older versions of the Opensource software, that you won’t see that update for another 6 to 12 months. Forget it, six months is a lifetime in AI and ML, you just missed that opportunity.

您供应商的商业应用程序和用户界面是如此紧密地集成到旧版本的开源软件中,以至于再过6至12个月您都不会看到该更新。 算了,六个月是AI和ML的生命,您只是错过了这个机会。

Recruitment, retention, and training. Every data scientist you recruit, will, for the most part, come fully trained on the opensource tools that they have been working with for years. Those that are just out of university, will be full of enthusiasm, have fresh ideas. The one thing they all have in common, is they are all experts on the opensource tools sets, that will let them bring their enthusiasm and ideas to reality.

招聘,保留和培训。 您招募的每位数据科学家都将在很大程度上接受他们多年来使用的开源工具的全面培训。 那些刚大学毕业的人会充满热情,并有新的想法。 他们都有一个共同点,就是他们都是开源工具集的专家,这将使他们将热情和想法变为现实。

Of course, you're going to tell them in the interview to forget all that knowledge they have worked hard to accrue, you have just invested a lot of money on a proprietary system that has half the data science capability they are used to and which they have never heard of before.


The long and short is you will find it hard to recruit staff and impossible to recruit talented staff. Any talented staff you currently have will soon be leaving as well.

总而言之,您将很难招募员工,也很难招募有才能的员工。 您目前拥有的所有有才能的员工也将很快离开。

Trust the grassroots. You will very rarely hear a data scientist raving about a commercial data science platform. For that reason, most of the vendors offering these products don’t target the grassroots. They go directly to the senior managers and even CEO looking for a top-down decision. Most CEO’s understand the value of data science, but the details are complex and overwhelming. So when a well-trained salesman scares the living shit out of them with horror stories of Opensource wow’s they tend to believe them.

相信基层。 您很少会听到数据科学家对商业数据科学平台大加赞赏。 因此,大多数提供这些产品的供应商都不以基层为目标。 他们直接向高级经理甚至首席执行官寻求自上而下的决定。 大多数首席执行官都了解数据科学的价值,但细节复杂而压倒性。 因此,当一个训练有素的推销员以开放源代码的恐怖故事吓them他们的生活时,他们往往会相信它们。

Talk to your own loyal staff before forcing something on them. Find out what opensource tools they currently use and what could be done better if a small investment was made, or they were given the time to design and implement a more suitable stack. After all, they work in your business, they know your requirements, and I guarantee the costs will be orders of magnitude less than paying for a commercial platform.

在强迫他们之前,先与自己的忠实员工交谈。 找出他们当前使用哪些开源工具,如果进行少量投资,或者他们有时间设计和实现更合适的堆栈,则可以做得更好。 毕竟,他们在您的企业中工作,知道您的要求,并且我保证成本将比为商业平台支付的费用少几个数量级。

综上所述 (In summary)

If you have got a data science requirement and money to invest, invest it wisely. Invest in talented individuals. Look at how you can make a small investment in infrastructure to get a big payback from the tools they already use. Your skilled staff will make your company more valuable and you will retain 100% of your business IP. You don’t need a high tech cardboard box, the free opensource ones, you already have are the best you can get.

如果您有数据科学方面的要求和资金来进行投资,请明智地进行投资。 投资有才华的人。 看一下如何在基础架构上进行少量投资,以从他们已经使用的工具中获得丰厚的回报。 熟练的员工将使您的公司更有价值,并且您将保留100%的业务IP。 您不需要高科技的纸板箱,免费的开源纸板箱,已经是最好的了。

翻译自: https://medium.com/swlh/why-you-should-avoid-commercial-data-science-platforms-6e9c4b5f3596




  • 数据分析团队的价值_您的数据科学团队的价值
  • 第一名数据科学工作冠状病毒医生
  • 简述yolo1-yolo3_使用YOLO框架进行对象检测的综合指南-第二部分
  • gcp devops_将GCP AI平台笔记本用作可重现的数据科学环境
  • 电力现货市场现货需求_现货与情绪:现货铜市场中的自然语言处理与情绪评分
  • sap wm内向交货步骤_内向型人在数据科学中成功的五个有效步骤
  • 数据库备份策略 分布式_管理优秀的分布式数据团队的4种基本策略
  • 深度学习 免费课程_2020年排名前三的免费深度学习课程
  • 机器学习:分类_机器学习基础:K最近邻居分类
  • 将PDF和Gutenberg文档格式转换为文本:生产中的自然语言处理
  • 协方差意味着什么_“零”到底意味着什么?
  • 全栈入门_启动数据栈入门包(2020)
  • marlin 三角洲_三角洲湖泊和数据湖泊-入门
  • 机器学习 建立模型_建立生产的机器学习系统
  • 风能matlab仿真_发现潜力:使用计算机视觉对可再生风能发电场的主要区域进行分类(第1部分)
  • 实验人员考评指标_了解实验指标
  • nba数据库统计_NBA板块的价值-从统计学上讲
  • 两个链接合并_如何找到两个链接列表的合并点
  • 工程师的成熟模型_数据工程师的成熟度
  • scrape创建_确实在2分钟内对Scrape公司进行了评论和评分
  • 如何不认识自己
  • plotly python_使用Plotly for Python时的基本思路
  • java项目经验行业_行业研究以及如何炫耀您的项目
  • 数据科学 python_适用于数据科学的Python vs(和)R
  • r怎么对两组数据统计检验_数据科学中最常用的统计检验是什么
  • 深度学习概述_深度感测框架概述
  • 为什么即使在班级均衡的情况下,准确度仍然令人困扰
  • 接受拒绝算法_通过算法拒绝大学学位
  • 为什么用scrum_为什么Scrum糟糕于数据科学
  • 使用集合映射和关联关系映射_使用R进行基因ID映射


  1. 光滑噪声数据常用的方法_整理一份详细的数据预处理方法

    重磅干货,第一时间送达 作者:lswbjtuhttps://zhuanlan.zhihu.com/p/51131210 为什么数据处理很重要? 熟悉数据挖掘和机器学习的小伙伴们都知道,数据处理相关的工 ...

  2. 数据透视表 筛选_筛选器选择中的数据透视图标题

    数据透视表 筛选 Instead of adding a static title to your Pivot Chart, use a worksheet formula to create a d ...

  3. 大数据好还是不好_学python好还是大数据好?想学IT,但有点搞不清方向的人可以看看...

    这是我在某平台上看到的一个问题,学IT的话,是学python好还是学大数据好? 首先这个问题不太对,因为大数据和python,从根源上来说是两码事,就像你问我,是学做湘菜好,还是学做打铁好. 所以,学 ...

  4. 数据中心安全风控_平安银行Hadoop集群跨数据中心迁移项目告捷项目骨干专访

    Hadoop集群跨数据中心迁移 平安银行东莞数据中心建成 平安银行科技中心零售大数据团队 平安银行科技中心科技运营中心 群迁告捷 经过平安银行科技运营中心和大数据团队的不懈努力,作为平安银行AI战略转 ...

  5. python大数据运维工程师_运维工程师转型大数据怎么样

    运维工作没意思,运维没有前途,运维会被取代--让很多的运维工程师感受到前途无"亮",随着资本寒冬的来临,以及各种新技术的不断出现,很多运维工程师开始走向了转型的道路.那么在如今的数 ...

  6. php链接数据库实行增删查改_利用PHP连接数据库——实现用户数据的增删改查的整体操作实例...

    main页面(主页面) 代号 姓名 性别 民族 生日 操作 $db = new MySQLi("localhost","root","",& ...

  7. lambda 查询大量数据速度很慢_处理百万级以上的数据提高查询速度的方法

    处理百万级以上的数据提高查询速度的方法: 1.应尽量避免在 where 子句中使用!=或<>操作符,否则将引擎放弃使用索引而进行全表扫描. 2.对查询进行优化,应尽量避免全表扫描,首先应考 ...

  8. oracle数据泵导入导出_【软件】R语言数据导入与导出

    "R语言导入文本和xlsx文件数据的方法,以及数据与图片的输出" 许多数据往往保存在TXT文件或Excel文件中,该如何将这些文件导入R语言进行分析呢?另外,使用R语言处理完数据之 ...

  9. mysql1000w数据怎么加索引_给mysql一百万条数据的表添加索引

    直接alter table add index 添加索引,执行一个小时没反应,并且会导致锁表:故放弃该办法,最终解决办法如下: 一.打开mysql 命令行客户端 这里我们那可以看到导出的数据文件所存放 ...


  1. mass Framework在后端的核心模块
  2. Linux基础——bash基础应用及文件系统基础命令
  3. CodeForces Goodbye 2017
  4. Python打包EXE神器 pyinstaller
  5. C/C++端口复用SO_REUSEADDR(setsockopt参数)
  6. 职高学的计算机单招考试能换专业吗,高职单招录取后可以换专业吗
  7. Activity与Fragment生命周期
  8. 手撸反向传播算法(附代码)
  9. Java图形组件 OpenSwing
  10. linux shell mysql版本,Linux中Shell操作MySQL
  11. 计算机系统是无形资产吗,计算机操作系统做为无形资产核算吗
  12. 陀螺仪传感器维特智能WT901JY901九轴传感器受金属干扰的解决方法。九轴陀螺仪、九轴传感器、磁场计、姿态感应器
  13. 基于ABP和Magicodes实现Excel导出操作
  14. 一行命令批量修改染色体和位置为RS号
  15. 虚拟主机3种方式nginx/apache+跨域知识点整理
  16. 华为:交付服务体系怎么提升一线作业人员的工作体验?
  17. 数字电视至显示android,手机投屏到电视的5种方法 看完才知道原来这么简单!
  18. Springboot—mysql+mybatis+generator插件
  19. 图像调整亮度饱和度 c语言,数据增强-亮度-对比度-色彩饱和度-色调-锐度 不改变图像大小...
  20. 关于InstallShield Projects


  1. 基于TCP的在线聊天程序
  2. ER TO SQL语句
  3. 897. 递增顺序查找树-未解决
  4. C# mysql 插入数据,中文乱码
  5. Alpine 操作系统是一个面向安全的轻型 Linux 发行版
  6. centos7安装oracle12c 二
  7. node.js安装部署测试
  8. RUNOOB python练习题3
  9. 05 MapReduce应用案例01
  10. 测试使用wiz来发布blog