为什么用scrum_为什么Scrum糟糕于数据科学

为什么用scrum

Scrum is a popular methodology for PM in software engineering and recently the trend has carried over to data science. While the utility of Scrum in standard software engineering may remain up for debate, here I will detail why it has unquestionably no place in data science (and data engineering as well). This is not to say that “Agile” as a whole is bad for data science, but rather that the specific principles of Scrum: sprints, single product owner, scrum master, daily stand-ups (and the litany of other meetings) fit poorly for data science teams and ultimately result in poorer products.

Scrum是软件工程中PM的一种流行方法，最近这种趋势一直延续到数据科学中。尽管Scrum在标准软件工程中的实用性仍有待争论，但在这里我将详细说明为什么它在数据科学(以及数据工程)中毫无疑问地没有地位。这并不是说“敏捷”作为一个整体对数据科学不利，而是说Scrum的具体原则：冲刺，单一产品所有者，Scrum负责人，日常站立式会议(以及其他会议的连网)为数据科学团队服务，最终导致产品质量下降。

The Sprint/Estimation

冲刺/估计

Scrum prioritizes creating “deliverables” often in two-week sprints. While this might arguably work well for certain areas of software engineering, it fails spectacularly in the data science world. Data Science by its very nature is a scientific process and involves, research, experimentation, and analysis. Data Science projects are very difficult to estimate because many times they are asking the team to do something that hasn’t been done before. While it is true that data scientists may have designed similar models before they likely haven’t leveraged the dataset or utilized the specific technique required. This means that there is a lot of uncertainty in the process. Things like poorer than expected data quality, problems with hyper-parameter tuning, and/or a technique just not working can cause a failure to “deliver” by the end of the sprint. This means that point estimates are often less than worthless as they are based on prior projects that often don’t bear resemblance to the current project in progress.

Scrum优先考虑经常在两周的冲刺中创建“可交付成果”。尽管这对于软件工程的某些领域来说可能是行之有效的，但在数据科学领域却失败了。数据科学从本质上说是一个科学过程，涉及研究，实验和分析。数据科学项目很难估算，因为很多时候他们要求团队做一些以前没有做过的事情。确实，数据科学家可能在未利用数据集或未使用所需的特定技术之前就已经设计了类似的模型。这意味着该过程存在很多不确定性。诸如数据质量差于预期，超参数调整问题和/或某种技术无法使用等情况可能导致在sprint结束之前无法“交付”。这意味着积分估计往往是不值钱的，因为它们是基于以前的项目而得出的，这些项目通常与正在进行的当前项目不相似。

Moreover, even if data scientists “deliver” the required items for a sprint in many cases they have likely sacrificed code quality, model robustness, or documentation in order to meet the arbitrary end of the sprint. I have often heard management describe positively the benefits of “more urgency” in a two-week sprint. But remember this urgency also has drawbacks chiefly that data scientists are more likely to make mistakes and overlook things.

而且，即使数据科学家在许多情况下“交付”了冲刺所需的物品，他们也可能为了满足冲刺的任意目的而牺牲了代码质量，模型鲁棒性或文档编制。我经常听到管理层在两周的冲刺中积极地描述了“更加紧迫”的好处。但是请记住，这种紧迫性也主要有弊端，即数据科学家更有可能犯错误并忽视事物。

On the opposite end of the spectrum, I’ve seen data scientists who finished their work early, hesitant to pull in new stories out of fear of not being able to complete them by the end of sprint. Therefore, they just sit idly for several days before the next sprint.

另一方面，我看到数据科学家们早日完成了他们的工作，出于担心无法在冲刺结束时完成它们的考虑而犹豫不决地提出新的故事。因此，在下一次冲刺之前，他们只是闲置几天。

But couldn’t we break up these big tasks? Proponents of Scrum will argue that the issue here is not Scrum, but just the need to better break up big tasks (likely with additional time consuming grooming sessions). However, even breaking up big tasks does not remove the uncertainty with data science. For instance, a task like train an XGBoost model and report results, might take much longer than a single sprint due to missing values in the code that need to be encoded or needed data not being present at all. Yes, this could be addressed by having prior story “explore the dataset and fill missing values” but as I will describe in a second most PO’s lack the expertise to prioritize these types of stories as it doesn’t fulfill an immediate deliverable.

但是我们不能分解这些大任务吗？ 支持Scrum的人会争辩说，这里的问题不是Scrum，而是需要更好地分解大型任务(可能需要花费更多时间进行梳理会话)。但是，即使分解大任务也不能消除数据科学的不确定性。例如，训练XGBoost模型并报告结果之类的任务可能比单个sprint花费更长的时间，这是因为代码中缺少需要编码的值或根本不存在数据。是的，这可以通过使用先前的故事“探索数据集并填充缺失的值”来解决，但是正如我将在第二篇中描述的那样，大多数PO缺乏专门知识来对这些类型的故事进行优先级排序，因为它无法立即完成交付。

Constant Pivoting

恒定旋转

Related to the above point, Scrum often results in constant pivoting from one project to another. True, many consider pivoting a “desirable” trait, however this constant change in direction often results in nothing getting done and promising projects being shelved simply because they aren’t producing immediate deliverables. This is particularly true in data science where many projects require a long-term investment in both employee time and resources. I have seen many times where promising projects were discontinued because they didn’t deliver performance improvements fast enough or the product owner just saw something flashier, they wanted to focus on.

与上述观点有关，Scrum经常导致从一个项目到另一个项目的不断旋转。的确，许多人考虑转为“理想的”特征，但是，方向的不断变化通常导致一无所获，有希望的项目被搁置仅仅是因为它们没有立即产生可交付的成果。在数据科学领域尤其如此，因为在该领域中，许多项目都需要在员工时间和资源上进行长期投资。我见过很多次有希望的项目被终止，因为它们没有足够快地提供性能改进，或者产品负责人只是看到了一些闪光点，他们想集中精力。

Lack of cross team pollination

缺乏跨团队授粉

Scrum often creates a horribly narrow focus on one’s own team’s sprint tickets to the exclusion of everything else. It discourages data scientists (or really anyone) from contributing to other initiatives around their company; other initiatives where their skills could potentially be of help. It also has the tendency to push off important issues that could affect other teams unless that team is in direct contact with the product owner.

Scrum经常将焦点集中在自己团队的冲刺门票上，从而将其他一切排除在外。它不鼓励数据科学家(或几乎任何人)为公司的其他计划做出贡献；其他可能有助于他们的技能的计划。除非该团队直接与产品所有者联系，否则它也倾向于推销可能影响其他团队的重要问题。

The role of the product owner (PO)

产品负责人(PO)的角色

Another key problem of Scrum is that it places too much power in the hands of the PO. The PO is generally in charge of the backlog and determines which issues need to be prioritized. However, product owners generally have a poor understanding of the technical nuances of data science projects. Therefore, needed work such as refactoring of code or further analysis of model performance often gets pushed to the back. Additionally, lack of immediate “progress” might result in the product owner moving away from a project entirely. This isn’t to say that data scientists shouldn’t regularly communicate with stakeholders to determine the priority of tickets, but rather than having a dedicated product owner at all the meetings and deciding the priority of tasks is counterproductive both to the team and long term to the product itself.

Scrum的另一个关键问题是，它将过多的权力置于PO手中。采购订单通常负责积压，并确定需要优先处理的问题。但是，产品所有者通常对数据科学项目的技术细微差别知之甚少。因此，诸如代码重构或对模型性能的进一步分析之类的所需工作通常被推迟到后面。此外，缺乏即时的“进度”可能会导致产品所有者完全脱离项目。这并不是说数据科学家不应该定期与利益相关者进行交流来确定工单的优先级，而是要在所有会议上让专门的产品负责人确定任务的优先级既不利于团队，又不利于长期发展。产品本身。

Daily Standups, grooming and other wastes of time

每日站立，梳理和其他浪费时间

I’ve seen very few if any teams that need to meet on a daily basis. Communication between teammates is important, however, usually twice per week or three times will more than suffice. Likewise, teammates should be encouraged to reach out if they get blocked or need help. However, a daily standup often does nothing but micro-manage employees.

我见过很少有需要每天开会的团队。队友之间的沟通很重要，但是，通常每周两次或三次就足够了。同样，如果被阻止或需要帮助，应鼓励队友伸出援手。但是，每天站起来通常只对微观管理人员无能为力。

Grooming (or refinement) is a meeting of the Scrum team in which the product backlog items are discussed, and the next sprint planning is prepared.

整理(或改进)是Scrum团队的一次会议，其中讨论了产品待办事项，并准备了下一个冲刺计划。

Grooming is another session that needlessly wastes time. As I mentioned above, technical complexity in data science often means that sprint goals will often not be met or met with subpar results. This in turn often results in the justification for even more grooming meetings (or pre-grooming as we used to call them) in order to “break down those big issues.” In a never-ending cycle these meetings continue to eat up more and more data scientist time.

修饰是另一次不必要地浪费时间的会议。正如我上面提到的，数据科学中的技术复杂性通常意味着冲刺目标通常不会达到或达到低于标准的结果。反过来，这通常导致有理由进行更多的梳理会议(或称我们以前称为“预梳理”)以“解决那些大问题”。在无休止的循环中，这些会议继续吞噬越来越多的数据科学家时间。

The retro is one of the few scrum meetings that I like, however, suggestions at these meetings are often not taken seriously. For instance, at several sprint retros I’ve attended during my career, the majority of teammates recommended not having a daily stand-up, but the scrum master and management discounted these suggestions because “that would not be scrum.” However, in contrast, suggestions for adding more grooming sessions are almost always enacted without question.

回顾会议是我所喜欢的为数不多的Scrum会议之一，但是，这些会议上的建议通常不会被认真对待。例如，在我职业生涯中参加过的几次冲刺比赛中，大多数队友建议不要每天站起来，但是Scrum主管和管理层不赞成这些建议，因为“那不是Scrum。” 但是，与此相反，几乎总是会提出增加更多修饰会话的建议。

The Scrum Master

Scrum大师

Another role that is essentially useless is the scrum master. the definition of a scrum master formally is:

本质上没有用的另一个角色是Scrum Master。 Scrum Master的正式定义为：

The scrum master is the team role responsible for ensuring the team lives agile values and principles and follows the processes and practices that the team agreed they would use.”- Agile Alliance

Scrum主管是团队的角色，负责确保团队生活在敏捷的价值观和原则中，并遵循团队同意使用的流程和实践。”-敏捷联盟

What…? In practice, the scrum master acts as a non-technical busy body who coerces team members into attending the aforementioned pointless meetings and drags around JIRA cards, while preaching the canon of how Scrum will lead your team to salvation (e.g. more points delivered per sprint).

什么…？实际上，Scrum主管是一个非技术性的忙碌机构，它强迫团队成员参加上述无意义的会议，并拖拉JIRA卡，同时宣扬Scrum如何引导您的团队获得救助的标准(例如，每个冲刺传递更多的积分) )。

False Dichotomy and the no true Scrum argument

错误的二分法和不正确的Scrum论点

Finally, proponents of Scrum often create a straw man in comparing Scrum to waterfall and other older project management methods. Moreover, in many companies, management takes an all or nothing approach. It is possible to take aspects of Scrum, Agile or other forms of project management without adhering to them completely. For instance, you could utilize ideas like stories, epics, etc, without a product owner, sprints, or a scrum master.

最后，Scrum的支持者经常将稻草人与瀑布和其他较旧的项目管理方法进行比较时创建了一个稻草人。而且，在许多公司中，管理层采取全有或全无的方法。可以完全不遵循Scrum，敏捷或其他形式的项目管理方面的内容。例如，您可以利用故事，史诗等想法，而无需产品所有者，冲刺或Scrum主管。

Another common trend I see frequently is for people to say “what you experienced was not true Scrum, blah blah is actually waterfall. If only you had a better product owner…” What this fails to realize is that Scrum as a system breeds these types of problems. Delegating a singular role as the product owner is bound to cause problems. Sure you could have an exceptionally good PO that has years of DS experience or understands the team, but that likely won’t be the case. Moreover, the sprint at its core encourages frantic rushing at the end of whatever arbitrarily decided duration in order to meet the “commitment.” It also fundamentally assumes that all work can be concretely estimated. Scrum could possibly work in areas of software engineering where there is very well defined problems that are only slight variations (though even then you have the issue with POs and the Scrum masters). However, when there is any uncertainty (like there is in data science, data engineering and Devops) Scrum breakdowns and results in both wasted time and resources.

我经常看到的另一个普遍趋势是，人们会说：“您所经历的不是真正的Scrum，实际上是瀑布。如果只有一个更好的产品负责人……”这无法实现的是，Scrum作为系统会滋生此类问题。将单一角色委派为产品所有者必定会引起问题。当然，您可能拥有一个非常出色的PO，该PO具有多年的DS经验或对团队的了解，但事实并非如此。此外，冲刺的核心是鼓励在任意决定期限的末尾疯狂奔波，以实现“承诺”。这也从根本上假定所有的工作可以具体估算。 Scrum可能在软件工程领域中工作，在这些领域中，定义非常明确的问题只有很小的变化(尽管即使如此，PO和Scrum管理员仍然遇到问题)。但是，当存在任何不确定性时(例如数据科学，数据工程和Devops中)，Scrum故障会导致浪费时间和资源。

What should you use instead?

您应该使用什么呢？

This leads to a central question of how you should manage a data science team. There is no single answer. What I’ve found to work well is a Kanban based approach without a product owner but regular discussions with stakeholders (weekly or every other week). Additionally, work in progress limits seem to help streamline the process. Like I mentioned above, meetings twice per week (Tuesday/Friday) or some alternative often work well.

这就引出了一个中心问题，即您应该如何管理数据科学团队。没有一个答案。我发现行之有效的是一种基于看板的方法，没有产品所有者，而是与利益相关者进行定期讨论(每周或每两周一次)。此外，进行中的限制似乎有助于简化流程。就像我上面提到的，每周(星期二/星期五)召开两次会议或其他一些会议通常效果很好。

However, this approach may not work well for all teams. That is why, particularly for data science, I’d recommend trying out many different approaches to determine what works well for your team. The key is to find the system that works well for your team and stakeholders rather than one that just placates upper management’s ideas of how a data scientist team should operate.

但是，这种方法可能不适用于所有团队。因此，特别是对于数据科学，我建议尝试许多不同的方法来确定最适合您的团队的方法。关键是要找到一种对您的团队和利益相关者都适用的系统，而不是仅仅体现高级管理层关于数据科学家团队应如何运作的想法的系统。

Additional Links

附加链接

Reddit Discussions on Scrum for Data Science

Reddit关于数据科学Scrum的讨论

Quora Question and Answers

Quora问题与解答

翻译自: https://towardsdatascience.com/why-scrum-is-awful-for-data-science-db3e5c1bb3b4

为什么用scrum

查看全文

http://www.taodudu.cc/news/show-997601.html

使用集合映射和关联关系映射_使用R进行基因ID映射
详尽kmp_详尽的分步指南，用于数据准备
SMSSMS垃圾邮件检测器的专业攻击
使用Python进行地理编码和反向地理编码
grafana 创建仪表盘_创建仪表盘前要问的三个问题
大数据对社交媒体的影响_数据如何影响媒体，广告和娱乐职业
python 装饰器装饰类_5分钟的Python装饰器指南
机器学习实际应用_机器学习的实际好处是什么？
mysql 时间推移_随着时间的推移可视化COVID-19新案例
海量数据寻找最频繁的数据_寻找数据科学家的“原因”
kaggle比赛数据_表格数据二进制分类：来自5个Kaggle比赛的所有技巧和窍门
netflix_Netflix的Polynote
气流与路易吉，阿戈，MLFlow，KubeFlow
顶级数据恢复_顶级R数据科学图书馆
大数据 notebook_Dockerless Notebook：数据科学期待已久的未来
微软大数据_我对Microsoft的数据科学采访
如何击败腾讯_击败股市
如何将Jupyter Notebook连接到远程Spark集群并每天运行Spark作业？
twitter 数据集处理_Twitter数据清理和数据科学预处理
使用管道符组合使用命令_如何使用管道的魔力
2020年十大币预测_2020年十大商业智能工具
为什么我们需要使用Pandas新字符串Dtype代替文本数据对象
nlp构建_使用NLP构建自杀性推文分类器
时间序列分析 lstm_LSTM —时间序列分析
泰晤士报下载_《泰晤士报》和《星期日泰晤士报》新闻编辑室中具有指标的冒险活动-第1部分：问题
异常检测机器学习_使用机器学习检测异常
特征工程tf-idf_特征工程-保留和删除的内容
自我价值感缺失的表现_不同类型的缺失价值观和应对方法
学习sql注入:猜测数据库_面向数据科学家SQL：学习简单方法
python自动化数据报告_如何：使用Python将实时数据自动化到您的网站

为什么用scrum_为什么Scrum糟糕于数据科学相关推荐

r怎么对两组数据统计检验_数据科学中最常用的统计检验是什么
r怎么对两组数据统计检验 Business analytics and data science is a convergence of many fields of expertise. Profe ...
数据科学 python_适用于数据科学的Python vs（和）R
数据科学 python Choosing the right programming language when taking on a new project is perhaps one of t ...
sap wm内向交货步骤_内向型人在数据科学中成功的五个有效步骤
sap wm内向交货步骤 Just like most attributes of humans, including both the bright and dark sides, being an ...
gcp devops_将GCP AI平台笔记本用作可重现的数据科学环境
gcp devops By: Edward Krueger and Douglas Franklin. 作者: 爱德华·克鲁格 ( Edward Krueger)和道格拉斯·富兰克林 ( Dougla ...
第一名数据科学工作冠状病毒医生
背景 (Background) 3 years ago, I had just finished medical school and started working full-time as a d ...
数据分析团队的价值_您的数据科学团队的价值
数据分析团队的价值 This is the first article in a 2-part series!! 这是分两部分的系列文章中的第一篇! 组织数据科学 (Organisational Da ...
数据库数据过长避免_为什么要避免使用商业数据科学平台
数据库数据过长避免让我们从一个类比开始 (Let's start with an analogy) Stick with me, I promise it's relevant. 坚持下去,我保证这 ...
敏捷数据科学pdf_敏捷数据科学数据科学可以并且应该是敏捷的
敏捷数据科学pdf TL;DR; TL; DR; I have encountered a lot of resistance in the data science community agains ...
软件开发向大数据开发过渡_如果您是过渡到数据科学的开发人员，那么这里是您的最佳资源...
软件开发向大数据开发过渡 by Cecelia Shao 邵Ce It seems like everyone wants to be a data scientist these days - fr ...

为什么用scrum_为什么Scrum糟糕于数据科学

相关文章：

为什么用scrum_为什么Scrum糟糕于数据科学相关推荐

最新文章

热门文章