《成为一名机器学习工程师》

机器学习工程 (Machine Learning Engineering)

The title of “Machine Learning Engineer” is quickly becoming more popular and with that, there is significant interest from people trying to enter the Data Science field. What kind of career path is this and what skill set does a Machine Learning Engineer need to have? Is it possible to define steps to take in order to become a ML Engineer? Can you follow online training and get certified? I figured I’d write up my ideas on the state of the field and how viable it is for those looking to pursue a career in it.

“机器学习工程师”的称号Swift变得越来越流行,因此,试图进入数据科学领域的人们引起了极大的兴趣。 这是什么样的职业道路 ,机器学习工程师需要具备哪些技能 ? 可以定义要成为ML工程师要采取的步骤吗? 您可以接受在线培训并获得认证吗? 我想我会就该领域的现状写出自己的想法,以及对于那些寻求在该领域谋求职业的人们来说是多么可行。

机器学习工程师 (The Machine Learning Engineer)

Let’s get one point out of the way first. Some might look at the job title and expect it to be a Data Scientist who purely focuses on model building — and that’s it. This is a big no no; if only because most ML Engineering work starts after the initial model is built. While it’s often part of the job, a Machine Learning Engineer does not purely build models. And honestly, that part will only take up 5 to 10% of the job.

首先让我们指出一点。 有些人可能会看这个职位,并期望它成为纯粹专注于模型构建的数据科学家,仅此而已。 这是一个很大的不,不是。 仅仅是因为大多数ML Engineering的工作都是在构建初始模型之后开始的。 虽然这通常是工作的一部分,但是机器学习工程师并不能纯粹建立模型。 老实说,这部分仅占工作的5%到10%。

Look at this image of all the components that are involved in the model ecosystem. The black square at the center? That’s the actual ML code.

查看此模型生态系统中涉及的所有组件的图像。 中间的黑色方块? 那就是实际的ML代码。

here.这里 。

So what kind of creature is the Machine Learning Engineer then and where does it fit into the grand scheme of things? I prefer to fall back on a part of Tomasz Dudek’s definition from 2018:

那么,机器学习工程师是什么样的生物?它在什么宏伟的事物中适合? 我更喜欢从2018年开始引用Tomasz Dudek的定义:

…A person called a machine learning engineer asserts that all production tasks are working properly in terms of actual execution and scheduling, abuses machine learning libraries to their extremes, often adding new functionalities. (They) ensure that data science code is maintainable, scalable and debuggable, automating and abstracting away different repeatable routines that are present in most machine learning tasks. They bring the best software development practices to the data science team and help them speed up their work…

…一个叫机器学习工程师的人断言,所有生产任务在实际执行和调度方面都正常工作,滥用了机器学习库的极限,经常添加新功能。 (他们)确保数据科学代码是可维护的,可伸缩的和可调试的,以自动化和抽象化大多数机器学习任务中存在的不同可重复例程。 他们为数据科学团队带来了最佳的软件开发实践,并帮助他们加快了工作速度……

— Tomasz Dudek in But what is this “machine learning engineer” actually doing?

-Tomasz Dudek在 但是,这个“机器学习工程师”实际上在做什么?

Essentially a ML Engineer is then some kind of wizard that brings models to production in a sensible way, is able to improve the Data Scientist’s models and is also partly an architect who lays the road for the Data Science team. This sounds incredibly like some kind of senior engineering role, and yet it doesn’t have to be.

从本质上来说,ML工程师是一种将明智地将模型投入生产,能够改进Data Scientist的模型的向导,并且在某种程度上也是为Data Science团队铺路的建筑师。 这听起来像是某种高级工程角色,但并非必须如此。

常见的ML工程背景。 (Common ML Engineering backgrounds.)

Most of the other ML Engineers I’ve met fall into one of two categories. The first group is highly educated, with most having a master’s or even a PhD in Computer Science, Artificial Intelligence, Data Science or Software Engineering. Surprisingly many are relatively new grads, with 1–3 years of experience under their belt when they became ML Engineers. There’s also a second group that consists of more experienced developers that transitioned into this role from neighboring fields such as Software Engineering or Data Engineering, and of course Data Science.

我遇到的其他大多数ML工程师都属于以下两类之一。 第一组受过高等教育,大多数人拥有计算机科学,人工智能,数据科学或软件工程的硕士学位,甚至博士学位。 令人惊讶的是,许多是相对较新的应届毕业生,当他们成为ML工程师时拥有1-3年的经验。 还有另一个小组,由经验丰富的开发人员组成,他们从软件工程或数据工程,当然还有数据科学等邻近领域过渡到这一角色。

This indicates that there is a level of proficiency needed to be a ML Engineer that could come from either of the two directions that make up the role. You could be a great software engineer, or a fantastic machine learning virtuoso. Maybe both! If you are one already, this might be the field for you. If you are not, it might be a viable direction to develop yourself towards.

这表明,要成为ML工程师,需要具备一定的熟练水平,而该水平可能来自于组成该角色的两个方向中的任何一个。 您可能是一名出色的软件工程师,或者是出色的机器学习专家。 也许两者! 如果您已经是一个人,那么这可能是适合您的领域。 如果您不是,这可能是朝着自己发展的可行方向。

But do not make the mistake that Software Engineers or Data Scientists automatically make good ML Engineers. I come from a software background myself and I can vouch that most ML concepts and APIs are absolutely alien to Software Engineers. I remember the intense struggles I had getting to know TensorFlow and Theano years ago. Even though I started coding in my teens I had never seen anything like it. The experience was humbling.

但是请不要误以为软件工程师或数据科学家会自动成为优秀的ML工程师。 我本人来自软件背景,我可以保证大多数ML概念和API绝对与软件工程师无关。 我记得几年前我开始了解TensorFlow和Theano时所进行的艰苦奋斗。 即使我从十几岁开始编码,也从未见过类似的东西。 经验令人谦卑。

A beginner-level ML Engineer is not a beginner programmer. This is a journey that is always at least traveled with experience. Is it then impossible to land a ML Engineering job without experience or training?

初学者级ML工程师不是初学者。 这是一个至少总是有经验的旅程。 那么,没有经验或没有培训就不可能找到ML Engineering的工作吗?

Of course not. However, the odds are against you. It is far easier to get into this niche when you have a similar background. There is some light on the horizon, however.

当然不是。 但是,赔率对您不利。 当您具有相似的背景时,进入这个利基市场要容易得多。 但是,地平线上有一些亮点。

Remember that back when Data Science started becoming popular we said the same thing about Data Scientists because the people doing Data Science at that time were some of the brightest and most highly-educated people in the world. Since then Data Science has become more accessible and in truth, nowadays you can be a great Data Scientist without needing a PhD. Whether the same will fully apply for ML Engineering I am not sure, but I hope that as our field matures the barriers to entry will become lower.

请记住,当数据科学开始流行时,我们对数据科学家也说了同样的话,因为当时从事数据科学的人是世界上最聪明,最受过高等教育的人。 从那时起,数据科学变得越来越容易访问,实际上,如今,您可以成为一名出色的数据科学家而无需博士学位。 我不确定这是否会完全适用于ML Engineering,但我希望随着我们领域的成熟,准入门槛将会降低。

Data Science. Software Engineering. Probably some linear algebra too. These were the ingredients chosen to create the perfect ML Engineer. Whiteboard creation by author.
数据科学。 软件工程。 也可能是一些线性代数。 这些都是创建完美的ML工程师所选择的要素。 由作者创建白板。

The toolbelt of the ML Engineer is not simply the lovechild of an intense affair between a Software Engineer’s IDE and a Data Scientist Jupyter Lab. It has many tools and techniques that are intrinsic to the field. Which brings me the next section…

ML工程师的工具带不仅仅是软件工程师的IDE与数据科学家Jupyter实验室之间激烈关系的挚爱。 它具有该领域固有的许多工具和技术。 这带给我下一部分...

机器学习工程师的技能 (The Machine Learning Engineer’s Skills)

Skills lists become outdated soon after being written, and often take on a life of their own. And yet I am here to draft up a non-exhaustive list of skills and topics to study! The tool landscape is so broad that it’s unlikely any ML Engineer will have proficiency with every language, tool and concept out there. Please don’t look upon this as some kind of list of items you need to cross off on your ML Engineering journey like so many online resources will instruct you to. Rather, take note and look at these as themes within the ML Engineering field.

技能列表在编写后很快就过时了,并且往往会过着自己的生活。 但是,我在这里起草了一份不完整的技能和主题列表,以供学习! 工具范围如此之广,以至于任何ML工程师都不可能精通其中的每种语言,工具和概念。 请不要将此视为您在ML Engineering之旅中需要克服的某些项目清单,就像许多在线资源将指导您这样做一样。 相反,请注意并在ML Engineering领域中将它们视为主题。

I’ll try to discuss concepts more than specific tools. That way most of this will remain relevant in a couple of months or years.

我将尝试讨论概念而不是特定工具。 这样,大多数情况将在几个月或几年后保持相关性。

数据科学 (Data Science)

  • Python. Look into coding standards and some of the cool stuff in the recent versions of Python. Having a basic understanding of R is also useful and your Data Scientists will thank you for it.

    Python。 查看编码标准和最新版本的Python中的一些很棒的东西。 对R有一个基本的了解也很有用,您的数据科学家将感谢您。

  • Statistics.

    统计。

  • Model optimization.

    模型优化。

  • Model validation.

    模型验证。

  • ML frameworks such as sci-kit learn

    ML框架,例如sci-kit学习

  • Deep learning frameworks such as TensorFlow and PyTorch

    深度学习框架,例如TensorFlow和PyTorch

  • ML applications such as NLP, computer vision and time series analysis.

    ML应用程序,例如NLP,计算机视觉和时间序列分析。

  • Mathematics. Implicitly, you’ll use a lot of linear algebra and calculus.

    数学 。 隐式地,您将使用很多线性代数和微积分。

The reason why I would take Python over R or any other language is mainly because of the production aspect. While you can do a lot with R it is often not supported as well as Python is. There’s also the time aspect that plays here: often it is far faster to productionalize code in Python than R.

之所以选择Python而不是R或其他任何语言,主要是由于生产方面的原因。 尽管您可以使用R做很多事情,但它通常不像Python那样受支持。 这里还有时间方面的问题:在Python中进行生产化代码通常比R要快得多。

软件工程 (Software engineering)

  • Experience outside of python in a second programming language, such as Java, C++, or JavaScript.

    使用第二种编程语言 (例如Java,C ++或JavaScript)在python之外体验。

  • Cloud offerings. More on that later.

    云产品 。 以后再说。

  • Distributed computing

    分布式计算

  • System design and software architecture

    系统设计和软件架构

  • Data Structures and Algorithms.

    数据结构和算法。

  • Databases and the query languages that come with it.

    数据库及其附带的查询语言。

  • Containerization (e.g. Docker, KubeFlow)

    容器化 (例如Docker,KubeFlow)

  • Functional programming concepts

    函数式程序设计概念

  • Design patterns

    设计模式

  • Big O

    大O

  • API development

    API开发

  • Version control: git

    版本控制: git

  • Testing

    测试中

  • Project management. Probably the most underrated element in any SE curriculum.

    项目管理 。 可能是所有SE课程中被低估的元素。

  • CI/CD

    CI / CD

  • MLOps

    多播

So how do you learn about all of these if not on the job? Courses and online training can be great but they won’t teach you how to do apply it in a real-life setting. For things like statistics it doesn’t matter, but for technical subjects knowing “about” it is only half of the mastery. It doesn’t take more than a quick glance at Reddit’s r/learnprogramming to see that there are many people struggling to make the jump from coding in the protected IDE in an online course to coding their own projects on their own machine.

那么,如果不在工作中,您如何了解所有这些信息呢? 课程和在线培训可能很棒,但是它们不会教您如何在现实生活中应用它。 对于诸如统计之类的事情来说,这并不重要,但是对于了解“大约”的技术人员来说,这仅仅是精通的一半。 只需一眼就可以看到Reddit的r / learnprogramming ,很多人都在努力从在线课程中的受保护IDE编码过渡到在自己的机器上编码自己的项目。

My experience is that it might be better to get started on a project on your own to learn a new skill, and supplement your knowledge with online training when you already have some applied knowledge. Instead of all-in-one training programs there are many tutorials online to help you with that, from building your own clock or calculator to a complete web app. Be aware of any course that promises you can go from zero to hero in a couple of weeks or months.

我的经验是,最好是自己开始一个项目以学习新技能,并在已经掌握一些应用知识的情况下,通过在线培训补充知识。 从构建自己的时钟或计算器到完整的Web应用程序,在线上有许多教程可以为您提供帮助,而不是一站式培训计划。 请注意任何可以保证您在几周或几个月内从零变到英雄的过程。

Certifications are a similar beast. A certification can be particularly valuable if you’re in consulting and want to signal to clients that your skills meet certain standards. Having a certification that corresponds to a client’s tech stack immediately puts you at the front of the pack. However, a certification is worthless without the skills to back this up in the first place. Consider now that you can obtain many certifications without having to code for them and you’ll see where I’m headed. Often, the time spent getting a certification would be better spent just building applications.

认证是类似的野兽。 如果您正在咨询并且想向客户表明您的技能符合某些标准,那么证书特别有价值。 拥有与客户的技术堆栈相对应的认证,将使您立于不败之地。 但是,如果没有足够的技能来首先进行认证,那么认证就一文不值。 现在考虑一下,您可以获得许多认证,而无需为其编写代码,您将看到我的去向。 通常,花费时间来获得认证会更好地花费在构建应用程序上。

That said, there are some certifications that do carry some merit for ML Engineers, particularly for cloud vendors. Often these require a couple of years of experience deploying applications on their respective platforms, but anyone can pay $100–300 and register for a certification examination. As of 2020, there are three cloud vendors worth mentioning: Azure (Microsoft), GCP (Google), and AWS (Amazon). Here’s a list of certifications they offer that are in the sphere of interest of the ML Engineer.

也就是说,有些认证确实对ML工程师(尤其是云供应商)具有一定的价值。 通常,这些程序需要在其各自平台上部署应用程序的经验 ,但是任何人都可以支付100-300美元并注册认证考试。 截至2020年,值得一提的有三家云供应商:Azure(Microsoft),GCP(Google)和AWS(Amazon)。 这是他们提供的与ML工程师有关的认证列表。

Source资源

Microsoft Azure: (Microsoft Azure:)

Microsoft offers associate-level certification for both Data Scientists and AI Engineers, as well as about a dozen other certifications. Some certifications actually require multiple exams, but this is not (yet?) the case for both the Data Scientist and the AI Engineer cert. The certification topics are a little bit superficial, but the exam should not be underestimated.

微软为数据科学家和AI工程师提供助理级别的认证,以及大约十二种其他认证。 某些认证实际上需要多次考试,但数据科学家和AI工程师证书都还不是(现在呢)。 认证主题有些肤浅,但是考试不应被低估。

  • Microsoft Certified: Azure AI Fundamentals

    微软认证:Azure AI基础知识

  • Azure Data Scientist Associate

    Azure数据科学家助理

  • Azure AI Engineer Associate

    Azure AI工程师助理

here.在这里 。

Google云端平台: (Google Cloud Platform:)

Google is the challenger when it comes to cloud services and the state of their certification reflects that. At the moment the ML Engineer exam is in beta and no certifications have been awarded yet. The exam takes four (!) hours but is an incredibly comprehensive list of what a ML Engineer’s job is all about. Prior to this certification being introduced, some ML topics fell under the Data Engineer certification, so many ML Engineers, myself included, actually took the Data Engineering certification track.

Google在云服务方面是挑战者,其认证状态反映了这一点。 目前,ML工程师考试尚处于测试阶段,尚未获得任何认证。 考试需要四(!)小时,但它是ML工程师的工作内容的综合列表,令人难以置信。 在引入此认证之前,某些ML主题属于Data Engineer认证 ,因此包括我在内的许多ML Engineer实际上都参加了Data Engineering认证。

You could also look at the Google Cloud Architect, Developer or DevOps certification, but these barely touch upon it and might add a little bit of noise on your resume that lines you up for different gigs. I say that as a certified Cloud Architect myself who learned this from experience. On the other hand, it could make your profile a little bit more appealing.

您也可以查看Google Cloud Architect , Developer或DevOps认证,但是这些认证几乎没有涉及到,并且可能会在履历表中增加一点噪音,使您准备参加不同的演出。 我说自己是一名通过认证的Cloud Architect,他是从经验中学到的。 另一方面,它可以使您的个人资料更具吸引力。

  • Google Cloud Certified Professional Data Engineer

    Google Cloud认证的专业数据工程师

  • Google Cloud Certified Professional Machine Learning Engineer (currently in beta)

    Google Cloud认证的专业机器学习工程师 (当前处于测试版)

here.这里 。

AWS : (AWS:)

Amazon has specific paths for both analytic roles and ML roles. Given that the data analytics certification is almost entirely focused on data processing and reporting, I would propose that only the ML Specialty is of interest to the ML Engineers. Their Machine Learning Specialty’s syllabus covers a lot of ML Engineering topics, though it is not as exhaustive as the Google certification.

Amazon具有分析角色和ML角色的特定路径。 鉴于数据分析认证几乎完全集中在数据处理和报告上,我建议ML工程师只对ML专业感兴趣。 他们的机器学习专业课程提纲涵盖了许多ML工程主题,尽管它不如Google认证那么详尽。

  • AWS Certified Machine Learning — Specialty

    AWS认证的机器学习-专业

  • AWS Certified Developer Associate

    AWS认证开发人员助理

那你应该选哪一个呢? (So which ones should you get?)

At the moment Amazon is the market leader, with around 60% market share. Azure sits at 30% and GCP at 10%. While the overall market is growing a lot, AWS is slowly losing market share to Google and Microsoft. Google might look like an underdog, but they have a pretty strong track record with AI innovations and their ownership of TensorFlow. Speaking of which, there’s also a certificate for TF. If you’re not forced to use one cloud vendor over the other by an employer I would advice to test out all three with trial accounts and deploying a pet project. Figure out which one you like and also look at what kind of companies use these cloud vendors.

目前,亚马逊是市场领导者,拥有约60%的市场份额。 Azure占30%,GCP占10%。 尽管整个市场增长Swift,但AWS逐渐失去了Google和Microsoft的市场份额。 Google可能看起来像个失败者,但他们在AI创新和TensorFlow所有权方面拥有相当不错的业绩。 说到这, 还有TF的证书。 如果您没有被雇主强迫使用一个云供应商而不是另一个,那么我建议您使用试用帐户测试这三个云供应商并部署一个宠物项目。 找出您喜欢的公司,然后查看哪种公司使用这些云供应商。

Why do you need cloud tech at all? Well, eventually Data Science work makes it to production and most of the time it is deployed on a cloud platform. You don’t need to rival the skills of a Cloud Engineer but you should know how to implement ML projects in your chosen platform. They don’t often teach how to navigate {vendor} cloud consoles at a formal place like an university.

为什么根本需要云技术? 好吧,最终Data Science的工作可以投入生产,并且大部分时间都部署在云平台上。 您不需要与云工程师的技能相抗衡,但是您应该知道如何在所选平台上实施ML项目。 他们通常不会教如何在像大学这样的正式场所浏览{vendor}云控制台。

There is a downside to learning a cloud platform: literally not a week goes by without a new cloud product being announced on one of these giants. Keeping up to date with cloud offerings is hard. Having a wide range of certifications also brings into question whether you’re actually up to snuff with all of them.

学习云平台有一个弊端: 从字面上看,没有一个星期能在这些巨头之一中宣布新的云产品 。 与云产品保持同步是很难的。 拥有广泛的认证也使您怀疑您是否真的要对所有这些都them之以鼻。

You might have noticed that, so far, I did not link any courses or tutorials. There are already many resources already out there for that. Furthermore, my main point is that the road to becoming a ML Engineer is traveled by doing projects and getting experience in the field, as it is not an entry-level job.

您可能已经注意到,到目前为止,我还没有链接任何课程或教程。 为此已经有很多资源。 此外,我的主要观点是,成为ML工程师的道路是通过做项目和获得现场经验来进行的,因为这不是入门级的工作。

构成ML工作的差异 (Variation In What Constitutes ML Work)

When you’ve done all that you should know that there is a world of difference between working at a small company and doing ML than working at FAANG and doing ML. Likewise, there is a lot of variation between working on ‘product’ companies or at consultancies. Similarly, a bank and a start-up are worlds apart in terms of technology adaption.

当您完成所有工作后,您应该知道在小公司工作和执行ML与在FAANG工作和执行ML之间存在很大的差异。 同样,在“产品”公司或咨询公司之间工作也存在很大差异。 同样,在技术适应性方面,一家银行与一家初创企业是天壤之别。

You’re much more likely to be a jack of all trades at small companies , e.g. being asked to do data engineering, visualization and data science-y work as part of your day to day activities. Larger companies are more likely to hire specific staff that focuses on specific parts of the ML chain, and might even have different types of ML Engineers running around. If you’re at a company that does many different projects you might never go deep into any framework, but get to experience many kinds of tools and domains. These are specific considerations to keep in mind when you’re going to look for your first ML job.

您更有可能成为小公司所有交易的负责人,例如被要求在日常活动中进行数据工程,可视化和数据科学工作。 较大的公司更有可能雇用专门负责ML链特定部分的特定人员,甚至可能四处奔波的不同类型的ML工程师。 如果您所在的公司从事许多不同的项目,那么您可能永远都不会深入研究任何框架,而是会体验多种工具和领域。 这些是您要寻找第一个ML工作时要记住的特定注意事项。

准备失败 (Be Prepared To Fail)

There’s a slightly toxic notion in computer science that good developers hardly ever make mistakes and good code is without bugs. This is complete nonsense and it has led to an epidemic of imposter syndrome. Code is almost never written correct the first time and it is a process that highly depends on the time and money thrown at it. You grow a lot as a engineer in this field, but at the same time the field will grow faster than you. Being a ML Engineer is continuously having to learn new things on the job, and you give up some proficiency in coding by virtue of being in an interdisciplinary field.

在计算机科学中,有一个略带毒害的概念,即好的开发人员几乎不会犯错误,而好的代码没有错误。 这完全是胡说八道,并导致了冒名顶替综合症的流行。 第一次代码几乎永远不会写对,这是一个高度依赖于时间和金钱的过程。 作为该领域的工程师,您成长很多,但与此同时,该领域的成长速度将比您快。 作为一名ML工程师,不断需要在工作中学习新事物,并且由于处于跨学科领域,您放弃了一些编码方面的专业知识。

I regularly look back at code written a while ago and find that I made mistakes. Sometimes I rewrite it with the knowledge I have right now, or see if I can update an old model to a new version of an API. Some developers swear by completely tossing out dense code and rewriting it from scratch. Over time you develop a feeling to retroactively detect code smells and this is what engineering is about.

我经常回顾一下前一段时间编写的代码,发现自己犯了错误。 有时,我会以现在的知识重写它,或者看看是否可以将旧模型更新为API的新版本。 一些开发人员发誓要完全扔掉密集的代码并从头开始重写它。 随着时间的流逝,您会产生一种可以追溯地检测代码气味的感觉,这就是工程的意义所在。

Don’t be afraid to make mistakes. Failing is natural, particularly with something as new as ML.

不要害怕犯错误。 失败是很自然的,尤其是像ML这样的新事物。

机器学习工程师的阅读清单 (A Reading List for the Machine Learning Engineer)

A Machine Learning Engineer’s bookcase. Notice the worn-down copy of Linear Algebra and its applications. Photo by author.
机器学习工程师的书架。 请注意线性代数及其应用的旧版本。 图片由作者提供。

Although this is by no means meant as a complete list, here are some resources that I feel would benefit those who want to break into this field.

尽管这绝不是完整列表,但我认为有些资源对那些希望涉足这一领域的人有利。

Books:

图书:

  • Clean code by Robert C. Martin

    Robert C. Martin的干净代码

  • Machine learning yearning by Andrew Ng

    对机器学习的渴望

  • Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

    Ian Goodfellow,Yoshua Bengio和Aaron Courville的深度学习

  • The Pragmatic Programmer by David Thomas and Andrew Hunt

    大卫·托马斯和安德鲁·亨特的实用程序员

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppman

    设计数据密集型应用程序:可靠,可扩展和可维护的系统背后的大构想 by Martin Kleppman

Posts / blogs:

帖子/博客:

  • How to read scientific papers by Christoph Schmidl

    如何阅读 Christoph Schmidl的科学论文

  • Everything you need to know about becoming a self-taught ML Engineer by Jason Benn has some excellent motives.

    贾森·本恩(Jason Benn) 成为自学成才的机器学习工程师所需的一切知识,都有一些出色的动机。

  • Become a Data Scientist in 2020 with these 10 resources by Rahul Agarwan will appeal to those looking for lists of online resources.

    Rahul Agarwan撰写的这10篇资源将成为2020年的数据科学家,这将吸引那些寻求在线资源列表的人们。

  • Openai’s blog

    Openai的博客

Papers:

文件:

  • Hidden Technical Debt in Machine Learning Systems

    机器学习系统中的隐藏技术债务

  • Software Engineering for Machine Learning: A Case Study

    机器学习的软件工程:案例研究

会议,聚会和ML工程现场 (Conferences, meetups and the ML Engineering Scene)

Some people hate them, others love them. Some use them as a way to self-promote themselves to the max while other people share absolutely brilliant ideas. I like meetups for the simple reason they’ll allow you to cherry pick what kind of topics you want to learn about. Do you like Scala? AI in Healthcare? Meetups. More of a fan of Bayesian optimization? There’s probably a meetup for that. Most of the meetups have gone fully online due to Corona and I expect this to continue through to the first half of 2021 at least.

有些人讨厌他们,另一些人爱他们。 有些人用它们作为自我提升的一种方式,而另一些人则分享绝对的绝妙想法。 我喜欢聚会 ,原因很简单,因为它们使您可以挑选想要学习的主题。 你喜欢Scala吗? 医疗保健中的AI? 聚会。 更喜欢贝叶斯优化吗? 可能有一个聚会。 由于有电晕,大多数聚会都已经完全在线,我希望至少到2021年上半年为止。

Protip: They’re great for networking if you’re looking to get an internship, job, or a mentor.

Protip :如果您想找实习,工作或导师,它们非常适合建立人脉。

其他职业 (Other Careers)

There are a couple of other careers that should be mentioned to those considering moving into this field.

考虑进入该领域的人还应该提到其他一些职业。

  • Data Engineering: Anything that touches data needs to be able to handle scale and complex transforms. You’re the one specialized in connecting various elements of the data pipeline. There is often a significantly higher demand for your services than there is for those of the Machine Learning Engineer.

    数据工程 :涉及数据的任何事物都必须能够处理规模和复杂的转换。 您是专门研究连接数据管道各个元素的人。 对您的服务的需求通常比对机器学习工程师的需求要高得多。

  • Data Scientist: Analysis, Storytelling, Statistics, Machine Learning and presenting it to the CEO. You got it all. Usually this job is more diverse and involves less programming, but it really depends where you end up — there are so many flavors of data science it is hard to define it as a single role and some data scientists run a complete Data & Analytics department by themselves.

    数据科学家:分析,讲故事,统计,机器学习并将其呈现给CEO。 知道了 通常,这项工作的种类更多,涉及的编程更少,但实际上取决于您的最终目标-数据科学的种类繁多,很难将其定义为单一角色,并且一些数据科学家会通过一个完整的数据与分析部门他们自己。

  • Cloud Engineering: Specialized in integrating different applications and moving workflows to the cloud, you’re pretty good friends with the Data and ML Engineers.

    云工程:专门集成不同的应用程序并将工作流移动到云中,您是数据和ML工程师的好朋友。

结论 (Conclusion)

A Machine Learning Engineer has a broad range of topics to understand from both Machine Learning and Software Development. Courses and certifications don’t bring you there as of 2020. A formal training or experience in the field is still desirable, but I expect that it will become more accessible over time, similar to how Data Science became more open to newcomers. With that in mind, I feel that the best path for those looking to become ML Engineers without formal training would then be to enter Data Science or Software Engineering, and transfer from there while picking up the elements that make up ML Engineering.

机器学习工程师有很多主题可以从机器学习和软件开发中了解。 到2020年,课程和认证并不会带您到那里。仍然需要在该领域进行正式培训或经验,但是我希望随着时间的流逝,它将变得更加容易获得,类似于Data Science对新移民更加开放。 考虑到这一点,我认为对于那些没有经过正式培训而想要成为ML工程师的人来说,最好的途径就是进入数据科学或软件工程,然后从那里转移,同时挑选组成ML Engineering的要素。

结束语 (Closing words)

As the field moves fast I would like to focus on the ML Engineering skillset and tool landscape in a future post by data scraping job postings and doing some magic on that in order to come up with a more statistically sound analysis of what a ML Engineer could know. Wondering whether you should learn TensorFlow over PyTorch? Stay tuned for that :)

随着领域的快速发展,我想在以后的文章中重点介绍ML工程技能和工具领域,方法是抓取数据并对其进行一些魔术处理,以便对ML工程师的工作进行更合理的统计分析。知道。 想知道您是否应该通过PyTorch学习TensorFlow? 敬请期待:)

With this post, I added to the growing body of Data Science articles. I hope you found it useful. I wanted to write something accessible to those not currently in the field and I hope my two cents will help you in figuring out whether this niche of ML and Engineering is for you.

通过这篇文章,我添加了越来越多的数据科学文章。 希望你觉得它有用。 我想写一些东西,让那些目前不在该领域的人可以使用,我希望我的两分钱能帮助您确定ML和Engineering的利基市场是否适合您。

翻译自: https://towardsdatascience.com/how-to-become-a-machine-learning-engineer-in-2020-1161aa29261e

《成为一名机器学习工程师》


http://www.taodudu.cc/news/show-863849.html

相关文章:

  • 打开应用蜂窝移动数据就关闭_基于移动应用行为数据的客户流失预测
  • 端到端机器学习_端到端机器学习项目:评论分类
  • python 数据科学书籍_您必须在2020年阅读的数据科学书籍
  • ai人工智能收入_人工智能促进收入增长:使用ML推动更有价值的定价
  • 泰坦尼克数据集预测分析_探索性数据分析—以泰坦尼克号数据集为例(第1部分)
  • ml回归_ML中的分类和回归是什么?
  • 逻辑回归是分类还是回归_分类和回归:它们是否相同?
  • mongdb 群集_通过对比群集分配进行视觉特征的无监督学习
  • ansys电力变压器模型_变压器模型……一切是如何开始的?
  • 浓缩摘要_浓缩咖啡的收益递减
  • 机器学习中的无监督学习_无监督机器学习中聚类背后的直觉
  • python初学者编程指南_动态编程初学者指南
  • raspberry pi_在Raspberry Pi上使用TensorFlow进行对象检测
  • 我如何在20小时内为AWS ML专业课程做好准备并进行破解
  • 使用composer_在Google Cloud Composer(Airflow)上使用Selenium搜寻网页
  • nlp自然语言处理_自然语言处理(NLP):不要重新发明轮子
  • 机器学习导论�_机器学习导论
  • 直线回归数据 离群值_处理离群值:OLS与稳健回归
  • Python中机器学习的特征选择技术
  • 聚类树状图_聚集聚类和树状图-解释
  • 机器学习与分布式机器学习_我将如何再次开始学习机器学习(3年以上)
  • 机器学习算法机器人足球_购买足球队:一种机器学习方法
  • 机器学习与不确定性_机器学习求职中的不确定性
  • pandas数据处理 代码_使用Pandas方法链接提高代码可读性
  • opencv 检测几何图形_使用OpenCV + ConvNets检测几何形状
  • 立即学习AI:03-使用卷积神经网络进行马铃薯分类
  • netflix 开源_Netflix的Polynote是一个新的开源框架,可用来构建更好的数据科学笔记本
  • 电场 大学_人工电场优化算法
  • 主题建模lda_使用LDA的Google Play商店应用评论的主题建模
  • 胶囊路由_评论:胶囊之间的动态路由

《成为一名机器学习工程师》_如何在2020年成为机器学习工程师相关推荐

  1. opencv机器学习线性回归_全面讲解手推实战机器学习之线性回归

    点击上方"蓝字",发现更多精彩. 这个主题是讲解机器学习,会全面的讲解理论,知识干货.学了理论不会实践怎么办?调了包不懂实现?每个算法都会配备实践,手推和简单实现,让你知其然,还要 ...

  2. 机器学习算法_明确解释:4种机器学习算法

    您是涉足机器学习的数据科学家吗? 如果是,那么您应该阅读此内容. 定义,目的,流行算法和用例-全部说明 > Photo by Andy Kelly on Unsplash 机器学习已经从科幻小说 ...

  3. 如何准备机器学习数据集_数据准备技术及其在机器学习中的重要性

    如何准备机器学习数据集 什么是数据? (What is Data?) Data refers to examples of cases from the domain that characteriz ...

  4. 5g 2020年赚钱的企业_如何在2020年建立旅游博客(一边赚钱)

    5g 2020年赚钱的企业 Do you want to start a travel blog but don't know where to begin? Travel blogs are a p ...

  5. 未来最有竞争力的编程语言_如何在2020年9月开始竞争性编程

    未来最有竞争力的编程语言 Competitive programming is changing the industry hiring process drastically. Many Big C ...

  6. 基于张量机器学习模型_什么是基于模型的机器学习?

    基于张量机器学习模型 About Tom: Tom Diethe is a research fellow on the SPHERE project at the University of Bri ...

  7. 前端前端开发工程师_如何消除您对成为前端工程师的担忧

    前端前端开发工程师 by Yazan Aabed 通过Yazan Aabed 如何消除您对成为前端工程师的担忧 (How to eliminate your fears about being a f ...

  8. 机器学习框架_一个框架解决几乎所有机器学习问题

    一个叫 Abhishek Thakur 的数据科学家,在他的 Linkedin 发表了一篇文章 Approaching (Almost) Any Machine Learning Problem,介绍 ...

  9. docker容器内漏洞_如何在2020年发现和修复Docker容器漏洞

    docker容器内漏洞 Containerization allows engineering teams to create a sandbox environment in which to ru ...

最新文章

  1. 调试异常 Free Heap block xxxxxxxx modified at xxxxxxxx after it was freed
  2. XCTF-高手进阶区:web2
  3. 关于mysql优化之个人见解
  4. dataframe常用操作总结
  5. java static方法
  6. 如何使用Spring优雅地处理REST异常
  7. 书单丨刷完这5本题库,妈妈再也不用担心我的面试
  8. SpringBoot 下 Mybatis 的缓存
  9. 2020电信最新套餐一览表_最新!2020年宁波中学排名一览表
  10. 图像分割评估指标——表面距离计算库
  11. 郝斌_数据结构入门笔记
  12. 数字逻辑电路期末复习与常见问题
  13. 中标麒麟服务器系统安装教程,安装国产Linux中标麒麟操作系统教程
  14. Msm8960(APQ8064)平台的MSM-AOSP-kitkat编译适配(8):wifi与蓝牙
  15. android7.1刷supersu,Android系统怎么刷SuperSU
  16. AMD AM4主板首曝:A320芯片组 惠普打造
  17. 高分子材料老化的内外因、性能评价与预防措施
  18. 国在产vr视频区_吉林vr建筑安全体验馆生产厂家-乐高VR安全教育
  19. 拉线前要理线,综合布线的八点准备事项
  20. 电子邮件安全的主要威胁有哪些?

热门文章

  1. 由于目标计算机积极拒绝,无法连接。 192.168.1.106:8078 说明: 执行当前 Web 请求期间,出现未经处理的异常。...
  2. ContextMenuStrip 类
  3. jQuery 3.0 的 setter/getter 模式
  4. OpenCv调用摄像头拍照代码
  5. C、C++差异之左值右值
  6. 初学Java ssh之Spring 第二篇
  7. Win2003 + SQL 2005 做数据库集群总结(虚拟机)
  8. mysql多源复制相同数据库名称_mysql数据库多源复制方案
  9. python的输出函数_Python输出函数print()总结(python print())
  10. SSH putty Disconnected: Server protocol violation: unexpected SSH2_MSG_UNIMPLEMENTED packet