机器学习深度学习 ai_如何突破AI炒作成为机器学习工程师

机器学习深度学习 ai

I’m sure you’ve heard of the incredible artificial intelligence applications out there — from programs that can beat the world’s best Go players to self-driving cars.

我敢肯定，您已经听说过令人难以置信的人工智能应用程序-从可以击败世界上最好的围棋选手的程序到无人驾驶汽车。

The problem is that most people get caught up on the AI hype, mixing technical discussions with philosophical ones.

问题在于，大多数人都被AI炒作所吸引，将技术性讨论与哲学性讨论混为一谈。

If you’re looking to cut through the AI hype and work with practically implemented data models, train towards a data engineer or machine learning engineer position.

如果您希望消除AI的炒作并使用实际实现的数据模型，请朝着数据工程师或机器学习工程师的方向培训。

Don’t look for interesting AI applications within AI articles. Look for them in data engineering or machine learning tutorials.

不要在AI文章中寻找有趣的AI应用程序。在数据工程或机器学习教程中查找它们。

These are the steps I took to build this fun little scraper I built to analyze gender diversity in different coding bootcamps. It’s the path I took to do research for Springboard’s new AI/ML online bootcamp with job guarantee.

这些是我为构建这个有趣的小刮板而采取的步骤，该刮板是为分析不同编码训练营中的性别多样性而构建的。这就是我为具有工作保障的 Springboard 新AI / ML在线训练营进行研究的途径。

Here’s a step-by-step guide to getting into the machine learning space with a critical set of resources attached to each one.

这是进入机器学习领域的分步指南，每个领域都有一组关键资源。

1.开始梳理您的Python和软件开发实践 (1. Start brushing up on your Python and software development practices)

You’ll want to start off by embracing Python, the language of choice for most machine learning engineers.

首先，您需要拥抱Python，这是大多数机器学习工程师的首选语言。

The handy scripting language is the tool of choice for most data engineers and data scientists. Most tools for data have been built in Python or have built API access for easy Python access.

方便的脚本语言是大多数数据工程师和数据科学家的首选工具。大多数数据工具都是使用Python构建的，或者已经构建了API访问权限以方便Python访问。

Thankfully, Python’s syntax is relatively easy to pick up. The language has tons of documentation and training resources. It also includes support for all sorts of programming paradigms from functional programming to object-oriented programming.

幸运的是，Python的语法相对容易掌握。该语言具有大量的文档和培训资源。它还包括对从功能编程到面向对象编程的各种编程范例的支持。

The one thing that might be a bit hard to pick up is the tabbing and spacing required to organize and activate your code. In Python, the whitespace really matters.

可能有点难以理解的一件事是组织和激活代码所需的制表符和空格。在Python中，空白确实很重要。

As a machine learning engineer, you’d be working in a team to build complex, often mission-critical applications. So, now is a good time to refresh on software engineering best practices as well.

作为机器学习工程师，您将在一个团队中构建复杂的，通常是关键任务的应用程序。因此，现在也是刷新软件工程最佳实践的好时机。

Learn to use collaborative tools such as Github. Get into the habit of writing thorough unit tests for your code using testing frameworks such as nose. Test your APIs using tools such as Postman. Use CI systems such as Jenkins to make sure your code doesn’t break. Develop good code review skills to work better with your future technical colleagues.

学习使用协作工具，例如Github。养成使用鼻子等测试框架为代码编写全面的单元测试的习惯。使用Postman等工具测试您的API。使用Jenkins等CI系统来确保您的代码不会中断。培养良好的代码审查技能，以便与未来的技术同事更好地合作。

One thing to read: What is the best Python IDE for data science? Take a quick read-through so you can understand what toolset you want to work in to implement Python on datasets.

读一件事 ：什么是数据科学最好的Python IDE？快速阅读，以便您了解要在数据集上实现Python的工具集。

I use the Jupyter Notebook myself, since it comes pre-installed with most of the important data science libraries you’ll use. It comes with an easy, clean interactive interface that allows you to edit your code on the fly.

我自己使用Jupyter Notebook ，因为它已经预装了您将要使用的大多数重要数据科学库。它带有一个简单，干净的交互式界面，使您可以即时编辑代码。

Jupyter Notebook also comes with extensions that allow you to easily share your results with the world at large. The files generated are also super easy to work with on Github.

Jupyter Notebook还带有扩展程序，使您可以轻松地与全世界共享您的结果。生成的文件在Github上也非常容易使用。

One thing to do: Pandas Cookbook allows you to fork into live examples of the Pandas framework, one of the most powerful data manipulation libraries. You can quickly work through an example of how to play with a dataset through it.

要做的一件事 ： Pandas Cookbook允许您进入Pandas框架的实时示例，该框架是功能最强大的数据处理库之一。您可以快速查看一个如何通过它处理数据集的示例。

2.研究机器学习框架和理论 (2. Look into machine learning frameworks and theory)

Once you’re playing around with Python and practicing with it, it’s time to start looking at machine learning theory.

一旦您开始使用Python并进行了实践，就该开始研究机器学习理论了。

You’ll learn what algorithms to use. Having a baseline knowledge of the theory behind machine learning will let you implement models with ease.

您将学习使用哪些算法。拥有机器学习背后的理论基础知识，可以轻松实现模型。

One thing to read: A Tour of The Top Ten Algorithms For Machine Learning Newbies will help you get started with the basics. You’ll learn that there isn’t a “free lunch”. There is no algorithm that will give you the optimal result for each setting, so you’ll have to dive into each algorithm.

阅读一件事 ：机器学习十大算法新手将帮助您入门基础知识。您会发现这里没有“免费午餐”。没有一种算法可以为您提供每种设置的最佳结果，因此您必须深入研究每种算法。

One thing to do: Play around with the interactive Free Machine Learning in Python Course — develop your Python skills and start implementing algorithms.

一件事要做 ：在Python课程中体验交互式的免费机器学习 -开发您的Python技能并开始实现算法。

3.开始使用数据集并进行实验 (3. Start working with datasets and experimenting)

You’ve got the tools and theory under your belt. You should think about doing little mini-projects that can help you refine your skills.

您掌握了工具和理论。您应该考虑做一些小型项目，这些项目可以帮助您提高技能。

One thing to read: Take a look at 19 Free Public Data Sets for Your First Data Science Project and start looking at where you can find different datasets on the web to play around with.

要读的一件事 ：为您的第一个数据科学项目查看19个免费公共数据集，然后开始查看可以在网上找到不同数据集的地方。

One thing to do: Kaggle Datasets will let you work with lots of publicly available datasets. What’s cool about this collection is you can see how popular certain datasets are. You can also see what other projects have been built with the same dataset.

要做的一件事 ： Kaggle数据集将使您可以处理许多公开可用的数据集。这个集合的优点是您可以看到某些数据集的受欢迎程度。您还可以查看使用相同数据集构建的其他项目。

4.利用Hadoop或Spark扩展数据技能 (4. Scale your data skills with Hadoop or Spark)

Now that you’re practicing on smaller datasets, you’ll want to learn how to work with Hadoop or Spark. Data engineers work with streaming, real-time production-level data at the terabyte and sometimes petabyte scale. Skill up by learning your way through a big data framework.

现在，您正在处理较小的数据集，您将需要学习如何使用Hadoop或Spark。数据工程师使用TB级(有时甚至PB级)的流式实时生产级数据。通过学习大数据框架来掌握技能。

One thing to read: This short article How do Hadoop and Spark Stack Up? will help you walk through both Hadoop and Spark and how they compare and contrast with one another.

阅读一件事 ：这篇简短的文章Hadoop和Spark如何堆叠？将帮助您遍历Hadoop和Spark以及它们如何相互比较。

One thing to do: If you want to start working with a big data framework right away, Spark Jupyter notebooks hosted on Databricks offers a tutorial-level introduction to the framework, and gets you to practice with production-level code examples.

要做的一件事 ：如果您想立即开始使用大数据框架， Databricks上托管的Spark Jupyter笔记本会提供该框架的教程级介绍，并让您练习生产级代码示例。

5.使用TensorFlow等深度学习框架 (5. Work with a deep learning framework like TensorFlow)

You’re done exploring machine learning algorithms and working with the different big data tools out there.

您已经完成了机器学习算法的探索，并可以使用各种不同的大数据工具。

Now it’s time to take on the sort of powerful reinforcement learning that has been the focus of new advances. Learn the TensorFlow framework and you’ll be on the cutting edge of machine learning work.

现在是时候进行强大的强化学习，而这正是新进展的重点。学习TensorFlow框架，您将处在机器学习工作的最前沿。

One thing to read: Read What is TensorFlow? and understand what’s going on below-the-hood when it comes to this powerful deep learning framework.

要阅读的一件事 ：阅读什么是TensorFlow？并了解有关此强大的深度学习框架的内幕。

One thing to do: TensorFlow and Deep Learning without a PhD is an interactive course built by Google that combines theory placed into slides with practical labs with code.

要做的一件事 ： TensorFlow和没有博士学位的深度学习是Google制作的一门互动课程，它将幻灯片中的理论与带有代码的实际实验室相结合。

6.开始使用大型生产级数据集 (6. Start working with big production-level datasets)

Now that you’ve worked with deep learning frameworks, you can start working towards large production-level datasets.

既然您已经使用了深度学习框架，就可以开始处理大型生产级数据集。

As a machine learning engineer, you’ll be making complex engineering decisions on managing large amounts of data and deploying your systems.

作为机器学习工程师，您将在管理大量数据和部署系统方面做出复杂的工程决策。

That would include collecting data from APIs and web scraping, SQL + NoSQL databases and when you’d use them, use of pipeline frameworks such as Luigi or Airflow.

这将包括从API和Web抓取，SQL + NoSQL数据库收集数据，以及在使用它们时使用诸如Luigi或Airflow之类的管道框架。

When you deploy your applications, you might use container-based systems such as Docker for scalability and reliability, and tools such as Flask to create APIs for your application.

部署应用程序时，可以使用基于容器的系统(例如Docker)来实现可伸缩性和可靠性，并使用工具(例如Flask)来为应用程序创建API。

One thing to read: 7 Ways to Handle Large Data Files for Machine Learning is a nice theoretical exercise into how you would handle big datasets, and can serve as a handy checklist of tactics to use.

要读的一件事 ：处理机器学习的大数据文件的7种方法是一个很好的理论练习，介绍了如何处理大数据集，并且可以用作方便使用的策略清单。

One thing to do: Publicly Available Big Data Sets is a list of places where you can get very large datasets — ready to practice your newfound data engineering skills on.

要做的一件事 ：公开可用的大数据集是可以获取非常大的数据集的位置的列表-准备练习新发现的数据工程技能。

7.练习，练习，练习，建立投资组合然后再工作 (7. Practice, practice, practice, build towards a portfolio and then a job)

Finally, you’ve gotten to a point where you can build working machine learning models. The next step to advance your machine learning career is to find a job with a company that holds those large datasets so you can apply your skills every day to a cutting-edge machine learning problem.

最后，您到了可以构建有效的机器学习模型的地步。推进机器学习事业的下一步是在拥有大量数据集的公司中找到工作，以便您每天可以将自己的技能应用于前沿的机器学习问题。

One thing to read: 41 Essential Machine Learning Interview Questions (with answers) will help you practice the knowledge you need to ace a machine learning interview.

要读的一件事 ： 41必备的机器学习面试问题(包括答案)将帮助您练习掌握机器学习面试所需的知识。

One thing to do: Go out and find meetups that are dedicated to machine learning or data engineering on Meetup — it’s a great way to meet peers in the space and potential hiring managers.

要做的一件事 ：出去玩，在Meetup上找到专门用于机器学习或数据工程的聚会–这是结识空间中的同行和潜在招聘经理的好方法。

Hopefully, this tutorial has helped cut through the hype around AI to something practical and tailored that you can use. If you feel like you need a little bit more, the company I work with, Springboard, offers a career track bootcamp dedicated to AI and machine learning with a job guarantee, and 1:1 mentorship from machine learning experts.

希望本教程有助于将围绕AI的炒作切入您可以使用的实用且量身定制的内容。如果您觉得需要更多一点，与我合作的公司Springboard会提供专门针对AI和机器学习的职业训练营，并提供工作保证，并由机器学习专家提供1：1指导。

翻译自: https://www.freecodecamp.org/news/how-to-cut-through-the-ai-hype-to-become-a-machine-learning-engineer-b0d2c5e4ae02/

机器学习深度学习 ai