数据结构两个月学完

It has been 2 years ever since I started my data science journey. Boy, that was one heck of a roller coaster ride!

自从我开始数据科学之旅以来已经有两年了 。 男孩 ,那可真是坐过山车!

There were many highs and lows, and of course, countless cups of coffee and sleepless nights.

有很多高峰和低谷,当然还有无数杯咖啡和不眠之夜。

I failed a lot, learned a lot, and of course, grew a lot as a data scientist along the journey.

作为一个数据科学家,我经历了很多失败,学到了很多东西,当然,成长了很多。

Throughout my journey in these 2 years, from writing on Medium, speaking at meetups and workshops, sharing my experience on LinkedIn, consulting clients on data science projects, to the current stage of teaching data science in education, I find joy and fulfilment in sharing and teaching to help others in data science and make an impact.

在这两年的旅程中,从撰写中型文章 , 在聚会和研讨会 上 发表演讲, 在LinkedIn上分享我的经验 , 就数据科学项目向客户提供咨询 ,到目前在教育中教授数据科学的阶段,我在分享中都感到快乐和成就并进行教学以帮助他人在数据科学中产生影响

At the end of the day, it all boils down to one simple fact — that I’m moving towards my mission — Making data science accessible to everyone.

归根结底,这都归结为一个简单的事实-我正在朝着自己的使命迈进- 使所有人都能使用数据科学

If you’re interested, feel free to check my previous LinkedIn post on why I decided to transition from a data scientist to becoming a data science instructor — a.k.a teacher.

如果您有兴趣,请随时查看我以前在LinkedIn上发布的帖子,以了解为什么我决定从数据科学家过渡到成为数据科学老师(又名老师)。

In this article, for the first time, I’ll consolidate everything that I’ve learned and condense all of these into 5 lessons that I’ve learned in 2 years as a data scientist.

在本文中,我将第一次将自己学到的所有知识整合在一起,并将所有这些知识汇总为我在两年内作为数据科学家学到的5课

If you’re just starting out in data science and wondering what to learn…

如果您只是刚开始从事数据科学,并想知道该学习什么……

Or you’re looking for a job in data science…

或者您正在寻找数据科学领域的工作...

Or you’re already working in data science space…

或者您已经在数据科学领域工作了……

I hope you’ll find these 5 lessons helpful to you as a data scientist!

希望您会发现这5堂课程对数据科学家有帮助!

Enough of talking… Let’s get started!

足够多的谈话……让我们开始吧!

我两年来作为数据科学家学到的5课 (5 Lessons I’ve Learned in 2 Years as a Data Scientist)

(Source)(资源)

1.讲故事,而不是陈述。 (1. Storytelling, NOT Presentation.)

One of the most profound questions that I’ve ever been asked by one of the great senior data scientists during my data science career:

在我的数据科学职业生涯中,一位伟大的高级数据科学家曾经问过我最深刻的问题之一:

“Admond, what’s the story that we are gonna tell in the meeting later?”

“阿德蒙德,我们稍后在会议上要讲的故事是什么?”

The first time I heard this question, I was stunned for a second.

第一次听到这个问题时,我惊呆了一秒钟。

He didn’t ask what slides I’d prepared.

他没有问我准备了哪些幻灯片。

He didn’t ask what I was gonna share.

他没有问我要分享什么。

He didn’t ask what results that I was gonna tell.

他没有问我要告诉什么结果。

NONE.

没有。

To be honest with you, I didn’t even understand why he emphasized so much on telling stories, instead of telling facts that we already had.

老实说,我什至不明白他为什么这么讲讲故事,而不是讲我们已经掌握的事实。

Before I began to appreciate the importance of telling stories, I made tons of mistakes.

在我开始欣赏讲故事的重要性之前,我犯了很多错误。

Either stakeholders didn’t understand what I was saying. Or the insights couldn’t convince and motivate them to take action.

任何一个利益相关者都不理解我在说什么。 否则这些见解无法说服和激励他们采取行动。

Once I decided to improve my storytelling skills…

一旦我决定提高叙事技巧,…

Once I started focusing on telling stories…

一旦我开始专注于讲故事...

Things changed, for real.

事情变了,真的。

Stakeholders or non-technical bosses began to understand what I was delivering without bombarding them with technical jargons and results. They took action.

利益相关者或非技术老板开始理解我所提供的内容,而没有用技术术语和结果轰炸他们。 他们采取了行动。

Facts tell, but stories sell.

F 言行举止,但故事却卖。

If you want to be a good data scientist, focus on technical skills.

如果您想成为一名优秀的数据科学家,请专注于技术技能。

If you want to be a great data scientist, focus on storytelling skills.

如果您想成为一名出色的数据科学家,请专注于讲故事的技能。

所以……如何学习讲故事的技巧? (So… How To Learn Storytelling Skills?)

Want to learn storytelling skills? Learn from Vox.

想学习讲故事的技巧吗? 向Vox学习。

Because they are the master of storytelling, like seriously.

因为他们是讲故事的主人,所以很认真。

They have always been able to explain complex issues or ideas in an engaging and understandable way.

他们始终能够以一种引人入胜且易于理解的方式解释复杂的问题或想法。

If this is the first time you’ve heard of Vox, check out their YouTube video below.

如果这是您第一次听说Vox,请在下面查看他们的YouTube视频。

Just observe how they explained societal phenomena and issues in the most intuitive storytelling way possible to understand.

只需观察他们如何以最直观的讲故事的方式解释社会现象和问题,就可以理解。

And this is very important when it comes to presenting insights or delivering core message to your audience with great storytelling skills.

当谈到具有深刻的讲故事技巧的见解或向您的听众传达核心信息时,这一点非常重要。

演示地址

Vox — How wildlife trade is linked to coronavirusVox —野生生物贸易与冠状病毒之间的联系

2.数据混乱,拥抱它。 (2. Data Is Messy, Embrace It.)

Forget about having Kaggle-like data in your real working environment, because most of the time you won’t have clean data.

忘记在实际的工作环境中拥有类似Kaggle的数据,因为大多数时候您将没有干净的数据。

Or worse, sometimes you don’t even have data to begin with, or perhaps you’re just not sure where to get or query data because they are scattered everywhere.

或更糟糕的是,有时您甚至没有开始使用的数据,或者您只是不确定要从哪里获取或查询数据,因为它们分散在各处。

Data collection and data integrity are one of the most important steps in any data science projects, yet a lot of junior data scientists might be oblivious to that.

数据采集数据完整性 这是任何数据科学项目中最重要的步骤之一,但是许多初级数据科学家可能会忽略这一点。

The reality is that you need to know where to get your data based on business requirements and the existing data architecture.

现实情况是,您需要根据业务需求和现有数据架构来了解从何处获取数据。

You might breathe a sigh of relief after you’ve got the data, but this is where the hard part begins — data integrity.

拥有数据后,您可能会松一口气,但这就是最困难的部分-数据完整性。

You need to perform a thorough check on the data collected by asking hard questions and understanding from different stakeholders to see if the data collected makes any sense.

您需要通过提出难题和不同利益相关者的理解对收集的数据进行彻底检查,以查看收集的数据是否有意义。

Without having right and accurate data in place at the first place, all of our data cleaning, EDA, machine learning models building, and deployment are simply a luxury.

如果没有首先放置正确且准确的数据,那么我们所有的数据清理 , EDA ,机器学习模型的建立和部署都是一种奢侈。

3.软技能>技术技能 (3. Soft Skills > Technical Skills)

One of the most common questions for beginners in data science is this:

数据科学初学者最常见的问题之一是:

“What are the skills that I need to learn when starting out in data science?”

“从数据科学开始我需要学习哪些技能?”

In my opinion, I think learning technical skills (programming, statistics etc.) should be the priority when first starting out in data science.

在我看来,我认为学习技术技能 (编程,统计学等)应该是首次进入数据科学时的优先事项。

Once we’ve a solid foundation in technical skills, we should focus more on building and improving our soft skills (communication, storytelling etc.).

一旦我们在技术技能上建立了坚实的基础,我们就应该更加专注于建立和改进我们的软技能 (沟通,讲故事等)。

While this might seem a bit counter-intuitive to the normal ways of learning data science skills, I truly believe in this approach.

尽管这似乎与学习数据科学技能的常规方法有点反常理,但我确实相信这种方法。

WHY?

为什么?

You see. Data scientists are problem solvers.

你看。 数据科学家是解决问题的人。

We don’t just write some code, build some fancy machine learning models and call it a day.

我们不只是编写一些代码,构建一些高级的机器学习模型,然后再称之为一天。

From understanding a business problem, collecting and visualizing data, to the stage of prototyping, fine-tuning and deploying models to real world applications, all these steps require teamwork, communication and storytelling skills to work with team members, manage expectation with stakeholders and ultimately to drive business decisions and actions.

从了解业务问题,收集和可视化数据到原型设计,微调和将模型部署到现实世界应用程序的阶段,所有这些步骤都需要团队合作,沟通和讲故事的技巧,才能与团队成员一起工作,与利益相关者一起管理期望并最终推动业务决策和行动。

There is a famous quote:

有句名言:

“ Without data you’re just another person with an opinion ”

“没有数据,您就是另一个有意见的人”

— W. Edwards Deming

—爱德华兹·戴明(W. Edwards Deming)

To me, getting data is only the first step. What’s more important is how you can use data to drive business decisions and actions to make a real impact. Here is a slightly modified quote from me:

对我来说,获取数据只是第一步。 更重要的是如何使用数据来推动业务决策和行动以产生真正的影响。 这是我的引用语:

“ Without storytelling skills you’re just another person with data ”

“没有讲故事的技巧,您就是另一个拥有数据的人”

You can perform the best data analytics in the world.

您可以执行世界上最好的数据分析。

You can build the best machine learning model in the world.

您可以构建世界上最好的机器学习模型。

You can also write the cleanest code in the world.

您还可以编写世界上最干净的代码。

But if you can’t use your results to drive business decisions and actions to convince people to use what you’ve got, your results would only be residing in your PowerPoint slides without having any real impact.

但是,如果您不能使用结果来推动业务决策和采取行动来说服人们使用您所拥有的功能,那么结果将只会驻留在PowerPoint幻灯片中而不会产生任何实际影响。

Sad, but true.

伤心,但真实。

4.可解释的模型很重要。 (4. Interpretable Models Matter, A Lot.)

For most businesses — unless you’re working at some cutting-edge technology companies — fancy or complex models typically are not the first choice for analytics or predictions.

对于大多数企业而言-除非您在某些尖端科技公司工作-否则,花哨或复杂的模型通常不是分析或预测的首选。

Your boss and stakeholders want to understand what’s going on behind your results.

您的老板和利益相关者希望了解结果背后的情况。

Therefore, you need to be able to explain what’s going on behind your results.

因此,您需要能够解释结果背后的原因。

For instance, what caused this anomaly to be detected? And why is that so? Does it make sense in the business context? Why is the prediction the way it is? What are the contributing factors to the prediction? Are our assumptions correct?

例如,什么原因导致此异常被检测到? 为什么会这样呢? 在商业环境中有意义吗? 为什么预测是这样? 预测的影响因素是什么? 我们的假设正确吗?

From all those questions asked above, it essentially boils down to one simple question:

从以上所有这些问题中,它基本上可以归结为一个简单的问题:

“ What’s the pattern observed behind? ”

“观察到的模式是什么? ”

Being able to understand what’s going on behind our models and results is crucial to drive business decisions by convincing stakeholders to take actions.

通过说服利益相关者采取行动,能够了解我们的模型和结果背后发生的事情,对于推动业务决策至关重要。

Huge enterprises simply can’t afford to deploy a blackbox model in the real world and let it run wild on the ground without understanding how it works or when it fails.

巨大的企业根本无力在现实世界中部署黑盒模型,而让它在不了解其工作原理或失效时间的情况下在野外疯狂运行。

And this is exactly why we’re still seeing simple models are still being utilized in the current industry like decision trees and logistic regression models.

这就是为什么我们仍然看到诸如决策树和逻辑回归模型之类的简单模型在当前行业中仍在使用的原因。

5.总是看到大图景 (5. Always See The Big Picture)

(Source)(资源)

I made this huge mistake when I was first starting out in data science.

当我刚开始从事数据科学时,我犯了一个巨大的错误。

I focused too much on code and errors but somehow lost sight of the big picture that was truly important — end-to-end pipeline integration in production and how the solution performed in real world.

我过多地专注于代码和错误,但是却以某种方式忽略了真正重要的全局- 生产中的端到端管道集成以及解决方案在现实世界中的执行情况

In other words, I was too fixated with the technical part to the extent of over-optimizing my code and models without having a real impact in the overall project or business.

换句话说,我过于专注于技术部分,以至于过度优化了我的代码和模型,而对整个项目或业务没有真正的影响。

Unfortunately, I learned this the hard way.

不幸的是,我很难学到这一点。

Fortunately, I’m currently using what I’ve learned to always remind myself to see the big picture.

幸运的是,我目前正在使用自己学到的知识来提醒自己看大图。

Hopefully, you’ll begin to realize the importance of seeing the big picture in your day-to-day work as a data scientist.

希望您会开始意识到在作为数据科学家的日常工作中看到全局的重要性。

And the first step to do this is to first understand the business domain and the problems that you’re solving.

第一步是首先了解业务领域和您要解决的问题。

Be clear of what you or your team aims to achieve in a project and understand how your role could be a part of the big picture and how different small pieces of picture can work together as a whole for the common goals.

清楚您或您的团队在项目中要实现的目标,并了解您的角色如何成为整体的一部分,以及不同的小片段如何共同为共同的目标而协同工作。

最后的想法 (Final Thoughts)

(Source)(资源)

Thank you for reading.

感谢您的阅读。

My data science journey definitely has been a tough one, but I enjoyed the ride and learned a lot along the way.

我的数据科学之旅当然是艰难的,但是我很喜欢这次旅程,并且在此过程中学到了很多东西。

And I’m still learning each and every day.

而且我仍在每天学习。

I hope you found this article helpful to you in some ways and will apply the lessons here in your work as a data scientist.

我希望您发现本文在某些方面对您有所帮助,并将本文中的课程应用于您作为数据科学家的工作。

Now that I’ve moved to become a data science instructor, you’d also expect more data science content from me in future to help you learn and get into this field.

既然我已经成为一名数据科学讲师,那么您也希望以后我会提供更多的数据科学内容,以帮助您学习和进入这一领域。

Check out my other articles if you want to learn more about data science.

如果您想了解有关数据科学的更多信息,请查看我的其他文章 。

If you’re interested in learning how to go into data science, feel free to check out this article — How To Go Into Data Science — where I compiled and answered a list of common questions (or challenges) faced by beginners in data science with guidance.

如果您有兴趣学习如何进入数据科学领域,请随时阅读本文— 如何进入数据科学领域。 在这里,我整理并回答了数据科学初学者在指导下遇到的常见问题(或挑战)列表。

I hope you enjoyed reading this article and I look forward to having you as part of the data science community.

希望您喜欢阅读本文,并希望您成为数据科学界的一员。

Remember, keep learning and never stop improving.

记住,继续学习,永远不要停止改进。

As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post!

数据结构两个月学完_这是我作为数据科学家两年来所学到的相关推荐

  1. 推荐阅读:《我在赶集网的两个月(完整版)》

    引子: 很好的一个流水帐,很好的一个实习生案例,很好的一个职场现身说法,很好的用数据说话的实战例子,很好的鲶鱼!请仔细阅读,尤其是你们中刚刚踏入职场没几年的年轻人,看看一个大三的北邮学生是怎么震了赶集 ...

  2. 转载:我在赶集网的两个月 (完整版)

    引子: 很好的一个流水帐,很好的一个实习生案例,很好的一个职场现身说法,很好的用数据说话的实战例子,很好的鲶鱼!请仔细阅读,尤其是你们中刚刚踏入职场没几年的年轻人,看看一个大三的北邮学生是怎么震了赶集 ...

  3. 我在赶集网的两个月(完整版),互联网营销

    引子: 很好的一个流水帐,很好的一个实习生案例,很好的一个职场现身说法,很好的用数据说话的实战例子,很好的鲶鱼!请仔细阅读,尤其是你们中刚刚踏入职场没几年的年轻人,看看一个大三的北邮学生是怎么震了赶集 ...

  4. Kaggle官网免费课程:从Python到机器学习,4小时学完一门,48小时掌握数据科学...

    点击我爱计算机视觉标星,更快获取CVML新技术 赖可 发自 凹非寺 量子位 报道 | 公众号 QbitAI 听说过Kaggle官网的免费"微课"吗? 想学Python .机器学习. ...

  5. java两个字符串 相隔天数_关于Java: Joda-Time时间中两个日期之间的天数

    我如何找到两个joda time DateTime实例之间的天数差异?如果开始时间是星期一,结束时间是星期二,那么不管开始和结束日期的小时/分钟/秒是多少,返回值都应该是1. 如果从晚上开始到早上结束 ...

  6. tcpdump抓两个网卡的包_如何抓取网络包?两个方法告诉你

    本文转载自[微信公众号:手机电脑双黑客,ID:heikestudio],经微信公众号授权转载,如需转载与原文作者联系 世界那么大,谢谢你来看我!!关注我你就是个网络.电脑.手机小达人 显形" ...

  7. 通才与专家_那么您准备聘请数据科学家了吗? 通才还是专家?

    通才与专家 Throughout my 10-year career, I have seen people often spend their time and energy in passiona ...

  8. 袋装决策树_袋装树是每个数据科学家需要的机器学习算法

    袋装决策树 袋装树木介绍 (Introduction to Bagged Trees) Without diving into the specifics just yet, it's importa ...

  9. 线性判别用于提取词向量_你是合格的数据科学家吗?30道题测试你的NLP水平

    近日,analyticsvidhya 上出现了一篇题为<30 Questions to test a data scientist on Natural Language Processing ...

最新文章

  1. java.lang.ClassNotFoundException: org.springframework.web.context.ContextLoaderL
  2. web 中防止sql注入
  3. python跟java 效率_Python与Java:哪个更好,如何选择?
  4. 转:ESRI矢量数据格式简介
  5. 提防iostream使用中的一个“陷阱”
  6. RHEL5系列之三:GNOME桌面的简单管理应用(1)
  7. xml解析 只有节点属性 android,and android:解析xml,一个节点标签中,有多个属性,怎样解析?...
  8. shell--变量的替换
  9. 开源数据库Neo4j获8000万美元E轮融资,One Peak Partners、摩根士丹利领投
  10. Graphics在java的哪个包_如何在Java中成功扩展Graphics
  11. 复旦大学网络认证linux,复旦大学校园网有线上网认证流程
  12. 升腾主机装linux,升腾终端安装说明
  13. EI 和 SCI 检索号查询
  14. 高斯消元法求解线性方程组
  15. 电瓶车续航测试软件,【图】2019年新能源车测试盘点:续航能耗篇_汽车之家
  16. Reinforcement Learning——Chapter 2 Multi-armed Bandits
  17. Android 定时获取上下行流量数据
  18. C# 零基础入门知识点汇总
  19. Hyperf 热更新Watcher
  20. THUWC2019 游记

热门文章

  1. 「网络流24题」试题库问题
  2. vue-cli搭建项目的目录结构及说明
  3. 基于ssm框架和freemarker的商品销售系统
  4. wget命令下载文件
  5. C# mysql 插入数据,中文乱码
  6. [摘抄]软件设计模式的几个原则
  7. Spring基于状态机squirrel-foundation简单使用
  8. ES6模块与commonJS模块的差异
  9. POJ 3608 旋转卡壳
  10. P1401 城市(30分,正解网络流)