意见 (Opinion)

动机 (Motivation)

Like most people, the first tool I used when started learning data science is Jupyter Notebook. Most of the online data science courses use Jupyter Notebook as a medium to teach. This makes sense because it is easier for beginners to start writing code in Jupyter Notebook’s cells than writing a script with classes and functions.

与大多数人一样,我开始学习数据科学时使用的第一个工具是Jupyter Notebook。 大多数在线数据科学课程都使用Jupyter Notebook作为教学手段。 这是有道理的,因为对于初学者来说,在Jupyter Notebook的单元格中开始编写代码比编写具有类和函数的脚本要容易得多。

Another reason why Jupyter Notebook is such a common tool in data science is that Jupyter Notebook makes it easy to explore and plot the data. When we type ‘Shift + Enter’, we will immediately see the results of the code, which makes it easy for us to identify whether our code works or not.

Jupyter Notebook之所以成为数据科学中如此普遍的工具的另一个原因是,Jupyter Notebook使其易于浏览和绘制数据。 当我们键入“ Shift + Enter”时,我们将立即看到代码的结果,这使我们很容易确定我们的代码是否有效。

However, I realized several fallbacks of Jupyter Notebook as I work with more data science projects:

但是,当我处理更多数据科学项目时,我意识到了Jupyter Notebook的一些后备功能:

  • Unorganized: As my code gets bigger, it becomes increasingly difficult for me to keep track of what I write. No matter how many markdowns I use to separate the notebook into different sections, the disconnected cells make it difficult for me to concentrate on what the code does.

    杂乱无章 :随着我的代码变得越来越大,对我而言,跟踪自己的编写变得越来越困难。 无论我使用多少次降价将笔记本分成不同的部分,断开的单元格都使我难以集中精力执行代码。

  • Difficult to experiment: You may want to test with different methods of processing your data, choose different parameters for your machine learning algorithm to see if the accuracy increases. But every time you experiment with new methods, you need to rerun the entire notebook. This is time-consuming, especially when the processing procedure or the training takes a long time to run.

    难以实验:您 可能想用不同的数据处理方法进行测试,为机器学习算法选择不同的参数以查看准确性是否提高。 但是,每次尝试新方法时,都需要重新运行整个笔记本。 这非常耗时,尤其是在处理过程或培训需要很长时间才能运行时。

  • Not ideal for reproducibility: If you want to use new data with a slightly different structure, it would be difficult to identify the source of error in your notebook.

    对于重现性而言并不理想:如果要使用结构略有不同的新数据,则很难在笔记本中识别错误源。

  • Difficult to debug: When you get an error in your code, it is difficult to know whether the reason for the error is the code or the change in data. If the error is in the code, which part of the code is causing the problem?

    难以调试:当您得到 代码中的错误,很难知道错误的原因是代码还是数据更改。 如果错误出在代码中,则代码的哪一部分导致了问题?

  • Not ideal for production: Jupyter Notebook does not play very well with other tools. It is not easy to run the code from Jupyter Notebook while using other tools.

    对于生产而言并不理想: Jupyter Notebook在与其他工具配合使用时效果不佳。 使用其他工具时,从Jupyter Notebook运行代码并不容易。

I knew there must be a better way to handle my code so I decided to give scripts a try. These are the benefits I found when using scripts:

我知道必须有一种更好的方式来处理我的代码,所以我决定尝试一下脚本。 这些是我在使用脚本时发现的好处:

有组织的 (Organized)

The cells in Jupyter Notebook make it difficult to organize the code into different parts. With a script, we could create several small functions with each function specifies what the code does like this

Jupyter Notebook中的单元格使得很难将代码组织成不同的部分。 使用脚本,我们可以创建几个小函数,每个函数指定代码的功能,如下所示

Better yet, if these functions could be categorized in the same category such as functions to process the data, we could put them in the same class!

更好的是,如果可以将这些函数归为同一类,例如处理数据的函数,我们可以将它们归为同一类!

Whenever we want to process our data, we know the functions in the class Preprocess can be used for this purpose.

每当我们要处理数据时,我们都知道Preprocess类中的函数可用于此目的。

鼓励实验 (Encourage Experiment)

When we want to experiment with a different approach to preprocess data, we could just add or remove a function by commenting out like this without being afraid to break the code! Even if we happen to break the code, we know exactly where to fix it.

当我们想尝试另一种预处理数据的方法时,我们可以通过注释掉这样的方式来添加或删除函数,而不必担心破坏代码! 即使我们碰巧破坏了代码,我们也知道在哪里修复它。

We could also experiment with different parameters by changing the input of the functions. For example, if we want to see how different methods of resampling my Pandas series affect my results, we could just switch from method_of_resample='sum’ to method_of_resample= 'average'. How neat!

我们还可以通过更改函数的输入来试验不同的参数。 例如,如果要查看对熊猫系列进行重采样的不同方法如何影响我的结果,可以将其从method_of_resample='sum'切换到method_of_resample= 'average' 。 多么整洁!

You can still use functions in a notebook, but when your number of functions gets really big, you might want to split the functions in different notebooks. Importing functions across different notebook is not easy.

您仍然可以在笔记本中使用功能,但是当功能数量真的很大时,您可能希望将功能拆分到不同的笔记本中。 跨不同笔记本导入功能并不容易。

重现性的理想选择 (Ideal for Reproducibility)

With classes and functions, we could make the code general enough so that it will be able to work with other data.

使用类和函数,我们可以使代码足够通用,以便能够与其他数据一起使用。

For example, if we want to drop different columns in my new data, we just need to change columns_to_drop to a list of columns, we want to drop and the code will run smoothly!

例如,如果我们想在新数据中删除不同的列,我们只需要将columns_to_drop更改为列的列表,我们就可以删除并且代码将平稳运行!

columns_to_drop = config.columns.to_dropdatetime_column = config.columns.datetime.sentimentdropna_columns = config.columns.drop_naprocessor = Preprocess(columns_to_drop, datetime_column, dropna_columns)

I can also create a pipeline that specifies steps to process and train the data! Once I have a pipeline, all I need to do is to use

我还可以创建一个管道来指定处理和训练数据的步骤! 一旦有了管道,我要做的就是使用

pipline.fit_transform(data)

to apply the same processing to both the train and test data.

对火车和测试数据进行相同的处理。

易于调试 (Easy to Debug)

With functions, it is easier to test whether that function produces the output we expect. We can quickly spot out where in the code we should change to produce the output we want

使用函数,可以更轻松地测试该函数是否产生我们期望的输出。 我们可以快速找出应该在代码中更改的位置以产生所需的输出

def extract_date_hour_minute(string: str):'''Extract data hour and minute from datetime string'''try:return string[:16]except TypeError:return np.nandef test_extract_date_hour_minute():'''Test whether the function extract date, hour, and minute '''string = '2020-07-30T23:25:31.036+03:00'assert extract_date_hour_minute(string) == '2020-07-30T23:25'

If all of the tests pass but there is still an error in running our code, we know the data is where we should look next.

如果所有测试都通过了,但是在运行我们的代码时仍然存在错误,那么我们知道数据是我们下一步应该去的地方。

For example, after passing the test above, I still have a TypeError when running the script, which gives me the idea that my data has null values. I just need to take care of that to run the code smoothly.

例如,通过上述测试后,运行脚本时我仍然遇到TypeError,这使我想到了我的数据具有空值。 我只需要注意这一点即可顺利运行代码。

生产的理想选择 (Ideal for Production)

We can use different functions in multiple scripts on top of something else like this

我们可以在类似这样的其他东西的多个脚本中使用不同的功能

from preprocess import preprocess
from model import run_model
from predict import predictdef main(config):df = preprocess(config)df = run_model(config)df, df_scale, min_day, max_day, accuracy = predict(df, config)

or to add a config file to control the values of the variables. This prevents us from wasting time tracking down a specific variable in the code just to change its value.

或添加配置文件以控制变量的值。 这样可以避免我们浪费时间跟踪代码中的特定变量以更改其值。

columns:to_drop:#- keywords#- entities- code- error- warningsbinary_columns: - sentiment - Diffdatetime:time: Date sentiment: crawleddrop_na: - sentiment- usage- crawled- emotionto_predict: sentiment

We could also easily add tools to track the experiment such as MLFlow or tools to handle configuration such as Hydra.cc!

我们还可以很容易地添加工具来跟踪实验,如MLFlow或工具来处理配置,如Hydra.cc !

我不喜欢使用Jupyter Notebook的想法,直到我将自己推出舒适区 (I didn’t like the Idea of Using Jupyter Notebook until I Pushed myself out of my Comfort Zone)

I used to use Jupyter Notebook all the time. When some data scientists advise me to switch from Jupyter Notebook to script to prevent some problems listed above, I didn’t understand and felt resistant to do so. I didn’t like the uncertainty of not being able to see the outcome when I run the cell.

我曾经一直使用Jupyter Notebook。 当一些数据科学家建议我从Jupyter Notebook切换到脚本以防止上面列出的某些问题时,我并不理解,并且对此感到抵触。 我不喜欢在运行单元时无法看到结果的不确定性。

But the disadvantage of Jupyter Notebook grew as I started my first real data science project in my new company so I decided to push myself out of my comfort zone and experiment with scripts.

但是Jupyter Notebook的劣势随着我在新公司中开始第一个真实数据科学项目而变得越来越严重,因此我决定将自己从舒适的领域中脱身出来,并尝试使用脚本。

In the beginning, I felt uncomfortable but started to notice the benefits of using scripts. I started to feel more organized when my code is organized into different functions, classes, and into multiple scripts with each script serving different purposes such as preprocessing, training, and testing.

一开始,我感到不舒服,但是开始注意到使用脚本的好处。 当我的代码被组织成不同的函数,类和多个脚本,并且每个脚本具有不同的目的(例如预处理,培训和测试)时,我开始变得井井有条。

所以,您是否建议我停止使用Jupyter Notebook? (So are you Suggesting me to Stop Using Jupyter Notebook?)

Don’t get me wrong. I still use Jupyter Notebook if my code is small and if I don’t plan to put my code into production. I use Jupyter Notebook when I want to explore and visualize the data. I also use it to explain how to use some python libraries. For example, I write use mostly Jupyter Notebooks in this repository as the medium to explain the code mentioned in all of my articles.

不要误会我的意思。 如果我的代码很小并且我不打算将代码投入生产,我仍然会使用Jupyter Notebook。 当我想浏览和可视化数据时,我使用Jupyter Notebook。 我也用它来解释如何使用一些python库。 例如,我在这个存储库中主要使用Jupyter Notebooks作为媒介来解释我所有文章中提到的代码。

If you don’t feel comfortable with coding everything in scripts, you could use both scripts and Jupyter Notebook for different purposes. For example, you could create classes and functions in scripts then import them in the notebook so that the notebook is less messy.

如果您不满意用脚本编写所有代码,则可以将脚本和Jupyter Notebook都用于不同的目的。 例如,您可以在脚本中创建类和函数,然后将其导入笔记本中,以使笔记本不那么混乱。

Another alternative is to turn the notebook into the script after writing the notebook. I personally don't prefer this approach because it often takes me longer to organize the code in my notebook such as put them into functions and classes and write test functions.

另一种选择是在编写笔记本后将笔记本变成脚本。 我个人不喜欢这种方法,因为通常需要我花费更长的时间在笔记本中组织代码,例如将它们放入函数和类中以及编写测试函数。

I find writing a small function then writing a small test function is faster and safer. If I happen to want to speeds up my code with the new Python library, I could use the test function I already wrote to make sure it still works as I expected.

我发现编写一个小的函数然后编写一个小的测试函数会更快,更安全。 如果我碰巧想用新的Python库加速代码,则可以使用已经编写的测试函数来确保它仍然可以按预期工作。

With that being said, I believe there are more ways to solve the disadvantage of Jupyter Notebook than what I mentioned here such as how Netflix uses put the notebook into production and schedule the notebook to run at a certain time.

话虽这么说,我相信比我在这里提到的解决Jupyter Notebook的缺点还有更多的方法,例如Netflix如何使用Netflix将笔记本电脑投入生产并安排笔记本电脑在特定时间运行 。

结论 (Conclusion)

Everybody has their own way to make their workflow more efficient and to me, it is to leverage the utility of scripts. If you have just switched from Jupyter Notebook to script, it might not be intuitive to write code in scripts, but trust me, you will get used to using scripts eventually.

每个人都有自己的方法来提高工作流程的效率,对我来说,这是利用脚本的实用程序。 如果您刚刚从Jupyter Notebook切换到脚本,那么用脚本编写代码可能并不直观,但是请相信我,您最终将习惯于使用脚本。

Once that happens, you will start to realize many benefits of the scripts over the messy Jupyter Notebook and want to write most of your code in scripts.

一旦发生这种情况,相对于凌乱的Jupyter Notebook,您将开始意识到脚本的许多优点,并希望将大多数代码编写在脚本中。

If you don’t feel comfortable with the big change, start small.

如果您对较大的变化不满意,请从小处着手。

Big changes start with small steps

大变化始于小步

I like to write about basic data science concepts and play with different algorithms and data science tools. You could connect with me on LinkedIn and Twitter.

我喜欢写有关基本数据科学概念的文章,并喜欢使用不同的算法和数据科学工具。 您可以在LinkedIn和Twitter上与我联系。

Star this repo if you want to check out the codes for all of the articles I have written. Follow me on Medium to stay informed with my latest data science articles like these

如果您想查看我编写的所有文章的代码,请给此回购加注星号。 在Medium上关注我,以了解有关这些最新数据科学文章的最新信息

翻译自: https://towardsdatascience.com/5-reasons-why-you-should-switch-from-jupyter-notebook-to-scripts-cb3535ba9c95


http://www.taodudu.cc/news/show-997337.html

相关文章:

  • ip登录打印机怎么打印_不要打印,登录。
  • 机器学习模型 非线性模型_调试机器学习模型的终极指南
  • 您的第一个简单的机器学习项目
  • 鸽子为什么喜欢盘旋_如何为鸽子回避系统设置数据收集
  • 追求卓越追求完美规范学习_追求新的黄金比例
  • 周末想找个地方敲代码_观看我们的代码游戏,全周末直播
  • javascript 开发_25个新JavaScript开发人员的免费资源
  • 感谢您的提问_感谢您的反馈,我们正在改进的5种方法
  • 堆叠自编码器中的微调解释_25种深刻漫画中的编码解释
  • Free Code Camp现在有本地组
  • 递归javascript_JavaScript中的递归
  • 判断一个指针有没有free_Free Code Camp的每个人现在都有一个档案袋
  • 使您的Java代码闻起来很新鲜
  • Stack Overflow 2016年对50,000名开发人员进行的调查得出的见解
  • 编程程序的名称要记住吗_学习编程时要记住的5件事
  • 如何在开源社区贡献代码_如何在15分钟内从浏览器获得您的第一个开源贡献
  • utf-8转换gbk代码_将代码转换为现金-如何以Web开发人员的身份赚钱并讲述故事。...
  • 有没有编码的知识图谱_没有人告诉您关于学习编码的知识-以及为什么如此困难...
  • 你鼓舞了我是世界杯主题曲吗_选择方法和鼓舞人心的网站列表
  • reddit_我在3天内疯狂地审查了Reddit上的50个投资组合,从中学到了什么。
  • 使用Express和MongoDB构建简单的CRUD应用程序
  • 在JavaScript中反转字符串的三种方法
  • 为什么要free释放内存_为什么在Free Code Camp上列出一份工作要花1,000美元?
  • 个税10% 人群_人群管理如何使我们的搜索质量提高27%
  • ae制作数据可视化_我如何精心制作真正可怕的数据可视化
  • web项目开发人员配比_我如何找到Web开发人员的第一份工作
  • nodejs命令行执行程序_在NodeJS中编写命令行应用程序
  • 谈论源码_5,000名开发人员谈论他们的薪水
  • gulp 和npm_为什么我离开Gulp和Grunt去看npm脚本
  • 致力微商_致力于自己。 致力于公益组织。

从Jupyter Notebook切换到脚本的5个理由相关推荐

  1. 编写一个watchdog.sh脚本_五大原因!为何要将Jupyter Notebook转换为python脚本?

    全文共3360字,预计学习时长9分钟 图源:unsplash 大多数数据科学在线课程都把Jupyter Notebook作为教学媒介,这是因为初学者在Jupyter Notebook的单元格中编写代码 ...

  2. 五大原因!为何要将Jupyter Notebook转换为python脚本?

    全文共3360字,预计学习时长9分钟 图源:unsplash 大多数数据科学在线课程都把Jupyter Notebook作为教学媒介,这是因为初学者在Jupyter Notebook的单元格中编写代码 ...

  3. windows10上为jupyter notebook切换指定conda环境

    windows10上为jupyter notebook切换指定conda环境 1.当前状态 2. 制作kernel环境 3.加载环境 4.验证是否成功 1.当前状态 2. 制作kernel环境 查看已 ...

  4. jupyter notebook切换到其他配置好的conda虚拟环境

    1 手把手教你如何把jupyter notebook切换到其他配置好的conda虚拟环境 https://blog.csdn.net/weixin_41813895/article/details/8 ...

  5. Jupyter lab add kernel Python+Julia+R 【jupyter Notebook 切换Python环境】and【在jupyter Notebook中安装第三方库】

    新增虚拟环境 conda create -n py2 python=2.7 进入python2的环境 conda activate py2 安装python2的内核并应用 python2 -m pip ...

  6. jupyter notebook切换conda环境时相关报错

    jupyter 切换conda 环境时候出现报找不到DLL模块的错误: 当在网上搜索了很多方法仍然不能解决jupyter notebook切换conda环境报错问题. 总结网上解决方法 添加conda ...

  7. Jupyter Notebook切换conda虚拟环境

    Jupyter Notebook本身是默认使用一种Anaconda中root目录下的Python环境的,如果想使用其它的虚拟环境,还需要通过插件来实现,也就是nb_conda插件. 一.安装插件 通过 ...

  8. Jupyter notebook切换虚拟环境报错DLL load failed python.exe 无法找到入口

    一.报错问题 我原来安装的Anaconda3,自带python3.7和Jupyter-notebook,安装了tf2.0环境 后来我在tf2.0环境的基础上新建了一个虚拟环境tf_1,安装了tf.14 ...

  9. Jupyter Notebook切换python运行环境

    #待解决的问题 通常我们下载Anaconda后运行Jupyter Notebook后都是默认基于base的python环境,可是随着后期的使用,可能会创捷一些其他的python环境,那怎么在Jupyt ...

最新文章

  1. AD rodc扩展报错
  2. 两个EdgeNode夹一个Node
  3. 华为电话面试题java_华为java面试题(含电话面试)
  4. 如何利用永洪自服务数据集,构建强大的数据处理能力?
  5. QT设计UI:QT模式对话框打开文件
  6. vim编辑器常用命令总结
  7. ROS官网新手级教程总结
  8. 1078. Bigram 分词
  9. Oracle数据库执行Sql脚本
  10. python100例详解-【学习笔记】python100例
  11. IBatis的resultMap使用
  12. 微软CEO鲍尔默称洽购雅虎已结束
  13. Windows bat命令解压缩文件360zip
  14. 如何优雅地制作精排 ePub —— 个人电子书制作规范及基本样式表
  15. Web信息收集,互联网上的裸奔者
  16. php随机生成微信昵称(二)
  17. delphi7中的局部变量和全局变量
  18. Mac mini 2018 win10 外接显卡终极教程
  19. mysql ndb同步_MySQL NDB Cluster 7.5.16 部署OGG同步
  20. Tensorflow之搭建神经网络八股

热门文章

  1. Django框架是什麼?
  2. PAT乙级 1003. 我要通过!
  3. Pile 0009: Vim命令梳理
  4. 【Hibernate3.3复习知识点二】 - 配置hibernate环境(annotations)
  5. UVA 10404 - Bachet's Game
  6. 防止SQL SERVER的事件探查器跟踪软件
  7. unity5.4.3p2里面的AssetBundle打包流程
  8. Symbol MC1000 扫描 冲突问题 把下面文件做成scanwedge.reg的注册表文件,放在Application重起
  9. Android复制assets目录下的图片到内存
  10. HIVE-分桶表的详解和创建实例