顶级数据恢复

Data science is the discipline of making data useful

数据科学是使数据有用的学科

When we talk about the top programming language for Data Science, we often find Python to be the best fit for the topic. Sure, Python is undoubtedly an excellent choice for a vast majority of Data Science-centric tasks, but there’s another programming language that was built specifically to provide superior number-crunching capabilities for Data Science, and that is R.

当我们谈论数据科学的顶级编程语言时 ,我们经常发现Python最适合该主题。 当然,对于绝大多数以数据科学为中心的任务,Python无疑是一个不错的选择,但是还有另一种专门为数据科学提供出色的数字运算功能的编程语言, 那就是R。

In addition to providing robust statistical computing, R offers a huge collection, over 16 thousand to be exact, of highly resourceful libraries, catering to the needs of Data Scientists, Data Miners, and Statisticians alike. Further, in this article, we will shed some light on a handful of top R libraries for Data Science.

除了提供强大的统计计算功能外,R还提供了大量的资源丰富的库 (准确地说是一万六千多个),可以满足数据科学家,数据挖掘者和统计学家的需求。 此外,在本文中,我们将阐明一些用于数据科学的顶级R库。

最佳R数据科学图书馆 (Best R Libraries for Data Science)

R is extremely popular among Data Miners and Statisticians, and part of the reason is the extensive range of libraries that comes with R. These tools and functions can simplify statistical tasks to a great extent, making tasks such as data manipulation, visualization, web crawling, Machine Learning and more, a breeze. Some of the libraries have been briefly explained below:

R在数据挖掘者和统计学家中非常受欢迎,部分原因是R附带的大量库 。这些工具和功能可以在很大程度上简化统计任务 ,从而完成诸如数据操作,可视化,Web爬网等任务,机器学习等等,轻而易举。 下面简要说明了一些库:

1. dplyr (1. dplyr)

The dplyr package, also known as the grammar of data manipulation, essentially provides frequently used tools and functions for data manipulation, that includes the following functions:

dplyr软件包 (也称为数据操作语法)本质上提供了用于数据操作的常用工具和功能 ,其中包括以下功能:

  • filter(): for filtering your data based on the criteria

    filter():用于根据条件过滤数据

  • mutate(): to add new variables which will act as functions of existing variables

    mutate():添加将充当现有变量功能的新变量

  • select(): for selecting variables based on the names

    select():根据名称选择变量

  • summarise(): helps summarise the data from multiple values

    summarise():有助于汇总来自多个值的数据

  • arrange(): for rearranging the ordering of the rows

    range():用于重新排列行的顺序

  • Additionally, you can use the group_by() function, which can return the results grouped according to the requirements. If you’re keen on checking out the dplyr package, you can either get it from the tidyverse or install the package directly with the command “install.packages(“dplyr”).

    此外,您可以使用group_by()函数,函数可以返回根据要求分组的结果。 如果您热衷于签出dplyr软件包,则可以从tidyverse获取它。 或使用命令“ install.packages(“ dplyr”)”直接安装软件包

2.提迪尔 (2. tidyr)

tidyr is one of the core packages in the Tidyverse ecosystem, and as the name suggests, it is used to tidy up messy data. Now, if you’re wondering what tidy data is, let me clear it for you. A tidy data indicates that every column is variable, each row is an observation, and each cell is a singular value.

tidyrTidyverse 生态系统的核心软件包之一,顾名思义,它用于整理凌乱的数据 。 现在,如果您想知道什么是整洁的数据,请让我为您清除。 整洁的数据表示每一列都是变量,每一行都是观察值,每个单元格都是一个奇异值。

According to tidyr, tidy data is a way of storing the data that is to be used throughout the tidyverse and can help you save time and be more productive with your analysis. You can get the package from tidyverse or by the following command “install.packages(“tidyr”)”.

根据tidyr的说法,整齐的数据是一种存储将在整个tidyverse中使用的数据的方式,它可以帮助您节省时间并提高分析效率。 您可以从tidyverse或通过以下命令“ install.packages(“ tidyr”)”获取软件包

3. ggplot2 (3. ggplot2)

ggplot2 is among the top R libraries for data visualization and is actively being used by thousands of users around the world to create compelling charts, graphs, and plots. The reason behind this popularity is ggplot2 was created to simplify the visualization process by taking minimal input from the developer, such as the data to visualize, the style, and the primitives to use while leaving the rest onto the library.

ggplot2用于数据可视化的顶级R库之一 ,世界各地成千上万的用户积极使用ggplot2创建引人注目的图表,图形和绘图 。 之所以如此受欢迎,是因为创建了ggplot2来简化可视化过程,方法是从开发人员获取最少的输入,例如要可视化的数据,样式和要使用的基元,而将其余的保留在库中。

The result is a graph that effortlessly presents complex statistics for instant visualizations. If you’re looking to add more customizability to your charts, you can use IDEs like RStudio for more granular control. You can get your hands on ggplot2 via the tidyverse collection or by using the standalone library via the command “install.packages(“ggplot2”)”.

结果是一个图形,该图形毫不费力地呈现了复杂的统计数据,以实现即时可视化。 如果您想为图表添加更多可定制性,则可以使用以下IDE: RStudio提供更精细的控制。 您可以通过tidyverse集合或使用独立库(通过命令“ install.packages(“ ggplot2”))使用ggplot2。

Read this R documentation to know about ggplot2 functions-

阅读此R文档以了解ggplot2函数-

4.润滑 (4. lubridate)

R is an excellent programming language for Data Science, but there are certain areas where R may feel incomplete. One such area is the handling of date and time. For anyone extensively working with date and time in R, may find it’s built-in capabilities cumbersome.

R是Data Science的出色编程语言,但在某些方面R可能感觉不完整。 这样的领域之一就是日期和时间的处理。 对于在R中广泛使用日期和时间的人,可能会发现它的内置功能很麻烦。

To overcome this, we have a handy package called lubridate. The package not only handles the standard date and time in R, but also offers additional enhancements such as time periods, daylight savings times, leap days, supports various time zones, fast time parsing, and many helper functions. Should your project require you to work with time and date, you can get the lubridate package from tidyverse or install just the package with “install.packages(“lubridate”)” command.

为了克服这个问题,我们有一个名为lubridate的便捷软件包 该软件包不仅可以处理R中的标准日期和时间,而且还提供其他增强功能,例如时间段,夏令时,leap日,支持各种时区,快速时间解析以及许多辅助功能。 如果您的项目要求您使用时间和日期,则可以从tidyverse获取lubridate软件包。 或者使用“ install.packages(“ lubridate”)”命令仅安装软件包

Read the documentation here:

在此处阅读文档:

5.格子 (5. lattice)

lattice is another elegant yet powerful data visualization library focussed on multivariate data. What makes this library special, is that apart from handling the regular visualizations, lattice also comes prepared with support for nonstandard situations and requirements. Due to being the practical implementation of Trellis graphics for R, it allows you to create Trellis graphs and even offers options to tune the graphs according to your requirements. lattice comes with R by default, but there’s an advanced version of lattice called latticeExtra, which might come in handy in case you want to extend the core features provided by the lattice.

是另一个优雅而强大的数据可视化库,它专注于多元数据。 这个库之所以与众不同,是因为除了处理常规的可视化之外 ,格网还准备了对非标准情况和要求的支持。 由于是R的Trellis图形的实际实现,因此它允许您创建Trellis图形 ,甚至提供根据您的要求调整图形的选项。 默认情况下,R附带有lattice,但是有一个高级版本的网格称为gridExtra ,如果您想扩展该网格提供的核心功能,可能会派上用场。

6.毫升 (6. mlr)

The Machine Learning in R(mlr), is a library that was released in 2013 and was updated to mlr3 with newer techniques, a better architecture, and core design in 2019. As of now, the library provides a framework to address several classifications, regression, support vector machines, and many other Machine Learning activities.

R(mlr)中机器学习(Machine Learning in R(mlr))是一个库,于2013年发布,并于2019年通过更新的技术,更好的体系结构和核心设计更新为mlr3 。 到目前为止,该库提供了一个框架,用于处理几种分类,回归,支持向量机以及许多其他机器学习活动。

mlr3 is targeted towards Machine Learning practitioners and researchers to facilitate the benchmarking and deployment of various Machine Learning algorithms without much hassle. For those looking to extend and even combine the existing learners and fine-tune the best technique for a task, will find mlr3 to be a perfect option. mlr3 can be installed using the command “install.packages(“mlr3”)”.

mlr3面向机器学习从业者和研究人员,旨在帮助轻松地对各种机器学习算法进行基准测试和部署。 对于那些希望扩展甚至结合现有学习者并微调最佳技术来完成某项任务的人来说,mlr3是理想的选择。 可以使用命令“ install.packages(“ mlr3”)”安装mlr3。

The wide range of functions are mentioned here —

这里提到了广泛的功能-

7. 插入号 (7. caret)

Short for Classification And REgression Training, the caret library provides several functions to optimize the process of model training for tricky regression and classification problems. caret comes with several additional tools and functions for tasks like data splitting, variable importance estimation, feature selection, pre-processing, and many more. With caret, you can also measure the performance of the models, and even fine-tune the model behavior by using various parameters like tuneLength or tuneGrid according to your requirements. The package itself is easy to use and only loads the necessary components as it goes. The library can be installed with the command “install.packages(“caret”)”.

插入式 分类和回归训练缩写 该库提供了一些功能来优化棘手的回归和分类问题的模型训练过程。 插入符还提供了一些其他工具和功能来执行任务,例如数据拆分,变量重要性估计,功能选择,预处理等等。 使用插入符号,您还可以测量模型的性能,甚至根据需要使用各种参数(如tuneLength或tuneGrid)来微调模型行为。 程序包本身易于使用,并且仅在运行时加载必要的组件。 可以使用命令“ install.packages(“ caret”)”安装该库

8. 随从 (8. esquisse)

esquisse is not a library per se, but an addin for the powerful data visualization library ggplot2. You might be wondering why would you need this with ggplot2, let me clear it for you. ggplot2 is already smart enough, but if you need an additional layer of intuitiveness for your visualizations, esquisse is the right way to go. esquisse allows you to simply drag and drop the required data, choose the desired customization options, and there you have it, a tailored plot built within a short period and ready to export to your application of choice. With esquisse, you can create visualizations such as bar plots, histograms, scatter plots, sf objects. You can add esquisse to your environment using “install.packages(“esquisse”)”.

esquisse本身并不是一个库,而是强大的数据可视化库ggplot2的插件。 您可能想知道为什么ggplot2需要它,让我为您清除它。 ggplot2已经足够聪明了,但是如果您需要可视化的附加直观性,那么使用esquisse是正确的方法。 esquisse允许您简单地拖放所需的数据,选择所需的自定义选项,就可以在短时间内构建定制的绘图,并准备将其导出到所选的应用程序中。 使用esquisse,您可以创建可视化效果,例如条形图,直方图,散点图,sf对象 。 您可以使用“ install.packages(“ esquisse”)”将esquisse添加到您的环境中

9. 有光泽 (9. shiny)

shiny is a web application framework from RStudio that allows the developers to create interactive web applications using R with minimal web development background. With shiny, you can build web pages, interactive visualizations, dashboards, and even embed widgets on R documents. shiny can also be easily extended with CSS themes, JavaScript actions, and htmlwidgets for added customization. It comes with a host of attractive built-in widgets for presenting plots, tables, and output of R objects, and whatever you code in shiny goes live the same instant, eliminating those annoying frequent page refreshes. If you’re sold on the features and want to give it a shot, you can get shiny using the command “install.packages(“shiny”)”.

ShinyRStudio的Web应用程序框架,允许开发人员使用R在最小的Web开发背景下创建交互式Web应用程序。 有了光泽,您可以构建网页,交互式可视化效果,仪表板,甚至将小部件嵌入 R文档中。 还可使用CSS主题,JavaScript操作和htmlwidget轻松扩展Shiny,以添加自定义功能。 它带有许多吸引人的内置小部件,用于显示R对象的图,表和输出,无论您用闪亮的代码进行编码,都可以在同一瞬间生效,从而消除了那些烦人的频繁页面刷新。 如果您已购买这些功能部件并想试一试,则可以使用“ install.packages(“ shiny”)”命令获得光泽

10. 爬行者 (10. Rcrawler)

If you’re looking for a tool to scrape data off websites and that too in an understandable format, look no further, Rcrawler is the right option for you. With Rcrawler’s powerful web crawling, data scraping, and data mining capabilities, you can not only crawl through websites and scrape data, but also analyze the network structure of any website, including its internal and external hyperlinks. In case you’re wondering why not use rvest, the Rcrawler package is a step up from rvest as it goes through all the pages on a website and extracts the data, which can be extremely helpful while trying to gather all the information from one source and in one go. The package can be installed with the command “install.packages(“Rcrawler”)”.

如果您正在寻找一种可以从网站抓取数据的工具,并且格式也是可以理解的, 那就别无所求Rcrawler是您的正确选择。 借助Rcrawler强大的Web爬网,数据抓取和数据挖掘功能 ,您不仅可以爬网网站并抓取数据,还可以分析任何网站的网络结构,包括其内部和外部超链接。 如果您想知道为什么不使用rvest ,那么Rcrawler程序包会比rvest更高,因为它会遍历网站上的所有页面并提取数据,这在尝试从一个来源收集所有信息时非常有帮助一口气。 可以使用命令“ install.packages(“ Rcrawler”)”安装该软件包

11. DT (11. DT)

The DT package acts as a wrapper of the JavaScript library called DataTables, for R. DT allows you to transform the data in your R matrix into an interactive table on your HTML page, which facilitates easy searching, sorting, and filtering of data. The package works by letting the main function i.e, the datatable() function, create an HTML widget for the R objects. DT allows further fine-tuning via the “options” arguments and even some additional customizability to your tables, all of this without going deep into the coding. The DT package can be installed using the command “install.packages(“DT”)”.

DT包充当JavaScript库DataTables的包装,用于R。DT允许您将R矩阵中的数据转换为HTML页面上的交互式表,从而方便了数据的搜索,排序和过滤。 该包通过让主要功能(即datatable()函数)为R对象创建HTML小部件来工作。 DT允许通过“选项”参数进行进一步的微调,甚至可以对表进行一些其他自定义,而所有这些都无需深入编码。 可以使用命令“ install.packages(“ DT”)”安装DT软件包。

12. 密谋 (12. plotly)

If you want to create interactive visualizations that steal the show, plotly would be perfect for you. With Plotly, you can create stunning, publication-worthy visualizations from a diverse collection of charts and graphs, such as scatter and line plots, bar charts, pie charts, histograms, heatmaps, contour plots, time series, you name it and plotly can make it. Built on top of the plotly.js library, plotly visualizations can also be displayed in web applications via Dash, in Jupyter Notebooks, or saved as HTML files. If you’re interested in trying out the package, you can install it using the command “install.packages(“plotly”)”.

如果您想创建可以窃取节目的交互式可视化效果,那么对于您而言, plotly非常适合。 使用Plotly,您可以从各种图表和图形中创建令人惊叹的,值得发布的可视化效果,例如散点图和折线图,条形图,饼图,直方图,热图,等高线图,时间序列 ,您可以为其命名并进行绘图做了。 构建在plotly.js库的顶部,绘制可视化效果还可以通过Dash在Jupyter Notebooks中显示在Web应用程序中,或另存为HTML文件。 如果您想试用该软件包,可以使用命令“ install.packages(“ plotly”)”进行安装。

其他值得R库- (Other Worth R Libraries —)

  • BioConductor
    生物导体
  • Knitr
    针织衫
  • Janitor
    看门人
  • randomForest
    randomForest
  • e1071
    e1071
  • stringr
    纵梁
  • data.table
    数据表
  • RMarkdown
    RMarkdown
  • Rvest
    Rvest

结论 (Conclusion)

Throughout this article, we covered some of the top R libraries covering common Data Science tasks, such as visualization, grammar, Machine Learning model training, and optimization. We know that this is not an extensive list and by no means covers the entirety of the vast ecosystem of libraries R has. CRAN, the repository for all things R, has thousands of equally capable and resourceful libraries for your specific needs with detailed information and documentation, should you ever need to find a library, we highly recommend you give CRAN a shot.

在本文中,我们涵盖了一些顶级R库,这些库涵盖了常见的数据科学任务,例如可视化,语法,机器学习模型训练和优化。 我们知道这不是一个广泛的清单,并且绝不涵盖R拥有的巨大的图书馆生态系统。 CRAN是所有R的存储库,拥有成千上万个功能相同且资源丰富的库,可满足您的特定需求,并提供详细的信息和文档,如果您需要查找库,我们强烈建议您尝试一下CRAN。

Note: To eliminate problems of different kinds, I want to alert you to the fact this article represents just my personal opinion I want to share, and you possess every right to disagree with it. If I’ve missed out any important library then do let me know in the comments section.

注意: 为消除各种问题,我想提醒您以下事实,即本文仅代表我要分享的个人观点,您拥有与此不同意的一切权利。 如果我错过了任何重要的库,请在评论部分让我知道。

更有趣的读物— (More Interesting Readings —)

I hope you’ve found this article useful! Below are some interesting readings hope you would like them too —

希望本文对您有所帮助! 以下是一些有趣的读物,希望您也喜欢它们—

About Author

关于作者

Claire D. is a Content Crafter and Marketer at Digitalogy a tech sourcing and custom matchmaking marketplace that connects people with pre-screened & top-notch developers and designers based on their specific needs across the globe. Connect with Digitalogy on Linkedin, Twitter, Instagram.

克莱尔·D Digitalogy 的Content Crafter and Marketinger ,这 是一个技术采购和自定义配对市场,可根据人们在全球的特定需求,将他们与预先筛选和一流的开发商和设计师联系起来。 Linkedin Twitter Instagram Digitalogy联系

翻译自: https://towardsdatascience.com/top-r-libraries-for-data-science-29b4e9f4907c

顶级数据恢复

http://www.taodudu.cc/news/show-997587.html

相关文章:

  • 大数据 notebook_Dockerless Notebook:数据科学期待已久的未来
  • 微软大数据_我对Microsoft的数据科学采访
  • 如何击败腾讯_击败股市
  • 如何将Jupyter Notebook连接到远程Spark集群并每天运行Spark作业?
  • twitter 数据集处理_Twitter数据清理和数据科学预处理
  • 使用管道符组合使用命令_如何使用管道的魔力
  • 2020年十大币预测_2020年十大商业智能工具
  • 为什么我们需要使用Pandas新字符串Dtype代替文本数据对象
  • nlp构建_使用NLP构建自杀性推文分类器
  • 时间序列分析 lstm_LSTM —时间序列分析
  • 泰晤士报下载_《泰晤士报》和《星期日泰晤士报》新闻编辑室中具有指标的冒险活动-第1部分:问题
  • 异常检测机器学习_使用机器学习检测异常
  • 特征工程tf-idf_特征工程-保留和删除的内容
  • 自我价值感缺失的表现_不同类型的缺失价值观和应对方法
  • 学习sql注入:猜测数据库_面向数据科学家SQL:学习简单方法
  • python自动化数据报告_如何:使用Python将实时数据自动化到您的网站
  • 学习深度学习需要哪些知识_您想了解的有关深度学习的所有知识
  • 置信区间估计 预测区间估计_估计,预测和预测
  • 地图 c-suite_C-Suite的模型
  • sap中泰国有预扣税设置吗_泰国餐厅密度细分:带有K-means聚类的python
  • 傅里叶变换 直观_A / B测试的直观模拟
  • 鸽子 迷信_人工智能如何帮助我战胜鸽子
  • scikit keras_Scikit学习,TensorFlow,PyTorch,Keras…但是天秤座呢?
  • 数据结构两个月学完_这是我作为数据科学家两年来所学到的
  • 迈向数据科学的第一步:在Python中支持向量回归
  • 使用Python和MetaTrader在5分钟内开始构建您的交易策略
  • ipywidgets_未来价值和Ipywidgets
  • 用folium模块画地理图_使用Folium表示您的地理空间数据
  • python创建类统计属性_轻松创建统计数据的Python包
  • knn分类 knn_关于KNN的快速小课程

顶级数据恢复_顶级R数据科学图书馆相关推荐

  1. python的顶级库_三大用于数据科学的顶级Python库

    Python有许多吸引力,如效率,代码可读性和速度,使其成为数据科学爱好者的首选编程语言.Python通常是希望升级其应用程序功能的数据科学家和机器学习专家的首选. 由于其广泛的用途,Python拥有 ...

  2. 深度学习数据更换背景_开始学习数据科学的最佳方法是了解其背景

    深度学习数据更换背景 数据科学教育 (DATA SCIENCE EDUCATION) 目录 (Table of Contents) The Importance of Context Knowledg ...

  3. 熊猫数据集_熊猫迈向数据科学的第一步

    熊猫数据集 I started learning Data Science like everyone else by creating my first model using some machi ...

  4. 数据科学项目_完整的数据科学组合项目

    数据科学项目 In this article, I would like to showcase what might be my simplest data science project ever ...

  5. 深度学习数据自动编码器_如何学习数据科学编码

    深度学习数据自动编码器 意见 (Opinion) When I first wanted to learn programming, I coded along to a 4 hour long Yo ...

  6. 用《R数据科学》学习一套数据处理语法

    这套语法就叫 tidyverse,先用一套小抄 Cheat Sheet 来镇贴. 抛开社区讲语言都是耍流氓,比如说 Python 可以克隆 ggplot2 包,语法几乎一样,用起来不会有太大的差别,但 ...

  7. 大数据数据量估算_如何估算数据科学项目的数据收集成本

    大数据数据量估算 (Notes: All opinions are my own) (注:所有观点均为我自己) 介绍 (Introduction) Data collection is the ini ...

  8. 微观计量经济学_微观经济学与数据科学

    微观计量经济学 什么是经济学和微观经济学? (What are Economics and Microeconomics?) Economics is a social science concern ...

  9. 《R数据科学》学习笔记|Note5:使用dplyr进行数据转换(下)

    点击蓝字 关注我! 写在前面 本系列为<R数据科学>(R for Data Science)的学习笔记.相较于其他R语言教程来说,本书一个很大的优势就是直接从实用的R包出发,来熟悉R及数据 ...

最新文章

  1. 服务器智能监控软件,监控 监控系统 消防智能监控 智能监控软件
  2. C++下简单的socket编程
  3. 配置ntp时间服务器
  4. [转]5分钟实现Android中更换头像功能
  5. linux运维云计算课程学习,Linux云计算面试时遇到的问题
  6. 推荐!手把手教你使用Git(转)
  7. fastjson safemode_Fastjson远程代码执行漏洞安全通告
  8. Java语言基础 ——注释
  9. 怎么确定服务器是否支持ipmi,如何获取服务器的IPMI地址?
  10. FlashFXP V3.3.9(真正破解) 绿色版
  11. 几种常见的7号电池的容量
  12. 2020-03-02
  13. 饥荒联机版服务器控制台本地和在线,《饥荒》多人联机版控制台开启方法详解...
  14. 【蓝桥杯】——备战冲刺最后两周
  15. webassembly介绍
  16. 华为手机打开图片很慢是怎么回事_华为手机打开应用很慢怎么办
  17. 【VS开发】免费打工仔:一个完善的ActiveX Web控件教程
  18. wtc java 代码 tpcall(servicename_[转载]Dorado+Spring+Wtc+Tuxedo开发
  19. vlookup使用步骤_Excel Vlookup函数的使用方法及实例图解
  20. Active X控件在IE上自动下载并注册

热门文章

  1. html知识笔记(三)——img标签、form表单
  2. 一个电脑的重装到java开发环境安装配置的全过程
  3. Python 列表List的定义及操作
  4. Vue项目中遇到了大文件分片上传的问题
  5. 作为微软技术.net 3.5的三大核心技术之一的WCF虽然没有WPF美丽的外观
  6. Java入门系列-22-IO流
  7. linux程序莫名异常怎么查
  8. WPF:从WPF Diagram Designer Part 4学习分组、对齐、排序、序列化和常用功能
  9. ? SegmentFault Hackathon 文艺复兴上海站作品集 - 获奖篇
  10. 【机器学习实战】极大似然法