七个鲜为人知的搜索网站

Pandas being the most widely used data analysis and manipulation library provides numerous functions and methods to work with data. Some of them are used more frequently than others because of the tasks they perform.

熊猫是使用最广泛的数据分析和处理库，它提供了许多处理数据的功能和方法。由于它们执行的任务，它们中的一些比其他使用更频繁。

In this post, we will cover 4 pandas operations that are less frequently used but still very functional.

在本文中，我们将介绍4种不常用的熊猫操作，但它们仍然非常有用。

Let’s start with importing NumPy and Pandas.

让我们从导入NumPy和Pandas开始。

import numpy as npimport pandas as pd

1.分解 (1. Factorize)

It provides a simple way to encode categorical variables which is a required task in most machine learning techniques.

它提供了一种编码分类变量的简单方法，这是大多数机器学习技术中必需的任务。

Here is a categorical variable from a customer churn dataset.

这是来自客户流失数据集的分类变量。

df = pd.read_csv('/content/Churn_Modelling.csv')df['Geography'].value_counts()France     5014 Germany    2509 Spain      2477 Name: Geography, dtype: int64

We can encode the categories (i.e. convert to numbers) with just one line of code.

我们可以只用一行代码对类别进行编码(即转换为数字)。

df['Geography'], unique_values = pd.factorize(df['Geography'])

The factorize function returns the converted values along with an index of categories.

factorize函数返回转换后的值以及类别索引。

df['Geography'].value_counts()0    5014 2    2509 1    2477 Name: Geography, dtype: int64unique_valuesIndex(['France', 'Spain', 'Germany'], dtype='object')

If there are missing values in the original data, you can specify a value to be used for them. The default value is -1.

如果原始数据中缺少值，则可以指定要用于它们的值。默认值为-1。

A = ['a','b','a','c','b', np.nan]A, unique_values = pd.factorize(A)array([ 0,  1,  0,  2,  1, -1])A = ['a','b','a','c','b', np.nan]A, unique_values = pd.factorize(A, na_sentinel=99)array([ 0,  1,  0,  2,  1, 99])

2.分类 (2. Categorical)

It can be used to create a categorical variable.

它可用于创建分类变量。

A = pd.Categorical(['a','c','b','a','c'])

The categories attribute is used to access the categories:

Categories属性用于访问类别：

A.categoriesIndex(['a', 'b', 'c'], dtype='object')

We can only assign new values from one of the existing categories. Otherwise, we will get a value error.

我们只能从现有类别之一分配新值。否则，我们将获得值错误。

A[0] = 'd'

We can also specify the data type using the dtype parameter. The default is the CategoricalDtype which is actually the best one use because of memory consumption.

我们还可以使用dtype参数指定数据类型。默认值为CategoricalDtype，实际上这是最好的一种用法，因为它会消耗内存。

Let’s do an example to compare memory usage.

让我们做一个比较内存使用情况的例子。

This is the memory usage in bytes for each column.

这是每列的内存使用量(以字节为单位)。

countries = pd.Categorical(df['Geography'])df['Geography'] = countries

The memory usage is 8 times less than the original feature. The amount of memory saved will further increase on larger datasets especially when we have very few categories.

内存使用量比原始功能少8倍。在较大的数据集上，保存的内存量将进一步增加，尤其是在类别很少的情况下。

3.间隔 (3. Interval)

It returns an immutable object representing an interval.

它返回一个代表间隔的不可变对象。

iv = pd.Interval(left=1, right=5, closed='both')3 in ivTrue5 in ivTrue

The closed parameter indicates if the bounds are included. The values it takes are “both”, “left”, “right”, and “neither”. The default value is “right”.

close参数指示是否包括边界。它采用的值是“ both”，“ left”，“ right”和“ noth”。默认值为“ right”。

iv = pd.Interval(left=1, right=5, closed='neither')5 in ivFalse

The interval comes in handy when we are working with date-time data. We can easily check if the dates are in a specified interval.

当我们使用日期时间数据时，该间隔会很方便。我们可以轻松地检查日期是否在指定的间隔内。

date_iv = pd.Interval(left = pd.Timestamp('2019-10-02'),                       right = pd.Timestamp('2019-11-08'))date = pd.Timestamp('2019-10-10')date in date_ivTrue

4.宽到长 (4. Wide_to_long)

Melt converts wide dataframes to long ones. This task can also be done with the melt function. Wide_to_long offers a less flexible but more user-friendly way.

Melt将宽数据帧转换为长数据帧。该任务也可以通过熔化功能来完成。 Wide_to_long提供了一种不太灵活但更加用户友好的方式。

Consider the following sample dataframe.

考虑以下示例数据帧。

It contains different scores for some people. We want to modify (or reshape) this dataframe in a way that the score types are represented in a row (not as a separate column). For instance, there are 3 score types under A (A1, A2, A3). After we convert the dataframe, there will only be on column (A) and types (1,2,3) will be represented with row values.

它对某些人包含不同的分数。我们希望以分数类型在一行中(而不是在单独的列中)表示的方式修改(或重塑)此数据框。例如，A下有3种得分类型(A1，A2，A3)。转换数据框后，将仅在(A)列上，并且类型(1,2,3)将用行值表示。

pd.wide_to_long(df, stubnames=['A','B'], i='names', j='score_type')

The stubnames parameter indicates the names of the new columns that will contain the values. The column names in the wide-format need to start with the stubnames. The “i” parameter is the column to be used as the id variable and the ‘j’ parameter is the name of the column that contains subcategories.

stubnames参数指示将包含值的新列的名称。宽格式的列名称必须以存根名称开头。 “ i”参数是用作id变量的列，“ j”参数是包含子类别的列的名称。

The returned dataframe has a multi-level index but we can convert it to a normal index by applying the reset_index function.

返回的数据帧具有多级索引，但是我们可以通过应用reset_index函数将其转换为普通索引。

pd.wide_to_long(df, stubnames=['A','B'], i='names', j='score_type').reset_index()

Pandas owes its success and predominance in the field of data science and machine learning to the variety and flexibility of the functions and methods. Some methods perform basic tasks whereas there are also detailed and more specific ones.

熊猫公司在数据科学和机器学习领域的成功和优势归功于功能和方法的多样性和灵活性。一些方法执行基本任务，但也有详细且更具体的方法。

There are usually multiple ways to do a task with Pandas which makes it easily fit specific tasks well.

通常，有多种方法可以对Pandas执行任务，这使其很容易适应特定任务。

Thank you for reading. Please let me know if you have any feedback.

感谢您的阅读。如果您有任何反馈意见，请告诉我。

翻译自: https://towardsdatascience.com/4-less-known-yet-very-functional-pandas-operations-46dcf2bd9688

七个鲜为人知的搜索网站

查看全文

http://www.taodudu.cc/news/show-3113665.html

七个鲜为人知的搜索网站_19个鲜为人知的编程神话
Windows上鲜为人知的三款黑马软件，款款深入人心
世界上鲜为人知的100件事
单个路由器设置计算机无线网络,无线路由器无线MAC地址过滤设置方法(指定电脑上网)...
wireshark抓取常用报文协议过滤法则大全
wireshark过滤语法总结
wireshark出现rst的原因_Wireshark过滤器的使用
MacVim配置参考
iptables及其过滤规则
布隆过滤器(BloomFilter)原理实现和性能测试
wireshark常用过滤条件
idea mac 查询方法被调用_IntelliJ IDEA For Mac 快捷键
Windows DHCP Server基于MAC地址过滤客户端请求实现IP地址的分配
wireshark过滤器使用
mac idea 快捷键设置
TPLINK路由无线MAC地址过滤设置
STM32F429 以太网MAC滤波应用说明
Mac idea快捷键
omnipeek抓包（确定设备AP模式下的MAC地址+过滤）
包过滤防火墙
思科交换机配置单播MAC地址过滤
Mac 按键标识
过滤文件内容（windows和Mac及ubuntu）
STM32 F7的MAC层过滤使用+实例代码
关闭无限局域网配置服务器,无线局域网无线控制器MAC地址过滤配置实例-Cisco.PDF...
如何绕过mac地址过滤_终极MAC地址过滤~到底要不要死心？
用Java写一个最简单的图形用户界面
用户界面设计应该用那些软件？
uniapp用户界面模板示例
C++图形用户界面开发框架Qt 6.x入门级教程 - 开发工具简介

七个鲜为人知的搜索网站_4个鲜为人知但功能强大的熊猫行动相关推荐

七个鲜为人知的搜索网站_19个鲜为人知的编程神话
七个鲜为人知的搜索网站您可能会认为,基于逻辑和学习的职业将不受民间传说的影响,但是开发人员社区仍然充满神话-神话如此普遍,它们开始体现出来. 如果所有编程神话都是正确的,那么编程世界将看起来像一群2 ...
一个手机用c网可以打开网站切换到g网就打不开_推荐7个鲜为人知的搜索网站，让人眼前一亮...
1:奇异书屋 https://www.talebook.org/ 奇异书屋是一个简洁清新并且不可不多的高质量电子书籍搜索网站. 它提供将近 2W 本电子书籍,并且按照作者和标签详细分类,其中书籍标签包 ...
网站统计源码，功能强大显示访问设备等
源码简介: 这款源码非常适合建站使用,功能强大安装的时候是英文的需要转换一下语言,需要申请sll证书更多功能大家自己研究吧源码截图: 源码下载:https://pan.baidu.com/s/1 ...
油猴脚本第一家，网页网盘链接实时判断+资源搜索网站导航，资源重度患者的福利...
现在网络上找资源,资源都是存在百度网盘的,大家都知道,百度网盘链接失效的非常之多.遇到网盘链接我们都要一个一个点进去查看链接是否失效,这样操作费时又累人.这时这个油猴脚本就可以帮忙了.实时判断网页中百 ...
推荐6个鲜为人知的强力网站，每一个都让你大开眼界
1:全国电视直播 http://bddn.cn/zb.htm 全国电视直播是最近发现的一个非常良心的电视直播网站:它支持几百个电视直播的网站,包括:港澳台卫视:网站界面简洁,没有任何广告,并且播放速度 ...
十个相似图片搜索网站（以图找图）
转载的文章.记不清出处了,还请见谅. --------------------------------------------------------------------------------- ...
七步教你制定网站SEO整体优化方案
不管是我们为自己的网站做SEO还是给自己公司的网站优化,又或者是给客户提供SEO服务,晓泉都希望大家能在SEO工作开始之前做好一份详细的SEO方案,不要怕费时间,只有有计划地去工作,才能让效率大大地提 ...
7个常用资源搜索网站推荐
说起搜索资源,大家肯定先想到百度,的确"度娘"很万能,能帮我们解决很多问题,但毕竟百度资源有限,用的人多了就造成重复的问题,接下来,小编给大家分享7个顶级资源搜索网站,能满足你很所 ...
历史上最全的中文博客搜索网站介绍
历史上最全的中文博客搜索网站介绍以下是2006年5月11日各博客搜索网站排名: (1)Bloaa http://bolaa.com ALEXA1172 不知道该不该把他也算作博客搜索,就算是吧,或者 ...

七个鲜为人知的搜索网站_4个鲜为人知但功能强大的熊猫行动

1.分解 (1. Factorize)

2.分类 (2. Categorical)

3.间隔 (3. Interval)

4.宽到长 (4. Wide_to_long)

相关文章：

七个鲜为人知的搜索网站_4个鲜为人知但功能强大的熊猫行动相关推荐

最新文章

热门文章