熊猫分发_熊猫cut()函数示例
熊猫分发
1.熊猫cut()函数 (1. Pandas cut() Function)
Pandas cut() function is used to segregate array elements into separate bins. The cut() function works only on one-dimensional array-like objects.
Pandas cut()函数用于将数组元素分离到单独的bin中。 cut()函数仅适用于一维类似数组的对象。
2.熊猫cut()函数的用法 (2. Usage of Pandas cut() Function)
The cut() function is useful when we have a large number of scalar data and we want to perform some statistical analysis on it.
当我们有大量标量数据并且想要对其进行一些统计分析时,cut()函数很有用。
For example, let’s say we have an array of numbers between 1 and 20. We want to divide them into two bins of (1, 10] and (10, 20] and add labels such as “Lows” and “Highs”. We can easily perform this using the pandas cut() function.
例如,假设我们有一个1到20之间的数字数组。我们想将它们分为(1,10]和(10,20]的两个bin,并添加标签,例如“ Lows”和“ Highs”。可以使用pandas cut()函数轻松地执行此操作。
Furthermore, we can perform functions on the elements of a specific bin and label elements.
此外,我们可以对特定bin元素和label元素执行功能。
3. Pandas cut()函数语法 (3. Pandas cut() function syntax)
The cut() function sytax is:
cut()函数的语法为:
cut(x,bins,right=True,labels=None,retbins=False,precision=3,include_lowest=False,duplicates="raise",
)
- x is the input array to be binned. It must be one-dimensional.x是要合并的输入数组。 它必须是一维的。
- bins defines the bin edges for the segmentation.bin定义用于分割的bin边缘。
- right indicates whether to include the rightmost edge or not, default value is True.right表示是否包括最右边,默认值为True。
- labels is used to specify the labels for the returned bins.标签用于为返回的垃圾箱指定标签。
- retbins specifies whether to return the bins or not.retbins指定是否返回垃圾箱。
- precision specifies the precision at which to store and display the bins labels.precision指定存储和显示垃圾箱标签的精度。
- include_lowest specifies whether the first interval should be left-inclusive or not.include_lowest指定第一个间隔是否应为左包含。
- duplicates speicifies what to do if the bins edges are not unique, whether to raise ValueError or drop non-uniques.重复项专门说明如果垃圾箱边缘不唯一时该怎么做,是引发ValueError还是丢弃非唯一变量。
4. Pandas cut()函数示例 (4. Pandas cut() function examples)
Let’s look into some examples of pandas cut() function. I will use NumPy to generate random numbers to populate the DataFrame
object.
让我们看一下pandas cut()函数的一些示例。 我将使用NumPy生成随机数来填充DataFrame
对象。
4.1)将段号划分为垃圾箱 (4.1) Segment Numbers into Bins)
import pandas as pd
import numpy as npdf_nums = pd.DataFrame({'num': np.random.randint(1, 100, 10)})
print(df_nums)df_nums['num_bins'] = pd.cut(x=df_nums['num'], bins=[1, 25, 50, 75, 100])
print(df_nums)print(df_nums['num_bins'].unique())
Output:
输出:
num
0 80
1 40
2 25
3 9
4 66
5 13
6 63
7 33
8 20
9 60num num_bins
0 80 (75, 100]
1 40 (25, 50]
2 25 (1, 25]
3 9 (1, 25]
4 66 (50, 75]
5 13 (1, 25]
6 63 (50, 75]
7 33 (25, 50]
8 20 (1, 25]
9 60 (50, 75][(75, 100], (25, 50], (1, 25], (50, 75]]
Categories (4, interval[int64]): [(1, 25] < (25, 50] < (50, 75] < (75, 100]]
Notice that 25 is part of the bin (1, 25]. It’s because the rightmost edge is included by default. If you don’t want that then pass the right=False
parameter to the cut() function.
注意25是bin(1,25]的一部分。这是因为默认情况下包括了最右边。如果您不希望这样做,则将right=False
参数传递给cut()函数。
4.2)将标签添加到垃圾箱 (4.2) Adding Labels to Bins)
import pandas as pd
import numpy as npdf_nums = pd.DataFrame({'num': np.random.randint(1, 20, 10)})
print(df_nums)df_nums['nums_labels'] = pd.cut(x=df_nums['num'], bins=[1, 10, 20], labels=['Lows', 'Highs'], right=False)print(df_nums)print(df_nums['nums_labels'].unique())
Since we want 10 to be part of Highs, we are specifying right=False in the cut() function call.
由于我们希望10成为高点的一部分,因此我们在cut()函数调用中指定right = False 。
Output:
输出:
num
0 5
1 16
2 6
3 13
4 2
5 10
6 18
7 10
8 2
9 18num nums_labels
0 5 Lows
1 16 Highs
2 6 Lows
3 13 Highs
4 2 Lows
5 10 Highs
6 18 Highs
7 10 Highs
8 2 Lows
9 18 Highs[Lows, Highs]
Categories (2, object): [Lows < Highs]
5.参考 (5. References)
- pandas cut() API Doc熊猫cut()API文档
- Python Pandas TutorialPython熊猫教程
翻译自: https://www.journaldev.com/33394/pandas-cut-function-examples
熊猫分发
熊猫分发_熊猫cut()函数示例相关推荐
- 熊猫分发_熊猫新手:第一部分
熊猫分发 For those just starting out in data science, the Python programming language is a pre-requisite ...
- 熊猫分发_熊猫新手:第二部分
熊猫分发 This article is a continuation of a previous article which kick-started the journey to learning ...
- 熊猫分发_熊猫实用指南
熊猫分发 什么是熊猫? (What is Pandas?) Pandas is an open-source data analysis and manipulation tool for Pytho ...
- 熊猫分发_熊猫重命名列和索引
熊猫分发 Sometimes we want to rename columns and indexes in the Pandas DataFrame object. We can use pand ...
- 熊猫分发_熊猫下降列和行
熊猫分发 1. Pandas drop()函数语法 (1. Pandas drop() Function Syntax) Pandas DataFrame drop() function allows ...
- 熊猫分发_与熊猫度假
熊猫分发 While working on a project recently, I had to work with time series data spread over a year. I ...
- 熊猫数据集_熊猫迈向数据科学的第一步
熊猫数据集 I started learning Data Science like everyone else by creating my first model using some machi ...
- 熊猫分发_实用熊猫指南
熊猫分发 Pandas is a very powerful and versatile Python data analysis library that expedites the data an ...
- 熊猫分发_流利的熊猫
熊猫分发 Let's uncover the practical details of Pandas' Series, DataFrame, and Panel 让我们揭露Pandas系列,DataF ...
最新文章
- 【linux】Valgrind工具集详解(五):命令行详解
- 当量子计算和机器学习相遇,会碰撞出什么火花?
- java 折线动图_在java中使用jfree图表制作动态折线图
- 生鲜水产品牌“仙泉湖”获和智4000万元投资
- java 获取400的错误信息_获取400错误的请求Spring RestTemplate POST
- iOS之深入探究多线程实现、线程安全和线程死锁
- 央行变相降准祭出又一新手段 引发同业套利之忧
- doubango简介
- zabbix mysql pgsql_Zabbix 5.0 监控 PostgreSQL 数据库
- 路径读取os.path.abspath、os.path.dirname、os.path.basename、os.path.split
- UCC国际洗衣:风雨26年,铸就干洗行业实力品牌
- 动态设置div的高度_DIV块和文字水平垂直居中,点击弹出文字提示
- web开发实现火星坐标、百度坐标、WGS84坐标互相转换
- java 文章目录递归(一级标题,二级标题)
- Appium连接手机
- r语言 新增一列数字类型_R语言实战(2)——创建数据集【学习分享】
- adb模拟三指划动,GKUI19+WHUD,全新智能三屏交互体验
- 从实验室跃进产业,腾讯AI是如何向to B进化的?
- 个人能力知识体系如何构建?
- Android开发学习总结——搭建最新版本的Android开发环境