python去重计数_如何python快速实现数组的去重计数

问题描述

对一个list=['a', 'a', 'b', 'c', 'a']，计算出截止每个位置为止去重后的元素个数。即：位置=0时, list[0:1] = ['a']，元素去重后个数为1

位置=1时, list[0:2] = ['a', 'a']，元素去重后个数为1

位置=2时, list[0:3] = ['a', 'a', 'b']，元素去重后个数为2

位置=3时, list[0:4] = ['a', 'a', 'b', 'c']，元素去重后个数为3

位置=4时, list[0:5] = ['a', 'a', 'b', 'c', 'a']，元素去重后个数为3

如何快速输出一个list=[1,1,2,3,3]？当数据量小时可能怎么处理都ok，但如果数据量巨大，几种方法的效率就存在明显差别。

方法比较方法一

采用len(set(list))，去重后求长度，是最自然的想法，但是数据量大了以后效率比较底下。

import time

import pandas as pd

import numpy as np

def method1(list_value):

def _get_setlen(i, list_value):

return len(set(list_value[0: i]))

y = [_get_setlen(i, list_value) for i in range(len(list_value))]

return y方法二

梳理一下问题的逻辑，在每次计算当前位置数值时，检查当前位置的值是否在原有数据里出现过，能节省不少比较的成本。确实快了很多，但还是不够快。

def method2(list_value):

y = np.zeros(len(list_value))

y[0] = 1

for i in range(1, len(list_value)):

if list_value[i] in list_value[0: i]:

y[i] = y[i-1]

else:

y[i] = y[i-1] + 1

return y

方法三

想能不能借助numpy/pandas等性能较高的库，采用方法一的思路用value_counts去重。挨个去重的效率还是太低了。

def method3(list_value):

list_value = pd.Series(list_value)

def _get_setlen(i, list_value):

return len(list_value[0: i].value_counts())

y = [_get_setlen(i, list_value) for i in range(len(list_value))]

return y方法四

再次梳理一下问题的逻辑，我们可以先生成一个中间数组[1,0,1,1,0]，代表是否是某个元素首次出现的位置，再对这个数组累加即可。

def method4(list_value):

y = np.zeros(len(list_value))

duplicate_list = list(set(list_value))

_index = [list(list_value).index(i) for i in duplicate_list]

y[_index] = 1

y = y.cumsum()

return y

效率对比

if __name__ == "__main__":

list_value = np.random.randint(0, 20, 1000)

for method in [method1, method2, method3, method4]:

start_time = time.time()

result = method(list_value)

print('mothod:%s, time cost:%s' % (str(method), str(time.time() - start_time)))

输出：

mothod: , time cost: 0.04685497283935547

test time 0.006468057632446289

mothod: , time cost: 0.006494283676147461

test time 0.6175649166107178

mothod: , time cost: 0.6175949573516846

test time 0.0009291172027587891

mothod: , time cost: 0.0009829998016357422

采用内置的库，并且分析问题将问题进行抽象，能大大提升计算效率。

python去重计数_如何python快速实现数组的去重计数相关推荐

python 时间序列预测_使用Python进行动手时间序列预测
python 时间序列预测 Time series analysis is the endeavor of extracting meaningful summary and statistical ...
python 概率分布模型_使用python的概率模型进行公司估值
python 概率分布模型 Note from Towards Data Science's editors: While we allow independent authors to publis ...
python去重计数_用Python实现透视表的value_sum和countdistinct功能
在pandas库中实现Excel的数据透视表效果通常用的是df['a'].value_counts()这个函数,表示统计数据框(DataFrame) df的列a各个元素的出现次数:例如对于一个数据表如 ...
python去重计数_用Python做透视表之value_sum和value_countdistinct功能
在pandas库中实现Excel的数据透视表效果通常用的是df['a'].value_counts()这个函数,表示统计数据框(DataFrame) df的列a各个元素的出现次数:例如对于一个数据表如 ...
python对象引用计数器_在Python中借助计数器对象对项目进行计数
python对象引用计数器前提 (The Premise) When we deal with data containers, such as tuples and lists, in Pytho ...
python 几何计算_【理解黎曼几何】6. 曲率的计数与计算(Python)
曲率的独立分量# 黎曼曲率张量是一个非常重要的张量,当且仅当它全部分量为0时,空间才是平直的.它也出现在爱因斯坦的场方程中.总而言之,只要涉及到黎曼几何,黎曼曲率张量就必然是核心内容. 已经看到,黎曼 ...
python 重复图片_删除重复文件或图片（去重）的python代码
通过python爬虫或其他方式保存的图片文件通常包含一些重复的图片或文件, 通过下面的python代码可以将重复的文件删除以达到去重的目的.其中,文件目录结构如下图: # /usr/bin/env p ...
python去重算法_使用Python检测文章抄袭及去重算法原理解析
在互联网出现之前,"抄"很不方便,一是"源"少,而是发布渠道少:而在互联网出现之后,"抄"变得很简单,铺天盖地的"源"源 ...
python机器学习预测_使用Python和机器学习预测未来的股市趋势
python机器学习预测 Note from Towards Data Science's editors: While we allow independent authors to publish ...
python dry原则_关于Python 的这几个技巧，你应该知道
随着大数据时代的到来,我们每天都在接触爬虫相关的事情,这其中就不得不提及Python这门编程语言.我已经使用Python编程有多年了,即使今天我仍然惊奇于这种语言所能让代码表现出的整洁和对DRY编程原 ...

python去重计数_如何python快速实现数组的去重计数

python去重计数_如何python快速实现数组的去重计数相关推荐

最新文章

热门文章