Pandas中,使用reindex方法报错:index must be monotonic increasing or decreasing的分析

今天在用pandas的时候,写了这一段语句 nd = d.reindex(index = ni,columns = nc,method = 'bfill')

其中nc,ni和d

其他博客中的解决方案是,把method字段提出来单独调用,但没有说明为什么报错,于是自己推了一下。

怀疑columns = nc 和 index = ni不能同用,分别删掉


这样将范围限制在了columns和method上。

看一下源码,报错的是这里

def _searchsorted_monotonic(self, label, side: str_t = "left"):if self.is_monotonic_increasing:return self.searchsorted(label, side=side)elif self.is_monotonic_decreasing:# np.searchsorted expects ascending sort order, have to reverse# everything for it to work (element ordering, search side and# resulting value).pos = self[::-1].searchsorted(label, side="right" if side == "left" else "left")return len(self) - posraise ValueError("index must be monotonic increasing or decreasing")

注释写的挺清楚了,np.searchsorted需要的是升序排序,如果不是升序就要反转。如果既不是升序也不是降序就报错。

其中调用的searchsorted()

@doc(_shared_docs["searchsorted"], klass="Index")def searchsorted(self, value, side="left", sorter=None) -> np.ndarray:return algorithms.searchsorted(self._values, value, side=side, sorter=sorter)

使用了numpy.searchsorted()

往上找:

@finaldef _get_fill_indexer_searchsorted(self, target: Index, method: str_t, limit: int | None = None) -> np.ndarray:"""Fallback pad/backfill get_indexer that works for monotonic decreasingindexes and non-monotonic targets."""if limit is not None:raise ValueError(f"limit argument for {repr(method)} method only well-defined ""if index and target are monotonic")side = "left" if method == "pad" else "right"# find exact matches first (this simplifies the algorithm)indexer = self.get_indexer(target)nonexact = indexer == -1indexer[nonexact] = self._searchsorted_monotonic(target[nonexact], side)if side == "left":# searchsorted returns "indices into a sorted array such that,# if the corresponding elements in v were inserted before the# indices, the order of a would be preserved".# Thus, we need to subtract 1 to find values to the left.indexer[nonexact] -= 1# This also mapped not found values (values of 0 from# np.searchsorted) to -1, which conveniently is also our# sentinel for missing valueselse:# Mark indices to the right of the largest value as not foundindexer[indexer == len(self)] = -1return indexer

注释说这个方法用于处理pad/backfill的get_indexer

继续看下面的注释,下面说,当 side == "left" 的时候,searchsorted方法返回“对应于排序好的数组的下标,我们将v中的元素插入到这个下标前面,数组的有序性不变;因此需要-1来找到左值”;当np.searchsorted返回的值为0,也就是没有找到值的时候,赋值将会是-1,可以作为哨兵值。

前面我们知道side要么是"left",要么是"right",要么报错,而当是”right"时,就把最大值右边设为-1(not found)

再往上:

@finaldef _get_fill_indexer(self, target: Index, method: str_t, limit: int | None = None, tolerance=None) -> np.ndarray:target_values = target._get_engine_target()if self.is_monotonic_increasing and target.is_monotonic_increasing:engine_method = (self._engine.get_pad_indexerif method == "pad"else self._engine.get_backfill_indexer)indexer = engine_method(target_values, limit)else:indexer = self._get_fill_indexer_searchsorted(target, method, limit)if tolerance is not None and len(self):indexer = self._filter_indexer_tolerance(target_values, indexer, tolerance)return indexer

方法说当self是单调递增且target是单调递增的时候,就根据method值调用get_pad_indexer或者get_backfill_indexer;而当它不递增的时候,就要调用上面的_get_fill_indexer_searchsorted()

再往上找就清晰起来了:

def _get_indexer(self,target: Index,method: str_t | None = None,limit: int | None = None,tolerance=None,) -> np.ndarray:if tolerance is not None:tolerance = self._convert_tolerance(tolerance, target)if not is_dtype_equal(self.dtype, target.dtype):dtype = self._find_common_type_compat(target)this = self.astype(dtype, copy=False)target = target.astype(dtype, copy=False)return this.get_indexer(target, method=method, limit=limit, tolerance=tolerance)if method in ["pad", "backfill"]:indexer = self._get_fill_indexer(target, method, limit, tolerance)elif method == "nearest":indexer = self._get_nearest_indexer(target, limit, tolerance)else:indexer = self._engine.get_indexer(target._get_engine_target())return ensure_platform_int(indexer)

这一段处理公差,类型均一,调用我们方法的是对method的分支语句

继续往上找:

@Appender(_index_shared_docs["get_indexer"] % _index_doc_kwargs)@finaldef get_indexer(self,target,method: str_t | None = None,limit: int | None = None,tolerance=None,) -> np.ndarray:# returned ndarray is np.intpmethod = missing.clean_reindex_fill_method(method)target = self._maybe_cast_listlike_indexer(target)self._check_indexing_method(method, limit, tolerance)if not self._index_as_unique:raise InvalidIndexError(self._requires_unique_msg)if not self._should_compare(target) and not is_interval_dtype(self.dtype):# IntervalIndex get special treatment bc numeric scalars can be#  matched to Interval scalarsreturn self._get_indexer_non_comparable(target, method=method, unique=True)if is_categorical_dtype(self.dtype):# _maybe_cast_listlike_indexer ensures target has our dtype#  (could improve perf by doing _should_compare check earlier?)assert is_dtype_equal(self.dtype, target.dtype)indexer = self._engine.get_indexer(target.codes)if self.hasnans and target.hasnans:loc = self.get_loc(np.nan)mask = target.isna()indexer[mask] = locreturn indexerif is_categorical_dtype(target.dtype):# potential fastpath# get an indexer for unique categories then propagate to codes via take_nd# get_indexer instead of _get_indexer needed for MultiIndex cases#  e.g. test_append_different_columns_typescategories_indexer = self.get_indexer(target.categories)indexer = algos.take_nd(categories_indexer, target.codes, fill_value=-1)if (not self._is_multi and self.hasnans) and target.hasnans:# Exclude MultiIndex because hasnans raises NotImplementedError# we should only get here if we are unique, so loc is an integer# GH#41934loc = self.get_loc(np.nan)mask = target.isna()indexer[mask] = locreturn ensure_platform_int(indexer)pself, ptarget = self._maybe_promote(target)if pself is not self or ptarget is not target:return pself.get_indexer(ptarget, method=method, limit=limit, tolerance=tolerance)return self._get_indexer(target, method, limit, tolerance)

前面的处理其实都不用太在意,我们知道它会调用_get_indexer就可以

继续:

def reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> tuple[Index, np.ndarray | None]:"""Create index with target's values.Parameters----------target : an iterableReturns-------new_index : pd.IndexResulting index.indexer : np.ndarray[np.intp] or NoneIndices of output values in original index."""# GH6552: preserve names when reindexing to non-named target# (i.e. neither Index nor Series).preserve_names = not hasattr(target, "name")# GH7774: preserve dtype/tz if target is empty and not an Index.target = ensure_has_len(target)  # target may be an iteratorif not isinstance(target, Index) and len(target) == 0:target = self[:0]else:target = ensure_index(target)if level is not None:if method is not None:raise TypeError("Fill method not supported if level passed")_, indexer, _ = self._join_level(target, level, how="right")else:if self.equals(target):indexer = Noneelse:if self._index_as_unique:indexer = self.get_indexer(target, method=method, limit=limit, tolerance=tolerance)else:if method is not None or limit is not None:raise ValueError("cannot reindex a non-unique index ""with a method or limit")indexer, _ = self.get_indexer_non_unique(target)if preserve_names and target.nlevels == 1 and target.name != self.name:target = target.copy()target.name = self.namereturn target, indexer

看函数说明,使用target的值创建一个index,参数 就是可迭代的target,返回值是结果new_index和indexer(output values在index的索引)

这之后的两个:

def _reindex_columns(self,new_columns,method,copy: bool,level: Level,fill_value=None,limit=None,tolerance=None,):new_columns, indexer = self.columns.reindex(new_columns, method=method, level=level, limit=limit, tolerance=tolerance)return self._reindex_with_indexers({1: [new_columns, indexer]},copy=copy,fill_value=fill_value,allow_dups=False,)
def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy):frame = selfcolumns = axes["columns"]if columns is not None:frame = frame._reindex_columns(columns, method, copy, level, fill_value, limit, tolerance)index = axes["index"]if index is not None:frame = frame._reindex_index(index, method, copy, level, fill_value, limit, tolerance)return frame

最后回到reindex本体(很长的注释)

def reindex(self: FrameOrSeries, *args, **kwargs) -> FrameOrSeries:"""Conform {klass} to new index with optional filling logic.Places NA/NaN in locations having no value in the previous index. A new objectis produced unless the new index is equivalent to the current one and``copy=False``.Parameters----------{optional_labels}{axes} : array-like, optionalNew labels / index to conform to, should be specified usingkeywords. Preferably an Index object to avoid duplicating data.{optional_axis}method : {{None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}}Method to use for filling holes in reindexed DataFrame.Please note: this is only applicable to DataFrames/Series with amonotonically increasing/decreasing index.* None (default): don't fill gaps* pad / ffill: Propagate last valid observation forward to nextvalid.* backfill / bfill: Use next valid observation to fill gap.* nearest: Use nearest valid observations to fill gap.copy : bool, default TrueReturn a new object, even if the passed indexes are the same.level : int or nameBroadcast across a level, matching Index values on thepassed MultiIndex level.fill_value : scalar, default np.NaNValue to use for missing values. Defaults to NaN, but can be any"compatible" value.limit : int, default NoneMaximum number of consecutive elements to forward or backward fill.tolerance : optionalMaximum distance between original and new labels for inexactmatches. The values of the index at the matching locations mostsatisfy the equation ``abs(index[indexer] - target) <= tolerance``.Tolerance may be a scalar value, which applies the same toleranceto all values, or list-like, which applies variable tolerance perelement. List-like includes list, tuple, array, Series, and must bethe same size as the index and its dtype must exactly match theindex's type.Returns-------{klass} with changed index.See Also--------DataFrame.set_index : Set row labels.DataFrame.reset_index : Remove row labels or move them to new columns.DataFrame.reindex_like : Change to same indices as other DataFrame.Examples--------``DataFrame.reindex`` supports two calling conventions* ``(index=index_labels, columns=column_labels, ...)``* ``(labels, axis={{'index', 'columns'}}, ...)``We *highly* recommend using keyword arguments to clarify yourintent.Create a dataframe with some fictional data.>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']>>> df = pd.DataFrame({{'http_status': [200, 200, 404, 404, 301],...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]}},...                   index=index)>>> dfhttp_status  response_timeFirefox            200           0.04Chrome             200           0.02Safari             404           0.07IE10               404           0.08Konqueror          301           1.00Create a new index and reindex the dataframe. By defaultvalues in the new index that do not have correspondingrecords in the dataframe are assigned ``NaN``.>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',...              'Chrome']>>> df.reindex(new_index)http_status  response_timeSafari               404.0           0.07Iceweasel              NaN            NaNComodo Dragon          NaN            NaNIE10                 404.0           0.08Chrome               200.0           0.02We can fill in the missing values by passing a value tothe keyword ``fill_value``. Because the index is not monotonicallyincreasing or decreasing, we cannot use arguments to the keyword``method`` to fill the ``NaN`` values.>>> df.reindex(new_index, fill_value=0)http_status  response_timeSafari                 404           0.07Iceweasel                0           0.00Comodo Dragon            0           0.00IE10                   404           0.08Chrome                 200           0.02>>> df.reindex(new_index, fill_value='missing')http_status response_timeSafari                404          0.07Iceweasel         missing       missingComodo Dragon     missing       missingIE10                  404          0.08Chrome                200          0.02We can also reindex the columns.>>> df.reindex(columns=['http_status', 'user_agent'])http_status  user_agentFirefox            200         NaNChrome             200         NaNSafari             404         NaNIE10               404         NaNKonqueror          301         NaNOr we can use "axis-style" keyword arguments>>> df.reindex(['http_status', 'user_agent'], axis="columns")http_status  user_agentFirefox            200         NaNChrome             200         NaNSafari             404         NaNIE10               404         NaNKonqueror          301         NaNTo further illustrate the filling functionality in``reindex``, we will create a dataframe with amonotonically increasing index (for example, a sequenceof dates).>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')>>> df2 = pd.DataFrame({{"prices": [100, 101, np.nan, 100, 89, 88]}},...                    index=date_index)>>> df2prices2010-01-01   100.02010-01-02   101.02010-01-03     NaN2010-01-04   100.02010-01-05    89.02010-01-06    88.0Suppose we decide to expand the dataframe to cover a widerdate range.>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')>>> df2.reindex(date_index2)prices2009-12-29     NaN2009-12-30     NaN2009-12-31     NaN2010-01-01   100.02010-01-02   101.02010-01-03     NaN2010-01-04   100.02010-01-05    89.02010-01-06    88.02010-01-07     NaNThe index entries that did not have a value in the original data frame(for example, '2009-12-29') are by default filled with ``NaN``.If desired, we can fill in the missing values using one of severaloptions.For example, to back-propagate the last valid value to fill the ``NaN``values, pass ``bfill`` as an argument to the ``method`` keyword.>>> df2.reindex(date_index2, method='bfill')prices2009-12-29   100.02009-12-30   100.02009-12-31   100.02010-01-01   100.02010-01-02   101.02010-01-03     NaN2010-01-04   100.02010-01-05    89.02010-01-06    88.02010-01-07     NaNPlease note that the ``NaN`` value present in the original dataframe(at index value 2010-01-03) will not be filled by any of thevalue propagation schemes. This is because filling while reindexingdoes not look at dataframe values, but only compares the original anddesired indexes. If you do want to fill in the ``NaN`` values presentin the original dataframe, use the ``fillna()`` method.See the :ref:`user guide <basics.reindexing>` for more."""# TODO: Decide if we care about having different examples for different# kinds# construct the argsaxes, kwargs = self._construct_axes_from_arguments(args, kwargs)method = missing.clean_reindex_fill_method(kwargs.pop("method", None))level = kwargs.pop("level", None)copy = kwargs.pop("copy", True)limit = kwargs.pop("limit", None)tolerance = kwargs.pop("tolerance", None)fill_value = kwargs.pop("fill_value", None)# Series.reindex doesn't use / need the axis kwarg# We pop and ignore it here, to make writing Series/Frame generic code# easierkwargs.pop("axis", None)if kwargs:raise TypeError("reindex() got an unexpected keyword "f'argument "{list(kwargs.keys())[0]}"')self._consolidate_inplace()# if all axes that are requested to reindex are equal, then only copy# if indicated must have index names equal here as well as valuesif all(self._get_axis(axis).identical(ax)for axis, ax in axes.items()if ax is not None):if copy:return self.copy()return self# check if we are a multi reindexif self._needs_reindex_multi(axes, method, level):return self._reindex_multi(axes, copy, fill_value)# perform the reindex on the axesreturn self._reindex_axes(axes, level, limit, tolerance, method, fill_value, copy).__finalize__(self, method="reindex")

我们发现方法注释的method参数部分,已经提醒过了,该参数只适用于单调的DataFrame/Series

method : {{None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}}Method to use for filling holes in reindexed DataFrame.Please note: this is only applicable to DataFrames/Series with amonotonically increasing/decreasing index.* None (default): don't fill gaps* pad / ffill: Propagate last valid observation forward to nextvalid.* backfill / bfill: Use next valid observation to fill gap.* nearest: Use nearest valid observations to fill gap.

也就是说,我们的columns不单调所以出错了

那么为什么不单调呢?可能是中文字符的问题,测试一下

看来不是。

那会不会是删除操作导致的呢?

破案了,删除导致索引不单调了,所以同步使用method来填充的时候就会出错。

具体的解决可以像其他博客中讲的那样,先进行删除,再调用.ffill()或.bfill()

Pandas中,使用reindex方法报错:index must be monotonic increasing or decreasing的分析相关推荐

  1. controller中执行main方法报错NoClassDefFoundError: javax/servlet/http/HttpServletResponse

    controller中执行main方法报了这个错:NoClassDefFoundError: javax/servlet/http/HttpServletResponse,如下图: NoClassDe ...

  2. python中使用ZADD方法报错AttributeError: 'int' object has no attribute 'items'

    redis的版本关系 正确的完整实例如下: import pymongo import redis# 代码作用是mongodb的数据传送到redis中去 handler = pymongo.Mongo ...

  3. python3中input()方法报错traceback变量未定义的解决方法

    python3中input()方法报错traceback变量未定义的解决方法 参考文章: (1)python3中input()方法报错traceback变量未定义的解决方法 (2)https://ww ...

  4. Python调用seek(pos,mode)方法报错Can‘t do nonzero cur-relative seeks

    在Python中IO操作调用seek(pos,mode)方法时,出现异常错误:OSError:Can't do nonzero cur-relative seeks其示例代码如下: from io i ...

  5. JDK8中Collectors.toMap方法报Duplicate key xxx错误信息

    两种 list转map方式 方式一:这种方式 重复key的话会报错 Duplicate key xxx Map deviceDtoMap = JsonUtils.toList(deviceDtoLis ...

  6. JDK8中Collectors.toMap方法报Duplicate key xxx错误信息

    今天在使用JDK8中的Collectors.toMap()方法报错,错误信息是Duplicate key xxx,很明显是因为key重复造成的,此时通过使用Collectors.toMap()的重载方 ...

  7. 基于adversarial-robustness-toolbox(ART)包进行AI对抗攻击ZOO攻击方法报错

    基于adversarial-robustness-toolbox(ART)包进行AI对抗攻击ZOO攻击方法报错 环境 问题分析 问题解决 ZooAttack类使用扩展 环境 ART版本:1.14.0 ...

  8. getSupportFragmentManager().beginTransaction().add方法报错

    图中红线报错时,发现明明一样但是无论如何改都会报错,这时候就要注意Fragment文件的extends导入的方法,它可能是导入的包不对 如图所示,Fragment有两个包,如果导入的是Android. ...

  9. vue父组件调用子组件方法报错的解决方法

    vue父组件调用子组件方法报错 在父组件定义了一个tab标签页,每一个标签页下面都调用不同的组件,如下图所示: 子组件中定义的方法: setup() {const getList = () => ...

  10. java 实现接口后重写方法报错

    java 实现接口后重写方法报错 java 实现接口后重写方法报错 The method contextDestroyed(ServletContextEvent) of type InitListe ...

最新文章

  1. CocoStudio 0.2.4.0 UI编辑器下根Panel控件设置背景图片时一个BUG
  2. bzoj2034: [2009国家集训队]最大收益
  3. Androidstudio项目更换gradle版本
  4. ymodem传输的终端工具_国六在线监控OBD终端H6S信息安全防篡改技术实现
  5. 51单片机与蓝牙模块连接
  6. ankhsvn 使用_AnkhSVN:在Visual Studio中使用Subversion
  7. Golang六款优秀Web框架对比
  8. TP6如何配置多应用?
  9. Steam流式传输后插耳机没有声音
  10. 破解win7开机密码!
  11. 金鳞岂是池中物IT评论博客正式成立
  12. 学习————运算符!
  13. SAP那些事-职业篇-2-AI能不能替代SAP顾问
  14. 只有一重循环的排序——侏儒排序(Gnome Sort)
  15. GSL(C数学运算库)安装和使用教程
  16. Ninja is required to load C++ extensions in Pycharm
  17. SAE英文会议论文模板及翻译
  18. 2020-11-21-卡农吉他谱
  19. 朝花夕拾:Eclipse断点调试
  20. 记一次购买小米的经历

热门文章

  1. 企业并购方式及并购操作流程
  2. 中山技术学院计算机学院,计算机科学与技术学院
  3. python海龟绘图画科比标志(turtle库)
  4. 玩转地球: 如何利用SAS绘制现代化地图(附代码)
  5. 羽素玩转开学季,1000份豪礼相送
  6. elasticsearch(15) match_phase的使用 slop的使用
  7. 网络 DMZ 区和网络安全等级简介
  8. 白杨SEO:QQ群SEO是什么?QQ群排名如何做引流与营销?【举例】
  9. 手把手教你快应用接入微信H5网页支付
  10. 开源社区ECE:Elastic认证考试复盘总结134贴