This is the list of things that are in pandas 2.1 release notes that need to be addressed in pandas-stubs. I have removed the sections Performance improvements and Bug fixes.
PR's welcome. If you do a PR, check off the item and put a link to the PR that closed it. One PR can address multiple issues.
Some of these may already have been taken care of, so if so, check them off and indicate with a comment such as "previously complete"
Enhancements
PyArrow will become a required dependency with pandas 3.0
Avoid NumPy object dtype for strings by default
DataFrame reductions preserve extension dtypes
Copy-on-Write improvements
New DataFrame.map method and support for ExtensionArrays
New implementation of DataFrame.stack
Series.ffill and Series.bfill are now supported for objects with IntervalDtype (ENH: Support ffill/bfill on IntervalArray pandas#54247 )
Added filters parameter to :func:read_parquet to filter out data, compatible with both engines (ENH: explicit filters parameter in pd.read_parquet pandas#53212 )
.Categorical.map and CategoricalIndex.map now have a na_action parameter.
.Categorical.map implicitly had a default value of "ignore" for na_action. This has formally been deprecated and will be changed to None in the future.
Also notice that Series.map has default na_action=None and calls to series with categorical data will now use na_action=None unless explicitly set otherwise (BUG/API: Categorical.map & CategoricalIndex.map have no 'na_action' kwarg pandas#44279 )
api.extensions.ExtensionArray now has a ~api.extensions.ExtensionArray.map method (REF: Add ExtensionArray.map pandas#51809 )
DataFrame.applymap now uses the ~api.extensions.ExtensionArray.map method of underlying api.extensions.ExtensionArray instances (ENH: make DataFrame.applymap uses the .map method of ExtensionArrays pandas#52219 )
MultiIndex.sort_values now supports na_position (BUG: na_position ignored when sorting MultiIndex with level!=None pandas#51612 )
MultiIndex.sortlevel and Index.sortlevel gained a new keyword na_position (BUG: na_position ignored when sorting MultiIndex with level!=None pandas#51612 )
arrays.DatetimeArray.map, arrays.TimedeltaArray.map and arrays.PeriodArray.map can now take a na_action argument (BUG/API: DTI/TDI/PI/IntervalIndex.map ignore na_action pandas#51644 )
arrays.SparseArray.map now supports na_action (ENH: support na_action in SparseArray.map pandas#52096 ).
pandas.read_html now supports the storage_options keyword when used with a URL, allowing users to add headers to the outbound HTTP request (ENH: support storage_options in pandas.read_html pandas#49944 )
Add Index.diff and Index.round (ENH: Index should have diff() method pandas#19708 )
Add "latex-math" as an option to the escape argument of .Styler which will not escape all characters between "\(" and "\)" during formatting ( ENH: add LaTeX math mode with parentheses pandas#51903 )
Add dtype of categories to repr information of CategoricalDtype (ENH: CategoricalDtype __repr__ should show dtype of categories pandas#52179 )
Adding engine_kwargs parameter to :func:read_excel (ENH: Adding engine_kwargs to Excel engines for issue #40274 pandas#52214 )
Classes that are useful for type-hinting have been added to the public API in the new submodule pandas.api.typing (API: Add pandas.api.typing pandas#48577 )
Implemented :attr:Series.dt.is_month_start, :attr:Series.dt.is_month_end, :attr:Series.dt.is_year_start, :attr:Series.dt.is_year_end, :attr:Series.dt.is_quarter_start, :attr:Series.dt.is_quarter_end, :attr:Series.dt.days_in_month, :attr:Series.dt.unit, :attr:Series.dt.normalize, Series.dt.day_name, Series.dt.month_name, Series.dt.tz_convert for ArrowDtype with pyarrow.timestamp (BUG: ArrowTemporalProperties object has no attribute day_name pandas#52388 , BUG: tz_convert not implemented for arrow timestamps pandas#51718 )
.DataFrameGroupBy.agg and .DataFrameGroupBy.transform now support grouping by multiple keys when the index is not a MultiIndex for engine="numba" (ENH: Groupby agg support multiple funcs numba pandas#53486 )
.SeriesGroupBy.agg and .DataFrameGroupBy.agg now support passing in multiple functions for engine="numba" (ENH: Groupby agg support multiple funcs numba pandas#53486 )
.SeriesGroupBy.transform and .DataFrameGroupBy.transform now support passing in a string as the function for engine="numba" (ENH: Groupby.transform support string input with engine=numba pandas#53579 )
DataFrame.stack gained the sort keyword to dictate whether the resulting MultiIndex levels are sorted (sort=False option to stack/unstack/pivot pandas#15105 )
DataFrame.unstack gained the sort keyword to dictate whether the resulting MultiIndex levels are sorted (sort=False option to stack/unstack/pivot pandas#15105 )
Series.explode now supports PyArrow-backed list types (ENH: Series.explode to support pyarrow-backed list types pandas#53602 )
Series.str.join now supports ArrowDtype(pa.string()) (ENH: Series.str.join for ArrowDtype(pa.string()) pandas#53646 )
Add validate parameter to Categorical.from_codes (ENH/PERF: add validate parameter to 'Categorical.from_codes' get avoid validation when not needed pandas#50975 )
Added .ExtensionArray.interpolate used by Series.interpolate and DataFrame.interpolate (ENH: EA.interpolate pandas#53659 )
Added engine_kwargs parameter to DataFrame.to_excel (ENH: Adding engine_kwargs to DataFrame.to_excel pandas#53220 )
Implemented :func:api.interchange.from_dataframe for DatetimeTZDtype (BUG: Conversion of datetime64[ns, UTC] to Arrow C format string is not implemented pandas#54239 )
Implemented __from_arrow__ on DatetimeTZDtype (ENH: add __from_pyarrow__ support to DatetimeTZDtype pandas#52201 )
Implemented __pandas_priority__ to allow custom types to take precedence over DataFrame, Series, Index, or .ExtensionArray for arithmetic operations, :ref:see the developer guide <extending.pandas_priority> (ENH: __pandas_priority__ pandas#48347 )
Improve error message when having incompatible columns using DataFrame.merge (ENH: More helpful error messages for merges with incompatible keys pandas#51861 )
Improve error message when setting DataFrame with wrong number of columns through DataFrame.isetitem (ERR: Add explicit error message for isetitem for DataFrame pandas#51701 )
Improved error handling when using DataFrame.to_json with incompatible index and orient arguments (API/BUG: Make to_json index= arg consistent with orient arg pandas#52143 )
Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns (The Value Error saying that empty data was passed with indices specif… pandas#52084 )
Improved error message when providing an invalid index or offset argument to .VariableOffsetWindowIndexer (CLN: VariableOffsetWindowIndexer pandas#54379 )
Let DataFrame.to_feather accept a non-default Index and non-string column names (ENH: Let to_feather accept index and non string column names pandas#51787 )
Added a new parameter by_row to Series.apply and DataFrame.apply. When set to False the supplied callables will always operate on the whole Series or DataFrame (REF: Decouple Series.apply from Series.agg pandas#53400 , BUG: fix Series.apply(..., by_row), v2. pandas#53601 ).
DataFrame.shift and Series.shift now allow shifting by multiple periods by supplying a list of periods (ENH: pd.Series.shift and .diff to accept a collection of numbers pandas#44424 )
Groupby aggregations with numba (such as .DataFrameGroupBy.sum) now can preserve the dtype of the input instead of casting to float64 (ENH: Allow numba aggregations to return non-float64 results pandas#44952 )
Improved error message when .DataFrameGroupBy.agg failed (ERR: GroupBy.foo raises confusing error message pandas#52930 )
Many read/to_* functions, such as DataFrame.to_pickle and :func:read_csv, support forwarding compression arguments to lzma.LZMAFile (ENH: df.to_pickle xz lzma preset=level for compression compresslevel pandas#52979 )
Reductions Series.argmax, Series.argmin, Series.idxmax, Series.idxmin, Index.argmax, Index.argmin, DataFrame.idxmax, DataFrame.idxmin are now supported for object-dtype (BUG/ENH: should idxmax/idxmin work on object types? pandas#4279 , TypeError on argmax of object dtype (change from 0.20.3) pandas#18021 , BUG: idxmin and idxmax fail for groupby of decimal columns pandas#40685 , BUG: idxmax raises when used with tuples pandas#43697 )
DataFrame.to_parquet and :func:read_parquet will now write and read attrs respectively (Parquet metadata persistence of DataFrame.attrs pandas#54346 )
Index.all and Index.any with floating dtypes and timedelta64 dtypes no longer raise TypeError, matching the Series.all and Series.any behavior (ENH: support Index.any/all with float, timedelta64 dtypes pandas#54566 )
Series.cummax, Series.cummin and Series.cumprod are now supported for pyarrow dtypes with pyarrow version 13.0 and above (BUG: Cumprod failing with type float[pyarrow] pandas#52085 )
Added support for the DataFrame Consortium Standard (ENH: add consortium standard entrypoint pandas#54383 )
Performance improvement in .DataFrameGroupBy.quantile and .SeriesGroupBy.quantile (PERF: GroupBy.quantile pandas#51722 )
PyArrow-backed integer dtypes now support bitwise operations (BUG: Bitwise operators for AND ( & ) and OR( | ) doesn't work with pyarrow integers pandas#54495 )
Backwards incompatible API changes
Deprecations
Deprecated silent upcasting in setitem-like Series operations
Deprecated parsing datetimes with mixed time zones
Deprecated :attr:.DataFrameGroupBy.dtypes, check dtypes on the underlying object instead (DEPR: GroupBy.dtypes pandas#51045 )
Deprecated :attr:DataFrame._data and :attr:Series._data, use public APIs instead (DEPR: deprecate _data when getting BlockManager pandas#33333 )
Deprecated :func:concat behavior when any of the objects being concatenated have length 0; in the past the dtypes of empty objects were ignored when determining the resulting dtype, in a future version they will not (API: concatting of Series/DataFrame - handling (not skipping) of empty objects pandas#39122 )
Deprecated .Categorical.to_list, use obj.tolist() instead (DEPR: Categorical.to_list pandas#51254 )
Deprecated .DataFrameGroupBy.all and .DataFrameGroupBy.any with datetime64 or PeriodDtype values, matching the Series and DataFrame deprecations (API: any/all logical operation for datetime-like dtypes pandas#34479 )
Deprecated axis=1 in DataFrame.ewm, DataFrame.rolling, DataFrame.expanding, transpose before calling the method instead (DEPR: axis=1 in DataFrame.window, resample pandas#51778 )
Deprecated axis=1 in DataFrame.groupby and in Grouper constructor, do frame.T.groupby(...) instead (DEPR: DataFrame.groupby(axis=1) pandas#51203 )
Deprecated broadcast_axis keyword in Series.align and DataFrame.align, upcast before calling align with left = DataFrame({col: left for col in right.columns}, index=right.index) (DEPR: NDFrame.align broadcast_axis, fill_axis keywords pandas#51856 )
Deprecated downcast keyword in Index.fillna (DEPR: downcast keyword in Index.fillna pandas#53956 )
Deprecated fill_method and limit keywords in DataFrame.pct_change, Series.pct_change, .DataFrameGroupBy.pct_change, and .SeriesGroupBy.pct_change, explicitly call e.g. DataFrame.ffill or DataFrame.bfill before calling pct_change instead (DEPR: pct_change method/limit keyword pandas#53491 )
Deprecated method, limit, and fill_axis keywords in DataFrame.align and Series.align, explicitly call DataFrame.fillna or Series.fillna on the alignment results instead (DEPR: NDFrame.align broadcast_axis, fill_axis keywords pandas#51856 )
Deprecated quantile keyword in .Rolling.quantile and .Expanding.quantile, renamed to q instead (ENH: Differing variable name for quantile vs rolling.quantile pandas#52550 )
Deprecated accepting slices in DataFrame.take, call obj[slicer] or pass a sequence of integers instead (DEPR: deprecate allowing slice in DataFrame.take pandas#51539 )
Deprecated behavior of DataFrame.idxmax, DataFrame.idxmin, Series.idxmax, Series.idxmin in with all-NA entries or any-NA and skipna=False; in a future version these will raise ValueError (BUG: pd.Series idxmax raises ValueError instead of returning <NA> when all values are <NA> pandas#51276 )
Deprecated explicit support for subclassing Index (DEPR: subclassing Index pandas#45289 )
Deprecated making functions given to Series.agg attempt to operate on each element in the Series and only operate on the whole Series if the elementwise operations failed. In the future, functions given to Series.agg will always operate on the whole Series only. To keep the current behavior, use Series.transform instead (DEPR: make Series.agg aggregate when possible pandas#53325 )
Deprecated making the functions in a list of functions given to DataFrame.agg attempt to operate on each element in the DataFrame and only operate on the columns of the DataFrame if the elementwise operations failed. To keep the current behavior, use DataFrame.transform instead (DEPR: make Series.agg aggregate when possible pandas#53325 )
Deprecated passing a DataFrame to DataFrame.from_records, use DataFrame.set_index or DataFrame.drop instead (PERF: DataFrame.from_records with DataFrame input pandas#51353 )
Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (Timezones silently dropped in parsing pandas#18702 )
Deprecated the axis keyword in DataFrame.ewm, Series.ewm, DataFrame.rolling, Series.rolling, DataFrame.expanding, Series.expanding (DEPR: axis=1 in DataFrame.window, resample pandas#51778 )
Deprecated the axis keyword in DataFrame.resample, Series.resample (DEPR: axis=1 in DataFrame.window, resample pandas#51778 )
Deprecated the downcast keyword in Series.interpolate, DataFrame.interpolate, Series.fillna, DataFrame.fillna, Series.ffill, DataFrame.ffill, Series.bfill, DataFrame.bfill (DEPR: Deprecate downcast keyword for fillna pandas#40988 )
Deprecated the behavior of :func:concat with both len(keys) != len(objs), in a future version this will raise instead of truncating to the shorter of the two sequences (API: pd.concat with len(keys) != len(values) does not raise; intentional? pandas#43485 )
Deprecated the behavior of Series.argsort in the presence of NA values; in a future version these will be sorted at the end instead of giving -1 (DEPR: Series.argsort NA behavior pandas#54219 )
Deprecated the default of observed=False in DataFrame.groupby and Series.groupby; this will default to True in a future version (DEPR: Change default to observed=True in DataFrame.groupby pandas#43999 )
Deprecating pinning group.name to each group in .SeriesGroupBy.aggregate aggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (DEPR: SeriesGroupBy._aggregate_named pandas#41090 )
Deprecated the axis keyword in .DataFrameGroupBy.idxmax, .DataFrameGroupBy.idxmin, .DataFrameGroupBy.fillna, .DataFrameGroupBy.take, .DataFrameGroupBy.skew, .DataFrameGroupBy.rank, .DataFrameGroupBy.cumprod, .DataFrameGroupBy.cumsum, .DataFrameGroupBy.cummax, .DataFrameGroupBy.cummin, .DataFrameGroupBy.pct_change, .DataFrameGroupBy.diff, .DataFrameGroupBy.shift, and .DataFrameGroupBy.corrwith; for axis=1 operate on the underlying DataFrame instead (DEPR: axis argument in groupby ops pandas#50405 , DEPR: GroupBy.cumsum etc with axis=1 pandas#51046 )
Deprecated .DataFrameGroupBy with as_index=False not including groupings in the result when they are not columns of the DataFrame (DEPR: groupby with as_index=False doesn't add grouper as column when passing a Series as group key pandas#49519 )
Deprecated :func:is_categorical_dtype, use isinstance(obj.dtype, pd.CategoricalDtype) instead (DEPR: is_categorical_dtype pandas#52527 )
Deprecated :func:is_datetime64tz_dtype, check isinstance(dtype, pd.DatetimeTZDtype) instead (DEPR: is_datetime64tz_dtype, is_interval_dtype pandas#52607 )
Deprecated :func:is_int64_dtype, check dtype == np.dtype(np.int64) instead (DEPR: is_int64_dtype pandas#52564 )
Deprecated :func:is_interval_dtype, check isinstance(dtype, pd.IntervalDtype) instead (DEPR: is_datetime64tz_dtype, is_interval_dtype pandas#52607 )
Deprecated :func:is_period_dtype, check isinstance(dtype, pd.PeriodDtype) instead (DEPR: is_period_dtype, is_sparse pandas#52642 )
Deprecated :func:is_sparse, check isinstance(dtype, pd.SparseDtype) instead (DEPR: is_period_dtype, is_sparse pandas#52642 )
Deprecated .Styler.applymap_index. Use the new .Styler.map_index method instead (REF: Styler.applymap -> map pandas#52708 )
Deprecated .Styler.applymap. Use the new .Styler.map method instead (REF: Styler.applymap -> map pandas#52708 )
Deprecated DataFrame.applymap. Use the new DataFrame.map method instead (API: rename DataFrame.applymap -> DataFrame.map pandas#52353 )
Deprecated DataFrame.swapaxes and Series.swapaxes, use DataFrame.transpose or Series.transpose instead (DEPR: DataFrame.swapaxes pandas#51946 )
Deprecated freq parameter in .PeriodArray constructor, pass dtype instead (REF: remove freq arg from PeriodArray constructor pandas#52462 )
Deprecated allowing non-standard inputs in :func:take, pass either a numpy.ndarray, .ExtensionArray, Index, or Series (DEPR: allowing unknowns in take pandas#52981 )
Deprecated allowing non-standard sequences for :func:isin, :func:value_counts, :func:unique, :func:factorize, case to one of numpy.ndarray, Index, .ExtensionArray, or Series before calling (DEPR: accepting non-standard sequences in core.algorithms functions pandas#52986 )
Deprecated behavior of DataFrame reductions sum, prod, std, var, sem with axis=None, in a future version this will operate over both axes returning a scalar instead of behaving like axis=0; note this also affects numpy functions e.g. np.sum(df) (Support axis=None in all reductions pandas#21597 )
Deprecated behavior of :func:concat when DataFrame has columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (API: value-dependent behaviour in concat with all-NA data pandas#40893 )
Deprecated behavior of Series.dt.to_pydatetime, in a future version this will return a Series containing python datetime objects instead of an ndarray of datetimes; this matches the behavior of other :attr:Series.dt properties (API/INT: why is to_pydatetime handled differently in Series.dt accessor? pandas#20306 )
Deprecated logical operations (|, &, ^) between pandas objects and dtype-less sequences (e.g. list, tuple), wrap a sequence in a Series or NumPy array before operating instead (BUG: Inconsistent behavior with bitwise operations on Series with np.array vs. list pandas#51521 )
Deprecated parameter convert_type in Series.apply (API: make the func in Series.apply always operate on the Series pandas#52140 )
Deprecated passing a dictionary to .SeriesGroupBy.agg; pass a list of aggregations instead (DEPR: SeriesGroupBy.agg with dict argument pandas#50684 )
Deprecated the fastpath keyword in Categorical constructor, use Categorical.from_codes instead (CLN: remove fastpath & verify_integrity from constructors pandas#20110 )
Deprecated the behavior of :func:is_bool_dtype returning True for object-dtype Index of bool objects (DEPR: is_bool_dtype special-casing Index[object_all_bools] pandas#52680 )
Deprecated the methods Series.bool and DataFrame.bool (DEPR: NDFrame.bool pandas#51749 )
Deprecated unused closed and normalize keywords in the DatetimeIndex constructor (DEPR: unused keywords in DTI/TDI construtors pandas#52628 )
Deprecated unused closed keyword in the TimedeltaIndex constructor (DEPR: unused keywords in DTI/TDI construtors pandas#52628 )
Deprecated logical operation between two non boolean Series with different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs (BUG: Results of series bitwise ufunc operations are being casted to bool in pandas-2.0 pandas#52500 , BUG: inconsistent Series/DataFrame behavior in bitwise ops pandas#52538 )
Deprecated Period and PeriodDtype with BDay freq, use a DatetimeIndex with BDay freq instead (DEPR: Period[B] pandas#53446 )
Deprecated :func:value_counts, use pd.Series(obj).value_counts() instead (DEPR or DOCS: pd.value_counts() is public, but not documented. Deprecate or document? pandas#47862 )
Deprecated Series.first and DataFrame.first; create a mask and filter using .loc instead (BUG: DataFrame.first has unexpected behavior when passing a DateOffset pandas#45908 )
Deprecated Series.interpolate and DataFrame.interpolate for object-dtype (API/DEPR: interpolate with object dtype pandas#53631 )
Deprecated Series.last and DataFrame.last; create a mask and filter using .loc instead (DEPR: deprecate DataFrame.last and Series.last pandas#53692 )
Deprecated allowing arbitrary fill_value in SparseDtype, in a future version the fill_value will need to be compatible with the dtype.subtype, either a scalar that can be held by that subtype or NaN for integer or bool subtypes (Require the dtype of SparseArray.fill_value and sp_values.dtype to match pandas#23124 )
Deprecated allowing bool dtype in .DataFrameGroupBy.quantile and .SeriesGroupBy.quantile, consistent with the Series.quantile and DataFrame.quantile behavior (API: quantile with bool/boolean dtypes pandas#51424 )
Deprecated behavior of :func:.testing.assert_series_equal and :func:.testing.assert_frame_equal considering NA-like values (e.g. NaN vs None as equivalent) (DEPR: be stricter in assert_almost_equal pandas#52081 )
Deprecated bytes input to :func:read_excel. To read a file path, use a string or path-like object ([DEPR]: Remove literal string/bytes input from read_excel, read_html, and read_xml pandas#53767 )
Deprecated constructing .SparseArray from scalar data, pass a sequence instead (DEPR: SparseArray(scalar) pandas#53039 )
Deprecated falling back to filling when value is not specified in DataFrame.replace and Series.replace with non-dict-like to_replace (API/DEPR: DataFrame/Series.replace is too complex. pandas#33302 )
Deprecated literal json input to :func:read_json. Wrap literal json string input in io.StringIO instead (DEPR: Deprecate literal json string input to read_json pandas#53409 )
Deprecated literal string input to :func:read_xml. Wrap literal string/bytes input in io.StringIO / io.BytesIO instead ([DEPR]: Remove literal string/bytes input from read_excel, read_html, and read_xml pandas#53767 )
Deprecated literal string/bytes input to :func:read_html. Wrap literal string/bytes input in io.StringIO / io.BytesIO instead ([DEPR]: Remove literal string/bytes input from read_excel, read_html, and read_xml pandas#53767 )
Deprecated option mode.use_inf_as_na, convert inf entries to NaN before instead (DEPR: use_inf_as_na pandas#51684 )
Deprecated parameter obj in .DataFrameGroupBy.get_group (DEPR: obj argument in GroupBy.get_group pandas#53545 )
Deprecated positional indexing on Series with Series.__getitem__ and Series.__setitem__, in a future version ser[item] will always interpret item as a label, not a position (DEPR: Series.__getitem__, Series.__setitem__ pandas#50617 )
Deprecated replacing builtin and NumPy functions in .agg, .apply, and .transform; use the corresponding string alias (e.g. "sum" for sum or np.sum) instead (DEPR: Special casing of NumPy and Python builtin functions pandas#53425 )
Deprecated strings T, t, L and l denoting units in :func:to_timedelta (BUG: Either incorrect unit validation for 'T' in to_timedelta() or incorrect documentation pandas#52536 )
Deprecated the "method" and "limit" keywords in .ExtensionArray.fillna, implement _pad_or_backfill instead (API: EA.ffill/bfill? pandas#53621 )
Deprecated the method and limit keywords in DataFrame.replace and Series.replace (API/DEPR: DataFrame/Series.replace is too complex. pandas#33302 )
Deprecated the method and limit keywords on Series.fillna, DataFrame.fillna, .SeriesGroupBy.fillna, .DataFrameGroupBy.fillna, and .Resampler.fillna, use obj.bfill() or obj.ffill() instead (DEPR: fillna 'method' pandas#53394 )
Deprecated the behavior of Series.__getitem__, Series.__setitem__, DataFrame.__getitem__, DataFrame.__setitem__ with an integer slice on objects with a floating-dtype index, in a future version this will be treated as positional indexing (BUG: is ser[:2] with Int64Index positional or label-based pandas#49612 )
Deprecated the use of non-supported datetime64 and timedelta64 resolutions with :func:pandas.array. Supported resolutions are: "s", "ms", "us", "ns" resolutions (API: pd.array convert unsupported dt64/td64 to supported? pandas#53058 )
Deprecated values "pad", "ffill", "bfill", "backfill" for Series.interpolate and DataFrame.interpolate, use obj.ffill() or obj.bfill() instead (DEPR/API: disallow ffill/bfill method in "interpolate" pandas#53581 )
Deprecated the behavior of Index.argmax, Index.argmin, Series.argmax, Series.argmin with either all-NAs and skipna=True or any-NAs and skipna=False returning -1; in a future version this will raise ValueError (API/BUG: Series.argmin/max with all-NaN data returns -1 ? pandas#33941 , API: argmin/argmax behaviour for nullable dtypes with skipna=False pandas#33942 )
Deprecated allowing non-keyword arguments in DataFrame.to_sql except name and con (DEPR: Positional arguments in to_* I/O methods pandas#54229 )
Deprecated silently ignoring fill_value when passing both freq and fill_value to DataFrame.shift, Series.shift and .DataFrameGroupBy.shift; in a future version this will raise ValueError (BUG: DataFrame.shift(axis=1) with EADtype pandas#53832 )
This is the list of things that are in pandas 2.1 release notes that need to be addressed in pandas-stubs. I have removed the sections
Performance improvementsandBug fixes.PR's welcome. If you do a PR, check off the item and put a link to the PR that closed it. One PR can address multiple issues.
Some of these may already have been taken care of, so if so, check them off and indicate with a comment such as "previously complete"
Enhancements
DataFrame.mapmethod and support for ExtensionArraysDataFrame.stackSeries.ffillandSeries.bfillare now supported for objects withIntervalDtype(ENH: Support ffill/bfill on IntervalArray pandas#54247)filtersparameter to :func:read_parquetto filter out data, compatible with bothengines(ENH: explicit filters parameter in pd.read_parquet pandas#53212).Categorical.mapandCategoricalIndex.mapnow have ana_actionparameter..Categorical.mapimplicitly had a default value of"ignore"forna_action. This has formally been deprecated and will be changed toNonein the future.Also notice that
Series.maphas defaultna_action=Noneand calls to series with categorical data will now usena_action=Noneunless explicitly set otherwise (BUG/API: Categorical.map & CategoricalIndex.map have no 'na_action' kwarg pandas#44279)api.extensions.ExtensionArraynow has a~api.extensions.ExtensionArray.mapmethod (REF: Add ExtensionArray.map pandas#51809)DataFrame.applymapnow uses the~api.extensions.ExtensionArray.mapmethod of underlyingapi.extensions.ExtensionArrayinstances (ENH: make DataFrame.applymap uses the .map method of ExtensionArrays pandas#52219)MultiIndex.sort_valuesnow supportsna_position(BUG:na_positionignored when sortingMultiIndexwithlevel!=Nonepandas#51612)MultiIndex.sortlevelandIndex.sortlevelgained a new keywordna_position(BUG:na_positionignored when sortingMultiIndexwithlevel!=Nonepandas#51612)arrays.DatetimeArray.map,arrays.TimedeltaArray.mapandarrays.PeriodArray.mapcan now take ana_actionargument (BUG/API: DTI/TDI/PI/IntervalIndex.map ignore na_action pandas#51644)arrays.SparseArray.mapnow supportsna_action(ENH: support na_action in SparseArray.map pandas#52096).pandas.read_htmlnow supports thestorage_optionskeyword when used with a URL, allowing users to add headers to the outbound HTTP request (ENH: supportstorage_optionsinpandas.read_htmlpandas#49944)Index.diffandIndex.round(ENH: Index should have diff() method pandas#19708)"latex-math"as an option to theescapeargument of.Stylerwhich will not escape all characters between"\("and"\)"during formatting ( ENH: add LaTeX math mode with parentheses pandas#51903)reprinformation ofCategoricalDtype(ENH: CategoricalDtype __repr__ should show dtype of categories pandas#52179)engine_kwargsparameter to :func:read_excel(ENH: Adding engine_kwargs to Excel engines for issue #40274 pandas#52214)pandas.api.typing(API: Add pandas.api.typing pandas#48577)Series.dt.is_month_start, :attr:Series.dt.is_month_end, :attr:Series.dt.is_year_start, :attr:Series.dt.is_year_end, :attr:Series.dt.is_quarter_start, :attr:Series.dt.is_quarter_end, :attr:Series.dt.days_in_month, :attr:Series.dt.unit, :attr:Series.dt.normalize,Series.dt.day_name,Series.dt.month_name,Series.dt.tz_convertforArrowDtypewithpyarrow.timestamp(BUG:ArrowTemporalPropertiesobject has no attributeday_namepandas#52388, BUG: tz_convert not implemented for arrow timestamps pandas#51718).DataFrameGroupBy.aggand.DataFrameGroupBy.transformnow support grouping by multiple keys when the index is not aMultiIndexforengine="numba"(ENH: Groupby agg support multiple funcs numba pandas#53486).SeriesGroupBy.aggand.DataFrameGroupBy.aggnow support passing in multiple functions forengine="numba"(ENH: Groupby agg support multiple funcs numba pandas#53486).SeriesGroupBy.transformand.DataFrameGroupBy.transformnow support passing in a string as the function forengine="numba"(ENH: Groupby.transform support string input with engine=numba pandas#53579)DataFrame.stackgained thesortkeyword to dictate whether the resultingMultiIndexlevels are sorted (sort=Falseoption to stack/unstack/pivot pandas#15105)DataFrame.unstackgained thesortkeyword to dictate whether the resultingMultiIndexlevels are sorted (sort=Falseoption to stack/unstack/pivot pandas#15105)Series.explodenow supports PyArrow-backed list types (ENH: Series.explode to support pyarrow-backed list types pandas#53602)Series.str.joinnow supportsArrowDtype(pa.string())(ENH: Series.str.join for ArrowDtype(pa.string()) pandas#53646)validateparameter toCategorical.from_codes(ENH/PERF: addvalidateparameter to 'Categorical.from_codes' get avoid validation when not needed pandas#50975).ExtensionArray.interpolateused bySeries.interpolateandDataFrame.interpolate(ENH: EA.interpolate pandas#53659)engine_kwargsparameter toDataFrame.to_excel(ENH: Adding engine_kwargs to DataFrame.to_excel pandas#53220)api.interchange.from_dataframeforDatetimeTZDtype(BUG: Conversion of datetime64[ns, UTC] to Arrow C format string is not implemented pandas#54239)__from_arrow__onDatetimeTZDtype(ENH: add__from_pyarrow__support toDatetimeTZDtypepandas#52201)__pandas_priority__to allow custom types to take precedence overDataFrame,Series,Index, or.ExtensionArrayfor arithmetic operations, :ref:see the developer guide <extending.pandas_priority>(ENH: __pandas_priority__ pandas#48347)DataFrame.merge(ENH: More helpful error messages for merges with incompatible keys pandas#51861)DataFramewith wrong number of columns throughDataFrame.isetitem(ERR: Add explicit error message for isetitem for DataFrame pandas#51701)DataFrame.to_jsonwith incompatibleindexandorientarguments (API/BUG: Maketo_jsonindex=arg consistent withorientarg pandas#52143)indexoroffsetargument to.VariableOffsetWindowIndexer(CLN: VariableOffsetWindowIndexer pandas#54379)DataFrame.to_featheraccept a non-defaultIndexand non-string column names (ENH: Let to_feather accept index and non string column names pandas#51787)by_rowtoSeries.applyandDataFrame.apply. When set toFalsethe supplied callables will always operate on the whole Series or DataFrame (REF: Decouple Series.apply from Series.agg pandas#53400, BUG: fix Series.apply(..., by_row), v2. pandas#53601).DataFrame.shiftandSeries.shiftnow allow shifting by multiple periods by supplying a list of periods (ENH: pd.Series.shift and .diff to accept a collection of numbers pandas#44424)numba(such as.DataFrameGroupBy.sum) now can preserve the dtype of the input instead of casting tofloat64(ENH: Allow numba aggregations to return non-float64 results pandas#44952).DataFrameGroupBy.aggfailed (ERR: GroupBy.foo raises confusing error message pandas#52930)DataFrame.to_pickleand :func:read_csv, support forwarding compression arguments tolzma.LZMAFile(ENH: df.to_pickle xz lzma preset=level for compression compresslevel pandas#52979)Series.argmax,Series.argmin,Series.idxmax,Series.idxmin,Index.argmax,Index.argmin,DataFrame.idxmax,DataFrame.idxminare now supported for object-dtype (BUG/ENH: should idxmax/idxmin work on object types? pandas#4279, TypeError on argmax of object dtype (change from 0.20.3) pandas#18021, BUG: idxmin and idxmax fail for groupby of decimal columns pandas#40685, BUG: idxmax raises when used with tuples pandas#43697)DataFrame.to_parquetand :func:read_parquetwill now write and readattrsrespectively (Parquet metadata persistence of DataFrame.attrs pandas#54346)Index.allandIndex.anywith floating dtypes and timedelta64 dtypes no longer raiseTypeError, matching theSeries.allandSeries.anybehavior (ENH: support Index.any/all with float, timedelta64 dtypes pandas#54566)Series.cummax,Series.cumminandSeries.cumprodare now supported for pyarrow dtypes with pyarrow version 13.0 and above (BUG: Cumprod failing with type float[pyarrow] pandas#52085).DataFrameGroupBy.quantileand.SeriesGroupBy.quantile(PERF: GroupBy.quantile pandas#51722)Backwards incompatible API changes
arrays.PandasArrayhas been renamed.NumpyExtensionArrayand the attached dtype name changed fromPandasDtypetoNumpyEADtype; importingPandasArraystill works until the next major version (API: rename PandasArray -> NumpyExtensionArray pandas#53694)Deprecations
.DataFrameGroupBy.dtypes, checkdtypeson the underlying object instead (DEPR: GroupBy.dtypes pandas#51045)DataFrame._dataand :attr:Series._data, use public APIs instead (DEPR: deprecate _data when getting BlockManager pandas#33333)concatbehavior when any of the objects being concatenated have length 0; in the past the dtypes of empty objects were ignored when determining the resulting dtype, in a future version they will not (API: concatting of Series/DataFrame - handling (not skipping) of empty objects pandas#39122).Categorical.to_list, useobj.tolist()instead (DEPR: Categorical.to_list pandas#51254).DataFrameGroupBy.alland.DataFrameGroupBy.anywith datetime64 orPeriodDtypevalues, matching theSeriesandDataFramedeprecations (API: any/all logical operation for datetime-like dtypes pandas#34479)axis=1inDataFrame.ewm,DataFrame.rolling,DataFrame.expanding, transpose before calling the method instead (DEPR: axis=1 in DataFrame.window, resample pandas#51778)axis=1inDataFrame.groupbyand inGrouperconstructor, doframe.T.groupby(...)instead (DEPR: DataFrame.groupby(axis=1) pandas#51203)broadcast_axiskeyword inSeries.alignandDataFrame.align, upcast before callingalignwithleft = DataFrame({col: left for col in right.columns}, index=right.index)(DEPR: NDFrame.align broadcast_axis, fill_axis keywords pandas#51856)downcastkeyword inIndex.fillna(DEPR: downcast keyword in Index.fillna pandas#53956)fill_methodandlimitkeywords inDataFrame.pct_change,Series.pct_change,.DataFrameGroupBy.pct_change, and.SeriesGroupBy.pct_change, explicitly call e.g.DataFrame.ffillorDataFrame.bfillbefore callingpct_changeinstead (DEPR: pct_change method/limit keyword pandas#53491)method,limit, andfill_axiskeywords inDataFrame.alignandSeries.align, explicitly callDataFrame.fillnaorSeries.fillnaon the alignment results instead (DEPR: NDFrame.align broadcast_axis, fill_axis keywords pandas#51856)quantilekeyword in.Rolling.quantileand.Expanding.quantile, renamed toqinstead (ENH: Differing variable name for quantile vs rolling.quantile pandas#52550)DataFrame.take, callobj[slicer]or pass a sequence of integers instead (DEPR: deprecate allowing slice in DataFrame.take pandas#51539)DataFrame.idxmax,DataFrame.idxmin,Series.idxmax,Series.idxminin with all-NA entries or any-NA andskipna=False; in a future version these will raiseValueError(BUG: pd.Series idxmax raises ValueError instead of returning <NA> when all values are <NA> pandas#51276)Index(DEPR: subclassing Index pandas#45289)Series.aggattempt to operate on each element in theSeriesand only operate on the wholeSeriesif the elementwise operations failed. In the future, functions given toSeries.aggwill always operate on the wholeSeriesonly. To keep the current behavior, useSeries.transforminstead (DEPR: make Series.agg aggregate when possible pandas#53325)DataFrame.aggattempt to operate on each element in theDataFrameand only operate on the columns of theDataFrameif the elementwise operations failed. To keep the current behavior, useDataFrame.transforminstead (DEPR: make Series.agg aggregate when possible pandas#53325)DataFrametoDataFrame.from_records, useDataFrame.set_indexorDataFrame.dropinstead (PERF: DataFrame.from_records with DataFrame input pandas#51353)axiskeyword inDataFrame.ewm,Series.ewm,DataFrame.rolling,Series.rolling,DataFrame.expanding,Series.expanding(DEPR: axis=1 in DataFrame.window, resample pandas#51778)axiskeyword inDataFrame.resample,Series.resample(DEPR: axis=1 in DataFrame.window, resample pandas#51778)downcastkeyword inSeries.interpolate,DataFrame.interpolate,Series.fillna,DataFrame.fillna,Series.ffill,DataFrame.ffill,Series.bfill,DataFrame.bfill(DEPR: Deprecate downcast keyword for fillna pandas#40988)concatwith bothlen(keys) != len(objs), in a future version this will raise instead of truncating to the shorter of the two sequences (API: pd.concat with len(keys) != len(values) does not raise; intentional? pandas#43485)Series.argsortin the presence of NA values; in a future version these will be sorted at the end instead of giving -1 (DEPR: Series.argsort NA behavior pandas#54219)observed=FalseinDataFrame.groupbyandSeries.groupby; this will default toTruein a future version (DEPR: Change default toobserved=TrueinDataFrame.groupbypandas#43999)group.nameto each group in.SeriesGroupBy.aggregateaggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (DEPR: SeriesGroupBy._aggregate_named pandas#41090)axiskeyword in.DataFrameGroupBy.idxmax,.DataFrameGroupBy.idxmin,.DataFrameGroupBy.fillna,.DataFrameGroupBy.take,.DataFrameGroupBy.skew,.DataFrameGroupBy.rank,.DataFrameGroupBy.cumprod,.DataFrameGroupBy.cumsum,.DataFrameGroupBy.cummax,.DataFrameGroupBy.cummin,.DataFrameGroupBy.pct_change,.DataFrameGroupBy.diff,.DataFrameGroupBy.shift, and.DataFrameGroupBy.corrwith; foraxis=1operate on the underlyingDataFrameinstead (DEPR: axis argument in groupby ops pandas#50405, DEPR: GroupBy.cumsum etc with axis=1 pandas#51046).DataFrameGroupBywithas_index=Falsenot including groupings in the result when they are not columns of the DataFrame (DEPR: groupby with as_index=False doesn't add grouper as column when passing a Series as group key pandas#49519)is_categorical_dtype, useisinstance(obj.dtype, pd.CategoricalDtype)instead (DEPR: is_categorical_dtype pandas#52527)is_datetime64tz_dtype, checkisinstance(dtype, pd.DatetimeTZDtype)instead (DEPR: is_datetime64tz_dtype, is_interval_dtype pandas#52607)is_int64_dtype, checkdtype == np.dtype(np.int64)instead (DEPR: is_int64_dtype pandas#52564)is_interval_dtype, checkisinstance(dtype, pd.IntervalDtype)instead (DEPR: is_datetime64tz_dtype, is_interval_dtype pandas#52607)is_period_dtype, checkisinstance(dtype, pd.PeriodDtype)instead (DEPR: is_period_dtype, is_sparse pandas#52642)is_sparse, checkisinstance(dtype, pd.SparseDtype)instead (DEPR: is_period_dtype, is_sparse pandas#52642).Styler.applymap_index. Use the new.Styler.map_indexmethod instead (REF: Styler.applymap -> map pandas#52708).Styler.applymap. Use the new.Styler.mapmethod instead (REF: Styler.applymap -> map pandas#52708)DataFrame.applymap. Use the newDataFrame.mapmethod instead (API: rename DataFrame.applymap -> DataFrame.map pandas#52353)DataFrame.swapaxesandSeries.swapaxes, useDataFrame.transposeorSeries.transposeinstead (DEPR: DataFrame.swapaxes pandas#51946)freqparameter in.PeriodArrayconstructor, passdtypeinstead (REF: remove freq arg from PeriodArray constructor pandas#52462)take, pass either anumpy.ndarray,.ExtensionArray,Index, orSeries(DEPR: allowing unknowns in take pandas#52981)isin, :func:value_counts, :func:unique, :func:factorize, case to one ofnumpy.ndarray,Index,.ExtensionArray, orSeriesbefore calling (DEPR: accepting non-standard sequences in core.algorithms functions pandas#52986)DataFramereductionssum,prod,std,var,semwithaxis=None, in a future version this will operate over both axes returning a scalar instead of behaving likeaxis=0; note this also affects numpy functions e.g.np.sum(df)(Support axis=None in all reductions pandas#21597)concatwhenDataFramehas columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (API: value-dependent behaviour in concat with all-NA data pandas#40893)Series.dt.to_pydatetime, in a future version this will return aSeriescontaining pythondatetimeobjects instead of anndarrayof datetimes; this matches the behavior of other :attr:Series.dtproperties (API/INT: why is to_pydatetime handled differently in Series.dt accessor? pandas#20306)|,&,^) between pandas objects and dtype-less sequences (e.g.list,tuple), wrap a sequence in aSeriesor NumPy array before operating instead (BUG: Inconsistent behavior with bitwise operations on Series with np.array vs. list pandas#51521)convert_typeinSeries.apply(API: make the func in Series.apply always operate on the Series pandas#52140).SeriesGroupBy.agg; pass a list of aggregations instead (DEPR: SeriesGroupBy.agg with dict argument pandas#50684)fastpathkeyword inCategoricalconstructor, useCategorical.from_codesinstead (CLN: remove fastpath & verify_integrity from constructors pandas#20110)is_bool_dtypereturningTruefor object-dtypeIndexof bool objects (DEPR: is_bool_dtype special-casing Index[object_all_bools] pandas#52680)Series.boolandDataFrame.bool(DEPR: NDFrame.bool pandas#51749)closedandnormalizekeywords in theDatetimeIndexconstructor (DEPR: unused keywords in DTI/TDI construtors pandas#52628)closedkeyword in theTimedeltaIndexconstructor (DEPR: unused keywords in DTI/TDI construtors pandas#52628)Serieswith different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs (BUG: Results of seriesbitwiseufunc operations are being casted toboolinpandas-2.0pandas#52500, BUG: inconsistent Series/DataFrame behavior in bitwise ops pandas#52538)PeriodandPeriodDtypewithBDayfreq, use aDatetimeIndexwithBDayfreq instead (DEPR: Period[B] pandas#53446)value_counts, usepd.Series(obj).value_counts()instead (DEPR or DOCS: pd.value_counts() is public, but not documented. Deprecate or document? pandas#47862)Series.firstandDataFrame.first; create a mask and filter using.locinstead (BUG: DataFrame.first has unexpected behavior when passing a DateOffset pandas#45908)Series.interpolateandDataFrame.interpolatefor object-dtype (API/DEPR: interpolate with object dtype pandas#53631)Series.lastandDataFrame.last; create a mask and filter using.locinstead (DEPR: deprecate DataFrame.last and Series.last pandas#53692)fill_valueinSparseDtype, in a future version thefill_valuewill need to be compatible with thedtype.subtype, either a scalar that can be held by that subtype orNaNfor integer or bool subtypes (Require the dtype of SparseArray.fill_value and sp_values.dtype to match pandas#23124).DataFrameGroupBy.quantileand.SeriesGroupBy.quantile, consistent with theSeries.quantileandDataFrame.quantilebehavior (API: quantile with bool/boolean dtypes pandas#51424).testing.assert_series_equaland :func:.testing.assert_frame_equalconsidering NA-like values (e.g.NaNvsNoneas equivalent) (DEPR: be stricter in assert_almost_equal pandas#52081)read_excel. To read a file path, use a string or path-like object ([DEPR]: Remove literal string/bytes input fromread_excel,read_html, andread_xmlpandas#53767).SparseArrayfrom scalar data, pass a sequence instead (DEPR: SparseArray(scalar) pandas#53039)valueis not specified inDataFrame.replaceandSeries.replacewith non-dict-liketo_replace(API/DEPR: DataFrame/Series.replace is too complex. pandas#33302)read_json. Wrap literal json string input inio.StringIOinstead (DEPR: Deprecate literal json string input to read_json pandas#53409)read_xml. Wrap literal string/bytes input inio.StringIO/io.BytesIOinstead ([DEPR]: Remove literal string/bytes input fromread_excel,read_html, andread_xmlpandas#53767)read_html. Wrap literal string/bytes input inio.StringIO/io.BytesIOinstead ([DEPR]: Remove literal string/bytes input fromread_excel,read_html, andread_xmlpandas#53767)mode.use_inf_as_na, convert inf entries toNaNbefore instead (DEPR: use_inf_as_na pandas#51684)objin.DataFrameGroupBy.get_group(DEPR:objargument inGroupBy.get_grouppandas#53545)SerieswithSeries.__getitem__andSeries.__setitem__, in a future versionser[item]will always interpretitemas a label, not a position (DEPR: Series.__getitem__, Series.__setitem__ pandas#50617).agg,.apply, and.transform; use the corresponding string alias (e.g."sum"forsumornp.sum) instead (DEPR: Special casing of NumPy and Python builtin functions pandas#53425)T,t,Landldenoting units in :func:to_timedelta(BUG: Either incorrect unit validation for 'T' in to_timedelta() or incorrect documentation pandas#52536).ExtensionArray.fillna, implement_pad_or_backfillinstead (API: EA.ffill/bfill? pandas#53621)methodandlimitkeywords inDataFrame.replaceandSeries.replace(API/DEPR: DataFrame/Series.replace is too complex. pandas#33302)methodandlimitkeywords onSeries.fillna,DataFrame.fillna,.SeriesGroupBy.fillna,.DataFrameGroupBy.fillna, and.Resampler.fillna, useobj.bfill()orobj.ffill()instead (DEPR: fillna 'method' pandas#53394)Series.__getitem__,Series.__setitem__,DataFrame.__getitem__,DataFrame.__setitem__with an integer slice on objects with a floating-dtype index, in a future version this will be treated as positional indexing (BUG: is ser[:2] with Int64Index positional or label-based pandas#49612)pandas.array. Supported resolutions are: "s", "ms", "us", "ns" resolutions (API: pd.array convert unsupported dt64/td64 to supported? pandas#53058)"pad","ffill","bfill","backfill"forSeries.interpolateandDataFrame.interpolate, useobj.ffill()orobj.bfill()instead (DEPR/API: disallow ffill/bfill method in "interpolate" pandas#53581)Index.argmax,Index.argmin,Series.argmax,Series.argminwith either all-NAs andskipna=Trueor any-NAs andskipna=Falsereturning -1; in a future version this will raiseValueError(API/BUG: Series.argmin/max with all-NaN data returns -1 ? pandas#33941, API: argmin/argmax behaviour for nullable dtypes with skipna=False pandas#33942)DataFrame.to_sqlexceptnameandcon(DEPR: Positional arguments in to_* I/O methods pandas#54229)fill_valuewhen passing bothfreqandfill_valuetoDataFrame.shift,Series.shiftand.DataFrameGroupBy.shift; in a future version this will raiseValueError(BUG: DataFrame.shift(axis=1) with EADtype pandas#53832)