Skip to content

PERF: DataFrame.copy(deep=True) returns a view on the original pyarrow buffer #61930

@TomAugspurger

Description

@TomAugspurger

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

Over in dask/dask#12022 (comment), I'm debugging a test failure with dask and pandas 3.x that comes down to the behavior of DataFrame.copy(deep=True) with an arrow-backed extension array.

In

def copy(self) -> Self:
, we deliberately return a shallow copy (a new object with a view on the original buffers) of the backing array. For correctness, this is fine since pyarrow arrays are immutable, so copying should be unnecessary. However, it does mean that after a DataFrame.copy(deep=True), you'll still have a reference back to the original buffer. If the output of the .copy(deep=True) is the only one with a reference to the original buffer, then it won't be garbage collected. Consider:

import pandas as pd
import pyarrow as pa


pool = pa.default_memory_pool()
print("before", pool.bytes_allocated())  # 0

df = pd.DataFrame({"a": ["a", "b", "c"] * 1000})
print("df", pool.bytes_allocated())  # 27200

del df
print("df", pool.bytes_allocated())  # 0


df2 = pd.DataFrame({"a": ["a", "b", "c"] * 1000})
clone = df2.iloc[:0].copy(deep=True)
print("df2", pool.bytes_allocated())  # 27200

del df2
print("after - clone", pool.bytes_allocated())  # 27200

Maybe this is fine. We can probably figure out some workaround in dask (in this case we're making an empty dataframe object as a kind of Schema object. We can probably do something other than df.iloc[:0].copy(deep=True)). But perhaps pandas could consider changing the behavior here.

The downside is that df.copy(deep=True) will become more expensive and use more memory.

Installed Versions

Details
In [4]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit                : 962168f06d15d1aced28b414eb82909d3c930916
python                : 3.12.8
python-bits           : 64
OS                    : Darwin
OS-release            : 24.5.0
Version               : Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 3.0.0.dev0+2254.g962168f06d
numpy                 : 2.4.0.dev0+git20250717.d02611a
dateutil              : 2.9.0.post0
pip                   : None
Cython                : None
sphinx                : None
IPython               : 9.4.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
fastparquet           : None
fsspec                : 2025.7.0
html5lib              : None
hypothesis            : 6.136.1
gcsfs                 : None
jinja2                : 3.1.6
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : 21.0.0
pyiceberg             : None
pyreadstat            : None
pytest                : 8.4.1
python-calamine       : None
pytz                  : 2025.2
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
qtpy                  : None
pyqt5                 : None

Prior Performance

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions