BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file · Issue #41241 · pandas-dev/pandas · GitHub | Latest TMZ Celebrity News & Gossip | Watch TMZ Live
Skip to content

BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file #41241

Closed
@bob-zhao-work

Description

@bob-zhao-work
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

df_pq = pd.read_parquet(x, use_nullable_dtypes = True)

Problem description

Get error when add the new parameter use_nullable_dtypes to pd.read_parquet().
If remove it , everything go back to normal.
OS: Ubuntu 16
Python: 3.8

A empty parquet file from spark causes the problem. Its schema is:

Authors,AuthorId,int64
Authors,Rank,int32
Authors,NormalizedName,string
Authors,DisplayName,string
Authors,LastKnownAffiliationId,int64
Authors,PaperCount,int64
Authors,PaperFamilyCount,int64
Authors,CitationCount,int64
Authors,CreatedDate,date32[day]

error msg:

df_pq = pd.read_parquet(x,use_nullable_dtypes = True)

File "/vjan/lib/python3.8/site-packages/pandas/io/parquet.py", line 459, in read_parquet
return impl.read(
File "/vjan/lib/python3.8/site-packages/pandas/io/parquet.py", line 221, in read
return self.api.parquet.read_table(
File "pyarrow/array.pxi", line 751, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 1668, in pyarrow.lib.Table._to_pandas
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 792, in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1133, in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1133, in
return [_reconstruct_block(item, columns, extension_columns)
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 751, in _reconstruct_block
pd_ext_arr = pandas_dtype.from_arrow(arr)
File "/vjan/lib/python3.8/site-packages/pandas/core/arrays/integer.py", line 121, in from_arrow
return IntegerArray._concat_same_type(results)
File "/vjan/lib/python3.8/site-packages/pandas/core/arrays/masked.py", line 271, in _concat_same_type
data = np.concatenate([x._data for x in to_concat])
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Expected Output

read the empty parquet file and generate an empty df

Output of pd.show_versions()

1.2.4

Metadata

Metadata

Assignees

Labels

IO Parquetparquet, featherNeeds TestsUnit test(s) needed to prevent regressionsRegressionFunctionality that used to work in a prior pandas versiongood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    TMZ Celebrity News – Breaking Stories, Videos & Gossip

    Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.

    Whether it’s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.

    🎥 Watch TMZ Live

    TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Don’t miss a beat—watch now and see what’s trending in Hollywood.