Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
df_pq = pd.read_parquet(x, use_nullable_dtypes = True)
Problem description
Get error when add the new parameter use_nullable_dtypes to pd.read_parquet().
If remove it , everything go back to normal.
OS: Ubuntu 16
Python: 3.8
A empty parquet file from spark causes the problem. Its schema is:
Authors,AuthorId,int64
Authors,Rank,int32
Authors,NormalizedName,string
Authors,DisplayName,string
Authors,LastKnownAffiliationId,int64
Authors,PaperCount,int64
Authors,PaperFamilyCount,int64
Authors,CitationCount,int64
Authors,CreatedDate,date32[day]
error msg:
df_pq = pd.read_parquet(x,use_nullable_dtypes = True)
File "/vjan/lib/python3.8/site-packages/pandas/io/parquet.py", line 459, in read_parquet
return impl.read(
File "/vjan/lib/python3.8/site-packages/pandas/io/parquet.py", line 221, in read
return self.api.parquet.read_table(
File "pyarrow/array.pxi", line 751, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 1668, in pyarrow.lib.Table._to_pandas
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 792, in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1133, in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1133, in
return [_reconstruct_block(item, columns, extension_columns)
File "/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 751, in _reconstruct_block
pd_ext_arr = pandas_dtype.from_arrow(arr)
File "/vjan/lib/python3.8/site-packages/pandas/core/arrays/integer.py", line 121, in from_arrow
return IntegerArray._concat_same_type(results)
File "/vjan/lib/python3.8/site-packages/pandas/core/arrays/masked.py", line 271, in _concat_same_type
data = np.concatenate([x._data for x in to_concat])
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
Expected Output
read the empty parquet file and generate an empty df
Output of pd.show_versions()
1.2.4