-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Image Feature in Datasets Library Fails to Handle bytearray Objects from Spark DataFrames #7517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi ! The
but it doesn't support |
Hi @lhoestq, I’d be happy to work on a fix for this issue. |
I see, that's an issue indeed. Feel free to ping me if I can help with reviews or any guidance If it can help, the code that takes a Spark DataFrame and iterates on the rows for datasets/src/datasets/packaged_modules/spark/spark.py Lines 49 to 53 in 6a96bf3
|
#self-assign |
Describe the bug
When using
IterableDataset.from_spark()
with a Spark DataFrame containing image data, theImage
feature class fails to properly process this data type, causing anAttributeError: 'bytearray' object has no attribute 'get'
Steps to reproduce the bug
IterableDataset.from_spark()
Expected behavior
The features should work on
IterableDataset
the same way they work onDataset
Environment info
datasets
version: 3.5.0huggingface_hub
version: 0.30.2fsspec
version: 2024.12.0The text was updated successfully, but these errors were encountered: