-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Type Mismatch in Dataset Mapping #7135
Comments
By the way, following code is working. This show the inconsistentcy. from datasets import Dataset
# Original data
data = {
'text': ['Hello', 'world', 'this', 'is', 'a', 'test'],
'label': [0, 1, 0, 1, 1, 0]
}
# Creating a Dataset object
dataset = Dataset.from_dict(data)
# Mapping function to convert label to string
def add_one(example):
example['label'] += 1
return example
# Applying the mapping function
dataset = dataset.map(add_one)
# Iterating over the dataset to show results
for item in dataset:
print(item)
print(type(item['label'])) |
Hello, thanks for submitting an issue. FWIU, the issue is that A quick solution would be to use # using Dataset.cast
dataset = dataset.cast_column('label', Value('string'))
# Alternative, supply features
dataset = dataset.map(add_one, features=Features({**dataset.features, 'label': Value('string')})) |
LGTM! Thanks for the review. Just to clarify, is this intended behavior, or is it something that might be addressed in a future update? |
Issue: Type Mismatch in Dataset Mapping
Description
There is an issue with the
map
function in thedatasets
library where the mapped output does not reflect the expected type change. After applying a mapping function to convert an integer label to a string, the resulting type remains an integer instead of a string.Reproduction Code
Below is a Python script that demonstrates the problem:
Expected Output
After applying the mapping function, the expected output should have the
label
field as strings:Actual Output
The actual output still shows the
label
field values as integers:Why necessary
In the case of Image process we often need to convert PIL to tensor with same column name.
Thank for every dev who review this issue. 🤗
The text was updated successfully, but these errors were encountered: