Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: While importing/Loading tfds plant_leaves dataset #5416

Open
Coolcoder45 opened this issue May 16, 2024 · 9 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@Coolcoder45
Copy link

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description
tfds plant_leaves is not getting loaded successfully. It's throwing NotImplementedError. Tried on May 16, 2024

Environment information

  • Operating System: Windows 11

  • Python version: 3.10.12

  • tensorflow-datasets/tfds-nightly version: 4.9.4

  • tensorflow/tf-nightly version: version: 2.15.0

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? Yup

Reproduction instructions

import tensorflow_datasets as tfds
plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)

Gives:

Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
DlCompleted...: 100%1/1 [10:04<00:00, 604.39s/url]
DlSize...: 100%6718/6718 [10:04<00:00, 11.25MiB/s]
Dataset plant_leaves downloaded and prepared to /root/tensorflow_datasets/plant_leaves/0.1.1. Subsequent calls will reuse this data.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-3-d88d46497437>](https://localhost:8080/#) in <cell line: 2>()
      1 import tensorflow_datasets as tfds
----> 2 plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)

33 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/file_adapters.py](https://localhost:8080/#) in make_tf_data(cls, filename, buffer_size)
    206   ) -> tf.data.Dataset:
    207     """Returns TensorFlow Dataset comprising given array record file."""
--> 208     raise NotImplementedError(
    209         '`.as_dataset()` not implemented for ArrayRecord files. Please, use'
    210         ' `.as_data_source()`.'

NotImplementedError: `.as_dataset()` not implemented for ArrayRecord files. Please, use `.as_data_source()`.

Expected behavior
To load dataset successfully.

@Coolcoder45 Coolcoder45 added the bug Something isn't working label May 16, 2024
@pierrot0 pierrot0 self-assigned this May 17, 2024
@pierrot0
Copy link
Collaborator

Hi, thank you for reporting!
This is definitely a bug.

Workaround: add the following arg to your tfds.load call:

tfds.load(..., download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

We'll look on how to update the code and update on the bug.

@Coolcoder45
Copy link
Author

It's still giving error.

import tensorflow_datasets as `tfds`
plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

Gives

Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-3-608b46b22c6c>](https://localhost:8080/#) in <cell line: 4>()
      2 #plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)
      3 #plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, as_data_source=True)
----> 4 plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

5 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    167     metadata = self._start_call()
    168     try:
--> 169       return function(*args, **kwargs)
    170     except Exception:
    171       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
    645       try_gcs,
    646   )
--> 647   _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
    648 
    649   if as_dataset_kwargs is None:

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
    504   if download:
    505     download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 506     dbuilder.download_and_prepare(**download_and_prepare_kwargs)
    507 
    508 

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    167     metadata = self._start_call()
    168     try:
--> 169       return function(*args, **kwargs)
    170     except Exception:
    171       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in download_and_prepare(self, download_dir, download_config, file_format)
    679     # to generate the files.
    680     if file_format:
--> 681       self.info.set_file_format(file_format, override=True)
    682 
    683     # Create a tmp dir and rename to self.data_dir on successful exit.

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in set_file_format(self, file_format, override)
    470       )
    471     if override and self._fully_initialized:
--> 472       raise RuntimeError(
    473           "Cannot override the file format "
    474           "when the DatasetInfo is already fully initialized!"

RuntimeError: Cannot override the file format when the DatasetInfo is already fully initialized!

@dddraxxx
Copy link

dddraxxx commented Jul 17, 2024

Same errors on refcoco dataset.
NotImplementedError: `.as_dataset()` not implemented for ArrayRecord files. Please, use `.as_data_source()`.

@dddraxxx
Copy link

Anyway, one thing I do to solve this is add the following line:

builder = tfds.builder('ref_coco/refcocog_umd')
builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
builder.download_and_prepare()
ref_ds = tfds.load('ref_coco/refcocog_umd', split='validation')

@Dmitry-Danchenko
Copy link

Dmitry-Danchenko commented Nov 7, 2024

builder = tfds.builder('oxford_iiit_pet')
builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
builder.download_and_prepare()

dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

also erroers:

`

NotImplementedError Traceback (most recent call last)
Cell In[34], line 5
2 builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
3 builder.download_and_prepare()
----> 5 dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/logging/init.py:176, in _FunctionDecorator.call(self, function, instance, args, kwargs)
174 metadata = self._start_call()
175 try:
--> 176 return function(*args, **kwargs)
177 except Exception:
178 metadata.mark_error()

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/load.py:673, in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
670 as_dataset_kwargs.setdefault('shuffle_files', shuffle_files)
671 as_dataset_kwargs.setdefault('read_config', read_config)
--> 673 ds = dbuilder.as_dataset(**as_dataset_kwargs)
674 if with_info:
675 return ds, dbuilder.info

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/logging/init.py:176, in _FunctionDecorator.call(self, function, instance, args, kwargs)
174 metadata = self._start_call()
175 try:
--> 176 return function(*args, **kwargs)
177 except Exception:
178 metadata.mark_error()

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1026, in DatasetBuilder.as_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised)
1017 # Create a dataset for each of the given splits
1018 build_single_dataset = functools.partial(
1019 self._build_single_dataset,
1020 shuffle_files=shuffle_files,
(...)
1024 as_supervised=as_supervised,
1025 )
-> 1026 all_ds = tree.map_structure(build_single_dataset, split)
1027 return all_ds

File /usr/local/lib/python3.12/dist-packages/tree/init.py:428, in map_structure(func, *structures, **kwargs)
425 for other in structures[1:]:
426 assert_same_structure(structures[0], other, check_types=check_types)
427 return unflatten_as(structures[0],
--> 428 [func(*args) for args in zip(*map(flatten, structures))])

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1044, in DatasetBuilder._build_single_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised)
1041 batch_size = self.info.splits.total_num_examples or sys.maxsize
1043 # Build base dataset
-> 1044 ds = self._as_dataset(
1045 split=split,
1046 shuffle_files=shuffle_files,
1047 decoders=decoders,
1048 read_config=read_config,
1049 )
1050 # Auto-cache small datasets which are small enough to fit in memory.
1051 if self._should_cache_ds(
1052 split=split, shuffle_files=shuffle_files, read_config=read_config
1053 ):

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1498, in FileReaderBuilder._as_dataset(self, split, decoders, read_config, shuffle_files)
1492 reader = reader_lib.Reader(
1493 self.data_dir,
1494 example_specs=example_specs,
1495 file_format=self.info.file_format,
1496 )
1497 decode_fn = functools.partial(features.decode_example, decoders=decoders)
-> 1498 return reader.read(
1499 instructions=split,
1500 split_infos=self.info.splits.values(),
1501 decode_fn=decode_fn,
1502 read_config=read_config,
1503 shuffle_files=shuffle_files,
1504 disable_shuffling=self.info.disable_shuffling,
1505 )

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:430, in Reader.read(self, instructions, split_infos, read_config, shuffle_files, disable_shuffling, decode_fn)
421 file_instructions = splits_dict[instruction].file_instructions
422 return self.read_files(
423 file_instructions,
424 read_config=read_config,
(...)
427 decode_fn=decode_fn,
428 )
--> 430 return tree.map_structure(_read_instruction_to_ds, instructions)

File /usr/local/lib/python3.12/dist-packages/tree/init.py:428, in map_structure(func, *structures, **kwargs)
425 for other in structures[1:]:
426 assert_same_structure(structures[0], other, check_types=check_types)
427 return unflatten_as(structures[0],
--> 428 [func(*args) for args in zip(*map(flatten, structures))])

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:422, in Reader.read.._read_instruction_to_ds(instruction)
420 def _read_instruction_to_ds(instruction):
421 file_instructions = splits_dict[instruction].file_instructions
--> 422 return self.read_files(
423 file_instructions,
424 read_config=read_config,
425 shuffle_files=shuffle_files,
426 disable_shuffling=disable_shuffling,
427 decode_fn=decode_fn,
428 )

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:462, in Reader.read_files(self, file_instructions, read_config, shuffle_files, disable_shuffling, decode_fn)
459 raise ValueError(msg)
461 # Read serialized example (eventually with tfds_id)
--> 462 ds = _read_files(
463 file_instructions=file_instructions,
464 read_config=read_config,
465 shuffle_files=shuffle_files,
466 disable_shuffling=disable_shuffling,
467 file_format=self._file_format,
468 )
470 # Parse and decode
471 def parse_and_decode(ex: Tensor) -> TreeDict[Tensor]:
472 # TODO(pierrot): parse_example uses
473 # tf.io.parse_single_example. It might be faster to use parse_example,
474 # after batching.
475 # https://www.tensorflow.org/api_docs/python/tf/io/parse_example

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:302, in _read_files(file_instructions, read_config, shuffle_files, disable_shuffling, file_format)
295 if (
296 shuffle_files
297 and read_config.shuffle_seed is None
298 and tf_compat.get_option_deterministic(read_config.options) is None
299 ):
300 deterministic = False
--> 302 ds = instruction_ds.interleave(
303 functools.partial(
304 _get_dataset_from_filename,
305 do_skip=do_skip,
306 do_take=do_take,
307 file_format=file_format,
308 add_tfds_id=read_config.add_tfds_id,
309 override_buffer_size=read_config.override_buffer_size,
310 ),
311 cycle_length=cycle_length,
312 block_length=block_length,
313 num_parallel_calls=read_config.num_parallel_calls_for_interleave_files,
314 deterministic=deterministic,
315 )
317 return assert_cardinality_and_apply_options(ds)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/dataset_ops.py:2534, in DatasetV2.interleave(self, map_func, cycle_length, block_length, num_parallel_calls, deterministic, name)
2530 # Loaded lazily due to a circular dependency (
2531 # dataset_ops -> interleave_op -> dataset_ops).
2532 # pylint: disable=g-import-not-at-top,protected-access
2533 from tensorflow.python.data.ops import interleave_op
-> 2534 return interleave_op._interleave(self, map_func, cycle_length, block_length,
2535 num_parallel_calls, deterministic, name)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/interleave_op.py:49, in _interleave(input_dataset, map_func, cycle_length, block_length, num_parallel_calls, deterministic, name)
46 return _InterleaveDataset(
47 input_dataset, map_func, cycle_length, block_length, name=name)
48 else:
---> 49 return _ParallelInterleaveDataset(
50 input_dataset,
51 map_func,
52 cycle_length,
53 block_length,
54 num_parallel_calls,
55 deterministic=deterministic,
56 name=name)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/interleave_op.py:119, in _ParallelInterleaveDataset.init(self, input_dataset, map_func, cycle_length, block_length, num_parallel_calls, buffer_output_elements, prefetch_input_elements, deterministic, name)
117 """See Dataset.interleave() for details."""
118 self._input_dataset = input_dataset
--> 119 self._map_func = structured_function.StructuredFunctionWrapper(
120 map_func, self._transformation_name(), dataset=input_dataset)
121 if not isinstance(self._map_func.output_structure, dataset_ops.DatasetSpec):
122 raise TypeError(
123 "The map_func argument must return a Dataset object. Got "
124 f"{dataset_ops.get_type(self._map_func.output_structure)!r}.")

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:265, in StructuredFunctionWrapper.init(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
258 warnings.warn(
259 "Even though the tf.config.experimental_run_functions_eagerly "
260 "option is set, this option does not apply to tf.data functions. "
261 "To force eager execution of tf.data functions, please use "
262 "tf.data.experimental.enable_debug_mode().")
263 fn_factory = trace_tf_function(defun_kwargs)
--> 265 self._function = fn_factory()
266 # There is no graph to add in eager mode.
267 add_to_graph &= not context.executing_eagerly()

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1251, in Function.get_concrete_function(self, *args, **kwargs)
1249 def get_concrete_function(self, *args, **kwargs):
1250 # Implements PolymorphicFunction.get_concrete_function.
-> 1251 concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
1252 concrete._garbage_collector.release() # pylint: disable=protected-access
1253 return concrete

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1221, in Function._get_concrete_function_garbage_collected(self, *args, **kwargs)
1219 if self._variable_creation_config is None:
1220 initializers = []
-> 1221 self._initialize(args, kwargs, add_initializers_to=initializers)
1222 self._initialize_uninitialized_variables(initializers)
1224 if self._created_variables:
1225 # In this case we have created variables on the first call, so we run the
1226 # version which is guaranteed to never create variables.

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:696, in Function._initialize(self, args, kwds, add_initializers_to)
691 self._variable_creation_config = self._generate_scoped_tracing_options(
692 variable_capturing_scope,
693 tracing_compilation.ScopeType.VARIABLE_CREATION,
694 )
695 # Force the definition of the function for these arguments
--> 696 self._concrete_variable_creation_fn = tracing_compilation.trace_function(
697 args, kwds, self._variable_creation_config
698 )
700 def invalid_creator_scope(*unused_args, **unused_kwds):
701 """Disables variable creation."""

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:178, in trace_function(args, kwargs, tracing_options)
175 args = tracing_options.input_signature
176 kwargs = {}
--> 178 concrete_function = _maybe_define_function(
179 args, kwargs, tracing_options
180 )
182 if not tracing_options.bind_graph_to_function:
183 concrete_function._garbage_collector.release() # pylint: disable=protected-access

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:283, in _maybe_define_function(args, kwargs, tracing_options)
281 else:
282 target_func_type = lookup_func_type
--> 283 concrete_function = _create_concrete_function(
284 target_func_type, lookup_func_context, func_graph, tracing_options
285 )
287 if tracing_options.function_cache is not None:
288 tracing_options.function_cache.add(
289 concrete_function, current_func_context
290 )

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:310, in _create_concrete_function(function_type, type_context, func_graph, tracing_options)
303 placeholder_bound_args = function_type.placeholder_arguments(
304 placeholder_context
305 )
307 disable_acd = tracing_options.attributes and tracing_options.attributes.get(
308 attributes_lib.DISABLE_ACD, False
309 )
--> 310 traced_func_graph = func_graph_module.func_graph_from_py_func(
311 tracing_options.name,
312 tracing_options.python_function,
313 placeholder_bound_args.args,
314 placeholder_bound_args.kwargs,
315 None,
316 func_graph=func_graph,
317 add_control_dependencies=not disable_acd,
318 arg_names=function_type_utils.to_arg_names(function_type),
319 create_placeholders=False,
320 )
322 transform.apply_func_graph_transforms(traced_func_graph)
324 graph_capture_container = traced_func_graph.function_captures

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/framework/func_graph.py:1059, in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, create_placeholders)
1056 return x
1058 _, original_func = tf_decorator.unwrap(python_func)
-> 1059 func_outputs = python_func(*func_args, **func_kwargs)
1061 # invariant: func_outputs contains only Tensors, CompositeTensors,
1062 # TensorArrays and Nones.
1063 func_outputs = variable_utils.convert_variables_to_tensors(func_outputs)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:599, in Function._generate_scoped_tracing_options..wrapped_fn(*args, **kwds)
595 with default_graph._variable_creator_scope(scope, priority=50): # pylint: disable=protected-access
596 # wrapped allows AutoGraph to swap in a converted function. We give
597 # the function a weak reference to itself to avoid a reference cycle.
598 with OptionalXlaContext(compile_with_xla):
--> 599 out = weak_wrapped_fn().wrapped(*args, **kwds)
600 return out

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:231, in StructuredFunctionWrapper.init..trace_tf_function..wrapped_fn(*args)
230 def wrapped_fn(*args): # pylint: disable=missing-docstring
--> 231 ret = wrapper_helper(*args)
232 ret = structure.to_tensor_list(self._output_structure, ret)
233 return [ops.convert_to_tensor(t) for t in ret]

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:161, in StructuredFunctionWrapper.init..wrapper_helper(*args)
159 if not _should_unpack(nested_args):
160 nested_args = (nested_args,)
--> 161 ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
162 ret = variable_utils.convert_variables_to_tensors(ret)
163 if _should_pack(ret):

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:690, in convert..decorator..wrapper(*args, **kwargs)
688 try:
689 with conversion_ctx:
--> 690 return converted_call(f, args, kwargs, options=options)
691 except Exception as e: # pylint:disable=broad-except
692 if hasattr(e, 'ag_error_metadata'):

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:352, in converted_call(f, args, kwargs, caller_fn_scope, options)
349 new_args = f.args + args
350 logging.log(3, 'Forwarding call of partial %s with\n%s\n%s\n', f, new_args,
351 new_kwargs)
--> 352 return converted_call(
353 f.func,
354 new_args,
355 new_kwargs,
356 caller_fn_scope=caller_fn_scope,
357 options=options)
359 if inspect_utils.isbuiltin(f):
360 if f is eval:

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:331, in converted_call(f, args, kwargs, caller_fn_scope, options)
329 if conversion.is_in_allowlist_cache(f, options):
330 logging.log(2, 'Allowlisted %s: from cache', f)
--> 331 return _call_unconverted(f, args, kwargs, options, False)
333 if ag_ctx.control_status_ctx().status == ag_ctx.Status.DISABLED:
334 logging.log(2, 'Allowlisted: %s: AutoGraph is disabled in context', f)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:459, in _call_unconverted(f, args, kwargs, options, update_cache)
456 return f.self.call(args, kwargs)
458 if kwargs is not None:
--> 459 return f(*args, **kwargs)
460 return f(*args)

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:69, in _get_dataset_from_filename(instruction, do_skip, do_take, file_format, add_tfds_id, override_buffer_size)
60 def _get_dataset_from_filename(
61 instruction: _Instruction,
62 do_skip: bool,
(...)
66 override_buffer_size: Optional[int] = None,
67 ) -> tf.data.Dataset:
68 """Returns a tf.data.Dataset instance from given instructions."""
---> 69 ds = file_adapters.ADAPTER_FOR_FORMAT[file_format].make_tf_data(
70 instruction.filepath, buffer_size=override_buffer_size
71 )
72 if do_skip:
73 ds = ds.skip(instruction.skip)

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/file_adapters.py:267, in ArrayRecordFileAdapter.make_tf_data(cls, filename, buffer_size)
260 @classmethod
261 def make_tf_data(
262 cls,
263 filename: epath.PathLike,
264 buffer_size: int | None = None,
265 ) -> tf.data.Dataset:
266 """Returns TensorFlow Dataset comprising given array record file."""
--> 267 raise NotImplementedError(
268 '.as_dataset() not implemented for ArrayRecord files. Please, use'
269 ' .as_data_source().'
270 )

NotImplementedError: .as_dataset() not implemented for ArrayRecord files. Please, use .as_data_source().
`

@pierrot0
Copy link
Collaborator

Can you try with the following instead?

builder = tfds.builder('oxford_iiit_pet')
builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
builder.download_and_prepare()

dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.PARQUET})

@dsaha21
Copy link

dsaha21 commented Nov 16, 2024

Hi @pierrot0,
I tried on both my 1. local system and 2. Colab. I used the PARQUET format like u mentioned. Getting something like the following :

colab_unet


I also tried to implement using only build

colab_onlybuilder1

using builder.as_data_source() is giving us the result

{'train': ArrayRecordDataSource(name=oxford_iiit_pet, split='train', decoders=None),
 'test': ArrayRecordDataSource(name=oxford_iiit_pet, split='test', decoders=None)}

@Dmitry-Danchenko
Copy link

Dmitry-Danchenko commented Nov 17, 2024

Hi @pierrot0 !

Still have a problem:


NotImplementedError Traceback (most recent call last)
Cell In[50], line 5
2 builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
3 builder.download_and_prepare()
----> 5 dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.PARQUET})

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/logging/init.py:176, in _FunctionDecorator.call(self, function, instance, args, kwargs)
174 metadata = self._start_call()
175 try:
--> 176 return function(*args, **kwargs)
177 except Exception:
178 metadata.mark_error()

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/load.py:673, in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
670 as_dataset_kwargs.setdefault('shuffle_files', shuffle_files)
671 as_dataset_kwargs.setdefault('read_config', read_config)
--> 673 ds = dbuilder.as_dataset(**as_dataset_kwargs)
674 if with_info:
675 return ds, dbuilder.info

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/logging/init.py:176, in _FunctionDecorator.call(self, function, instance, args, kwargs)
174 metadata = self._start_call()
175 try:
--> 176 return function(*args, **kwargs)
177 except Exception:
178 metadata.mark_error()

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1026, in DatasetBuilder.as_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised)
1017 # Create a dataset for each of the given splits
1018 build_single_dataset = functools.partial(
1019 self._build_single_dataset,
1020 shuffle_files=shuffle_files,
(...)
1024 as_supervised=as_supervised,
1025 )
-> 1026 all_ds = tree.map_structure(build_single_dataset, split)
1027 return all_ds

File /usr/local/lib/python3.12/dist-packages/tree/init.py:428, in map_structure(func, *structures, **kwargs)
425 for other in structures[1:]:
426 assert_same_structure(structures[0], other, check_types=check_types)
427 return unflatten_as(structures[0],
--> 428 [func(*args) for args in zip(*map(flatten, structures))])

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1044, in DatasetBuilder._build_single_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised)
1041 batch_size = self.info.splits.total_num_examples or sys.maxsize
1043 # Build base dataset
-> 1044 ds = self._as_dataset(
1045 split=split,
1046 shuffle_files=shuffle_files,
1047 decoders=decoders,
1048 read_config=read_config,
1049 )
1050 # Auto-cache small datasets which are small enough to fit in memory.
1051 if self._should_cache_ds(
1052 split=split, shuffle_files=shuffle_files, read_config=read_config
1053 ):

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1498, in FileReaderBuilder._as_dataset(self, split, decoders, read_config, shuffle_files)
1492 reader = reader_lib.Reader(
1493 self.data_dir,
1494 example_specs=example_specs,
1495 file_format=self.info.file_format,
1496 )
1497 decode_fn = functools.partial(features.decode_example, decoders=decoders)
-> 1498 return reader.read(
1499 instructions=split,
1500 split_infos=self.info.splits.values(),
1501 decode_fn=decode_fn,
1502 read_config=read_config,
1503 shuffle_files=shuffle_files,
1504 disable_shuffling=self.info.disable_shuffling,
1505 )

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:430, in Reader.read(self, instructions, split_infos, read_config, shuffle_files, disable_shuffling, decode_fn)
421 file_instructions = splits_dict[instruction].file_instructions
422 return self.read_files(
423 file_instructions,
424 read_config=read_config,
(...)
427 decode_fn=decode_fn,
428 )
--> 430 return tree.map_structure(_read_instruction_to_ds, instructions)

File /usr/local/lib/python3.12/dist-packages/tree/init.py:428, in map_structure(func, *structures, **kwargs)
425 for other in structures[1:]:
426 assert_same_structure(structures[0], other, check_types=check_types)
427 return unflatten_as(structures[0],
--> 428 [func(*args) for args in zip(*map(flatten, structures))])

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:422, in Reader.read.._read_instruction_to_ds(instruction)
420 def _read_instruction_to_ds(instruction):
421 file_instructions = splits_dict[instruction].file_instructions
--> 422 return self.read_files(
423 file_instructions,
424 read_config=read_config,
425 shuffle_files=shuffle_files,
426 disable_shuffling=disable_shuffling,
427 decode_fn=decode_fn,
428 )

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:462, in Reader.read_files(self, file_instructions, read_config, shuffle_files, disable_shuffling, decode_fn)
459 raise ValueError(msg)
461 # Read serialized example (eventually with tfds_id)
--> 462 ds = _read_files(
463 file_instructions=file_instructions,
464 read_config=read_config,
465 shuffle_files=shuffle_files,
466 disable_shuffling=disable_shuffling,
467 file_format=self._file_format,
468 )
470 # Parse and decode
471 def parse_and_decode(ex: Tensor) -> TreeDict[Tensor]:
472 # TODO(pierrot): parse_example uses
473 # tf.io.parse_single_example. It might be faster to use parse_example,
474 # after batching.
475 # https://www.tensorflow.org/api_docs/python/tf/io/parse_example

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:302, in _read_files(file_instructions, read_config, shuffle_files, disable_shuffling, file_format)
295 if (
296 shuffle_files
297 and read_config.shuffle_seed is None
298 and tf_compat.get_option_deterministic(read_config.options) is None
299 ):
300 deterministic = False
--> 302 ds = instruction_ds.interleave(
303 functools.partial(
304 _get_dataset_from_filename,
305 do_skip=do_skip,
306 do_take=do_take,
307 file_format=file_format,
308 add_tfds_id=read_config.add_tfds_id,
309 override_buffer_size=read_config.override_buffer_size,
310 ),
311 cycle_length=cycle_length,
312 block_length=block_length,
313 num_parallel_calls=read_config.num_parallel_calls_for_interleave_files,
314 deterministic=deterministic,
315 )
317 return assert_cardinality_and_apply_options(ds)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/dataset_ops.py:2534, in DatasetV2.interleave(self, map_func, cycle_length, block_length, num_parallel_calls, deterministic, name)
2530 # Loaded lazily due to a circular dependency (
2531 # dataset_ops -> interleave_op -> dataset_ops).
2532 # pylint: disable=g-import-not-at-top,protected-access
2533 from tensorflow.python.data.ops import interleave_op
-> 2534 return interleave_op._interleave(self, map_func, cycle_length, block_length,
2535 num_parallel_calls, deterministic, name)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/interleave_op.py:49, in _interleave(input_dataset, map_func, cycle_length, block_length, num_parallel_calls, deterministic, name)
46 return _InterleaveDataset(
47 input_dataset, map_func, cycle_length, block_length, name=name)
48 else:
---> 49 return _ParallelInterleaveDataset(
50 input_dataset,
51 map_func,
52 cycle_length,
53 block_length,
54 num_parallel_calls,
55 deterministic=deterministic,
56 name=name)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/interleave_op.py:119, in _ParallelInterleaveDataset.init(self, input_dataset, map_func, cycle_length, block_length, num_parallel_calls, buffer_output_elements, prefetch_input_elements, deterministic, name)
117 """See Dataset.interleave() for details."""
118 self._input_dataset = input_dataset
--> 119 self._map_func = structured_function.StructuredFunctionWrapper(
120 map_func, self._transformation_name(), dataset=input_dataset)
121 if not isinstance(self._map_func.output_structure, dataset_ops.DatasetSpec):
122 raise TypeError(
123 "The map_func argument must return a Dataset object. Got "
124 f"{dataset_ops.get_type(self._map_func.output_structure)!r}.")

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:265, in StructuredFunctionWrapper.init(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
258 warnings.warn(
259 "Even though the tf.config.experimental_run_functions_eagerly "
260 "option is set, this option does not apply to tf.data functions. "
261 "To force eager execution of tf.data functions, please use "
262 "tf.data.experimental.enable_debug_mode().")
263 fn_factory = trace_tf_function(defun_kwargs)
--> 265 self._function = fn_factory()
266 # There is no graph to add in eager mode.
267 add_to_graph &= not context.executing_eagerly()

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1251, in Function.get_concrete_function(self, *args, **kwargs)
1249 def get_concrete_function(self, *args, **kwargs):
1250 # Implements PolymorphicFunction.get_concrete_function.
-> 1251 concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
1252 concrete._garbage_collector.release() # pylint: disable=protected-access
1253 return concrete

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1221, in Function._get_concrete_function_garbage_collected(self, *args, **kwargs)
1219 if self._variable_creation_config is None:
1220 initializers = []
-> 1221 self._initialize(args, kwargs, add_initializers_to=initializers)
1222 self._initialize_uninitialized_variables(initializers)
1224 if self._created_variables:
1225 # In this case we have created variables on the first call, so we run the
1226 # version which is guaranteed to never create variables.

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:696, in Function._initialize(self, args, kwds, add_initializers_to)
691 self._variable_creation_config = self._generate_scoped_tracing_options(
692 variable_capturing_scope,
693 tracing_compilation.ScopeType.VARIABLE_CREATION,
694 )
695 # Force the definition of the function for these arguments
--> 696 self._concrete_variable_creation_fn = tracing_compilation.trace_function(
697 args, kwds, self._variable_creation_config
698 )
700 def invalid_creator_scope(*unused_args, **unused_kwds):
701 """Disables variable creation."""

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:178, in trace_function(args, kwargs, tracing_options)
175 args = tracing_options.input_signature
176 kwargs = {}
--> 178 concrete_function = _maybe_define_function(
179 args, kwargs, tracing_options
180 )
182 if not tracing_options.bind_graph_to_function:
183 concrete_function._garbage_collector.release() # pylint: disable=protected-access

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:283, in _maybe_define_function(args, kwargs, tracing_options)
281 else:
282 target_func_type = lookup_func_type
--> 283 concrete_function = _create_concrete_function(
284 target_func_type, lookup_func_context, func_graph, tracing_options
285 )
287 if tracing_options.function_cache is not None:
288 tracing_options.function_cache.add(
289 concrete_function, current_func_context
290 )

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:310, in _create_concrete_function(function_type, type_context, func_graph, tracing_options)
303 placeholder_bound_args = function_type.placeholder_arguments(
304 placeholder_context
305 )
307 disable_acd = tracing_options.attributes and tracing_options.attributes.get(
308 attributes_lib.DISABLE_ACD, False
309 )
--> 310 traced_func_graph = func_graph_module.func_graph_from_py_func(
311 tracing_options.name,
312 tracing_options.python_function,
313 placeholder_bound_args.args,
314 placeholder_bound_args.kwargs,
315 None,
316 func_graph=func_graph,
317 add_control_dependencies=not disable_acd,
318 arg_names=function_type_utils.to_arg_names(function_type),
319 create_placeholders=False,
320 )
322 transform.apply_func_graph_transforms(traced_func_graph)
324 graph_capture_container = traced_func_graph.function_captures

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/framework/func_graph.py:1059, in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, create_placeholders)
1056 return x
1058 _, original_func = tf_decorator.unwrap(python_func)
-> 1059 func_outputs = python_func(*func_args, **func_kwargs)
1061 # invariant: func_outputs contains only Tensors, CompositeTensors,
1062 # TensorArrays and Nones.
1063 func_outputs = variable_utils.convert_variables_to_tensors(func_outputs)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:599, in Function._generate_scoped_tracing_options..wrapped_fn(*args, **kwds)
595 with default_graph._variable_creator_scope(scope, priority=50): # pylint: disable=protected-access
596 # wrapped allows AutoGraph to swap in a converted function. We give
597 # the function a weak reference to itself to avoid a reference cycle.
598 with OptionalXlaContext(compile_with_xla):
--> 599 out = weak_wrapped_fn().wrapped(*args, **kwds)
600 return out

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:231, in StructuredFunctionWrapper.init..trace_tf_function..wrapped_fn(*args)
230 def wrapped_fn(*args): # pylint: disable=missing-docstring
--> 231 ret = wrapper_helper(*args)
232 ret = structure.to_tensor_list(self._output_structure, ret)
233 return [ops.convert_to_tensor(t) for t in ret]

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:161, in StructuredFunctionWrapper.init..wrapper_helper(*args)
159 if not _should_unpack(nested_args):
160 nested_args = (nested_args,)
--> 161 ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
162 ret = variable_utils.convert_variables_to_tensors(ret)
163 if _should_pack(ret):

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:690, in convert..decorator..wrapper(*args, **kwargs)
688 try:
689 with conversion_ctx:
--> 690 return converted_call(f, args, kwargs, options=options)
691 except Exception as e: # pylint:disable=broad-except
692 if hasattr(e, 'ag_error_metadata'):

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:352, in converted_call(f, args, kwargs, caller_fn_scope, options)
349 new_args = f.args + args
350 logging.log(3, 'Forwarding call of partial %s with\n%s\n%s\n', f, new_args,
351 new_kwargs)
--> 352 return converted_call(
353 f.func,
354 new_args,
355 new_kwargs,
356 caller_fn_scope=caller_fn_scope,
357 options=options)
359 if inspect_utils.isbuiltin(f):
360 if f is eval:

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:331, in converted_call(f, args, kwargs, caller_fn_scope, options)
329 if conversion.is_in_allowlist_cache(f, options):
330 logging.log(2, 'Allowlisted %s: from cache', f)
--> 331 return _call_unconverted(f, args, kwargs, options, False)
333 if ag_ctx.control_status_ctx().status == ag_ctx.Status.DISABLED:
334 logging.log(2, 'Allowlisted: %s: AutoGraph is disabled in context', f)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:459, in _call_unconverted(f, args, kwargs, options, update_cache)
456 return f.self.call(args, kwargs)
458 if kwargs is not None:
--> 459 return f(*args, **kwargs)
460 return f(*args)

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:69, in _get_dataset_from_filename(instruction, do_skip, do_take, file_format, add_tfds_id, override_buffer_size)
60 def _get_dataset_from_filename(
61 instruction: _Instruction,
62 do_skip: bool,
(...)
66 override_buffer_size: Optional[int] = None,
67 ) -> tf.data.Dataset:
68 """Returns a tf.data.Dataset instance from given instructions."""
---> 69 ds = file_adapters.ADAPTER_FOR_FORMAT[file_format].make_tf_data(
70 instruction.filepath, buffer_size=override_buffer_size
71 )
72 if do_skip:
73 ds = ds.skip(instruction.skip)

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/file_adapters.py:267, in ArrayRecordFileAdapter.make_tf_data(cls, filename, buffer_size)
260 @classmethod
261 def make_tf_data(
262 cls,
263 filename: epath.PathLike,
264 buffer_size: int | None = None,
265 ) -> tf.data.Dataset:
266 """Returns TensorFlow Dataset comprising given array record file."""
--> 267 raise NotImplementedError(
268 '.as_dataset() not implemented for ArrayRecord files. Please, use'
269 ' .as_data_source().'
270 )

NotImplementedError: .as_dataset() not implemented for ArrayRecord files. Please, use .as_data_source().

Screenshot 2024-11-17 at 20 56 41 Screenshot 2024-11-17 at 20 59 31

@Dmitry-Danchenko
Copy link

Screenshot 2024-11-17 at 21 01 21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants