Will Avalanche's FFCV integration support concatenated datasets? #1543
-
Hi, I have been trying to integrate the FFCV implementation found on the main branch. I have run into the issue where I cannot use the That is a problem because of the following check:
Specifically Main Question: So far, the only workaround I have thought of is to build a Side note: Thanks! P.S. Just dropping a tag @lrzpellegrini, since I figured you were leading the charge on FFCV integration |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
For posterity, I ended up going down the route of trying to "trick" Avalanche and it seems to have worked for my case. This isn't a particularly impressive solution and I might be causing some issues for myself down the line. But, for now, it works. I just reused the logic from Pytorch's The key is just to be very careful that both datasets are returning exactly the same types, shapes, etc, and that any transformations applied are also exactly the same. You can use this CombinedDataset class to enforce/override transformations applied to the datasets being combined. Here's a stripped down version class CombinedDataset(Dataset):
"""
Dataset class to combine two dataset datasets with a common transformation.
Acts as a single dataset, hiding the fact that it is really a ConcatDataset
"""
def __init__(
self,
):
# ...
# variables, transforms, etc.
# Initialize datasets
self.datasets = []
self.datasets.append(
MyFirstDataset()
)
self.datasets.append(
MySecondDataset()
)
# Initialize targets
self.targets = []
for dataset in self.datasets:
self.targets.extend(dataset.targets)
# Compute cumulative sizes
self.cumulative_sizes = self._cumsum(self.datasets)
@staticmethod
def _cumsum(sequence):
r, s = [], 0
for e in sequence:
l = len(e)
r.append(l + s)
s += l
return r
def __len__(self):
return self.cumulative_sizes[-1]
def __getitem__(self, index):
if index < 0 or index >= len(self):
raise IndexError("The index is out of range.")
dataset_index = bisect.bisect_right(self.cumulative_sizes, index)
sample_index = (
index - self.cumulative_sizes[dataset_index - 1]
if dataset_index > 0
else index
)
# Unpack
image, label = self.datasets[dataset_index][sample_index]
return image, label I'm going to close this discussion for now, but I'm still curious to know if there will ever be support for concatenated datasets in Avalanche's support of FFCV! The solution above is hacky and bound to be error prone, I think. |
Beta Was this translation helpful? Give feedback.
The solution seems right. Just make sure that the order in which datasets are concatenated is always the same across executions!