Skip to content

Latest commit

 

History

History
312 lines (279 loc) · 9.44 KB

prepare_dataset.md

File metadata and controls

312 lines (279 loc) · 9.44 KB

Prepare Data

Download Dataset

prepare data follow below instruction.

  • data directory file tree
─── data
    ├── 50salads
    ├── egtea
    ├── gtea
    └── ...

gtea and 50salads and egtea

The video action segmentation model uses egtea, 50salads and gtea data sets.

To download I3D feature from ms-tcn repo.

  • Dataset tree example
─── gtea
    ├── Videos
    │   ├── S1_Cheese_C1.mp4
    │   ├── S1_Coffee_C1.mp4
    │   ├── S1_CofHoney_C1.mp4
    │   └── ...
    ├── groundTruth
    │   ├── S1_Cheese_C1.txt
    │   ├── S1_Coffee_C1.txt
    │   ├── S1_CofHoney_C1.txt
    │   └── ...
    ├── splits
    │   ├── test.split1.bundle
    │   ├── test.split2.bundle
    │   ├── test.split3.bundle
    │   └── ...
    └── mapping.txt

thumos14 and TVSerious

You can download thumos14 dataset in website and TVSerious in website Thumos14 dataset is temporal action localization dataset.

  • Dataset tree
─── thumos14
    ├── Videos
    │   ├── video_test_0000896.mp4
    │   ├── video_test_0000897.mp4
    │   ├── video_validation_0000482.mp4
    │   └── ...
    ├── groundTruth
    │   ├── video_test_0000897.txt
    │   ├── video_test_0000897.txt
    │   ├── video_validation_0000482.txt
    │   └── ...
    ├── val_list.txt
    ├── test_list.txt
    └── mapping.txt

Extract Optical Flow(Optional)

python tools/extract/extract_flow.py -c config/extract_flow/extract_optical_flow_fastflownet.yaml -o data/gtea
python tools/extract/extract_flow.py -c config/extract_flow/extract_optical_flow_raft.yaml -o data/gtea
python tools/extract/extract_flow.py -c config/extract_flow/extract_optical_flow_liteflownetv3.yaml -o data/gtea

Extract Feature(Optional)

python tools/extract/extract_features.py -c config/extract_feature/extract_feature_i3d_thumos14.yaml -o data/thumos14

Dataset Normalization

# count mean and std from video
# gtea
python tools/dataset_transform/transform_segmentation_label.py data/gtea data/gtea/groundTruth data/gtea --mode localization --fps 15
python tools/dataset_transform/prepare_video_recognition_data.py data/gtea/label.json data/gtea/Videos data/gtea --negative_sample_num 100 --only_norm True --fps 15 --dataset_type gtea_rgb
python tools/dataset_transform/prepare_video_recognition_data.py data/gtea/label.json data/gtea/flow data/gtea --negative_sample_num 100 --only_norm True --fps 15 --dataset_type gtea_flow

# egtea
python tools/dataset_transform/prepare_video_recognition_data.py data/egtea/egtea.json data/egtea/Videos data/egtea --negative_sample_num 1000 --only_norm True --fps 24 --dataset_type egtea_rgb

# 50salads
python tools/dataset_transform/transform_segmentation_label.py data/50salads data/50salads/groundTruth data/50salads --mode localization --fps 30
python tools/dataset_transform/prepare_video_recognition_data.py data/50salads/label.json data/50salads/Videos data/50salads --negative_sample_num 1000 --only_norm True --fps 30 --dataset_type 50salads_rgb

# thumos14
python tools/dataset_transform/transform_segmentation_label.py data/thumos14/gt.json data/thumos14/Videos data/thumos14 --fps 30

Here releases dataset mean and std config

  • gtea:
# rgb
mean RGB :[140.39158961711036, 108.18022223151027, 45.72351736766547]
std RGB :[33.94421369129452, 35.93603536756186, 31.508484434367805]
# flows
mean RGB :[0.9686297051020777, 0.9706158002294017, 0.972493270804535] * 255
std RGB :[0.039060756165796726, 0.03689212641350189, 0.03209093941013171] * 255
  • egtea:
mean RGB ∶[0.47882690412518875, 0.30667687330914223, 0.1764174579795214] * 255
std RGB :[0.26380785444954574, 0.20396220265286277, 0.16305419562005563] * 255
  • 50salads:
mean RGB ∶[0.5139909998345553, 0.5117725498677757,0.4798814301515671] * 255
std RGB :[0.23608918491478523, 0.23385714300069754, 0.23755006337414028] * 255
  • breakfast:
mean RGB ∶[0.4245283568405083, 0.3904851168609079, 0.33709139617292494] * 255
std RGB :[0.26207845745959846, 0.26008439810422, 0.24623600365905168] * 255
  • thumos14:
mean RGB ∶[0.384953972862144, 0.38326867429930167, 0.3525199505706894] * 255
std RGB :[0.258450710004705, 0.2544892750057763, 0.24812118173426492] * 255

Convert Localization Label to Segmentation Label

# egtea
python tools/dataset_transform/transform_egtea_label.py data/egtea/splits_label data/egtea/verb_idx.txt data/egtea
python tools/dataset_transform/transform_segmentation_label.py data/egtea/egtea.json data/egtea/Videos data/egtea --mode segmentation --fps 24

For EGTEA we mannul split test set

split1 test

OP01-R01-PastaSalad.mp4
OP01-R02-TurkeySandwich.mp4
OP01-R03-BaconAndEggs.mp4
OP01-R04-ContinentalBreakfast.mp4
OP01-R05-Cheeseburger.mp4
OP01-R06-GreekSalad.mp4
OP01-R07-Pizza.mp4
OP02-R01-PastaSalad.mp4
OP02-R02-TurkeySandwich.mp4
OP02-R03-BaconAndEggs.mp4
OP02-R04-ContinentalBreakfast.mp4
OP02-R05-Cheeseburger.mp4
OP02-R06-GreekSalad.mp4
OP02-R07-Pizza.mp4
P01-R01-PastaSalad.mp4
P01-R02-TurkeySandwich.mp4
P02-R01-PastaSalad.mp4
P02-R03-BaconAndEggs.mp4
P02-R04-ContinentalBreakfast.mp4
P02-R05-Cheeseburger.mp4
P02-R06-GreekSalad.mp4
P03-R01-PastaSalad.mp4
P04-R01-PastaSalad.mp4
P04-R05-Cheeseburger.mp4
P04-R06-GreekSalad.mp4
P05-R01-PastaSalad.mp4
P05-R02-TurkeySandwich.mp4
P06-R01-PastaSalad.mp4
P06-R02-TurkeySandwich.mp4
P07-R01-PastaSalad.mp4

split2 test

OP03-R01-PastaSalad.mp4
OP03-R02-TurkeySandwich.mp4
OP03-R03-BaconAndEggs.mp4
OP03-R04-ContinentalBreakfast.mp4
OP03-R05-Cheeseburger.mp4
OP03-R06-GreekSalad.mp4
OP03-R07-Pizza.mp4
OP04-R01-PastaSalad.mp4
OP04-R02-TurkeySandwich.mp4
OP04-R03-BaconAndEggs.mp4
OP04-R04-ContinentalBreakfast.mp4
OP04-R05-Cheeseburger.mp4
OP04-R06-GreekSalad.mp4
OP04-R07-Pizza.mp4
P08-R01-PastaSalad.mp4
P09-R01-PastaSalad.mp4
P09-R02-TurkeySandwich.mp4
P10-R01-PastaSalad.mp4
P10-R02-TurkeySandwich.mp4
P10-R05-Cheeseburger.mp4
P10-R06-GreekSalad.mp4
P11-R01-PastaSalad.mp4
P11-R02-TurkeySandwich.mp4
P12-R01-PastaSalad.mp4
P12-R02-TurkeySandwich.mp4
P13-R01-PastaSalad.mp4
P14-R01-PastaSalad.mp4
P14-R02-TurkeySandwich.mp4

split3 test

OP05-R03-BaconAndEggs.mp4
OP05-R04-ContinentalBreakfast.mp4
OP05-R07-Pizza.mp4
OP06-R02-TurkeySandwich.mp4
OP06-R03-BaconAndEggs.mp4
OP06-R04-ContinentalBreakfast.mp4
OP06-R05-Cheeseburger.mp4
OP06-R06-GreekSalad.mp4
OP06-R07-Pizza.mp4
P15-R01-PastaSalad.mp4
P16-R03-BaconAndEggs.mp4
P17-R03-BaconAndEggs.mp4
P17-R04-ContinentalBreakfast.mp4
P18-R03-BaconAndEggs.mp4
P18-R04-ContinentalBreakfast.mp4
P19-R03-BaconAndEggs.mp4
P19-R04-ContinentalBreakfast.mp4
P20-R03-BaconAndEggs.mp4
P20-R04-ContinentalBreakfast.mp4
P21-R03-BaconAndEggs.mp4
P21-R04-ContinentalBreakfast.mp4
P21-R05-Cheeseburger.mp4
P21-R06-GreekSalad.mp4
P22-R03-BaconAndEggs.mp4
P23-R03-BaconAndEggs.mp4
P24-R03-BaconAndEggs.mp4
P25-R06-GreekSalad.mp4
P26-R05-Cheeseburger.mp4

Customized Dataset

The easiest way to create a customized dataset is to reuse an existing dataset class, just align the format of the dataset, and then modify the file path to use it.

For example:

  • Dataset tree example
─── your_dataset
    ├── Videos
    │   ├── S1_Cheese_C1.mp4
    │   ├── S1_Coffee_C1.mp4
    │   ├── S1_CofHoney_C1.mp4
    │   └── ...
    ├── groundTruth
    │   ├── S1_Cheese_C1.txt
    │   ├── S1_Coffee_C1.txt
    │   ├── S1_CofHoney_C1.txt
    │   └── ...
    ├── splits
    │   ├── test.split1.bundle
    │   ├── test.split2.bundle
    │   ├── test.split3.bundle
    │   └── ...
    ├── file_list.txt
    └── mapping.txt
  • Config
dict(
    name = "FeatureStreamSegmentationDataset",
    data_prefix = "./",
    file_path = "./path/to/your_dataset/file_list.txt",
    feature_path = "./path/to/your_dataset/feature_files.txt",
    gt_path = "./path/to/your_dataset/ground_truth.txt",
    actions_map_file_path = "./path/to/your_dataset/mapping.txt",
    dataset_type = "50salads",
    train_mode = True,
    sliding_window = sliding_window
)

Of course you can build a new dataset class to completely customize the data processing you need.

Follow these steps:

  • Step 1: The newly constructed dataset needs to inherit the BaseDataset class
  • Step 2: Modify the DATASETPIPLINE of the model configuration file to meet the required data processing flow
  • Step 3: After inherit the BaseDataset class, you should register your dataset class
  • Step 4: Modify the config file to use

Detail in Setp 3

  • Step 3.1
from svtas.utils import AbstractBuildFactory

@AbstractBuildFactory.register('dataset')
class CustomizedDataset(BaseDataset):
    ...

# or
@AbstractBuildFactory.register('dataset')
class CustomizedDataset(ItemDataset):
    ...

# or
@AbstractBuildFactory.register('dataset')
class CustomizedDataset(StreamDataset):
    ...
  • Step 3.2 add code in file: svtas\loader\dataset\__init__.py
from .your_customized_dataset import CustomizedDataset

__all__ = [
    "CustomizedDataset"
]