Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training YOWO on a customized dataset #98

Open
tanthinhdt opened this issue Apr 11, 2024 · 2 comments
Open

Training YOWO on a customized dataset #98

tanthinhdt opened this issue Apr 11, 2024 · 2 comments

Comments

@tanthinhdt
Copy link

Hi, I have a dataset for action recognition. I also organized the dataset following the UCF dataset's format and tried training YOWO on it using the UCF settings. However, I keep getting this error. Can anyone help me?

The error

image

Configurations

TRAIN:
  DATASET: vsl
  BATCH_SIZE: 1
  TOTAL_BATCH_SIZE: 128
  LEARNING_RATE: 1e-4
  EVALUATE: False
  FINE_TUNE: False
  BEGIN_EPOCH: 1
  END_EPOCH: 10
SOLVER:
  MOMENTUM: 0.9
  WEIGHT_DECAY: 5e-4
  STEPS: [2, 3, 4, 5]
  LR_DECAY_RATE: 0.5
  ANCHORS:
    [
      0.70458,
      1.18803,
      1.26654,
      2.55121,
      1.59382,
      4.08321,
      2.30548,
      4.94180,
      3.52332,
      5.91979,
    ]
  NUM_ANCHORS: 5
  OBJECT_SCALE: 5
  NOOBJECT_SCALE: 1
  CLASS_SCALE: 1
  COORD_SCALE: 1
DATA:
  NUM_FRAMES: 16
  SAMPLING_RATE: 1
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 224
  MEAN: [0.4345, 0.4051, 0.3775]
  STD: [0.2768, 0.2713, 0.2737]
MODEL:
  NUM_CLASSES: 98
  BACKBONE_3D: resnext101
  BACKBONE_2D: darknet
WEIGHTS:
  BACKBONE_3D: "weights/resnext-101-kinetics.pth"
  BACKBONE_2D: "weights/yolo.weights"
  FREEZE_BACKBONE_3D: False
  FREEZE_BACKBONE_2D: False
BACKUP_DIR: "backup/vsl"
RNG_SEED: 1
LISTDATA:
  BASE_PTH: "data/vsl/yowo_vsl"
  TRAIN_FILE: "data/vsl/yowo_vsl/trainlist.txt"
  TEST_FILE: "data/vsl/yowo_vsl/testlist.txt"
  TEST_VIDEO_FILE: "data/vsl/yowo_vsl/testlist.txt"
  MAX_OBJS: 1
  CLASS_NAMES: [
    "Con chó",
    "Con mèo",
    "Con gà",
    "Con vịt",
    "Con rùa",
    "Con thỏ",
    "Con trâu",
    "Con bò",
    "Con dê",
    "Con heo",
    "Màu đen",
    "Màu trắng",
    "Màu đỏ",
    "Màu cam",
    "Màu vàng",
    "Màu hồng",
    "Màu tím",
    "Màu nâu",
    "Quả dâu",
    "Quả mận",
    "Quả dứa",
    "Quả đào",
    "Quả đu đủ",
    "Quả cam",
    "Quả bơ",
    "Quả chuối",
    "Quả xoài",
    "Quả dừa",
    "Bố",
    "Mẹ",
    "Con trai",
    "Con gái",
    "Vợ",
    "Chồng",
    "Ông nội",
    "Bà nội",
    "Ông ngoại",
    "Bà ngoại",
    "Ăn",
    "Uống",
    "Xem",
    "Thèm",
    "Mách",
    "Khóc",
    "Cười",
    "Học",
    "Dỗi",
    "Chết",
    "Đi",
    "Chạy",
    "Bận",
    "Hát",
    "Múa",
    "Nấu",
    "Nướng",
    "Nhầm lẫn",
    "Quan sát",
    "Cắm trại",
    "Cung cấp",
    "Bắt chước",
    "Bắt buộc",
    "Báo cáo",
    "Mua bán",
    "Không quen",
    "Không nên",
    "Không cần",
    "Không cho",
    "Không nghe lời",
    "Mặn",
    "Đắng",
    "Cay",
    "Ngọt",
    "Đậm",
    "Nhạt",
    "Ngon miệng",
    "Xấu",
    "Đẹp",
    "Chật",
    "Hẹp",
    "Rộng",
    "Dài",
    "Cao",
    "Lùn",
    "Ốm",
    "Mập",
    "Ngoan",
    "",
    "Khỏe",
    "Mệt",
    "Đau",
    "Giỏi",
    "Chăm chỉ",
    "Lười biếng",
    "Tốt bụng",
    "Thú vị",
    "Hài hước",
    "Dũng cảm",
    "Sáng tạo",
  ]
@ValuableCache
Copy link

the source code lacks of usage instructions. Not worth putting more effort into it.

@lamnguyenvux
Copy link

You should describe the structure of your dataset. This algorithm is more like spatial temporal localization where it both classifies and localizes action in video. That means you also need bounding box annotation for each video frame. Have you prepared that?

The annotation files are txt files, exactly the same as yolo annotation format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants