- trimbag.py trims a bag file to the specified time range
- video_box.py takes in a bag file and generates bounding prism/lables on the cars detected, outputting a video to check qualitative results
- box_csv.py produces a csv file that contains the coordinates of four corners of the bounding labels.
- gen_darknet_label.py populates a directory in darknet format (images and labels)
Recommended Usage:
- Trim your large bag into a small portion (1-2 minutes) using trimbag.py
- Run video_box.py and visually check the accuracy/consistency of labeling
- Create the necessary directories to store images/labels in darknet format
- Run gen_darknet_label.py to populate the image/label directories
Label generation pipeline:
- Transform from the tracked mocap marker to the base_link of each car.
- Transform from base_link to centroid of observed cars, and camera of observing car.
- Transform from centroid of observed cars to find corners of bounding prism.
- Project 3D coordinates of corners onto the video frame.
- We take the min and max of x and y of the eight corners and produce the labels/bounding prisms.
- [gen_darknet_label only] We save the labels in darknet format (centerX, centerY, width, height)
Transforms:
- Transforms are stored as a 4x4 numpy array, and represent the rotation and translation of a point with respect to some reference frames.
- A_T_B represents object A w.r.t. object B.
- To change reference frames, we can multiply by other transforms.
- C_T_B matmul A_T_B = A_T_C
- To take the inverse, we use the inverse_transform method in video_box.py and gen_darknet_label.py. Taking the inverse of the numpy array will not perform the expected operation.
Using car24 and car26 as an example: mark24_T_cam26 = base26_T_cam26 * mark26_T_base26 * world_T_mark26 * mark24_T_world center24_T_cam26 = mark24_T_cam26 * base24_T_mark24 * center24_T_base24
mark26_T_world and mark24_T_world is given by the mocap data, world_T_mark26 is the inverse of mark26_T_world