Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy benchmark and inference speed #4

Open
anuar12 opened this issue Sep 5, 2018 · 4 comments
Open

Accuracy benchmark and inference speed #4

anuar12 opened this issue Sep 5, 2018 · 4 comments

Comments

@anuar12
Copy link

anuar12 commented Sep 5, 2018

Hi,

Great code, thanks a lot!

I had 2 questions:

  1. Accuracy
    Just wanted to know whether the code actually works and tracking does improve detection.

  2. Speed of tracking
    I am looking into using MultiObjDetTracker in my project. The requirements of my project is real-time processing, would you how fast is the inference time of tracking part (excluding detection) of the MultiObjDetTracker on the GPU (1050 or 1080)? I would imagine it would <20% of the detector since it has 2 layers even though LSTM layer might take more time. I am aiming for ~10 fps.

Thanks!

@ktzsh
Copy link
Owner

ktzsh commented Sep 6, 2018

Hi

Yes the code actually works :) But tracking in general is only as good as your detection priors. The main purpose of my research was to have a multiple object tracker that could exploit the temporal dependencies to track occluded objects. Training for both the task together was to test the hypothesis that certain features might be better for tracking. For general tracking I'd say tune your detector first on the dataset and freeze it before training the tracker part.

So, my dataset was focused on occlusion and there were lot of things I also wanted to try but hardware is a limitation for me now. Maybe we could collaborate on this one, if that's feasible. The training as I have seen is also very sensitive to the ratio of tracking and detection loss which you would need to experiment yourself. You could also try Bidirectional LSTMs.

As far as speed goes, I haven't done any rigorous analysis on that but I imagine 10fps should be easily achievable with a 1080 since it is basically a regression and does not use any RPNs which are slow. Additionally increasing temporal stride will give you better tracker speed. Imagine 3 forward passes for 4 frames with 2 forward passes for 6 frames. And also if you could parallelize the pre and post processing/decoding of frames like Yolo's original C implementation does, it should be quite easy to have 10fps.

Regards
Kshitiz

@anuar12
Copy link
Author

anuar12 commented Sep 6, 2018

Great, thanks for a thorough reply!

Yeah I agree that starting with a pre-trained detector and only fine-tuning the tracker would make sense. I am not 100% from the code yet but it looks like the part between detector and tracker is differentiable, so turning on the learning on the detector after finetuning the tracker would also help so that to train the whole system end-to-end.

Yep, I see your task. Occlusion is difficult.
I also used SORT that uses a kalman filter (https://github.com/abewley/sort), it's very simple and fast and accuracy is pretty decent, but looking into something that can work even better. Official ROLO code was pretty darn bad, it was very hard to believe they got the results in the paper.
My problem has lots of FNs detections, the images can be bad quality, bad lighting, dust everywhere, just general noise.

Also in terms of the detector, I am using Yolo v3 which has 3 output heads (I just have small objects, and v3 has 3 heads at different layers). If you want to improve your result overall, then the detector does all the hard work, so improving the detector is very important.

@anuar12
Copy link
Author

anuar12 commented Sep 6, 2018

I will let you how my development goes, if anything I could submit a PR.

@anuar12
Copy link
Author

anuar12 commented Sep 11, 2018

So I first wanted to train a detector KerasYOLO.py. I think I changed all the right things (anchors, labels, etc...) but I get non-sense low-confidence predictions after training, even though the loss steadily decreases. Also looking at the individual loss terms, WH term seems to be several orders of magnitude larger than the rest, I tried scaling OBJ term but I had to use extremely large values which doesn't really make sense.

Would you know what is the problem? I've used code from keras-yolo2 but not sure what changes you have made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants