Please prepare a large number of image data captured with your smartphone or camera. For testing purposes, I have prepared images of rock, paper, and scissors. Don't forget to convert them to the png or jpg format.
And this time, we will be using the Easy-Peasy Ultimate AI Tool that I created.
First, git clone
:
git clone https://github.com/TakanoTaiga/ml.git
When you clone, you will get a repository with files and folders like this:
Then, please put all the prepared images into the input_image
folder. (No preprocessing is required at this stage)
Next, open a terminal, resize the images, convert them all to jpg, and rename them using the following script:
cd ./ml
python3 set_format.py
You will see that the out_image
folder is generated. Check inside to ensure that there are plenty of images.
We will use the coco-annotator for annotation.
https://github.com/jsbroks/coco-annotator
It's easy to use, just use docekr-compose
:
git clone https://github.com/jsbroks/coco-annotator.git
cd coco-annotator
docekr compose up
Now, access localhost:5000. You can stop it with ctrl-c, but afterward, start and stop using the following commands:
cd coco-annotator
docker compose start
cd coco-annotator
docker compose stop
Once it's up, create a dataset in coco-annotator and copy the contents of the out_image
folder prepared earlier into it. (Do not delete or crop any images within the out_image
folder).
After annotating, download the JSON file of the annotation results and copy it to the ml/input_label
folder.
After completing the data preparation, it's time for training. To train, you need to convert the current coco format to the yolo format. I've prepared a script for that, so let's execute it:
cd ml
python3 coco2yolo.py
You will see many folders generated. Once the train_rtdetr_xxxxx.py
files are generated, setup is complete.
Now, let's start the training.
Start the container and then execute the generated python file:
./start.sh
After starting, (replace xxxxx
appropriately):
python3 train_rtdetr_xxxxx.py
Once training is complete, the weight files and log data will be generated in ml/run/trainNN
. This completes the training.
You should find a .pt
file in the train/weight
folder. Place either the best or last one in a suitable folder. Then, save the following code to a Python file and execute it. It will automatically read from the camera and start inference. Replace hogehoge.pt
accordingly:
import cv2
from ultralytics import RTDETR
model = RTDETR('hogehoge.pt')
# Open the web camera stream
cap = cv2.VideoCapture(0)
while cap.isOpened():
success, frame = cap.read()
k = cv2.waitKey(1)
if k != -1:
break
if success:
results = model.predict(frame, conf=0.7, half=True)
annotated_frame = results[0].plot()
cv2.imshow("RT-DETR Inference", annotated_frame)
cap.release()
cv2.destroyAllWindows()