Explaining Oriented Bounding Box training and inference in YOLOv5

Oriented Bounding Box annotations were explained in this article. This article explains the YOLOv5 training and inference methods using the Oriented Bounding Box annotation data generated.

The original YOLOv5 cannot handle Oriented Bounding Box (OBB). Use Yolov5 for Oriented Object Detection (yolov5_obb), which provides an Oriented Bounding Box extension to YOLOv5.

Conversion of annotation data
Splitting training and test data
Training
Inference
Reference

Conversion of annotation data

In this article, the Oriented Bounding Box annotation format was explained. There were <cx> <cy> <w> <h> and <angle> in <robndbox>...</robndbox>, which were the x coordinate of the center of the Oriented Bounding Box, the y coordinate of the center, width, height, and angle, respectively.

According to https://github.com/hukaixuan19970627/yolov5_obb/blob/master/docs/GetStart.md, the Oriented Bounding Box for yolov5_obb is represented by (x₁, y₁, x₂, y₂, x₃, y₃, x₄, y₄), where (x_i, y_i) represents the coordinates of the i-th Oriented Bounding Box, and the coordinates seem to move clockwise. The coordinates are followed by the name and the difficulty. difficulty can be zero. In yolov5_obb, the coordinates, name, and difficulty are separated by spaces as follows:

x1 y1 x2 y2 x3 y3 x4 y4 name 0Code language: plaintext (plaintext)

Download the script for converting XML annotation data to yolov5_obb annotation data from here and convert the annotation data. The XML annotation data to be converted below is the annotation data created in this article.

git clone https://github.com/otamajakusi/yolov5-utils.git
python3 yolov5-utils/voc2yolo5_obb.py --path data/car-parking.xml --class-file data/classes.txtCode language: Bash (bash)

You can see that the converted annotation data data/car-parking.txt.

384.90906545146345 769.5000055959972 274.67376491516154 714.9137141146996 301.27973454853657 661.1837944040028 411.5150350848385 715.7700858853004 car 0
416.90906545146345 713.5000055959972 306.67376491516154 658.9137141146996 333.27973454853657 605.1837944040028 443.5150350848385 659.7700858853004 car 0
443.90906545146345 656.5000055959972 333.67376491516154 601.9137141146996 360.27973454853657 548.1837944040028 470.5150350848385 602.7700858853004 car 0
484.90906545146345 601.5000055959972 374.67376491516154 546.9137141146996 401.27973454853657 493.1837944040028 511.5150350848385 547.7700858853004 car 0
...Code language: plaintext (plaintext)

To be sure that the annotation data has been created correctly, the following script can be used.

python3 yolov5-utils/show_obb.py --image data/car-parking.jpg --anno data/car-parking.txtCode language: Bash (bash)

This script can be also used for XML annotation data as well as yolov5_obb annotation data.

Splitting training and test data

You can also use your own GPU for learning, but for ease of setup, Google Colaboratory is used here.
Access Google Colaboratory and select New notebook.

To use GPU instances, set the hardware accelerator.

Select Runtime > Change runtime type

Select Hardware accelerator > GPU

Next, before starting the training, connect the Colaboratory to Google Drive in your Google login account and save the data to it to be able to keep the data.

Click the folder icon, then click Mount Drive.

Click Connect to Google Drive

change directory to Google Drive directory.

%cd drive/MyDriveCode language: Bash (bash)

In Colaboratory, enter the command to be executed in the text box below and press the play button to execute the command. The leading % and ! at the beginning of the command must also be entered.

Clone yolov5_obb

!git clone https://github.com/hukaixuan19970627/yolov5_obb
%cd yolov5_obbCode language: Bash (bash)

Now, in order to train, a lot of training data (images and their annotated data) is required. For the purpose of this exercise, we will use the training data of an analog meter that we prepared here.

!gdown 1XwnMKQKSyyvMTfcHNDUu7sqnJpzvgChY
!unzip analog-meter.zipCode language: Bash (bash)

Images and XML annotation data are located in a directory named 1. The class files are located in classes.txt. Download the script to convert the annotation data and convert it. The script will convert all *.xml files under the directory if you specify a directory in the –path field.

!git clone https://github.com/otamajakusi/yolov5-utils
!python3 yolov5-utils/voc2yolo5_obb.py --path 1 --class-file classes.txtCode language: Bash (bash)

Next, the data is split into training and test data. yolov5_obb requires the following directory structure

├── images
│   ├── train
│   │   ├── 0123.jpg
│   │   ├── ...
│   │   └── 4567.jpg
│   └── valid
│       ├── 89ab.jpg
│       ├── ...
│       └── cdef.jpg
└── labelTxt
    ├── train
    │   ├── 0123.txt
    │   ├── ...
    │   └── 4567.txt
    └── valid
        ├── 89ab.txt
        ├── ...
        └── cdef.txtCode language: plaintext (plaintext)

yolov5 requires a labels directory, whereas yolov5_obb requires a labelTxt directory.

Split the data with a data splitting script. Since the data splitting script splits the data into images and labels directories, a symbolic link to labelTxt is created in the labels directory to support yolov5_obb.

!python3 yolov5-utils/data_split.py --datapath 1 --outpath analog-meter
!(cd meter;ln -s labels labelTxt)Code language: Bash (bash)

Since yolov5_obb does not support torch version 1.12, install the torch library that yolov5_obb supports beforehand, and then install yolov5_obb’s requirements.txt file by pip.

!pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.htmlCode language: Bash (bash)

Build libraries and install packages.

!(cd utils/nms_rotated; python3 setup.py develop)
!python3 -m pip install -r requirements.txtCode language: Bash (bash)

Now we will create the configuration data for the data to be trained: two classes, needles and scales.

%%bash
cat << EOS > data/analog-meter.yaml

train: analog-meter/images/train/    # train images (relative to 'path') 
val: analog-meter/images/valid/   # val images (relative to 'path') 

# Classes
nc: 2  # number of classes
names: [
  'needle',
  'scale',
]
EOSCode language: Bash (bash)

Training

Start training.

!python3 train.py --data data/analog-meter.yaml --cfg models/yolov5m.yaml --batch-size 4 --device 0 --epochs 10000Code language: Bash (bash)

In this environment, the study seems to be terminated after about 500 times.

Show tensorboard.

%load_ext tensorboard
%tensorboard --logdir runs/train/expCode language: Bash (bash)

When we study on our own PCs, we can see the graphs in real time, but in the Colaboratory we can only see them after the training is finished.

Inference

Inference on test data.

!python detect.py --weights runs/train/exp/weights/best.pt --source analog-meter/images/valid --device 0 --hide-conf --hide-labelsCode language: Bash (bash)

Take a look at the image that was created.

import glob
from IPython.display import Image,display_jpeg

imgs = glob.glob("runs/detect/exp/*")
for img in imgs:
  display_jpeg(Image(img))Code language: Bash (bash)

The results are not very good, but may be improved by increasing the training data.

That’s all.

Conversion of annotation data

Splitting training and test data

Training

Inference

Reference