How to Train YOLO Model to Detect Distracted Drivers

Sam Ansari
8 min readAug 1, 2022

Distracted driving is any activity that diverts a driver’s attention while driving a motor vehicle. This includes activities such as texting, talking on the phone, drinking, doing makeup and hair, fiddling with the stereo and radio systems and talking to fellow passengers.

Distracted driving is one of the leading causes of deaths on the US roads. According to the National Highway Traffic Safety Administration (NHTSA), the distracted driving killed 3,142 people in the US in 2020 (source: https://www.nhtsa.gov/risky-driving/distracted-driving).

Source: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813266

In this short article, we will explore a computer vision and machine learning technique that will be able to detect drivers’s distraction in realtime. It is our hope that such system, when implemented in practice, will help save thousands of lives.

I will illustrate how to train a YOLO model from a labeled set of images. You Only Look Once or YOLO is a popular state-of-the-art object detection algorithm that can efficient detect objects within an image. YOLO v5 is an open source implementation of YOLO by Ultralytics. Here is the github repository for more details on YOLO https://github.com/ultralytics/yolov5.

Dataset: For this article, we will utilize an already labeled image set that is publicly and freely available at Roboflow at https://universe.roboflow.com/ipylot-project/distracted-driving-v2wk5. You may have to create an account to access the labeled images.

Dataset contains about 7000 images in training set, 1000 in validation and 1000 in test sets. There are 12 classes labeled from 0 through 11. These labeled classes are mapped as follows:

0: Safe Driving

1: Texting

2: Talking on the phone

3: Operating the Radio

4: Drinking

5: Reaching Behind

6: Hair and Makeup

7: Talking to Passenger

8: Eyes Closed

9: Yawning

10: Nodding Off

11: Eyes Open

The class distribution is shown below in Figure 1.0

Figure 1.0: Class distribution of various driving distractions

Downloading the Dataset: Visit the above URL and click on the “Download this Dataset” button. Select “YOLO v5 PyTorch” format from the dropdown options and check “Download zip to computer” and click the “Continue” button. See Figure 1.1 below for an example.

Figure 1.1: Roboflow screenshot showing the download options for YOLO v5 PyTorch to local computer.

If you want to label your own images using any other annotation tools, such as LabelImage, follow the guidelines of section 1.2 on this page https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data#12-create-labels-1.

Ensure your directory structure of the labeled images looks like the one shown below in Figure 1.2.

Figure 1.2: Directory structure of the labeled images

The images directory contains the actual .png or .jpg images. The labels directory contains a .txt file corresponding to each image in the images directory. The .txt file name is exactly the same as the corresponding image file name except the extension.

The *.txt file specifications are:

  • One row per object
  • Each row is class x_center y_center width height format.
  • Box coordinates must be in normalized xywh format (from 0–1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
  • Class numbers are zero-indexed (start from 0).

If the image contains 2 objects, the label .txt file will contain two lines as shown in Figure 1.3

Figure 1.3: Sample label file containing two rows for two objects

For the purpose of this article, we assume that you downloaded labeled images from Roboflow to your local computer and it is a zip file.

YOLO v5 Model Training Tool: We will explore how to use Momentum AI’s computer vision training tool to train the YOLO v5 model to detect driver distraction. Create an account by visiting https://one.accure.ai:5555/ and login to it with your username and password.

Uploading Labeled Data to Momentum: Scroll down to locate “Data Management” on the left side menu list, click to launch the page to upload the zip file that you downloaded from Roboflow.

Enter the name of the directory to which you want to upload the zip file containing images and annotations. For example, I entered the directory name “driving” as shown in the screenshot in Figure 1.4 below. To upload, drag and drop the zip file from your local computer over to designated area on this page. Wait until the file is fully uploaded.

Figure 1.4: File upload screen. Notice the Data Management menu, directory name, and file upload area.

After the file is fully uploaded, expand the top level directory, and then the directory you just uploaded the file to. You should see the expanded form of the directory structure that should look something like the one shown in Figure 1.5 below.

Figure 1.5: Directory structure to show the expanded form of the uploaded images and labels data

Training YOLO v5 Model: From the left menu list, click “Train New Model” option and select “YOLOv5 Object Detection”. This will open a form to configure your YOLO model parameters. Figure 1.6 shows a sample training configuration. The form fields are explained below.

Figure 1.6: YOLO v5 configuration screen

Name: A user defined name

Description: Give a detailed description of the model and its purpose.

model_name: Give a meaningful name. This will be the name of the model file after the training is completed.

model_version: Specify a version number

train_dir: Path to the training directory that contains the images and labels subdirectories. For our example, we have the path as driving/train.

validation_dir: The path to the validation directory

test_dir: path to the test directory

model_output_dir: This is the path where the trained model will be stored. For our example, we entered driving/output.

num_classes: Specify the number of object classes you are training to detect for. In our case, we have 12 classes.

class_names: Comma separated list of class names, for example, c0 — Safe Driving,c1 — Texting,c2 — Talking on the phone,c3 — Operating the Radio,c4 — Drinking,c5 — Reaching Behind,c6 — Hair and Makeup,c7 — Talking to Passenger,d0 — Eyes Closed,d1 — Yawning,d2 — Nodding Off,d3 — Eyes Open

image_size: Specify the dimension to which all images will be resized to. This will depend on the transfer learning model that we will select next. If we select YOLOv5, input size 640x640, then the image size should be 640. If we select YOLOv5, input size 1280x1280, then the image size should be 1280. In our case, we will use YOLOv5, input size 640x640 for the transfer learning and hence the image_size will be 640.

epochs: This is the maximum number of iterations the training should run.

batch_size: This is the number of images in a single batch of training. Enter appropriate batch size based on your hardware memory. Since we are running this example model on a small GPU machine, we are using the batch size 4 but you should consider using a larger batch size, such as 32 or 64 or higher.

transfer_learning_using: Select the appropriate YOLO pre-trained model.

cache_images: Select Yes if you have large enough memory to keep all images in cache to speed up the training. Otherwise, select No.

Click the Submit button to save the model configuration.

Starting the YOLO Model Training: After saving, the page will transition to show the model details. You can also navigate to this page by expanding “My Models” on the left menu list and clicking on the model name.

On the model detail page, click on the green “Start” button to start the model.

It will take a while to transfer all the images and annotations to the training cluster and during that time you might see a message saying the model training has failed. Ignore that “Failed” message specially within a few minutes of the model launch. Keep refreshing the page until you will start seeing the status as “Running” with the spinner spinning.

Monitoring the Training: As shown in Figure 1.7 below, the model detail page shows the training status and the latest 1000 lines of the training logs.

Figure 1.7: YOLO model training monitoring screen showing logs, losses, precision and recall curves.

The monitoring screen also shows various types of losses and metrics as shown in Figure 1.8 below.

Figure 1.8: Training losses, precision, recall, mAP@0.5 and mAP@05-.95

Displaying Evaluation Results: On the model details and monitoring page, click the “Evaluate” button to display the evaluation result. If the model is still training, you will see the training evaluations only. After the model is done training, it will show the evaluation based on the test data. Here is the screenshot (Figure 1.9) showing the evaluation results. Click on the thumbnail images to see the enlarged view.

Figure 1.9: Model evaluation results.

After the model is fully trained, clicking on the “Evaluate” button will show thumbnails of evaluation results. Clicking on the thumbnail shows an enlarged view of the evaluation result (Figure 1.10 below).

Figure 1.10: Evaluation result example

Downloading the Trained Model: Click on the “Download Model” button to download the model in ONNX format. It may take a while to download the model, so, wait until the browser spinner does not stop spinning and model file is not downloaded.

Using the Trained Model for Inference: After you download the model, save them to a location within your local computer or server.

  1. Download the latest YOLO v5 source from github using the command:

git clone https://github.com/ultralytics/yolov5

2. Install the dependencies using the command:

cd yolov5

pip install -r requirements.txt

3. After all the requirements are successfully installed, navigate to the directory where detect.py is located.

python yolov5/detect.py — weights path_to_onnx_file — source path_to_input_images — project path_to_save_output

In the above command the arguments should be appropriately changed to match your computer or server paths.

If the above command successfully runs, the output images with bounding boxes around the detected objects will be stored.

References:

  1. Ansari S. (2020). Building Computer Vision Applications Using Artificial Neural Networks. Apress. 10.1007/978–1–4842–5887–3_4, https://link.springer.com/book/10.1007/978-1-4842-5887-3
  2. https://github.com/ultralytics/yolov5

--

--

Sam Ansari

CEO, author, inventor and thought leader in computer vision, machine learning, and AI. 4 US Patents.