Step up your coding game with AI-powered Code Explainer. Get insights like never before!
In this tutorial, we will learn how to perform object detection and tracking with YOLOv8 and DeepSORT.
We will use the Ultralytics implementation of YOLOv8 which is implemented in PyTorch. So the YOLO model will be used for object detection and the DeepSORT algorithm will be used to track those detected objects.
A tracker can help to identify the same object and assign it a unique ID from frame to frame even when the object detector fails to detect the object in some frames (e.g. when the object is occluded).
DeepSORT is a deep learning-based algorithm for object tracking that was introduced in 2017 in the paper Simple Online and Realtime Tracking with a Deep Association Metric by Nicolai Wojke, Alex Bewley, and Dietrich Paulus.
DeepSORT is based on the SORT algorithm that utilizes a combination of a Kalman filter for prediction and a Hungarian algorithm for data association. However, DeepSORT improves upon SORT by incorporating a deep appearance descriptor to improve the matching of objects over time.
Related: Mastering YOLO: Build an Automatic Number Plate Recognition System with OpenCV in Python.
In order to use YOLOv8 and DeepSORT, we need to install some Python packages.
There are some issues with the original DeepSORT implementation (some changes need to be made) and we want to get started quickly with object tracking, right?
So I prefer using a more real-time adaptation of DeepSORT in this tutorial.
Here are the commands to install the required Python packages:
$ pip install ultralytics # to use YOLOv8
$ pip install deep-sort-realtime
I assume that you have PyTorch and OpenCV installed on your system. If not, you can install them with the following commands:
$ pip install torch torchvision torchaudio
$ pip install opencv-python
With the packages installed, we can start coding.
Before start tracking objects, we first need to detect them. So in this step, we will use YOLOv8 to detect objects in the video frames.
Create a new Python file and name it object_tracking.py
. Then, copy the following code into it:
import datetime
from ultralytics import YOLO
import cv2
from helper import create_video_writer
# define some constants
CONFIDENCE_THRESHOLD = 0.8
GREEN = (0, 255, 0)
# initialize the video capture object
video_cap = cv2.VideoCapture("2.mp4")
# initialize the video writer object
writer = create_video_writer(video_cap, "output.mp4")
# load the pre-trained YOLOv8n model
model = YOLO("yolov8n.pt")
First thing first, we import the required packages. The create_video_writer()
function is a helper function that I created to simplify the creation of the video writer object which can then be used to save the output video.
The code for this function is obviously in the helper.py
file, make sure to download the code for this tutorial to get access to this file as well.
Then, we define some constants that we will use later. We also initialize the video capture and video writer objects.
Next, we load the pre-trained YOLOv8n
model. For testing purposes, we are using the smallest model (YOLOv8n
) in the family of YOLOv8
which is the fastest model but has the lowest accuracy.
Now, we can start looping over the video frames:
while True:
# start time to compute the fps
start = datetime.datetime.now()
ret, frame = video_cap.read()
# if there are no more frames to process, break out of the loop
if not ret:
break
# run the YOLO model on the frame
detections = model(frame)[0]
In the while
loop, we start by reading the next frame from the video capture object. If there are no more frames to process, we break out of the loop. This is especially useful when the video reaches the end so that we don't get an error.
Then, we get the detections from the model which is the YOLOv8 model that we loaded in the previous step.
To get the detections in the form of:
[[xmin, ymin, xmax, ymax, confidence_score, class_id], ...]
# example:
[[835, 15, 1054, 612, 0.94, 0], [549, 260, 679, 623, 0.91, 0], [308, 370, 589, 629, 0.84, 13]]
we can use the .boxes.data.tolist()
attribute of the model:
# loop over the detections
for data in detections.boxes.data.tolist():
# extract the confidence (i.e., probability) associated with the detection
confidence = data[4]
# filter out weak detections by ensuring the
# confidence is greater than the minimum confidence
if float(confidence) < CONFIDENCE_THRESHOLD:
continue
# if the confidence is greater than the minimum confidence,
# draw the bounding box on the frame
xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), GREEN, 2)
So here data
is a list of the form:
[xmin, ymin, xmax, ymax, confidence_score, class_id]
So we loop over all the detections (detections.boxes.data.tolist()
) and extract the confidence, if the confidence is below the confidence threshold, we skip the detection.
If the confidence is greater than the minimum confidence, we draw the bounding box on the frame.
Finally, we can draw the fps on the frame and write the frame to the output video:
# end time to compute the fps
end = datetime.datetime.now()
# show the time it took to process 1 frame
total = (end - start).total_seconds()
print(f"Time to process 1 frame: {total * 1000:.0f} milliseconds")
# calculate the frame per second and draw it on the frame
fps = f"FPS: {1 / total:.2f}"
cv2.putText(frame, fps, (50, 50),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255), 8)
# show the frame to our screen
cv2.imshow("Frame", frame)
writer.write(frame)
if cv2.waitKey(1) == ord("q"):
break
video_cap.release()
writer.release()
cv2.destroyAllWindows()
Here we are calculating the time it took to process 1 frame and then calculating the fps. We draw the fps on the frame and write the frame to the output video.
We also show the frame on our screen and wait for the user to press the q
key to break out of the loop.
The video below shows the result of the code above:
The YOLO model is detecting the two people in the video and the bench. I am getting around 15 frames per second on my laptop using the CPU. This is not bad at all.
If you check your terminal you will see the time it took to process 1 frame. On my laptop, it takes around 60-70 milliseconds to process 1 frame. We will see how the tracking algorithm will affect the fps and the processing time in the next step.
Let's move on now to the tracking part.
Building a real-time automatic number plate recognition system using YOLO and OpenCV library in Python
Download EBookWe will build on the code we wrote in the previous step to add the tracking code.
Create a new file called object_detection_tracking.py
and let's see how we can add the tracking code:
import datetime
from ultralytics import YOLO
import cv2
from helper import create_video_writer
from deep_sort_realtime.deepsort_tracker import DeepSort
CONFIDENCE_THRESHOLD = 0.8
GREEN = (0, 255, 0)
WHITE = (255, 255, 255)
# initialize the video capture object
video_cap = cv2.VideoCapture("2.mp4")
# initialize the video writer object
writer = create_video_writer(video_cap, "output.mp4")
# load the pre-trained YOLOv8n model
model = YOLO("yolov8n.pt")
tracker = DeepSort(max_age=50)
This code is similar to the code we wrote in the previous step. The only difference is that we are creating the WHITE
variable, importing the DeepSort
class from the deepsort_tracker
module, and initializing the DeepSort
object with the max_age
parameter set to 50.
The max_age
parameter is used to determine how many frames a track can be lost before it is deleted. This is useful when the object is occluded for a few frames.
Let's now write the main loop:
while True:
start = datetime.datetime.now()
ret, frame = video_cap.read()
if not ret:
break
# run the YOLO model on the frame
detections = model(frame)[0]
# initialize the list of bounding boxes and confidences
results = []
######################################
# DETECTION
######################################
# loop over the detections
for data in detections.boxes.data.tolist():
# extract the confidence (i.e., probability) associated with the prediction
confidence = data[4]
# filter out weak detections by ensuring the
# confidence is greater than the minimum confidence
if float(confidence) < CONFIDENCE_THRESHOLD:
continue
# if the confidence is greater than the minimum confidence,
# get the bounding box and the class id
xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
class_id = int(data[5])
# add the bounding box (x, y, w, h), confidence and class id to the results list
results.append([[xmin, ymin, xmax - xmin, ymax - ymin], confidence, class_id])
In the detection part above, we are using the same logic as in the previous step but this time we are adding the bounding box, confidence, and class id to the results
list because we will need this information for the tracking algorithm.
Let's now start tracking the objects:
######################################
# TRACKING
######################################
# update the tracker with the new detections
tracks = tracker.update_tracks(results, frame=frame)
# loop over the tracks
for track in tracks:
# if the track is not confirmed, ignore it
if not track.is_confirmed():
continue
# get the track id and the bounding box
track_id = track.track_id
ltrb = track.to_ltrb()
xmin, ymin, xmax, ymax = int(ltrb[0]), int(
ltrb[1]), int(ltrb[2]), int(ltrb[3])
# draw the bounding box and the track id
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), GREEN, 2)
cv2.rectangle(frame, (xmin, ymin - 20), (xmin + 20, ymin), GREEN, -1)
cv2.putText(frame, str(track_id), (xmin + 5, ymin - 8),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, WHITE, 2)
# end time to compute the fps
end = datetime.datetime.now()
# show the time it took to process 1 frame
print(f"Time to process 1 frame: {(end - start).total_seconds() * 1000:.0f} milliseconds")
# calculate the frame per second and draw it on the frame
fps = f"FPS: {1 / (end - start).total_seconds():.2f}"
cv2.putText(frame, fps, (50, 50),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255), 8)
# show the frame to our screen
cv2.imshow("Frame", frame)
writer.write(frame)
if cv2.waitKey(1) == ord("q"):
break
video_cap.release()
writer.release()
cv2.destroyAllWindows()
In the code snippet above, we are updating the tracker with the new detections and then looping over the tracks.
If the track is not confirmed, we ignore it, otherwise, we get the track id and the bounding box and draw the bounding box and the track id on the frame.
Let's see how the tracking algorithm performs on the video:
As you can see, each person is assigned a unique id. Notice how the DeepSORT algorithm is able to keep track of the little boy even when he is hidden behind the second person.
The frame per second dropped to ~5 fps. Also if you check the terminal, you will see that it takes around 200 milliseconds to process 1 frame. But don't forget that this time includes the detection part as well.
Using the GPU will certainly improve the performance but my goal here was to show you how to use the DeepSORT tracker with the YOLOv8 object detector.
So there you have it! We have successfully implemented DeepSORT with YOLOv8 to perform object detection and tracking in a video.
By combining the power of YOLOv8's accurate object detection with DeepSORT's robust tracking algorithm, we are able to identify and track objects even in challenging scenarios such as occlusion or partial visibility.
I hope that you found this tutorial helpful in understanding how to implement object detection and tracking with YOLOv8 and DeepSORT. If you have any questions or feedback, please let me know in the comments below!
You can find the complete code for this tutorial here.
Finally, if you want to dive more into the exciting world of object detection, I suggest you see our comprehensive guide, Mastering YOLO: Build an Automatic Number Plate Recognition System. Whether you're a Python programmer, a hobbyist in computer vision, or a professional developer looking to advance your skills, this book offers a practical, hands-on approach to understanding and implementing YOLO. From setting up your environment to training the model and deploying an ANPR system, this book is a complete roadmap. What's more, it comes with lifetime access to future revisions, source code, and a 30-day money-back guarantee! Elevate your skillset and create real-world solutions with our step-by-step tutorials and clear explanations. Get your digital copy today!
Learn also: Age and Gender Detection using OpenCV in Python.
Happy coding ♥
Liked what you read? You'll love what you can learn from our AI-powered Code Explainer. Check it out!
View Full Code Switch My Framework
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!