14/08/2007
- Understanding SORT: Simple Online and Realtime Tracking
- The Core Components of SORT
- The SORT Tracking Process: A Three-Step Approach
- The Role of the Detector
- SORT's Strengths and Contributions
- Limitations of the Original SORT Algorithm
- Performance Metrics and Results
- Using SORT in Your Projects
- Evolution of SORT: Deep SORT
- Conclusion
- Frequently Asked Questions (FAQs)
Understanding SORT: Simple Online and Realtime Tracking
In the dynamic world of computer vision, accurately tracking multiple objects in video sequences is a fundamental yet complex challenge. Among the many algorithms developed to tackle this, SORT, or Simple Online and Realtime Tracking, stands out as a seminal work that significantly influenced the field. Published in 2016 by Alex Bewley and his colleagues, SORT presented a streamlined approach to multiple object tracking (MOT) that prioritised speed and simplicity without sacrificing considerable accuracy. This article delves into the intricacies of SORT, explaining its underlying principles, its key components, and its impact on subsequent MOT research.

SORT operates within a tracking-by-detection framework. This means it relies on an external object detector to identify potential objects in each frame of a video. The tracker's primary role is then to associate these detections across consecutive frames, assigning unique identities to each object and maintaining their trajectories over time. What set SORT apart at its inception was its pragmatic combination of established algorithms – the Kalman filter for state estimation and the Hungarian algorithm for data association – creating a system that was both efficient and effective, particularly for realtime applications.
The Core Components of SORT
SORT's effectiveness can be attributed to its clever integration of two powerful tools:
- Kalman Filter: This is a mathematical tool used for estimating the state of a system that is subject to noise and uncertainty. In the context of SORT, the Kalman filter is used to predict the next position and bounding box of an object based on its previous states. It essentially provides a sophisticated guess of where an object will be in the next frame, taking into account its motion characteristics. This prediction is crucial for associating new detections with existing tracks.
- Hungarian Algorithm: This is an algorithm used for solving the assignment problem. In SORT, it's employed to find the optimal matching between predicted object bounding boxes (from the Kalman filter) and the actual detected bounding boxes in the current frame. It does this by minimising a cost function, typically based on the intersection-over-union (IoU) between predicted and detected boxes. The Hungarian algorithm ensures that each detection is associated with the most likely track, and vice versa, minimising incorrect associations.
The SORT Tracking Process: A Three-Step Approach
The SORT algorithm proceeds through a well-defined three-step cycle for each incoming frame:
1. Prediction
The first step involves using the Kalman filter to predict the state (position, velocity, bounding box dimensions) of each currently tracked object for the current frame. This prediction is based on the object's history and its estimated motion model. These predicted bounding boxes serve as the initial estimates for association.
2. Data Association
This is the critical step where new detections are matched with existing tracks. A cost matrix is constructed, where each entry represents the similarity between a predicted bounding box and a detected bounding box. The intersection-over-union (IoU) is the most common metric used here; a higher IoU value indicates a greater overlap and thus a higher likelihood of a match. The Hungarian algorithm then processes this cost matrix to find the optimal assignment of detections to tracks. Detections that cannot be matched to any existing track are considered as potential new objects, and tracks that are not matched with any detection are marked for potential deletion.
3. Update
Once the associations are made, the Kalman filter updates the state of each matched track using the corresponding detection. This update refines the filter's estimate of the object's state, incorporating the new observation. For unmatched detections, new tracks are initiated. Tracks that have not been updated for a certain number of consecutive frames (a parameter that can be tuned) are removed from the system. This removal mechanism is designed to handle objects that have left the scene or have been occluded for too long.
The Role of the Detector
It's crucial to understand that SORT's performance is heavily reliant on the quality of the object detections it receives. The paper that introduced SORT highlighted the significant impact of the detector's accuracy on the overall tracking performance. While SORT itself is efficient, a poor-quality detector will inevitably lead to suboptimal tracking results, including increased ID switches and missed objects. The original SORT implementation often used Faster R-CNN, a powerful Convolutional Neural Network (CNN)-based object detection model, as its detector. The performance gains observed when using advanced CNN detectors like Faster R-CNN underscored the importance of integrating state-of-the-art detection methods with tracking algorithms.

SORT's Strengths and Contributions
SORT brought several significant contributions to the field of Multiple Object Tracking:
- Emphasis on Realtime Performance: By simplifying the data association process and relying on efficient algorithms, SORT achieved a remarkable speed advantage over previous methods like Multiple Hypothesis Tracking (MHT) and Joint Probabilistic Data Association (JPDA). It was reported to be approximately eight times faster than JPDA, making it feasible for applications requiring live processing.
- Demonstration of Detector Impact: SORT's success clearly demonstrated that the quality of object detections is a paramount factor in achieving high-performance tracking. This insight encouraged further research into improving object detectors, particularly those based on deep learning.
- Pragmatic and Accessible Approach: The combination of Kalman filters and the Hungarian algorithm provided a robust yet relatively simple framework that was easier to implement and understand compared to more complex, bespoke tracking algorithms.
- Open Sourcing: The decision to open-source the SORT code was instrumental in its widespread adoption and further development by the research community. This fostered collaboration and accelerated progress in MOT research.
Limitations of the Original SORT Algorithm
Despite its strengths, the original SORT algorithm had certain limitations:
- Handling Occlusions: SORT's basic implementation did not explicitly handle object occlusions. If an object was occluded for more than two consecutive frames, its track would likely be terminated and a new track would be created when the object reappeared. This often led to ID switches, where a re-emerging object was assigned a new, incorrect identity.
- Lack of Appearance Features: SORT relied solely on motion and spatial information (bounding box overlap) for data association. It did not incorporate appearance features (e.g., visual characteristics of the object). This made it difficult to re-identify objects after long periods of occlusion or when multiple objects were in close proximity and their motion patterns were similar.
- Sensitivity to Detector False Positives: The simple track deletion mechanism (removing tracks after a few unmatched frames) was partly to manage false positives from the detector. However, this also meant that short-term occlusions could lead to track termination and subsequent ID switches.
Performance Metrics and Results
The paper evaluating SORT presented its performance on the MOT benchmark, showcasing metrics such as MOTA (Multiple Object Tracking Accuracy), MOTP (Multiple Object Tracking Precision), and ID Switches. The results demonstrated that SORT, when paired with a strong detector like Faster R-CNN, could achieve competitive performance, sometimes rivaling more complex methods, particularly in terms of speed. For instance, the overall MOTA achieved with Faster R-CNN detections was reported at 34.0%, a significant improvement over other detectors.
| Sequence | MOTA | MOTP | ID Switches |
|---|---|---|---|
| TUD-Campus | 62.7 | 73.7 | 6 |
| ETH-Sunnyday | 59.1 | 74.4 | 22 |
| ETH-Pedcross2 | 45.4 | 74.8 | 77 |
| ADL-Rundle-8 | 28.6 | 71.1 | 103 |
| Venice-2 | 18.6 | 73.4 | 57 |
| KITTI-17 | 60.2 | 72.3 | 9 |
| Overall | 34.0 | 73.3 | 274 |
Note: These are illustrative results based on the paper; actual performance can vary depending on the specific detector and dataset.
Using SORT in Your Projects
Implementing SORT in your own projects is relatively straightforward, especially if you have a detection pipeline already in place. The core logic involves instantiating the `Sort` tracker and then updating it with the detections from each frame. Detections are typically provided as a NumPy array, where each row represents a bounding box with its coordinates (e.g., `[x1, y1, x2, y2]`). The `update` method returns the currently active tracks, each with an assigned ID.
from sort import * # Create an instance of the SORT tracker mot_tracker = Sort() # In each frame: # Get detections from your object detector (e.g., [x1, y1, x2, y2, confidence]) detections = get_detections_from_frame() # Update the SORT tracker with the current detections # The output is a NumPy array of active tracks, with the track ID in the last column track_bbs_ids = mot_tracker.update(detections) # Process the tracked bounding boxes and their IDs for track in track_bbs_ids: x1, y1, x2, y2, track_id = track # ... do something with the tracked object ... Evolution of SORT: Deep SORT
Recognising the limitations of the original SORT, particularly its inability to handle occlusions effectively due to the lack of appearance information, subsequent research led to the development of Deep SORT. Deep SORT enhances the original SORT by incorporating appearance features extracted using deep learning models. This allows for more robust re-identification of objects, even after prolonged occlusions. Deep SORT typically uses a deep appearance descriptor alongside the Kalman filter and IoU matching, significantly improving tracking accuracy in challenging scenarios. If you're looking for a more advanced tracker, exploring Deep SORT is highly recommended.
Conclusion
SORT, or Simple Online and Realtime Tracking, was a pivotal algorithm in the evolution of multiple object tracking. Its emphasis on efficiency, its pragmatic use of Kalman filters and the Hungarian algorithm, and its clear demonstration of the critical role of object detectors laid the groundwork for much of the subsequent research in MOT. While the original SORT has limitations, particularly with occlusions, its foundational principles remain highly relevant, and it continues to serve as an excellent baseline for developing more sophisticated tracking systems. Understanding SORT is key to appreciating the advancements made in tracking moving objects in the complex visual world.
Frequently Asked Questions (FAQs)
Q1: What is the main advantage of SORT over older tracking methods?
A1: The primary advantage of SORT is its significant speed improvement, making it suitable for realtime applications. It achieves this by simplifying the data association process compared to methods like MHT or JPDA.

Q2: How does SORT handle objects that disappear and reappear?
A2: The original SORT algorithm has a limited ability to handle reappearances. If an object is not detected for a few consecutive frames, its track is deleted. When it reappears, it's likely to be assigned a new ID, leading to an ID switch. More advanced versions like Deep SORT address this by using appearance features.
Q3: Does SORT use object appearance information for tracking?
A3: No, the original SORT algorithm does not use object appearance features. It relies solely on the motion and spatial overlap (IoU) of bounding boxes for data association.
Q4: What are the key components of the SORT algorithm?
A4: The key components are the Kalman filter for predicting object states and the Hungarian algorithm for data association.
Q5: How important is the object detector for SORT's performance?
A5: Extremely important. SORT is a tracking-by-detection framework, meaning its performance is highly dependent on the accuracy and quality of the detections provided by the external object detector.
If you want to read more articles similar to Mastering SORT: A Guide to Object Tracking, you can visit the Automotive category.
