Rise of multiple objects tracking in a real-time setting
MOT metrics are metrics used to evaluate the accuracy of tracking algorithms. There are two primary metrics that experts consider while evaluating tracking algorithms,
1. MOTP (Multiple Object Tracking Precision):
It measures the accuracy of localization of detection boxes. It’s much similar to the mAP metrics.
2. MOTA (Multiple Object Tracking Accuracy):
- Miss Detection: Object present in ground truth but not got detected by the detection algorithm.
- False-positive: Object not present in the ground truth, but the detection algorithm detects it as objects (False detections).
- Mismatch error: Object in ground truth is falsely related to some other object due to false tracking.
1. Intersection over Union (IOU)
Intersection over Union (IOU) is a metric to evaluate the object detector’s accuracy. It is calculated between two bounding boxes’ (Ground truth & Predicted bounding boxes) overlap area with union area.
2. Squared Euclidean distance:
Euclidean pixel distance is the line segment’s length between the two points in pixel coordinates. The distance between the center point of GT and the Prediction box will be calculated.
The Squared Euclidean pixel distance between two points (x1, y1) and (x2, y2) is,
d = (x1 – x2)2 + (y1 – y2)2
Note: The most preferred method for calculating correspondence is the IOU method.
2. If the Ground Truth object has NO MATCH in detection output, then the count of MISS(FNt) will be incremented by 1(+1).
3. If the object in detection output has NO MATCH in the real world (Ground Truth), it will be considered false detection. In that case, the count of FALSE POSITIVES(FPt) will be incremented by 1(+1).
Consider in Frame 1, and the correspondence is made between Ground Truth person(P1) with detection model output person (PD3) the pair will be saved as (1, 3). In a case in the next frame ground truth person (P1) is paired with a different person who is in detection output (PD4), then the MISMATCH ERROR(IDSt) will be incremented by 1(+1). The pair (1,3) will be deleted, and the pair (1,4) will be considered in future frames.
dt – Distance between the localization of objects in the ground truth and the detection output
ct – total matches made between ground truth and the detection output
MOTA Calculation for a Sample Frame
MOTA = 1 – ((1 + 1 + 1) / 4) = 0.25
NOTE: This is illustrated for understanding, but in real-time, Misses, False positives & Mismatch Errors will be calculated for every frame of the video, and finally, MOTA will be calculated
MOTP Calculation for a Sample Frame
Distance, dt = 1 – IOU = 0.22
Total match, ct = 1
MOTP = dt/ct = 0.22 / 1 = 0.22.
NOTE: This is illustrated for understanding, but in real-time, Distances & total matches will be calculated for every frame of the video, and finally, MOTP will be calculated
What these values signify?
1. Mostly Tracked – Accuracy based on several objects tracked for at least 80 percent of lifespan.
2. Partially Tracked – Accuracy based on several objects tracked between 20 and 80 percent of lifespan.
3. Mostly Lost – Accuracy based on several objects tracked less than 20 percent of lifespan.
Enable vTrack For Human/People and Object Detection.
To know more, Feel free to reach us at firstname.lastname@example.org