What are the trade-offs between accuracy and speed one should know in people/human detection solutions?

Human/People Detection accuracy

Summary

The people and object detection concept has provided a massive boost to the field of computer vision, which is now advancing at a breakneck speed.
Speed and accuracy are the two crucial factors that should be considered while developing a robust people detection solution.
This article aims to give a clear insight into the speed-accuracy trade-off in the people/ human detection solution.

written for:

Engineers and Project managers who are in the process of determining the right People Detection system for their particular product and use-cases.

Introduction

Speed and Accuracy are the two essential parameters any developer should consider while developing an Edge inferencing model. Since Edge hardware has limited computation capability, we need to do a trade-off between Speed and Accuracy to arrive at an acceptable performance.
Speed and Accuracy are inter-dependent parameters. When you try to increase the Accuracy by making architecture more complex, you will lose speed, and likewise, if you want to increase speed by making the architecture more straightforward, you have to pay for Accuracy. In simpler words, both are in inverse relation, and one reduces the other. So, one has to find some point where model speed and Accuracy should be fair enough to consider.

Model Development

Model development mainly involves three phases:
  • Building a Model
  • Evaluate the Model
  • Deploy the Model
So once a model is built, we usually evaluate our model to know how accurately our model predicts the correct object for which we trained. One more manageable way to evaluate is to run inference on some videos and manually see the prediction outcomes. An alternate way is to use the standard approach of computing mAP with respect to the ground truth.

Speed Metric

In real-time video inferencing, speed is commonly measured in FPS (Frames Per Second), I.e., per second, how many frames can be processed by the model in the target hardware. The lower the processing time, the higher the speed of the model.
Speed and Accuracy trade-offs can be done mainly based on the business requirement in the requirement’s criticality. Some business use case requires a very high accuracy that needs to be managed with the right hardware, but some business use case puts a limitation in using sophisticated hardware, limiting the processing speed.
There are use cases like Cancer-cell detections which are specifically optimized for Accuracy, and there are use cases like autonomous driving where real-time performance is required. There is some case where we need to get both better speed, and as well as Accuracy, the only way to manage such requirement is by using right hardware to process it quickly.
Model selection is an art, and we have to select a model that achieves the right combination of speed and accuracy balance for the given application and hardware.
Here is the performance of the Yolo Model in Jetson TX2:
Resolution

Improving Accuracy

In YoloV4, they introduced a new term called “Bag of Freebies.” The methods that only change the training strategy or increase training time to improve Accuracy without increasing the inference cost are called Bag of Freebies.

One such method that meets Bag of Freebies’ definition is Data Augmentation, commonly used in object detection to improve Accuracy. Data Augmentation focuses on boosting input image variability so that the constructed object detection model is more resilient to images acquired from various environments.

Loss function or objective function is also one such method that meets the definition of Bag of Freebies. It’s good to use different loss functions according to the requirement. There are many loss functions available to try, so we have to choose the correct loss function as per the requirement, and it plays a significant role in improving the model accuracy.
There are also other ways to improve Accuracy, like increasing the model input size and adding a layer to increase the number of learnable parameters. But then you have to pay for speed because these methods would increase the inference time as well.

Improving Speed

  • Model Pruning
  • Model Precision Change
  • Hardware specific optimization

Model Pruning is the process of deleting unused nodes to reduce the number of learnable parameters. Pruning requires the model to be retrained with an entire dataset to adjust the learnable parameters for the change we introduced.

Model Precision can also be reduced to improve speed. Usually, the CNN models are trained at higher precision, say FP32, you can reduce the model to FP16 or INT8 precision to improve the speed. The vital thing to remember here is that you need to check if your target hardware can handle these precision levels. Also, changing the model to INT8 precision requires a process called Quantization, a calibration done with the minimal or full dataset used for training.

Additionally, some of the SoC vendors provide tools for optimizing the model with respect to their hardware, and we can utilize them to optimize our model performance. For example, for Jetson boards, Nvidia provides TensorRT, which takes the model as input and is optimized for the target hardware to run faster with minimal Accuracy loss during the deployment stage on the Edge device.

Resolution

Conclusion

Speed vs. accuracy is an unfortunate trade-off that engineers must fix. However, by understanding the business use case, one can prioritize it correctly and then, via a series of tactical interventions, ensure that robust balance is struck between speed and Accuracy.

VisAI Platform – Human/People Detection Solution

VisAI Labs has understood the essential need for Edge-optimized human/people detection algorithms that satisfy a varied range of use cases and pre-tested through diverse test scenarios to optimize the product/solution.
VisAI Labs human/People detection algorithms are constructed exclusively for People counting, Congestion detection, Intrusion detection, Social distancing checker, Dwell Time detection, and building monitoring.
Share this post
Need help to accelerate your Computer Vision journeys?

Thank you