Top 3 techniques to improve people/ human detection accuracy

Person identification_view


Human/people detection is one of the machine learning applications gaining much traction in the AI community.
Close human detection is critical in different fields of use, such as abnormal event detection, person identification, and gender classification in a visual environment.
The first step of the detection process is to identify a moving target accurately. This blog discusses three top techniques to increase the accuracy of people/human detection.

Written for:

Project managers and engineers who are in the process of determining the best People Detection system for their particular product/use case.


n the previous blogs, we have addressed some of the technical problems associated with People Detection solutions and provided few techniques to overcome them. This blog would be an extension of the previous one giving the audience accustomed to much more general techniques to improve the detection accuracy. So whatever techniques we introduce in training, we need to evaluate them on a separate dataset not used in training/validation to measure the accuracy mAP to find whether that technique affects our problem. This blog solely focuses on the accuracy of the detection problem. The techniques we introduce might affect the overall throughput of the pipeline (in terms of fps); we would be covering that aspect in our next blog.
Below are the top 3 techniques we would present to you in this blog:
  • Data Augmentation
  • Design of Model architecture
  • Input resolution

Data Augmentation

For not all problems, an excellent open public dataset is available. But deep learning models expect to have large datasets to learn the variations to generalize the solution. Data Augmentation is one such technique to increase the number of training images using the existing dataset. We could apply various image processing techniques to introduce variations in the existing images. Before delving into the data augmentation section in-depth, we’d like to emphasize that the accuracy is directly related to annotations’ quality. If annotations themselves are poor, we could not guarantee improvements from the usage of any techniques. In the aspect of collecting larger datasets, one should not compromise on the annotation quality.

people explaining augmentations
Similar picture with people explaining augmentations

So what data augmentation methods are recommended for People Detection?

It is better to avoid imaginary scenarios where those kinds of images won’t come in your deployment. For instance, a person is not going to hang upside down in general. So, we can avoid vertical flipping. The main reason is that during training, the model also tends to learn these unreal cases, which is not good for accuracy. The model would learn to generalize further if we can eliminate those deviations in the dataset.
To simulate real scenarios, we could introduce
  • Noise, Motion Blur, Compression to simulate Image quality from the camera
  • Horizontal flipping, Perspective change to simulate view angle
  • Brightness, Contrast, Colour to simulate Lighting conditions
  • Removing certain regions of an image to generalize
  • Image Mosaicking
Person identification_view

Design of Model architecture

There are multiple state-of-the-art detection models out there and even getting released frequently. Instead of sticking to fixed model architecture, one could try new models or adapt their existing model ideas. The more learnable parameters you have in your model design, the more delicate features the model can train.
Yolo paper-min

Input Resolution

The input resolution has a direct impact on accuracy. Due to the utilization of pooling and similar ideas, the image’s spatial input resolution may usually get transformed to a very smaller representation at the output layer. If we use a lower resolution, say, for example, 384*384, the smaller object features in the image might get lost, which results in poor detection accuracy. If we increase the resolution, say 608*608, the smaller features will still retain, giving the proper detections at the output.
From the above figure, you can infer that going beyond a specific input resolution would not yield higher accuracy. There could be several reasons, and one could be due to the limitation with the current model architecture. Overall, you need to pick the suitable input resolution for your model for your target model architecture.


Human/people detection is the critical aspect of video technology due to its importance in identifying persons, recognizing human activity, and scene analysis. Most of the early human detection approaches’ issues are significantly decreased in newer people/human detection accuracy techniques.

VisAI Platform – Human/People Detection Solution

VisAI Labs has understood the essential need for Edge-optimized human/people detection algorithms that satisfy a varied range of use cases and pre-tested through diverse test scenarios to optimize the product/solution.
VisAI Labs human/People detection algorithms are constructed exclusively for People counting, Congestion detection, Intrusion detection, Social distancing checker, Dwell Time detection, and building monitoring. To know more about people detection and tracking solutions.

Feel free to reach us at

Share this post
Need help to accelerate your Computer Vision journeys?

Thank you