Object detection is a task in computer vision, which requires the algorithm to predict a bounding box with a class label for each Region of Interest (ROI) in an image. Anchor-based detectors have ruled this space for a long time. Recently, anchor-free detectors have started to overtake anchor-based detectors due to their lower computational complexity and efficient detections. NanoDet has generated interest in the AI community as it is a fast and lightweight anchor-free object detection model.
This blog aims to introduce NanoDet - a new model which comes with less memory footprint along with better detection performance compared to anchor-based models.
2.0 Anchor based models vs Anchor–free models
2.1 Anchor based models
In an anchor-based model, predefined anchor boxes of various sizes are used to calculate the location of the object in an image. In anchor-based detectors, the bounding box locations are predicted using an additional offsets regression, which can be thought of as pre-defined sliding windows or proposals that are categorized as positive or negative classes.
2.2 Anchor–free models
To identify and locate the objects in an image, anchor-free models use a per-pixel classification, which is comparable to segmentation. As a result, computations are simplified and the necessity for anchor boxes as a hyperparameter is eliminated.
Conventionally, anchor-based models such as single-shot detector SSD, You Only Look Once YOLO and region-based convolutional neural network R-CNN were popular for object detection. Although anchor-based models have performed adequately well with respect to detections in real-world applications such as person detection, crowd counting, licence plate detection and recognition, there were several drawbacks.
Some of the drawbacks of anchor-based detectors are :
- The prediction is dependent on size, number and aspect ratios of anchor boxes. With predefined anchor boxes, detectors encounter difficulties to deal with objects of large shape variations, particularly for small objects
- Large number of anchor boxes are required to predict the objects. Most of these anchor boxes return negative values resulting in an imbalance between positive & negative samples
- Complex computation to calculate the Intersection Over Union (IOU) with ground truth to classify each anchor box as positive or negative
Anchor-Free Object Detectors address the above drawbacks. In an anchor-free model, every pixel in the feature map is predicted within an object box, similar to segmentation.
A few prominent advantages of anchor-free object detections are :
- Detection framework works like classical fully convolutional segmentation networks
- There is no dependence on the size of anchor boxes for detection
- Detection becomes anchor free which lowers the computational complexity and design hyper parameters.
Since anchor-free models tend to be faster with better results and can be deployed on low-cost edge devices with lesser memory footprint, we chose NanoDet, an anchor-free model for detecting numbers from a number plate. Before looking into an example let's briefly review NanoDet.
NanoDet . is a lightweight, superfast, real-time object detection model that comes with the following features.
- Model file size 980KB(INT8) or 1.8MB(FP16).
- Speed: 97fps (10.23ms) on mobile ARM CPU.
- High accuracy up to 34.3 mAP@0.5:0.95 on COCO-dataset and still real-time on CPU.
- Easier to train and has much lower GPU memory cost than other models. The model can be trained with a batch-size of 80 on GTX1060-6GB
3.1 Architecture of NanoDet
- Backbone: NanoDet uses ShufflenetV2  which is a very robust & cost-effective structure for mobile devices as a backbone.
- Neck: NanoDet uses Pyramid Attention Network (PAN) for extracting feature maps but all the convolutions except for 1x1 convolutions were removed for a lighter weight model.
- Head: NanoDet uses depth wise convolution at the head to make it light weight. The border regression & classification is calculated using the same convolutions and later split into two parts to reduce calculations.
4.0 Automatic Number Plate Recognition (ANPR)
The objective of this experiment is to detect only the large font numbers in a number plate in real-time, which can be even used for digit recognition, given the training data has seen such labeled data during training.
A dataset with more than 2000 original number plate images were collected from various sources which consist of license plates pertaining to different colour backgrounds from various vehicles. To augment original data, synthetic data has been used to enhance the count of training image samples to attain the expected result.
The dataset was labeled using Labeling and corresponding .XML of the image annotation file was created. The list of classes includes digits from 0 to 9 as the class labels.
The dataset was divided into training and validation with the ratio 80% – 20% respectively. The model was trained for 1000 epochs using Pytorch and later converted to tflite. The size of the model was 1.9 MB in float16 that got reduced to 970 KB with int8 conversion. This memory footprint helps to get deployed on a low-cost edge device such as R-pi where memory is a concern.
Several models were trained and tested using the same dataset before NanoDet was used. On the Ignitarium custom dataset, models like YOLOv4-Tiny had a mAP of 20.8 with a size of 3.2MB, and YOLOX-Nano, an anchor-free model, had a mAP of 33.6. As the preliminary results were satisfactory with a mAP of 42, to further reduce the size, layer customization was done on the model. The customized NanoDet model was retrained for 2000 epochs with the same amount of data. The results of customized NanoDet model (38.9 mAP) were comparable to the original model. The customized model after converting to int8 had a size of ~753 KB without much drop in performance.
|12 MB (float16)
The results obtained from NanoDet were comparable with the results from other detectors like YOLOX-Nano, YOLOv4-Tiny. The custom NanoDet model had a mAP of 38.9 with a size of 753KB. The test results were compared on the same bench marking data created in-house to use across all the models for testing. NanoDet outperformed in terms of detection accuracy and memory footprint on the Ignitarium custom test data.
An anchor-free & proposal-free single-stage detector called NanoDet is used to detect numbers in a number plate. This model is lightweight with good performance. It is comparable with popular anchor-based models like YOLO & SSD. The model completely avoids all computation and hyper-parameters related to anchor boxes and solves the object detection in a per-pixel prediction fashion. NanoDet can be used for various other detection applications in the future as well, which comes as a strong alternative to anchored models and has diverse applications.
For an alternative AI-based approach towards ANPR, please refer Ignitarium's blog accessible here.
- “SSD: Single Shot MultiBox, Detector”, Wei Liu, Dragomir Anguelov, et al.2016.
- "You Only Look Once:Unified, Real-Time Object Detection", Joseph Redmon∗, Santosh Divvala∗†, Ross Girshick, Ali Farhadi University of Washington∗, Allen Institute for AI†, Facebook AI Research.
- “Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5)” Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik
- Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection, Shifeng Zhang, et al.2022
- FCOS: Fully Convolutional One-Stage Object Detection, Zhi Tian, et al.2019
- "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" Ningning Ma1,2 Xiangyu Zhang ?1 Hai-Tao Zheng2 Jian Sun1 1 Megvii Inc (Face++) 2 Tsinghua University, 2018