Face recognition technology is an advanced system that uses intricate algorithms and machine learning methods to recognize or authenticate people based on their distinct facial characteristics.

Pooja Korah and Christy Varghese
July 30, 2024

IR Face Detection and Recognition using YOLOv8 and FaceNet

Introduction

Face recognition technology is an advanced system that uses intricate algorithms and machine learning methods to recognize or authenticate people based on their distinct facial characteristics. This technology works by taking pictures or video frames of faces, identifying characteristics like the shape of the nose, the distance between the eyes, and the jawline, and then transferring those attributes into a digital representation called a face print. To ensure precise identifications, these face prints are then compared to a database of recognized faces.

Illumination, pose, expression changes, and facial disguises can significantly decrease recognition accuracy. Use of Infrared images is a good solution since the issues with changes in visible light have a much smaller impact if illumination is controlled in other parts of the spectrum. In numerous applications, including those in the military, banking, marketing, healthcare, public security, video surveillance, access restrictions for laptops and mobile phones, and law enforcement, face recognition has emerged as a leading physiological biometric for identity authentication. One of the major challenges of IR face recognition is the distinct thermal patterns emitted by faces in infrared images. In this context, the development of an efficient method for infrared face recognition ideal for real-time applications on edge devices becomes paramount.

IR Face Recognition

Face recognition mainly involves two tasks: face detection and face verification. Face detection is a core task in computer vision that entails recognizing and pinpointing human faces in images or video streams. The first stage in a face recognition system is to identify a face; there are various object identification techniques available for this purpose. This process serves as a critical precursor to various applications, including facial recognition, emotion analysis, and identity verification. The You Only Look Once (YOLO) approach renowned for its object detection capabilities has been used for face detection.

Face recognition involves the extraction of discriminative facial features from images or videos, followed by comparison with a database of known faces to establish identity. Deep learning, particularly Convolutional Neural Networks (CNNs), has emerged as a powerful tool for face recognition. These systems learn to extract intricate patterns and representations from facial images, capturing both spatial and semantic information essential for distinguishing between individuals. Through extensive training on large-scale datasets, deep learning models can effectively generalize across diverse facial appearances, expressions, poses, and lighting conditions, making them suitable for real-world deployment. Pre-trained models such as Arcface or Facenet, trained on large image dataset can be used for face verification.

YOLO V8

The architecture of YOLOv8, which uses a convolutional neural network with two primary components—the head and the backbone—is a development of earlier YOLO models. The head comprises multiple convolutional layers followed by fully connected layers responsible for predicting bounding boxes, objectness scores, and class probabilities. In short, YOLOv8 incorporates a self-attention mechanism in the network’s head and utilizes a feature pyramid network for multi-scaled object detection. This allows it to concentrate on different parts of an image and identify objects of varying sizes and scales.

Facenet

Face recognition systems often rely on advanced architectures like FaceNet to accurately identify individuals based on facial features. FaceNet, a pioneering deep learning model, revolutionized the field by introducing a novel approach to face embedding, where faces are represented as high-dimensional vectors in a continuous embedding space. The architecture of FaceNet comprises several key components designed to extract discriminative features from facial images and generate compact embeddings suitable for comparison.

One distinguishing feature of FaceNet is its use of triplet loss during training, a specialized loss function that encourages the model to map similar faces closer together in the embedding space while pushing dissimilar faces apart. By optimizing the embeddings to minimize the distance between embeddings of the same individual and maximize the distance between embeddings of different individuals, FaceNet learns to generate embeddings that are highly discriminative and invariant to variations in pose, expression, and lighting.

Challenges

Building a face recognition system requires loading separate models for both face detection and recognition.
This can impact performance and decrease the overall FPS (Frames Per Second) of the system.
To address this challenge and create a more compact model, we can reduce the number of parameters in the YOLO v8n model.

Modified YOLO V8

Deeper neural networks encounter challenges during training due to the vanishing gradient problem. This phenomenon occurs as gradients become increasingly smaller with each layer, hindering the learning process. While techniques like skip connections and batch normalization mitigate this issue to some extent, deeper networks don’t always lead to better accuracy. On the other hand, wider neural networks can capture more detailed information, but extremely wide and shallow networks struggle to understand complex patterns.

Thus, scaling the depth (number of layers) and width (number of channels) of the network architecture with the depth_multiple and width_multiple options in the YOLOv8 YAML configuration files helps to accommodate varying hardware limitations or performance demands by adjusting the model’s size. A value less than 1 would reduce the depth and width. The value of max_channels act as an upper limit on the number of channels to prevent the network from becoming too wide.

By using multipliers to control the number of layers and channels, the model can be tailored to specific needs. These multipliers allow customization of the model’s size and computational requirements without altering the underlying architecture. This approach enables fine-tuning to meet various performance goals, making the model adaptable for different scenarios.

Performance Evaluation

Trained YOLO v8n and modified YOLO v8n on face detection dataset available in Kaggle. The comparison of model size is given below:

Model	Model Size
Yolo v8n	6.2MB
Modified Yolo v8n	1.2MB

Table 1: Comparison of model size

Reducing parameters in YOLOv8 significantly decreases the model size, which in turn leads to a substantial increase in FPS (Frames Per Second). A comparison of the performance of a face recognition system using YOLOv8n_facenet with a modified YOLOv8n with reduced parameters revealed a significant 7x increase in FPS.

Conclusion

IR face recognition is crucial for security applications, video surveillance, and nighttime facial recognition tasks. However, loading separate models for face detection (like YOLOv8) and verification (like Facenet) significantly reduces FPS, limiting its use on edge devices. To address this challenge, a method has been proposed to reduce the model size of YOLOv8. This modified, compact YOLOv8 model improves the overall FPS of the face recognition system, making it ideal for real-time applications on resource-constrained devices.

References

https://github.com/ultralytics/ultralytics
https://github.com/timesler/facenet-pytorch
https://www.kaggle.com/datasets/fareselmenshawii/face-detection-dataset
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. “Facenet: A unified embedding for face recognition and clustering.” Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

info@ignitarium.com

IR Face Detection and Recognition using YOLOv8 and FaceNet

Introduction

IR Face Recognition

YOLO V8

Facenet

Challenges

Modified YOLO V8

Performance Evaluation

Conclusion

References

Stay informed

NEWS & VIEWS

Join our team

APPLY

PRIVACY POLICY

©2025 Ignitarium Technology Solutions, All Rights Reserved

Newsletter

An ISO 9001:2015 certified company

Great Place to Work® Certified

We are a leading provider of Product Engineering Services, offering expertise in Semiconductor design, Multimedia & Imaging, Connectivity, Cloud & Enterprise solutions, and Machine Learning & Deep Neural Networks

Semiconductor

Software

Ecosystem

Resources

Contact Us

Request for Video

info@ignitarium.com

IR Face Detection and Recognition using YOLOv8 and FaceNet

Introduction

IR Face Recognition

YOLO V8

Facenet

Challenges

Modified YOLO V8

Performance Evaluation

Conclusion

References

Stay informed

NEWS & VIEWS

Join our team

APPLY

PRIVACY POLICY

©2025 Ignitarium Technology Solutions, All Rights Reserved

Newsletter

An ISO 9001:2015 certified company

Great Place to Work® Certified

We are a leading provider of Product Engineering Services, offering expertise in Semiconductor design, Multimedia & Imaging, Connectivity, Cloud & Enterprise solutions, and Machine Learning & Deep Neural Networks

Semiconductor

Software

Ecosystem

Resources

Contact Us

Human Pose Detection & Classification

Features:

Target Markets:

OCR / Pattern Recognition

Use cases :

Highlights :

Behavior Monitoring

Use cases :

Highlights :

Attire & PPE Detection

Use cases :

Use cases :

Request for Video

Real Time Color Detection​

Use cases :

Highlights :

Missing Artifact Detection

Use cases :

Highlights :

Real Time Manufacturing Line Inspection

Use cases :

Highlights :

Ground Based Infrastructure analytics

Use cases :

Highlights :

Aerial Analytics

Use cases :

Highlights :

SANJAY JAYAKUMAR

Request Free Demo

RAMESH EMANI

​Manoj Thandassery

MALAVIKA GARIMELLA​

PRADEEP KUMAR LAKSHMANAN

SONA MATHEW

ASHWIN RAMACHANDRAN

AZIF SALY

RAJU KUNNATH

PRADEEP SUKUMARAN

SUJEET SREENIVASAN

RAJIN RAVIMONY

SIBY ABRAHAM

SUDIP NANDY

SUJEETH JOSEPH

SUJITH MATHEW IYPE

RAMESH SHANMUGHAM

Real Time Color Detection

Manoj Thandassery

MALAVIKA GARIMELLA