Video-Based Facial Emotion Recognition using YOLO and Vision Transformer

Vidhi Sareen; K.R. Seeja

doi:10.1051/epjconf/202532801040

All issues

Volume 328 (2025)

EPJ Web Conf., 328 (2025) 01040

Abstract

Open Access

Issue		EPJ Web Conf. Volume 328, 2025 First International Conference on Engineering and Technology for a Sustainable Future (ICETSF-2025)


Article Number		01040
Number of page(s)		10
DOI		https://doi.org/10.1051/epjconf/202532801040
Published online		18 June 2025

EPJ Web of Conferences 328, 01040 (2025)
https://doi.org/10.1051/epjconf/202532801040

Video-Based Facial Emotion Recognition using YOLO and Vision Transformer

Vidhi Sareen and K.R. Seeja^*

Indira Gandhi Delhi Technical University for Women, India

^* Corresponding author: vidhi022mtcse23@igdtuw.ac.in

Published online: 18 June 2025

Abstract

This paper presents a video-based FER approach using a combination of the YOLOv8 model for face detection and a pre-trained Vision Transformer (ViT) for emotion classification. Our methodology involves extracting the middle frame from the RAVDESS dataset, which is then used for face detection using the YOLOv8 algorithm. The detected facial region is then processed through the Vit model to classify emotions into seven categories like Neutral, Happy, Sad, Angry, Fearful, Disgust, and Surprised. To enhance model robustness and generalization, data augmentation techniques such as horizontal flipping, brightness adjustment, and Gaussian noise injection were applied during preprocessing. The combination of precise face localization by YOLOv8 and powerful feature extraction of ViT contributes to the system’s performance. The proposed FER framework achieved an accuracy of 90.81%, surpassing several existing state-of-the-art FER systems. This research shows the strength of combining advanced face detection with transformer-based architecture for accurate emotion recognition from facial expressions in videos.

Key words: Facial Emotion Recognition (FER) / YOLO / Vision Transformer (ViT)

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.