Deakin University
Browse

A multiple feature fusion framework for video emotion recognition in the wild

Version 2 2024-06-05, 00:48
Version 1 2020-04-09, 10:22
journal contribution
posted on 2024-06-05, 00:48 authored by N Samadiani, Guangyan HuangGuangyan Huang, Wei LuoWei Luo, CH Chi, Y Shu, R Wang, Tuba KocaturkTuba Kocaturk
Human emotions can be recognized from facial expressions captured in videos. It is a growing research area in which many have attempted to improve video emotion detection in both lab‐controlled and unconstrained environments. While existing methods show a decent recognition accuracy on lab‐controlled datasets, they deliver much lower accuracy in a real‐world uncontrolled environment, where a variety of challenges need to be addressed such as variations in illumination, head pose, and individual appearance. Moreover, automatically identifying the key frames consisting of the expression from real‐world videos is another challenge. In this article, to overcome these challenges, we provide a video emotion recognition via multiple feature fusion method. First, a uniform local binary pattern (LBP) and the scale‐invariant feature transform features are extracted from each frame in the video sequences. By applying a random forest classifier, all of the static frames are then labelled by the related emotion class. In this way, the key frames can be automatically identified, including neutral and other expressions. Furthermore, from the key frames, a new geometric feature vector and the LBP from three orthogonal planes are extracted. To further improve robustness, audio features are extracted from the video sequences as an additional dimension to augmenting visual facial expression analysis. The audio and visual features are fused through a kernel multimodal sparse representation. Finally, the corresponding emotion labels to the video sequences can be assigned when a multimodal quality measure specifies the quality of each modality and its role in the decision. The results on both acted facial expressions in the Wild and MMI datasets demonstrate that the proposed method outperforms several counterpart video emotion recognition methods.

History

Journal

Concurrency and Computation: Practice and Experience

Volume

34

Article number

ARTN e5764

Location

London, Eng.

ISSN

1532-0626

eISSN

1532-0634

Language

English

Notes

In Press

Publication classification

C1 Refereed article in a scholarly journal

Issue

8

Publisher

WILEY