Context-Aware Emotion Recognition Networks (ICCV 2019)

(Work done at all authors in Yonsei University)

CAER benchmark contains more than 13,000 annotated videos.
(CAER-S contains 70k frame images sampled from CAER.)
You can use CAER benchmark for emotion recognition.
The videos are annotated with an extended list of 7 emotion categories.


Traditional techniques for emotion recognition have focused on the facial expression analysis only, thus providing limited ability to encode context that comprehensively represents the emotional responses. We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner. The key idea is to hide human faces in a visual scene and seek other contexts based on an attention mechanism. Our networks consist of two sub-networks, including two-stream encoding networks to seperately extract the features of face and context regions, and adaptive fusion networks to fuse such features in an adaptive fashion. We also introduce a novel benchmark for context-aware emotion recognition, called CAER, that is more appropriate than existing benchmarks both qualitatively and quantitatively. On several benchmarks, CAER-Net proves the effect of context for emotion recognition.


The CAER dataset is available to download for research purposes.
The copyright remains with the original owners of the video.

Special thanks to Zhicheng Zhang (Nankai University) and Iris Dominguez (UPNA) for helping us share CAER and CAER-S again.


The size of the CAER-S dataset is approximately 13.5GB.
The benchmark is consisted of train and test folders.
You can freely configure the training and validation sets from the train folder.

Extract download files with the following code.
zip -s 0 --out; unzip;


This benchmark contains more than 13K annotated dynamic videos.
The benchmark is consisted of train, validation and test folders.


Please cite the following if you make use of the dataset.

  author    = {Lee, Jiyoung and Kim, Seungryong and Kim, Sunok and Park, Jungin and Sohn, Kwanghoonn},
  title     = {Context-aware emotion recognition networks},
  booktitle = {Proceedings of the IEEE/CVF international conference on computer vision},
  year      = {2019},


This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science and ICT (NRF-2017M3C4A7069370).