MultiMediate: Multi-modal Group Behaviour Analysis for Artificial Mediation

Eye Contact Detection Sub-challenge[data available]

This sub-challenge focuses on eye contact detection in group interactions from ambient RGB cameras. We define eye contact as a discrete indication of whether a participant is looking at another participants’ face, and if so, who this other participant is. Video and audio recordings over a 10 second context window will be provided as input to provide temporal context for the classification decision. Eye contact has to be detected for the last frame of this context window, making the task formulation also applicable to an online prediction scenario as encountered by artificial mediators.

Group Discussion

Next Speaker Prediction Sub-challenge[data available]

In the next speaker prediction sub-challenge, approaches need to predict which members of the group will be speaking at a future point in time. Similar to the eye contact detection sub-challenge, video and audio recordings over a 10 second context window will be provided as input. Based on this information, approaches need to predict the speaking status of each participant at one second after the end of the context window.

Evaluation of Participants’ Approaches

For the purpose of this challenge we model the next speaker detection problem as a multi label problem. Hence a model for this task should predict a binary value (speaking = 1, not-speaking = 0) for each participant, for a given sample. As a metric to compare the submitted models we will use the unweighted average recall over all samples (see scikit recall_score(y_true, y_pred, average='macro') function).

For the eye contact detection task the problem is modeled as a multi class problem. Given a specific participant, a submitted model should predict with what other participant he or she is making eye contact. The task is modeled using five classes - one for each participants position (classes 1-4) and an additional class for no eye contact (class 0). To evaluate the performance of this task we will use accuracy as a metric (see scikit accuracy_score(y_true, y_pred) function).

Participants will receive training and validation data that can be used to build solutions for each sub-challenge (eye contact detection and next speaker prediction). The evaluation of these approaches will then be performed remotely on our side with the unpublished test portion of the dataset. For that, participants will create and upload docker images with their solutions that are then evaluated on our systems (for more information regarding the process visit this link).

Rules for participation

* The competition is team-based. A single person can only be part of a single team.
* Each team will have 5 evaluation runs on the test set (per sub-challenge).
* Additional datasets can be used, but they need to be publicly available.
* The Organisers will not participate in the challenge.
* For awarding certificates for 1st, 2nd and 3rd place in each subchallenge we will only consider approaches that are described in accepted papers that were submitted to the ACM MM Grand Challenge track.
* The evaluation servers will be open until the camera ready deadline (August 10, 2021). In case participants' evaluation results in the camera ready version of the paper differ from the results in the initial paper submission, the organisers need to be notified and the reason for the difference needs to be explained. The improved result can only be considered for the challenge ranking if it is obtained with the method described in the accepted paper.
* Both challenge tasks are formulated as an online prediction scenario at test time, i.e. using only information from a single test sample to perform prediction for that test sample. We are aware that the design of the evaluation server allows for offline prediction (i.e. using information from several test samples jointly). The challenge ranking will only be based on online approaches. However, we also invite submissions using an offline approach. In this case the fact that an offline approach is presented needs to be clearly communicated in the paper and it will be out-of-competition with the online approaches.