MultiMediate: Multi-modal Group Behaviour Analysis for Artificial Mediation


Figure 2

Recording setup

The data recording took place in a quiet office in which a larger area was cleared of existing furniture. The office was not used by anybody else during the recordings. To capture rich visual information and allow for natural bodily expressions, we used a 4DV camera system to record frame-synchronised video from eight ambient cameras. Specifically, two cameras were placed behind each participant and with a position slightly higher than the head of the participant (see the green indicators in the figure). With this configuration a near-frontal view of the face of each participant could be captured throughout the experiment, even if participants turned their head while interacting with each other. In addition, we used four BehringerB5 microphones with omnidirectional capsules for recording audio. To record high-quality audio data and avoid occlusion of the faces, we placed the microphones in front of but slightly above participants (see the blue indicators in the figure above).

Recording Procedure

We recruited 78 German-speaking participants (43 female, aged between 18 and 38 years) from a German university campus, resulting in 12 group interactions with four participants, and 10 interactions with three participants. During the group forming process, we ensured that participants in the same group did not know each other prior to the study. To prevent learning effects, every participant took part in only one interaction. Preceding each group interaction, we told the participants that first personal encounters could result in various artifacts that we were not interested in. As a result, we would first do a pilot discussion for them to get to know each other, followed by the actual recording. We intentionally misled the participant to believe that the recording system would be turned on only after the pilot discussion, so that they would behave naturally. In fact, however, the recording system was running from the beginning and there was no follow-up recording. To increase engagement, we prepared a list of potential discussion topics and asked each group to choose the topic that was most controversial among group members. Afterwards, the experimenter left the room and came back about 20 minutes later to end the discussion. Finally, participants were debriefed, in particular about the deceit, and gave free and informed consent to their data being used and published for research purposes.


* Speaking turns were annotated for all recordings. If several people are speaking at the same time, this is reflected in the annotations. Backchannels do not constitute a speaking turn.

* Eye contact was annotated by observers for all participants every 15 seconds. In detail, annotators indicated whether a participant is looking at another participants’ face and, if so, who the participant looks at.

Challenge Dataset

For the purpose of the MultiMediate challenge, we will provide a version of the MPIIGroupInteraction dataset that contains a single frontal view on each participant as well as audio recorded from a single microphone per interaction.

In the current release of the speaker annotations, there exists two quality levels: Some recordings (detailed in the readme) come with the final, precise annotations that will be used for the final evaluation. Some (detailed in the readme) recordings have more imprecise annotations that will be replaced in by a second release of speaking annotations in a few weeks. We recommend to use recordings with the precise annotations only for now.

Download: Please download the EULA here and send to the address below. We will then give you the link to access the dataset, annotations and readme.

Contact: Dominike Thomas,

The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following papers:

  1. Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behavior

    Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behavior

    Philipp Müller, Michael Xuelin Huang, Andreas Bulling

    Proc. ACM International Conference on Intelligent User Interfaces (IUI), pp. 153-164, 2018.

    Abstract Links BibTeX

  2. Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour

    Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour

    Philipp Müller, Michael Xuelin Huang, Xucong Zhang, Andreas Bulling

    Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 31:1-31:10, 2018.

    Abstract Links BibTeX