‘Serious games’ are becoming extremely relevant to individuals who have specific needs, such as children with an Autism Spectrum Condition (ASC). Often, individuals with an ASC have difficulties in interpreting verbal and non-verbal communication cues during social interactions. The ASC-Inclusion EU-FP7 funded project aims to provide children who have an ASC with a platform to learn emotion expression and recognition, through play in the virtual world. In particular, the ASC-Inclusion platform focuses on the expression of emotion via facial, vocal, and bodily gestures. The platform combines multiple analysis tools, using on-board microphone and web-cam capabilities. The platform utilises these capabilities via training games, text-based communication, animations, video and audio clips. This paper introduces current findings and evaluations of the ASC-Inclusion platform and provides detailed description for the different modalities.
Index Terms—Autism Spectrum Condition, inclusion, virtualcomputerised environment, emotion recognition, AI in games.
Authors: Erik Marchi, Bjorn Schuller, Alice Baird, Simon Baron-Cohen, Amandine Lassalle, Helen O’Rielly, Delia Pigat, Peter Robinson, Ian Davies, Tadas Baltrusaitis, Ofer Golan, Shimrit Fridenson-Hayo, Shahar Tal, Shai Newman, Noga Meir-Goren, Antonio Camurri, Stefano Piana, Sven Bolte, Metin Sezgin, Nese Alyuz, Agnieszka Rynkiewicz, Aurelie Baranger
Read the full paper.
We present HapTable; a multi–modal interactive tabletop that allows users to interact with digital images and objects through natural touch gestures, and receive visual and haptic feedback accordingly. In our system, hand pose is registered by an infrared camera and hand gestures are classified using a Support Vector Machine (SVM) classifier. To display a rich set of haptic effects for both static and dynamic gestures, we integrated electromechanical and electrostatic actuation techniques effectively on tabletop surface of HapTable, which is a surface capacitive touch screen. We attached four piezo patches to the edges of tabletop to display vibrotactile feedback for static gestures. For this purpose, the vibration response of the tabletop, in the form of frequency response functions (FRFs), was obtained by a laser Doppler vibrometer for 84 grid points on its surface. Using these FRFs, it is possible to display localized vibrotactile feedback on the surface for static gestures. For dynamic gestures, we utilize the electrostatic actuation technique to modulate the frictional forces between finger skin and tabletop surface by applying voltage to its conductive layer. To our knowledge, this hybrid haptic technology is one of a kind and has not been implemented or tested on a tabletop. It opens up new avenues for gesture–based haptic interaction not only on tabletop surfaces but also on touch surfaces used in mobile devices with potential applications in data visualization, user interfaces, games, entertainment, and education. Here, we present two examples of such applications, one for static and one for dynamic gestures, along with detailed user studies. In the first one, user detects the direction of a virtual flow, such as that of wind or water, by putting their hand on the tabletop surface and feeling a vibrotactile stimulus traveling underneath it. In the second example, user rotates a virtual knob on the tabletop surface to select an item from a menu while feeling the knob’s detents and resistance to rotation in the form of frictional haptic feedback.
Index Terms—Electrostatic actuation, gesture recognition, haptic interfaces, human–computer interaction, multimodal systems, vibrotactile haptic feedback
Authors:Senem Ezgi Emgin, Amirreza Aghakhani, T. Metin Sezgin, Cagatay Basdogan
Read the full paper.
Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turntaking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTMRNN) are trained on human-human conversational data. Unimodal and multi-modal classification performances of head-nod and turn-taking events are reported over the IEMOCAP dataset.
Index Terms: head-nod, turn-taking, social signals, event prediction, dyadic conversations, human-robot interaction
Authors: B. B. Turker, E. Erzin, Y. Yemez and M. Sezgin
Read the full paper.
This paper addresses the problem of evaluating engagement of the human participant by combining verbal and nonverbal behaviour along with contextual information. This study will be carried
out through four different corpora. Four different systems designed to explore essential and complementary aspects of the JOKER system in terms of paralinguistic/linguistic inputs were used for the data collection. An annotation scheme dedicated to the labeling of verbal and non-verbal behavior have been designed. From our experiment, engagement in HRI should be multifaceted.
Keywords-Human-Robot Interaction; Dataset; Engagement; Speech Recognition; Affective Computing
Authors: L. Devillers and S. Rosset and G. Dubuisson Duplessis and L. Bechade and Y. Yemez and B. B. Turker and M. Sezgin and E. Erzin and K. El Haddad and S. Dupont and P. Deleglise and Y. Esteve and C. Lailler and E. Gilmartin and N. Campbell
Read the full paper.
Human eyes exhibit different characteristic patterns during different virtual interaction tasks such as moving a window, scrolling a piece of text, or maximizing an image. Human-computer studies literature contains examples of intelligent systems that can predict user’s task-related intentions and goals based on eye gaze behavior. However, these systems are generally evaluated in terms of prediction accuracy, and on previously collected offline interaction data. Little attention has been paid to creating real-time interactive systems using eye gaze and evaluating them in online use. We have five main contributions that address this gap from a variety of aspects. First, we present the first line of work that uses real-time feedback generated by a gaze-based probabilistic task prediction model to build an adaptive real-time visualization system. Our system is able to dynamically provide adaptive interventions that are informed by real-time user behavior data. Second, we propose two novel adaptive visualization approaches that take into account the presence of uncertainty in the outputs of prediction models. Third, we offer a personalization method to suggest which approach will be more suitable for each user in terms
of system performance (measured in terms of prediction accuracy). Personalization boosts system performance and provides users with the more optimal visualization approach (measured in terms of usability and perceived task load). Fourth, by means of a thorough usability study, we quantify the effects of the proposed visualization approaches and prediction errors on natural user behavior and the performance of the underlying prediction systems. Finally, this paper also demonstrates that our previously-published gaze-based task prediction system, which was assessed as successful in an offline test scenario, can also be successfully utilized in realistic online usage scenarios.
Implicit interaction, activity prediction, task prediction, uncertainty visualization, gaze-based interfaces, predictive interfaces, proactive interfaces, gaze-contingent interfaces, usability study
Authors: Çağla Çığ and T. M. Sezgin
Read the full paper.
This work advances our understanding of children’s visualization literacy, and aims to improve it with a novel approach for teaching visualization at elementary schools. We ﬁrst contribute an analysis of data graphics and activities employed in grade K to 4 educational materials, and the results of a survey conducted with 16 elementary school teachers. We ﬁnd that visualization education could beneﬁt from integrating pedagogical strategies for teaching abstract concepts with established interactive visualization techniques. Building on these insights, we develop and study design principles for novel interactive teaching material aimed at increasing children’s visualization literacy. We speciﬁcally contribute an online platform for teachers and students to respectively teach and learn about pictographs and bar charts and report on our initial observations of its use in grades K and 2.
Author Keywords: visualization literacy; qualitative analysis.
Authors: B. Alper, N. H. Riche, F. Chevalier, J. Boy and T. M. Sezgin
Read the full paper.
We present a work-in-progress report on a sketch- and image-based software called “CHER-ish” designed to help make sense of the cultural heritage data associated with sites within 3D space. The software is based on the previous work done in the domain of 3D sketching for conceptual architectural design, i.e., the system which allows user to visualize urban structures by a set of strokes located in virtual planes in 3D space. In order to interpret and infer the structure of a given cultural heritage site, we use a mix of data such as site photographs and floor plans, and then we allow user to manually locate the available photographs and their corresponding camera positions within 3D space. With the photographs’ camera positions placed in 3D, the user defines a scene’s 3D structure by the means of stokes and other simple 2D geometric entities. We introduce the main system components: virtual planes (canvases), 2D entities (strokes, line segments, photos, polygons) and provide a description of the methods that allow the user to interact with them within the system to create a scene representation. Finally, we demonstrate the usage of the system on two different data sets: a collection of photographs and drawings from Dura-Europos, and drawings and plans from Horace Walpole’s Strawberry Hill villa.
Authors: V. Rudakova, N. Lin, N. Trayan, T. M. Sezgin, J. Dorsey and H.
We address the problem of continuous laughter detection over audio-facial input streams obtained from naturalistic dyadic conversations. We ﬁrst present meticulous annotation of laughters, cross-talks and environmental noise in an audio-facial database with explicit 3D facial mocap data. Using this annotated database, we rigorously investigate the utility of facial information, head movement and audio features for laughter detection. We identify a set of discriminative features using mutual information-based criteria, and show how they can be used with classiﬁers based on support vector machines (SVMs) and time delay neural networks (TDNNs). Informed by the analysis of the individual modalities, we propose a multimodal fusion setup for laughter detection using different classiﬁer-feature combinations. We also effectively incorporate bagging into our classiﬁcation pipeline to address the class imbalance problem caused by the scarcity of positive laughter instances. Our results indicate that a combination of TDNNs and SVMs lead to superior detection performance, and bagging effectively addresses data imbalance. Our experiments show that our multimodal approach supported by bagging compares favorably to the state of the art in presence of detrimental factors such as cross-talk, environmental noise, and data imbalance.
Index Terms—Laughter detection, naturalistic dyadic conversations, facial mocap, data imbalance
Authors: B. B. Türker, Y. Yemez, T. M. Sezgin, E. Erzin.
Read the full paper.
Sketch recognition is the task of converting hand-drawn digital ink into symbolic computer representations. Since the early days of sketch recognition, the bulk of the work in the field focused on building accurate recognition algorithms for specific domains, and well defined data sets. Recognition methods explored so far have been developed and evaluated using standard machine learning pipelines and have consequently been built over many simplifying assumptions. For example, existing frameworks assume the presence of a fixed set of symbol classes, and the availability of plenty of annotated examples. However, in practice, these assumptions do not hold. In reality, the designer of a sketch recognition system starts with no labeled data at all, and faces the burden of data annotation. In this work, we propose to alleviate the burden of annotation by building systems that can learn from very few labeled examples, and large amounts of unlabeled data. Our systems perform self-learning by automatically extending a very small set of labeled examples with new examples extracted from unlabeled sketches. The end result is a sufficiently large set of labeled training data, which can subsequently be used to train classifiers. We present four self-learning methods with varying levels of implementation difficulty and runtime complexities. One of these methods leverages contextual co-occurrence patterns to build verifiably more diverse set of training instances. Rigorous experiments with large sets of data demonstrate that this novel approach based on exploiting contextual information leads to significant leaps in recognition performance. As a side contribution, we also demonstrate the utility of bagging for sketch recognition in imbalanced data sets with few positive examples and many outlier
Authors: K. T. Yeşilbek, T. M. Sezgin.
Read the full paper,
From a user interaction perspective, speech and sketching make a good couple for describing motion. Speech allows easy specification of content, events and relationships, while sketching brings inspatial expressiveness. Yet, we have insufficient knowledge of how sketching and speech can be used for motion-based video retrieval, because there are no existing retrieval systems that support such interaction. In this paper, we describe a Wizard-of-Oz protocol and a set of tools that we have developed to engage users in a sketch and speech-based video retrieval task. We report how the tools and the protocol fit together using ”retrieval of soccer videos” as a use case scenario. Our software is highly customizable, and our protocol is easy to follow. We believe that together they will serve as a convenient and powerful duo for studying a wide range of multi-modal use cases.
Keywords: sketch-based interfaces, human-centered design, motion, multimedia retrieval
Authors: O. C. Altıok, T. M. Sezgin.
Read the full paper.