We study the problem of handling the inter-turn pauses in a human-robot dialogue. In order to reduce the impression of elapsed time while the robot transcribes, understands and starts uttering a response we propose to automatically generate conversational fillers, to fill the silences. These fillers combine verbal utterances with body movements. We propose a Bayesian model that samples filler whose production duration time is close to the expected computational time needed by the robot. To increase the sensation of engagement, the fillers also includes contextual information gathered during the dialogue (such as the name of the interlocutor), if this information is present with high confidence. We evaluate this approach with an indirect user study measuring time perception, comparing three different strategies to overcome the inter-turn time (silence, static filler and our approach). The results show that users prefer the dynamic fillers, even when the conversation is objectively shorter with one of the other strategies.
- Context-aware selection of multi-modal conversational fillers in human-robot dialogue.pdf