Recently, automatic speech and speaker recognition has matured to the degree that it entered the daily lives of thousands of Europe’s citizens, e.g., on their smart phones or in call services. During the next years, speech processing technology will move to a new level of social awareness to make interaction more intuitive, speech retrieval more efficient, and lend additional competence to computer-mediated communication and speech-analysis services in the commercial, health, security, and further sectors. To reach this goal, rich speaker traits and states such as age, height, personality and physical and mental state as carried by the tone of the voice and the spoken words must be reliably identified by machines.

The iHEARu project aims to push the limits of intelligent systems for computational paralinguistics by considering Holistic analysis of multiple speaker attributes at once, Evolving and self-learning, deeper Analysis of acoustic parameters – all on Realistic data on a large scale, ultimately progressing from individual analysis tasks towards universal speaker characteristics analysis, which can easily learn about and can be adapted to new, previously unexplored characteristics.