Automatic Speech Recogition and Analysis of Children's Speech

TNT members involved in this project:

Analyzing natural speech and language samples of children is a well-known source of insights when conducting research in the field of speech and language acquisition. The process of collecting, manual transcription and analysis of these data however is extremely time-consuming and costly. Because of that, the data basis for speech and language development research is scarce.

Meanwhile speech recognition and processing technology has been developed to a point where use for research purposes in linguistics and speech-language-pathology seems possible. For the recognition of adult language, technology has evolved to mainstream applications. However processing child utterances is much more challenging due to their acoustic and language properties.

Approach

The project is an interdisciplinary collaboration with the Department for Speech and Language Therapy of the Institute of Special Education (IFS). By combining the domain knowledge of the IFS about children's speech and our expertise in machine learning and signal processing we aim to improve automatic speech recognition of children's speech to the point were it can be used for applications in speech language therapy.

The project is part of the interdisciplinary collaboration "Leibniz Lab for Relational Communication Research" (Project Website).

kidsTALC Corpus

The kidsTALC corpus is a speech corpus of German children’s spontaneous speech. It is designed for training of ASR system, with the goal to be used to facilitate research of speech development and assist therapeutic applications. kidsTALC is the first German speech corpus that addresses the modern standards to meet the requirements for developing automatic tools to support language sample analysis in research and clinical applications.

The repository consists of multiple datasets (all containing connected speech), to represent different recording settings, language status, and ages. In the final version the repository will contain recordings from about 300 children, while their age range will span Kindergarten to elementary school. The elicitation contexts will cover various settings along the unstructured-structured continuum, such as free play, story tell, conversational discourse or read texts with a focus on spontaneous language. Also children with various oral and written language abilities will be included in the corpus, such as typically developing children and children with developmental language disorder or speech sound disorder. Participants for the entire repository are being recruited froma network of collaborating preschools, kindergartens and ele-mentary schools. Eligibility criteria for the current finished part of the dataset are: 3 ½–11 years, monolingual German speakers, typically developing.

For more information on the corpus please read our publication (pdf, BibTeX).

Access

To get access to the corpus, please send the signed end-user agreement to kidstalc@tnt.uni-hannover.de.

You will be provided a username and password to download the corpus here:

Version 1, October 2022: kidsTALC-v1

Recording Status

Date of Completion	Target Number of Speakers	Recorded Speakers	Type	Age
2022	90	49	Spontaneous: Typically Developed	3;6-10;11
2023	40	0	Spontaneous: Typically Developed	3;0-7;0
2024	60	0	Spontaneous: Developmental Language Disorders and Speech Sound Disorders	3;0-7;0
2024	100	0	Read: Typically Developed and Reading Difficulties	8;0-10;0

Recent Publications

Show all publications

Lars Rumberg Christopher Gebauer
Rule-Based Grammatical Error Detection on Spontaneous Children’s Speech
Elektronische Sprachsignalverarbeitung (ESSV), pp. 117-124, 2025
(pdf) BibTeX
Lars Rumberg Christopher Gebauer
Grammatical Error Detection on Spontaneous Children’s Speech Using Iterative Pseudo Labeling
to appear in Proceedings INTERSPEECH 2025 – 26th Annual Conference of the International Speech Communication Association, ISCA, 2025
(pdf) BibTeX
Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Maren Wallbaum, Ulrike Lüdtke, Jörn Ostermann
Uncertainty Estimation for Connectionist Temporal Classification Based Automatic Speech Recognition
Proc. INTERSPEECH 2023, pp. 4583--4587, August 2023
(pdf) BibTeX