Human Diversity seminar 27.4.2026, Törö, Suni & Simko

25.03.2026 12:15 - 13:00

Monday 27.4. at 11.45-13.00
LS XIV, 3rd floor, Natura
Zoom webinar link

Doctoral Researcher Tuukka Törö,  IT Designer Antti Suni, and Senior University Lecturer Juraj Simko
Department of Digital Humanities, University of Helsinki

Novel Approaches to Investigating Linguistic Diversity Using Speech Embeddings

Traditional research on linguistic diversity requires substantial effort from collecting and processing data to annotating and describing linguistic features. The analytical choices researchers make can influence outcomes and potentially introduce theoretical or observer bias. Moreover, many of the world’s languages and varieties remain under documented, limiting the applicability of conventional linguistic methods.
Recent advances in deep learning have opened new avenues for exploring language variation. State-of-the-art self-supervised speech models can reliably classify languages from short audio samples of varying quality. This enables us to study language relationships in their latent spaces in a relatively theory independent way, based on how languages sound. These models also allow us to examine relationships involving low resource languages and varieties, including those with minimal available data, no writing system, or prior linguistic description.
In our talk, we will highlight both the strengths and limitations of these methods for conducting scientifically sound research on linguistic diversity across multiple levels of variation, from dialects to global language relationships.