Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks
AuthorsEileen Panâ , Anna Seo Gyeong Choiâ , Maartje ter Hoeve, Skyler Seto, Allison Koeneckeâ â¡
Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks
AuthorsEileen Panâ , Anna Seo Gyeong Choiâ , Maartje ter Hoeve, Skyler Seto, Allison Koeneckeâ â¡
Large language models (LLMs) are ubiquitous in modern day natural language processing. However, previous work has shown degraded LLM performance for under-represented English dialects. We analyze the effects of typifying âstandardâ American English language questions as non-âstandardâ dialectal variants on multiple choice question answering tasks and find up to a 20% reduction in accuracy. Additionally, we investigate the grammatical basis of under-performance in non-âstandardâ English questions. We find that individual grammatical rules have varied effects on performance, but some are more consequential than others: three specific grammar rules (existential âitâ, zero copula, and yâall) can explain the majority of performance degradation observed in multiple dialects. We call for future work to investigate bias mitigation methods focused on individual, high-impact grammatical structures.
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs
May 16, 2025research area Speech and Natural Language Processingconference ACL
Current Large Language Models (LLMs) are predominantly designed with English as the primary language, and even the few that are multilingual tend to exhibit strong English-centric biases. Much like speakers who might produce awkward expressions when learning a second language, LLMs often generate unnatural outputs in non-English languages, reflecting English-centric patterns in both vocabulary and grammar. Despite the importance of this issue,â¦
Towards a World-English Language Model
March 29, 2024research area Speech and Natural Language Processingconference ICASSP
Neural Network Language Models (NNLMs) of Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories could be one way to improve scalability. In this work, we combine regional variants of English by building a âWorld Englishâ NNLM. We examine three data sampling techniques and we experiment with addingâ¦