Stop splitting hairs: The problems with dichotomizing continuous data in language research


Hemelstrand S. Inoue T.
December 2025Elsevier B.V.

Research Methods in Applied Linguistics
2025#4Issue 3

It is common in the language sciences to dichotomize continuous data in order to fit models to data. However, several statisticians and methodologists have warned against this practice for years. Many in the language sciences seem unaware of this problem. Because of the lack of modern, robust, and open data simulations related to this issue in the language science literature, this article provides an empirical investigation of this practice. Across three different simulations, our analysis shows that dichotomization almost universally increases the standard errors, and consequently leads to inaccuracy of tests of statistical significance. Furthermore, effect sizes like R2 are often diminished by the reduction of available information in the data. We conclude by providing suggestions and considerations for future empirical studies.

Binning , Dichotomization , Language , Linguistics , Regression , Transformation

Text of the article Перейти на текст статьи

Department of Linguistics and Cognitive Science, KIMEP University, Kazakhstan
Department of Psychology, The Chinese University of Hong Kong, Hong Kong
Centre for Developmental Psychology, The Chinese University of Hong Kong, Hong Kong

Department of Linguistics and Cognitive Science
Department of Psychology
Centre for Developmental Psychology

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026