Center for the Study of Language and Society

Introduction to statistics for linguistics and language studies. A practical introduction to statistics using R

Thursday, 2019/02/28 - Friday, 2019/03/01

Doctoral students of the GSH are credited with 4 ECTS.

Event organizer: Workshop Language and Society
Speaker: Mathieu Avanzi (Sorbonne-Université)
Date: 2019/02/28 - 2019/03/01
Time: 09:00 - 17:30
Locality: 320/216
Uni Mittelstrasse
Mittelstrasse 43
3012 Bern
Characteristics: open to the public
free of charge

Course description –The aim of this workshop is to provide some knowledge and some skills in order to understand, perform and criticize basic analyses involving quantitative linguistic data. It aims to cover the most used classical methods in statistics (i.e. inferential tests, regressions and multidimensional analyses), with an important focus on data exploration and data visualization. The seminar will combine lectures from the professor and practical works on laptops. Datasets of English, French and German, involving different types of linguistic (mainly phonetic, syntactic, and morphological features) and non-linguistic (age, gender, socioeconomic status, etc.) variables. The R software (, a free computational programming environment, will be used. 

Outcomes – When students have completed this module, it is expected that they will be able to:

  • recognize the different types of variables used in linguistic studies;
  • explain key concepts in statistics in their own words;
  • describe datasets meaningfully using descriptive statistics;
  • recognize when to use a specific statistical tests depending on the variables at stake;
  • use the R program to conduct statistical analyses;
  • use the R to realize nice plots thanks to the famous ggplot2 package;
  • communicate findings and present results from experimental studies in a paper;
  • understand and critically evaluate published research findings.



Baayen, R. H. (2008). Analyzing linguistic data. A Practical Introduction to Statistics using R. Cambridge: Cambridge University Press.

Gries, S. T. (2013). Statistics for Linguistics with R. A Practical Introduction. Berlin: Mouton De Gruyter.

Levshina, N. (2015). How to do linguistics with R. Data exploration and statistical analysis. Amsterdam/New York: John Benjamins.


Instructor – Mathieu Avanzi is currently junior lecturer in Sorbonne-Université. His research deals with geolinguistic variation in French and Gallo-Romance dialects.


Prerequisites – the course is specifically designed for students who do not have a specific background in mathematics. No skills in computer programming are required.



February 28th


1. An introduction to R Language 

2. Descriptive stats

3. Inferential stats in R (Mann-Whitney/t-tests, Anova I and II, Chi²/Fisher test)

4. Advanced inferential stats in R (Linear, Logistic and Multinomial Regressions)

March 1st


5- Plotting nice plot with ggplot2 (barplots, pie charts, histograms, boxplots, waffles, etc.) 

6-More advanced inferential stats in R (Random Mixed Models Effects)

7- Twitter (use Twitter as a corpus)