Corpora Usage

Publications and theses using our data


  • Kohoutková, Jolana. Analýza slovosledu v korpusu spontánních konverzací matek s dětmi []. Unpublished master thesis. FF UK. 2021. (LABELS2018) 
  • Matějka, Štěpán. Pořadí osvojování gramatických slov a tvarů: elektronický dotazník. Unpublished dissertation. FF UK. 2022. (Chroma 
Projects focused on analysing our data
  • PROJEKT START 2021–2023 Nominal morphological categories and the mean length of utterance in a longitudinal corpus of early language development (č. START/HUM/016) funded by Charles University under the START programme; principal investigator: Klára Matiasovitsová (Chroma and LABELS2018)

The Klára’s START team consists (or has consisted)  of Jakub Sláma Jakub Sláma, Kamila Homolková (until August 2021), Petra Čechová (from September 2021), Jolana Kohoutková (from October 2021) and Anna Chromá (from April 2022). The role of expert consultant of the project is fulfilled by Filip Smolík. The whole team is working on two research areas: firstly, selected measures of general language maturity and their adaptation for use on Czech (corpus) data; and secondly, morphological annotation of CoCzeFLA. Among the language maturity measures, the project focuses on different ways of measuring average utterance length (in words, syllables and morphemes) and the Index of Productive Syntax (IPSyn) measure, which was adapted to Czech for the first time in the project. Both the LABELS2018 corpus and the Chroma corpus are used to compare the different measures. Child utterances in the Chroma corpus are morphologically tagged using semi-automatic annotation (using the MorphoDiTa tool).

  • PROJEKT PRIMUS 20192022 Cross-methodological approaches to syntax and semantics (XMASS) funded by Faculty of Arts under the PRIMUS programme; principal investigator: Radek Šimík (Chroma

Radek’s XMASS research group under the PRIMUS project focuses on the syntax and semantics of wh-constructs from different perspectives. Among other things, their use is verified on children’s data in the Chroma corpus. The analysis of wh-constructs in the children’s corpus is the focus of Klára Matiasovitsová’s research. She verifies the hypothesis that children acquire wh-constructions according to the hierarchy of questions > embedded questions > correlatives > headless / light-headed relational sentences > headed relational sentences. She also investigates the relationship between children’s acquisition of these constructions and their frequency in child-centred speech.