The linguistic corpus of Kullui was developed within the framework of the study of Himachali Pahari – Indo-Aryan languages of North India. The corpus was created by researchers of the Institute of Linguistics, RAS and the Institute of Oriental Studies, RAS – E. Renkovskaya, J. Mazurova, A. Krylova. The corpus comprises the data collected during fieldtrips to Kullu district (Naggar, Bashing, Thawa, Suma villages) in 2014-2017.
The software part of the corpus was developed by Eugenia Korovina. At the moment, the Kullui corpus comprises about 18,000 tokens. It consists of two subcorpora – oral texts and elicited grammatical questionnaires with the translation into Russian and English. The phrases from the text subcorpus have corresponding audio recordings, while questionnaires are not provided with the oral component.
All the texts have detailed morphological annotations. Besides combined search of the Kullui words, parts of the words and their Russian and English translations, there is a possibility for the search of the grammatical categories and the combinations of two Kullui words with a certain distance between them. The user of the corpus can also have access to the full texts and their recordings.
The texts are provided with the discourse marking reflecting intonation, pauses, breaks, self-correction, code-switching and other important characteristics of the oral speech. The corpus also comprises metadata of the place and the time of the recording and the sociolinguistic information about the speaker.
This project was supported by Russian Foundation for Basic Research (RFBR), № 19-012-00 355 «The investigation of the Kullui language based on the oral text corpus» (2019–2021)