Main Article Content
Abstract
This study presents CILLCO (Corpus of Indonesian Language, Linguistics, and Communities), a digital corpus designed to document and analyze vernacular language varieties in everyday and digital contexts. Jointly established by the Research Center for Language, Literature, and Community at the National Research and Innovation Agency (BRIN) in collaboration with the English Department of Universitas Islam Indonesia (UII), CILLCO addresses the historical gap between the national standard language (Bahasa Indonesia baku) and its vernacular varieties at the interpersonal, media, and online levels across the Indonesian archipelago. While most existing Indonesian corpora focus on written and formal language, CILLCO focuses on naturally occurring communication, capturing data such as WhatsApp exchanges and everyday conversations. As such, CILLCO functions as a linguistic and communicative resource platform, providing researchers with empirical materials to examine how meaning is made, identities are negotiated, and social relations are enacted in the hybrid spaces of spoken and digital communication. The corpus incorporates multimodal sources, including spoken discourse, social media interactions, online conversations, web documents and comments, transcribed interviews, and regional narratives, all encoded through sophisticated annotation and retrieval tools. By embedding CILLCO within current work in corpus linguistics, communication research, and digital ethnography, this study demonstrates the corpus's potential to advance interdisciplinary investigation into language use, digital discourse, and sociocultural change in Indonesia. CILLCO offers a solid empirical foundation for analyzing communicative practices in Southeast Asia, contributing to decentered, corpus-driven communication research. Ultimately, it sheds light on how digital vernacular communication reshapes the linguistic landscapes and communicative identities of Indonesian speakers in an era of rapid digital transformation.
Keywords
Article Details
Copyright (c) 2026 Intan Pradita

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
- Baker, P. (2006). Using Corpora in Discourse Analysis. Continuum.
- Barlow, M. (2002). ParaConc: Concordance Software for Multilingual Parallel Corpora. LREC Workshop No. 8: Language Resources in Translation Work.
- Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511804489
- Bird, S. (2020). Decolonising Speech and Language Technology. COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. https://doi.org/10.18653/v1/2020.coling-main.313
- Blommaert, J. (2010). The sociolinguistics of globalization. In The Sociolinguistics of Globalization. https://doi.org/10.1017/CBO9780511845307
- BRIN. (2023). Peraturan BRIN No. 6 Tahun 2023. Badan Riset Dan Inovasi Nasional Republik Indonesia.
- Cahyawijaya, S., Lovenia, H., Koto, F., Adhista, D., Dave, E., Oktavianti, S., Akbar, S., Lee, J., Shadieq, N., Cenggoro, T. W., Linuwih, H., Wilie, B., Muridan, G., Winata, G., Moeljadi, D., Aji, A. F., Purwarianti, A., & Fung, P. (2024). NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. https://doi.org/10.18653/v1/2023.ijcnlp-main.60
- Carter, R., & McCarthy, M. (2017). Spoken grammar: Where are we and where are we going? Applied Linguistics, 38(1). https://doi.org/10.1093/applin/amu080
- Christopher, E. (2018). Communication across cultures. World Englishes, 37(3). https://doi.org/10.1111/weng.12332
- Coates, J. (2015). Women, men and language: A sociolinguistic account of gender differences in language, third edition. In Women, Men and Language: A Sociolinguistic Account of Gender Differences in Language, Third Edition. https://doi.org/10.4324/9781315645612
- Davies, M., & Fuchs, R. (2015). Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE). English World-Wide. A Journal of Varieties of English, 36(1). https://doi.org/10.1075/eww.36.1.01dav
- Dewaele, J. M., McCloskey, J., Ren, W., Chen, Y. S., Lin, C. Y., Yoo, I. W., Zhang, W., Hu, G., Wang, Y., Huang, Y., Hashim, A., Si, J., Jenkins, J., Kirkpatrick, A., Ren, W., Crismore, A., Ngeow, K. Y. hwa, Soo, K. S., McKenzie, R. M., … Zealand, N. (2005). State of the art article World Englishes : approaches , issues and resources State of the art : World Englishes. English, 26(c).
- Djenar, D. N. (2007). Self-Reference and its Variation in Indonesian. Electronic Journal of Foreign Language Teaching, 4(1).
- Don, Z. M., & Knowles, G. (2022). The digital humanities and re-imagined language description: A linguistic model of Malay with potential for other languages. Digital Scholarship in the Humanities, 37(4), 1084–1096. https://doi.org/10.1093/llc/fqab101
- Du Bois, J. W. (1980). Beyond Definiteness: The Trace of Identity in Discourse. In The pear stories: Cognitive, cultural, and linguistic aspects of narrative production.
- Englebretson, R. (2003). Searching for Structure: The problem of complementation in colloquial Indonesian conversation. Studies in Discourse and Grammar, 13.
- Ewing, M. C. . (2005). Grammar and inference in conversation : identifying clause structure in spoken Javanese. John Benjamins Pub. Co.
- Fuster, C. (2024). Lexical transfer as a resource in pedagogical translanguaging. International Journal of Multilingualism, 21(1). https://doi.org/10.1080/14790718.2022.2048836
- Georgakopoulou, A., & Spilioti, T. (2015). The routledge handbook of language and digital communication. In The Routledge Handbook of Language and Digital Communication. https://doi.org/10.4324/9781315694344
- Gilquin, G. (2022). The Process Corpus of English in Education: Going beyond the written text. Research in Corpus Linguistics, 10(1). https://doi.org/10.32714/ricl.10.01.02
- Jantunen, J. H. (2022). Synonymity and Lexical Simplification in Translations: A Corpus-Based Approach. Across Languages and Cultures, 2(1). https://doi.org/10.1556/acr.2.2001.1.7
- Leimgruber, J. R. E., Lim, J. U. N. J. I. E., Gonzales, W. D. W., & Hiramoto, M. I. E. (2021). Ethnic and gender variation in the use of colloquial Singapore English discourse particles. English Language and Linguistics, 25(3). https://doi.org/10.1017/S1360674320000453
- Lewis, M., & Frank, M. C. (2016). Linguistic niches emerge from pressures at multiple timescales. Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016.
- Lim, M. (2017). Freedom to hate: social media, algorithmic enclaves, and the rise of tribal nationalism in Indonesia. Critical Asian Studies, 49(3). https://doi.org/10.1080/14672715.2017.1341188
- Lippi, M. (2019). Natural Language Statistical Features of LSTM-Generated Texts. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3326–3337. https://doi.org/10.1109/TNNLS.2019.2890970
- Maclellan, N. (2021). The Region in Review: International Issues and Events, 2020. The Contemporary Pacific, 33(2). https://doi.org/10.1353/cp.2021.0040
- McEnery, T., & Hardie, A. (2024). Corpus linguistics: Method, theory and practice. Cambridge University Press.
- Migdadi, H. F., Yunus, K., & Al.Garni, A.-F. (2020). A Global View towards Understanding of Standard and Non-Standard Varieties of English. International Journal of Academic Research in Business and Social Sciences, 10(2). https://doi.org/10.6007/ijarbss/v10-i2/6894
- Miller, D., Costa, E., Haynes, N., McDonald, T., Nicolescu, R., Sinanan, J., Spyer, J., Venkatraman, S., & Wang, X. (2018). How the World Changed Social Media. In How the World Changed Social Media. https://doi.org/10.2307/j.ctt1g69z35
- Mufwene, S. S., & Véronique, G. D. (2020). Robert Chaudenson, 1937–2020. Journal of Pidgin and Creole Languages, 35(2). https://doi.org/10.1075/jpcl.00059.cha
- Pérez-Sabater, C. (2015). Discovering language variation in WhatsApp text interactions. Onomazein, 31(1). https://doi.org/10.7764/onomazein.31.8
- Putri, R. A., Sartini, N. W., & Fajri, M. S. Al. (2020). The analysis of illocutionary acts of judges’ comments in America’s next top model and Asia’s next top model competitions: A cross-cultural pragmatic study. In Journal of Language and Linguistic Studies (Vol. 16, Issue 4). https://doi.org/10.17263/JLLS.851015
- Robin, E., Götz, A., Pataky, É., & Szegh, H. (2017). Translation Studies and Corpus Linguistics: Introducing the Pannonia Corpus. Acta Universitatis Sapientiae, Philologica, 9(3). https://doi.org/10.1515/ausp-2017-0032
- Sneddon, J. (2003). The indonesian language its history and role model in modern society. University of New South Wales Press.
- Snell, J. (2018). Solidarity, stance, and class identities. Language in Society, 47(5). https://doi.org/10.1017/S0047404518000970
- Strömbergsson, S., Götze, J., Edlund, J., & Nilsson Björkenstam, K. (2022). Simulating Speech Error Patterns Across Languages and Different Datasets. Language and Speech, 65(1). https://doi.org/10.1177/0023830920987268
- Tagg, C. (2015). Exploring digital communication: Language in action. In Exploring Digital Communication: Language in Action. https://doi.org/10.4324/9781315727165
- Tagg, C., & Seargeant, P. (2014). Audience design and language choice in the construction and maintenance of translocal communities on social network sites. In The Language of Social Media (pp. 161–185). Palgrave Macmillan UK. https://doi.org/10.1057/9781137029317_8
- Tapsell, R. (2017). Media Power in Indonesia: Oligarchs, Citizens and the Digital Revolution. In Media Power in Indonesia: Oligarchs, Citizens and the Digital Revolution. https://doi.org/10.1080/01292986.2020.1808690
- Thao, T. Q., & Khoi, N. M. (2022). The deployment of English lexical bundles in applied linguistics research articles by Vietnamese researchers. Ho Chi Minh City Open University Journal of Science - Social Sciences, 12(2), 75–84. https://doi.org/10.46223/hcmcoujs.soci.en.12.2.2227.2022
- Traugott, E. C. (1975). William Labov, Sociolinguistic patterns. (Conduct and Communication, 4.) Philadelphia: University of Pennsylvania Press, 1972. Language in Society, 4(1), 89–107. https://doi.org/10.1017/S0047404500004528
- Wallis, S. (2020). Statistics in Corpus Linguistics Research. In Statistics in Corpus Linguistics Research. https://doi.org/10.4324/9780429491696
- Zulaeha, Z. (2022). Writing Composition Problem in Arabic Language Learning Among Arabic Language Education Students. Langkawi: Journal of The Association for Arabic and English, 72. https://doi.org/10.31332/lkw.v0i0.3399
References
Baker, P. (2006). Using Corpora in Discourse Analysis. Continuum.
Barlow, M. (2002). ParaConc: Concordance Software for Multilingual Parallel Corpora. LREC Workshop No. 8: Language Resources in Translation Work.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511804489
Bird, S. (2020). Decolonising Speech and Language Technology. COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. https://doi.org/10.18653/v1/2020.coling-main.313
Blommaert, J. (2010). The sociolinguistics of globalization. In The Sociolinguistics of Globalization. https://doi.org/10.1017/CBO9780511845307
BRIN. (2023). Peraturan BRIN No. 6 Tahun 2023. Badan Riset Dan Inovasi Nasional Republik Indonesia.
Cahyawijaya, S., Lovenia, H., Koto, F., Adhista, D., Dave, E., Oktavianti, S., Akbar, S., Lee, J., Shadieq, N., Cenggoro, T. W., Linuwih, H., Wilie, B., Muridan, G., Winata, G., Moeljadi, D., Aji, A. F., Purwarianti, A., & Fung, P. (2024). NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. https://doi.org/10.18653/v1/2023.ijcnlp-main.60
Carter, R., & McCarthy, M. (2017). Spoken grammar: Where are we and where are we going? Applied Linguistics, 38(1). https://doi.org/10.1093/applin/amu080
Christopher, E. (2018). Communication across cultures. World Englishes, 37(3). https://doi.org/10.1111/weng.12332
Coates, J. (2015). Women, men and language: A sociolinguistic account of gender differences in language, third edition. In Women, Men and Language: A Sociolinguistic Account of Gender Differences in Language, Third Edition. https://doi.org/10.4324/9781315645612
Davies, M., & Fuchs, R. (2015). Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE). English World-Wide. A Journal of Varieties of English, 36(1). https://doi.org/10.1075/eww.36.1.01dav
Dewaele, J. M., McCloskey, J., Ren, W., Chen, Y. S., Lin, C. Y., Yoo, I. W., Zhang, W., Hu, G., Wang, Y., Huang, Y., Hashim, A., Si, J., Jenkins, J., Kirkpatrick, A., Ren, W., Crismore, A., Ngeow, K. Y. hwa, Soo, K. S., McKenzie, R. M., … Zealand, N. (2005). State of the art article World Englishes : approaches , issues and resources State of the art : World Englishes. English, 26(c).
Djenar, D. N. (2007). Self-Reference and its Variation in Indonesian. Electronic Journal of Foreign Language Teaching, 4(1).
Don, Z. M., & Knowles, G. (2022). The digital humanities and re-imagined language description: A linguistic model of Malay with potential for other languages. Digital Scholarship in the Humanities, 37(4), 1084–1096. https://doi.org/10.1093/llc/fqab101
Du Bois, J. W. (1980). Beyond Definiteness: The Trace of Identity in Discourse. In The pear stories: Cognitive, cultural, and linguistic aspects of narrative production.
Englebretson, R. (2003). Searching for Structure: The problem of complementation in colloquial Indonesian conversation. Studies in Discourse and Grammar, 13.
Ewing, M. C. . (2005). Grammar and inference in conversation : identifying clause structure in spoken Javanese. John Benjamins Pub. Co.
Fuster, C. (2024). Lexical transfer as a resource in pedagogical translanguaging. International Journal of Multilingualism, 21(1). https://doi.org/10.1080/14790718.2022.2048836
Georgakopoulou, A., & Spilioti, T. (2015). The routledge handbook of language and digital communication. In The Routledge Handbook of Language and Digital Communication. https://doi.org/10.4324/9781315694344
Gilquin, G. (2022). The Process Corpus of English in Education: Going beyond the written text. Research in Corpus Linguistics, 10(1). https://doi.org/10.32714/ricl.10.01.02
Jantunen, J. H. (2022). Synonymity and Lexical Simplification in Translations: A Corpus-Based Approach. Across Languages and Cultures, 2(1). https://doi.org/10.1556/acr.2.2001.1.7
Leimgruber, J. R. E., Lim, J. U. N. J. I. E., Gonzales, W. D. W., & Hiramoto, M. I. E. (2021). Ethnic and gender variation in the use of colloquial Singapore English discourse particles. English Language and Linguistics, 25(3). https://doi.org/10.1017/S1360674320000453
Lewis, M., & Frank, M. C. (2016). Linguistic niches emerge from pressures at multiple timescales. Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016.
Lim, M. (2017). Freedom to hate: social media, algorithmic enclaves, and the rise of tribal nationalism in Indonesia. Critical Asian Studies, 49(3). https://doi.org/10.1080/14672715.2017.1341188
Lippi, M. (2019). Natural Language Statistical Features of LSTM-Generated Texts. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3326–3337. https://doi.org/10.1109/TNNLS.2019.2890970
Maclellan, N. (2021). The Region in Review: International Issues and Events, 2020. The Contemporary Pacific, 33(2). https://doi.org/10.1353/cp.2021.0040
McEnery, T., & Hardie, A. (2024). Corpus linguistics: Method, theory and practice. Cambridge University Press.
Migdadi, H. F., Yunus, K., & Al.Garni, A.-F. (2020). A Global View towards Understanding of Standard and Non-Standard Varieties of English. International Journal of Academic Research in Business and Social Sciences, 10(2). https://doi.org/10.6007/ijarbss/v10-i2/6894
Miller, D., Costa, E., Haynes, N., McDonald, T., Nicolescu, R., Sinanan, J., Spyer, J., Venkatraman, S., & Wang, X. (2018). How the World Changed Social Media. In How the World Changed Social Media. https://doi.org/10.2307/j.ctt1g69z35
Mufwene, S. S., & Véronique, G. D. (2020). Robert Chaudenson, 1937–2020. Journal of Pidgin and Creole Languages, 35(2). https://doi.org/10.1075/jpcl.00059.cha
Pérez-Sabater, C. (2015). Discovering language variation in WhatsApp text interactions. Onomazein, 31(1). https://doi.org/10.7764/onomazein.31.8
Putri, R. A., Sartini, N. W., & Fajri, M. S. Al. (2020). The analysis of illocutionary acts of judges’ comments in America’s next top model and Asia’s next top model competitions: A cross-cultural pragmatic study. In Journal of Language and Linguistic Studies (Vol. 16, Issue 4). https://doi.org/10.17263/JLLS.851015
Robin, E., Götz, A., Pataky, É., & Szegh, H. (2017). Translation Studies and Corpus Linguistics: Introducing the Pannonia Corpus. Acta Universitatis Sapientiae, Philologica, 9(3). https://doi.org/10.1515/ausp-2017-0032
Sneddon, J. (2003). The indonesian language its history and role model in modern society. University of New South Wales Press.
Snell, J. (2018). Solidarity, stance, and class identities. Language in Society, 47(5). https://doi.org/10.1017/S0047404518000970
Strömbergsson, S., Götze, J., Edlund, J., & Nilsson Björkenstam, K. (2022). Simulating Speech Error Patterns Across Languages and Different Datasets. Language and Speech, 65(1). https://doi.org/10.1177/0023830920987268
Tagg, C. (2015). Exploring digital communication: Language in action. In Exploring Digital Communication: Language in Action. https://doi.org/10.4324/9781315727165
Tagg, C., & Seargeant, P. (2014). Audience design and language choice in the construction and maintenance of translocal communities on social network sites. In The Language of Social Media (pp. 161–185). Palgrave Macmillan UK. https://doi.org/10.1057/9781137029317_8
Tapsell, R. (2017). Media Power in Indonesia: Oligarchs, Citizens and the Digital Revolution. In Media Power in Indonesia: Oligarchs, Citizens and the Digital Revolution. https://doi.org/10.1080/01292986.2020.1808690
Thao, T. Q., & Khoi, N. M. (2022). The deployment of English lexical bundles in applied linguistics research articles by Vietnamese researchers. Ho Chi Minh City Open University Journal of Science - Social Sciences, 12(2), 75–84. https://doi.org/10.46223/hcmcoujs.soci.en.12.2.2227.2022
Traugott, E. C. (1975). William Labov, Sociolinguistic patterns. (Conduct and Communication, 4.) Philadelphia: University of Pennsylvania Press, 1972. Language in Society, 4(1), 89–107. https://doi.org/10.1017/S0047404500004528
Wallis, S. (2020). Statistics in Corpus Linguistics Research. In Statistics in Corpus Linguistics Research. https://doi.org/10.4324/9780429491696
Zulaeha, Z. (2022). Writing Composition Problem in Arabic Language Learning Among Arabic Language Education Students. Langkawi: Journal of The Association for Arabic and English, 72. https://doi.org/10.31332/lkw.v0i0.3399