Main Article Content

Abstract

This study presents CILLCO (Corpus of Indonesian Language, Linguistics, and Communities), a digital corpus designed to document and analyze vernacular language varieties in everyday and digital contexts. Jointly established by the Research Center for Language, Literature, and Community at the National Research and Innovation Agency (BRIN) in collaboration with the English Department of Universitas Islam Indonesia (UII), CILLCO addresses the historical gap between the national standard language (Bahasa Indonesia baku) and its vernacular varieties at the interpersonal, media, and online levels across the Indonesian archipelago. While most existing Indonesian corpora focus on written and formal language, CILLCO focuses on naturally occurring communication, capturing data such as WhatsApp exchanges and everyday conversations. As such, CILLCO functions as a linguistic and communicative resource platform, providing researchers with empirical materials to examine how meaning is made, identities are negotiated, and social relations are enacted in the hybrid spaces of spoken and digital communication. The corpus incorporates multimodal sources, including spoken discourse, social media interactions, online conversations, web documents and comments, transcribed interviews, and regional narratives, all encoded through sophisticated annotation and retrieval tools. By embedding CILLCO within current work in corpus linguistics, communication research, and digital ethnography, this study demonstrates the corpus's potential to advance interdisciplinary investigation into language use, digital discourse, and sociocultural change in Indonesia. CILLCO offers a solid empirical foundation for analyzing communicative practices in Southeast Asia, contributing to decentered, corpus-driven communication research. Ultimately, it sheds light on how digital vernacular communication reshapes the linguistic landscapes and communicative identities of Indonesian speakers in an era of rapid digital transformation.

Keywords

vernacular Indonesian corpus linguistics sociolinguistics

Article Details

How to Cite
Pradita, I., Puspitasari, D. A., Karlina, Y., & Sukma, B. P. (2026). Introducing CILLCO: A corpus model of vernacular Indonesian as a cultural capital. Jurnal Komunikasi, 20(1). https://doi.org/10.20885/komunikasi.vol20.iss1.art9

References

  1. Baker, P. (2006). Using Corpora in Discourse Analysis. Continuum.
  2. Barlow, M. (2002). ParaConc: Concordance Software for Multilingual Parallel Corpora. LREC Workshop No. 8: Language Resources in Translation Work.
  3. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511804489
  4. Bird, S. (2020). Decolonising Speech and Language Technology. COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. https://doi.org/10.18653/v1/2020.coling-main.313
  5. Blommaert, J. (2010). The sociolinguistics of globalization. In The Sociolinguistics of Globalization. https://doi.org/10.1017/CBO9780511845307
  6. BRIN. (2023). Peraturan BRIN No. 6 Tahun 2023. Badan Riset Dan Inovasi Nasional Republik Indonesia.
  7. Cahyawijaya, S., Lovenia, H., Koto, F., Adhista, D., Dave, E., Oktavianti, S., Akbar, S., Lee, J., Shadieq, N., Cenggoro, T. W., Linuwih, H., Wilie, B., Muridan, G., Winata, G., Moeljadi, D., Aji, A. F., Purwarianti, A., & Fung, P. (2024). NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. https://doi.org/10.18653/v1/2023.ijcnlp-main.60
  8. Carter, R., & McCarthy, M. (2017). Spoken grammar: Where are we and where are we going? Applied Linguistics, 38(1). https://doi.org/10.1093/applin/amu080
  9. Christopher, E. (2018). Communication across cultures. World Englishes, 37(3). https://doi.org/10.1111/weng.12332
  10. Coates, J. (2015). Women, men and language: A sociolinguistic account of gender differences in language, third edition. In Women, Men and Language: A Sociolinguistic Account of Gender Differences in Language, Third Edition. https://doi.org/10.4324/9781315645612
  11. Davies, M., & Fuchs, R. (2015). Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE). English World-Wide. A Journal of Varieties of English, 36(1). https://doi.org/10.1075/eww.36.1.01dav
  12. Dewaele, J. M., McCloskey, J., Ren, W., Chen, Y. S., Lin, C. Y., Yoo, I. W., Zhang, W., Hu, G., Wang, Y., Huang, Y., Hashim, A., Si, J., Jenkins, J., Kirkpatrick, A., Ren, W., Crismore, A., Ngeow, K. Y. hwa, Soo, K. S., McKenzie, R. M., … Zealand, N. (2005). State of the art article World Englishes : approaches , issues and resources State of the art : World Englishes. English, 26(c).
  13. Djenar, D. N. (2007). Self-Reference and its Variation in Indonesian. Electronic Journal of Foreign Language Teaching, 4(1).
  14. Don, Z. M., & Knowles, G. (2022). The digital humanities and re-imagined language description: A linguistic model of Malay with potential for other languages. Digital Scholarship in the Humanities, 37(4), 1084–1096. https://doi.org/10.1093/llc/fqab101
  15. Du Bois, J. W. (1980). Beyond Definiteness: The Trace of Identity in Discourse. In The pear stories: Cognitive, cultural, and linguistic aspects of narrative production.
  16. Englebretson, R. (2003). Searching for Structure: The problem of complementation in colloquial Indonesian conversation. Studies in Discourse and Grammar, 13.
  17. Ewing, M. C. . (2005). Grammar and inference in conversation : identifying clause structure in spoken Javanese. John Benjamins Pub. Co.
  18. Fuster, C. (2024). Lexical transfer as a resource in pedagogical translanguaging. International Journal of Multilingualism, 21(1). https://doi.org/10.1080/14790718.2022.2048836
  19. Georgakopoulou, A., & Spilioti, T. (2015). The routledge handbook of language and digital communication. In The Routledge Handbook of Language and Digital Communication. https://doi.org/10.4324/9781315694344
  20. Gilquin, G. (2022). The Process Corpus of English in Education: Going beyond the written text. Research in Corpus Linguistics, 10(1). https://doi.org/10.32714/ricl.10.01.02
  21. Jantunen, J. H. (2022). Synonymity and Lexical Simplification in Translations: A Corpus-Based Approach. Across Languages and Cultures, 2(1). https://doi.org/10.1556/acr.2.2001.1.7
  22. Leimgruber, J. R. E., Lim, J. U. N. J. I. E., Gonzales, W. D. W., & Hiramoto, M. I. E. (2021). Ethnic and gender variation in the use of colloquial Singapore English discourse particles. English Language and Linguistics, 25(3). https://doi.org/10.1017/S1360674320000453
  23. Lewis, M., & Frank, M. C. (2016). Linguistic niches emerge from pressures at multiple timescales. Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016.
  24. Lim, M. (2017). Freedom to hate: social media, algorithmic enclaves, and the rise of tribal nationalism in Indonesia. Critical Asian Studies, 49(3). https://doi.org/10.1080/14672715.2017.1341188
  25. Lippi, M. (2019). Natural Language Statistical Features of LSTM-Generated Texts. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3326–3337. https://doi.org/10.1109/TNNLS.2019.2890970
  26. Maclellan, N. (2021). The Region in Review: International Issues and Events, 2020. The Contemporary Pacific, 33(2). https://doi.org/10.1353/cp.2021.0040
  27. McEnery, T., & Hardie, A. (2024). Corpus linguistics: Method, theory and practice. Cambridge University Press.
  28. Migdadi, H. F., Yunus, K., & Al.Garni, A.-F. (2020). A Global View towards Understanding of Standard and Non-Standard Varieties of English. International Journal of Academic Research in Business and Social Sciences, 10(2). https://doi.org/10.6007/ijarbss/v10-i2/6894
  29. Miller, D., Costa, E., Haynes, N., McDonald, T., Nicolescu, R., Sinanan, J., Spyer, J., Venkatraman, S., & Wang, X. (2018). How the World Changed Social Media. In How the World Changed Social Media. https://doi.org/10.2307/j.ctt1g69z35
  30. Mufwene, S. S., & Véronique, G. D. (2020). Robert Chaudenson, 1937–2020. Journal of Pidgin and Creole Languages, 35(2). https://doi.org/10.1075/jpcl.00059.cha
  31. Pérez-Sabater, C. (2015). Discovering language variation in WhatsApp text interactions. Onomazein, 31(1). https://doi.org/10.7764/onomazein.31.8
  32. Putri, R. A., Sartini, N. W., & Fajri, M. S. Al. (2020). The analysis of illocutionary acts of judges’ comments in America’s next top model and Asia’s next top model competitions: A cross-cultural pragmatic study. In Journal of Language and Linguistic Studies (Vol. 16, Issue 4). https://doi.org/10.17263/JLLS.851015
  33. Robin, E., Götz, A., Pataky, É., & Szegh, H. (2017). Translation Studies and Corpus Linguistics: Introducing the Pannonia Corpus. Acta Universitatis Sapientiae, Philologica, 9(3). https://doi.org/10.1515/ausp-2017-0032
  34. Sneddon, J. (2003). The indonesian language its history and role model in modern society. University of New South Wales Press.
  35. Snell, J. (2018). Solidarity, stance, and class identities. Language in Society, 47(5). https://doi.org/10.1017/S0047404518000970
  36. Strömbergsson, S., Götze, J., Edlund, J., & Nilsson Björkenstam, K. (2022). Simulating Speech Error Patterns Across Languages and Different Datasets. Language and Speech, 65(1). https://doi.org/10.1177/0023830920987268
  37. Tagg, C. (2015). Exploring digital communication: Language in action. In Exploring Digital Communication: Language in Action. https://doi.org/10.4324/9781315727165
  38. Tagg, C., & Seargeant, P. (2014). Audience design and language choice in the construction and maintenance of translocal communities on social network sites. In The Language of Social Media (pp. 161–185). Palgrave Macmillan UK. https://doi.org/10.1057/9781137029317_8
  39. Tapsell, R. (2017). Media Power in Indonesia: Oligarchs, Citizens and the Digital Revolution. In Media Power in Indonesia: Oligarchs, Citizens and the Digital Revolution. https://doi.org/10.1080/01292986.2020.1808690
  40. Thao, T. Q., & Khoi, N. M. (2022). The deployment of English lexical bundles in applied linguistics research articles by Vietnamese researchers. Ho Chi Minh City Open University Journal of Science - Social Sciences, 12(2), 75–84. https://doi.org/10.46223/hcmcoujs.soci.en.12.2.2227.2022
  41. Traugott, E. C. (1975). William Labov, Sociolinguistic patterns. (Conduct and Communication, 4.) Philadelphia: University of Pennsylvania Press, 1972. Language in Society, 4(1), 89–107. https://doi.org/10.1017/S0047404500004528
  42. Wallis, S. (2020). Statistics in Corpus Linguistics Research. In Statistics in Corpus Linguistics Research. https://doi.org/10.4324/9780429491696
  43. Zulaeha, Z. (2022). Writing Composition Problem in Arabic Language Learning Among Arabic Language Education Students. Langkawi: Journal of The Association for Arabic and English, 72. https://doi.org/10.31332/lkw.v0i0.3399