Automatic Annotation and Opinion Mining of Gujlish Social Media Data: A Supervised Deep Learning Approach for an Under-Resourced Language
DOI:
https://doi.org/10.32628/CSEIT25111698Keywords:
Code-Mixed Language, Gujlish, YouTube, Automatic Annotation, Fuzzy, Opinion Mining, Convolutional Neural NetworkAbstract
The exponential growth of multimedia content has led to an increasing presence of regional code-mixed data. However, the scarcity of language-specific tools and comprehensive corpora limits the effectiveness of opinion mining for under-resourced languages. This study addresses this gap by reviewing state-of-the-art research work and proposing a generic architecture that combines natural language processing, fuzzy logic, and deep learning techniques. An experimental framework was developed to classify opinions expressed in Gujarati–English code‑mixed comments from YouTube comedy videos using a convolutional neural network. To the best of our knowledge, this is the first exploration of Gujlish in the comedy YouTube domain for opinion mining. Key contributions include the introduction of an automatic annotation module and the creation of a Gujlish dataset. Despite a limited dataset size, the proposed model achieved superior results, attaining a maximum accuracy of 81%. The findings indicate that performance can be further improved as more labelled data becomes available. This work demonstrates the capability of AI‑driven systems to improve content development and user engagement on social media platforms. The research presents an innovative tool that bridges linguistic diversity with technological innovation. Its application has the potential to support inclusive strategies for multilingual populations, enabling governments and organizations to address diverse socio-linguistic needs effectively. The results provide novel insights that support the development of optimized approaches for resource-poor languages. This paper highlights the practical advantages of the proposed methodology and outlines potential directions for future enhancements.
Downloads
References
Ekbal A (2024) Atmosphere kamaal ka tha (Was Wonderful): A multilingual joint learning framework for aspect category detection and sentiment classification. IEEE Transactions on Computational Social Systems 11(5): 5892-5902. DOI: https://doi.org/10.1109/TCSS.2024.3374450
Solanki NN, Shah DB (2021) Words reflect man-a review on opinion Mining. International Journal of Scientific Research in Science and Technology 8(3): 296-299. DOI: https://doi.org/10.32628/IJSRST218337
Solanki NN, Shah DB (2022) A comparative assessment of deep learning approaches for opinion mining. In: Rajagopal S, Faruki P, Popat K (ed) International conference on advancements in smart computing and information security, Cham: Springer Nature Switzerland, pp 49-61. DOI: https://doi.org/10.1007/978-3-031-23092-9_5
Nkomo LM, Alm A (2022) Sentiment analysis: capturing chatbot experiences of informal language learners. In: Emerging concepts in technology-enhanced language teaching and learning, IGI Global Scientific Publishing, pp 215-231. DOI: https://doi.org/10.4018/978-1-7998-8981-6.ch011
Ahmad GI, Singla J, Nikita N (2019) Review on sentiment analysis of Indian languages with a special focus on code mixed Indian languages. In: International conference on automation, computational and technology management, IEEE, pp 352-356. DOI: https://doi.org/10.1109/ICACTM.2019.8776796
Kaur G, Kaushik A, Sharma S (2019) Cooking is creating emotion: A study on hinglish sentiments of youtube cookery channels using semi-supervised approach. Big Data and Cognitive Computing 3(3): 37. DOI: https://doi.org/10.3390/bdcc3030037
Shah SR, Kaushik A, Sharma S, Shah J (2020) Opinion-mining on marglish and devanagari comments of youtube cookery channels using parametric and non-parametric learning models. Big Data and Cognitive Computing 4(1): 3. DOI: https://doi.org/10.3390/bdcc4010003
Jabreel MHA (2020) Sentiment analysis of textual content in social networks, from hand-crafted to deep learning- based models. Dissertation, Universitat Rovira i Virgili.
Bhadka DB, Shah DB, Patel NS (2019) Mobile computing opinion mining on agriculture schemes of Gujarat government. Wadhwancity: C U Shah University.
Singh G (2021) Sentiment analysis of code-mixed social media text (hinglish). arXiv preprint arXiv 2102: 12149.
Shah P, Swaminarayan P, Patel M (2022) Sentiment analysis on film review in Gujarati language using machine learning. International Journal of Electrical and Computer Engineering 12(1): 1030-1039. DOI: https://doi.org/10.11591/ijece.v12i1.pp1030-1039
Kazhuparambil S, Kaushik A (2020) Cooking is all about people: comment classification on cookery channels using bert and classification models (malayalam-english mix-code). arXiv preprint arXiv 2007: 04249. DOI: https://doi.org/10.20944/preprints202006.0223.v1
Patel HH, Patel BC, Lad KB (2022) Opinion mining of Gujarati language text using hybrid approach. United International Journal for Research & Technology 3(4): 105-110.
Kuriyozov E, Matlatipov S, Alonso MA, Gómez-Rodrıguez C (2019) Deep learning vs. classic models on a new Uzbek sentiment analysis dataset. Human Language Technologies as a Challenge for Computer Science and Linguistics: 258-262.
Hossain MS, Nayla N, Rassel AA (2022) Product market demand analysis using nlp in banglish text with sentiment analysis and named entity recognition. In: 56th annual conference on information sciences and systems (CISS), IEEE, pp 166-171. DOI: https://doi.org/10.1109/CISS53076.2022.9751188
Siva Subrahamanyam Varma Kusampudi, Preetham Sathineni, and Radhika Mamidi. 2021b. Sentiment Analysis in Code-Mixed Telugu English Text with Unsupervised Data Normalization. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). 753–760. DOI: https://doi.org/10.26615/978-954-452-072-4_086
Shakeel, M. H., & Karim, A. (2020, March). Adapting deep learning for sentiment classification of code-switched informal short text. In Proceedings of the 35th annual ACM symposium on applied computing (pp. 903-906). DOI: https://doi.org/10.1145/3341105.3374091
Kumar, S. M., Reddy, N., Malapati, A., & Kumar, L. (2021, December). An Ensemble Model for Sentiment Classification on Code-Mixed Data in Dravidian Languages. In FIRE (Working Notes) (pp. 1085-1093).
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.