×
This study introduces a methodology to detect Sinhala and English words in code-mixed data and this is the first research done on such scenario.
This paper presents a language detection model with XGB classifier with 92.1% accuracy and a CRF model with a Fl-score of 0.94 for sequence labeling to ...
This study presents a language detection model using machine learning and natural language processing techniques.
The authors [Smith and Thayasivam 2019] presented the first language detection model to detect Sinhala-English code-mixed text. Since this was a novel approach, ...
Therefore, this paper presents a language detection model with XGB classifier with 92.1% accuracy and a CRF model with a F1-score of 0.94 for sequence labeling.
Results Result shows, Boruta algorithm identifies eleven significant features including respondent's age, highest education level, educational attainment, ...
This study presents a new Sinhala-English code-mixed data collected from Facebook comments, chat history and from public posts
Sinhala-English-Code-Mixed-Code-Switched-Dataset · Sentiment - Positive, Negative, Neutral, Conflict · Humor - Humorous, Non humorous · Hate Speech - Hate-Inducing ...
People also ask
Indeed, if the code-mixed data is having Unicode characters, the language detection is straightforward and can be achieved using a simple Python program.
This paper proposes a Neural Ma- chine Translation(NMT) model to trans- late the Sinhala-English code-mixed text to the Sinhala language. Due to the lim- ited ...