Topic Map v2
Topic Map v2
Topic Map v2
0 Topic Map
Collection
Pre-processing
•Sentiment analysis
•Classification
•Topic discovery
•Information Extraction
•Temporal
•Location
•Trends
•Manual analysis
•Comparison of diff. approaches
•Archival
Collection
As research works are now data-driven, there is a need for a databank of Philippine language resources.
Towards addressing this concern, students who are interested will develop tools and techniques that
can aid automatic collection and categorization of texts. This includes crawling the web for language
resources and automatically storing and organizing them based on language. Related work includes
clustering the languages, and annotating each collected text.
Target Venue(s):
Starting Reference(s):
Authors Oco, Nathaniel; Syliongka, Leif Romeritch; Allman, Tod; Roxas, Rachel Edita
Title Resources for Philippine Languages: Collection, Annotation, and Modeling
Publication The 30th Pacific Asia Conference on Language, Information and Computation
Pages 433-438
Year 2016
Publisher Institute for the Study of Language and Information at Kyung Hee University
Authors Oco, Nathaniel; Ilao, Joel; Roxas, Rachel Edita; Syliongka, Leif Romeritch;
Title Measuring language similarity using trigrams: Limitations of language identification
Publication 2013 International Conference on Recent Trends in Information Technology (ICRTIT)
Pages 478-481
Year 2013
Publisher IEEE
Pre-Processing
Textual data has been the main resource for numerous software programs. One of the integral
considerations is the proper representation and use of high quality data. In order to achieve such
quality, text pre-processors – or subprograms that modify the raw data to custom fit or provide new
data features to a given system – are needed. Currently, there are numerous pre-processors that are
available. However, there exists no compilation of tools that are lightweight and flexible to different
kind of systems or language domains. Students are to develop pre-processing tools for textual data.
These may consist of the following:
Tokenization
o Cleaning
o URLs
o Special Characters
o Length Limit
o Duplicates
o Stop words
True-casing (e.g. john -> John)
Feature Extraction (Affixes)
Stemming (Root words)
Text Transformation
o Standard text normalization (e.g. resume -> résumé, canonicalization)
o Unicode normalization (e.g. ñ -> U+00F1, Å -> U+00C5)
o Shortcut text normalization (e.g. LOL -> Laughing Out Loud, gr8 -> great)
o Spell / grammar check
o Translation
Target Venue(s):
Starting Reference(s):
Authors Nocon, Nicco; Oco, Nathaniel; Ilao, Joel; Roxas, Rachel Edita;
Title Philippine component of the network-based ASEAN language translation public service
2014 International Conference on Humanoid, Nanotechnology, Information Technology,
Publication Communication and Control, Environment and Management (HNICEM)
Year 2014
Publisher IEEE
Target Venue(s):
Starting Reference(s):
Authors Regalado, Ralph Vincent J; Chua, Jenina L; Co, Justin L; Tiam-Lee, Thomas James Z;
Subjectivity Classification of Filipino Text with Features Based on Term Frequency--
Title Inverse Document Frequency
Publication 2013 International Conference on Asian Language Processing (IALP)
Pages 113-116
Year 2013
Publisher IEEE
Twitter has been found to be a potentially useful source of information in times of disaster. As a
microblogging platform, users tend to use it for near-real-time updates. Specifically, in the context of
disasters, some use it to report damage, request for assistance, find missing persons, etc. These could be
useful for concerned entities like government agencies that conduct disaster response. However, with
the large multitude of tweets, it is hard for people to manually scour through them; the task is
sometimes likened to finding a needle in a haystack. Thus, automatic classification of relevant tweets
will be useful for situations like these. Students interested in this area will be involved in experimenting
with different features (like word embeddings) and classification algorithms to achieve this end goal.
Target Venue(s):
Starting Reference(s):
Target Venue(s):
Starting Reference(s):
Ligutom III, Cerino; Orio, Jay Vincent; Ramacho, Dyannah Alexa Marie; Montenegro,
Authors Chuchi; Roxas, Rachel Edita; Oco, Nathaniel;
Title Using Topic Modelling to make sense of typhoon-related tweets
Publication 2016 International Conference on Asian Language Processing (IALP)
Pages 362 - 365
Year 2017
Publisher IEEE
Authors Soriano, Cheryll Ruth; Roldan, Ma Divina Gracia; Cheng, Charibeth; Oco, Nathaniel;
Social media and civic engagement during calamities: the case of Twitter use during
Title typhoon Yolanda
Publication Philippine Political Science Journal
Volume 37
Number 1
Pages 06-25
Year 2016
Publisher Routledge
Syliongka, Leif Romeritch; Oco, Nathaniel; Lam, Alron Jan; Soriano, Cheryll Ruth; Roldan,
Authors Ma Divina Gracia; Magno, Francisco; Cheng, Charibeth;
Combining Automatic and Manual Approaches: Towards a Framework for Discovering
Title Themes in Disaster-related Tweets
Publication Proceedings of the 24th International Conference on World Wide Web
Pages 1239-1244
Year 2015
Publisher ACM
Information Extraction
Possible Topic: Visualizing Disaster Information Extracted from Philippine News Articles / Tweets
News articles and tweets contain loads of information on disasters before, during, and after it happens.
These information sources contain typhoon names, date range of occurrence, locations hit, casualties,
financial and material needs of the victims, and others. They also contain information about donations
(and of what type) provided by countries, organizations, individuals to the victims. In this research,
students will create an automated way of extracting this information from these sources and displaying
them in a visual way showing the series of events related to each typhoon.
Target Venue(s):
Starting References:
https://www.aclweb.org/anthology/W/W14/W14-2905.pdf
https://www.aclweb.org/anthology/W/W16/W16-3906.pdf
https://www.aclweb.org/anthology/C/C08/C08-3001.pdf
Resources
http://bit.ly/1MpcFoT
Projects
LanguageTool: https://languagetool.org/
ASEANMT: http://aseanmt.org/
QuakeCAFE: http://quakecafe.org/
Online Tools
Twitter 4J: http://twitter4j.org/en/
SentiWordNet: http://sentiwordnet.isti.cnr.it/
Weka: http://www.cs.waikato.ac.nz/ml/weka/