Research:Newsletter/2016/June: Difference between revisions

Content deleted Content added

Inline

Revision as of 14:54, 10 July 2016

{{Wikipedia:Signpost/Template:Signpost-article-start|Using deep learning to predict article quality; search engine helps school kids navigate Chinese Wikipedia|By Morten Warncke-Wang,Liang (WMTW), Tsung-Ho Liang …| 4 July 2016}}

Vol: {{{1}}} • Issue: {{{2}}} • {{{3}}} [contribute] [archives]

{{{4}}}

Using deep learning to predict article quality

Reviewed by Morten Warncke-Wang

A short paper presented at the Joint Conference on Digital Libraries titled "Quality Assessment of Wikipedia Articles Without Feature Engineering"^[1] uses deep learning to predict the quality of articles in the English Wikipedia. As the paper's title alludes to, previous research on article quality has used a specific set of features to represent the articles, whereas the promise of deep learning is that the machine learner will determine the best representation on its own.

Some representation of the articles still requires to be chosen, and the paper uses "Doc2Vec", an extension of Word2vec that uses unsupervised machine learning to learn vector representations of the articles. A benefit of this approach is that it is language neutral, whereas other approaches might utilize features that are language-specific. These vectors are learned from a training set based on the Wikimedia Foundation's dataset of 30,000 English articles. A deep neural network using Google’s TensorFlow library is then trained using these vectors with the aim to predict which of the English Wikipedia’s assessment classes an article belongs to.

The performance of the classifier is compared to the current state of the art, which at the time of writing is WMF's own Objective Revision Evaluation Service (ORES; disclaimer: the reviewer is the primary author of the research upon which ORES' article quality classifier is built). Since the number of articles in each class is fairly balanced, the proportion of correctly classified instances (accuracy) is used as the performance measure. ORES is reported to be 60% accurate (it currently reports 61.9% accuracy), and the deep neural network was found to be 55% accurate. As pointed out in the paper, this work is a first step towards using deep learning for this task, meaning that slightly lower performance is perfectly acceptable. The authors describe a couple of changes that will most likely improve the classifier and aim to do so in future work. Deep learning is an area where much interesting work is happening, and if it can be used to improve our ability to automatically assess Wikipedia articles, a service that is already useful to many Wikipedians through services like WikiProject X and SuggestBot, that is only for the better!

Taiwanese researcher develop new ontology tool using Chinese Wikipedia to help primary and secondary school kids learn better

By Liang (WMTW) and Tsung-Ho Liang (Tainan, Taiwan)

Dr. Tsung-Ho Liang (梁宗賀)^{[supp 1]} is a systems analyst in the information center at the Tainan City Government's Bureau of Education. He is currently involved in studying big data in education, especially dealing with unstructured data and natural language processing techniques. In 2013, he started a project to integrate the contents of Chinese Wikipedia with the Chinese Knowledge and Information Processing (CKIP) technology and established a new search engine for Chinese Wikipedia^{[supp 2]} – WikiSeeker (維基嬉客).

WikiSeeker is a tailor-made search system based on the Wikipedia corpus to leverage search effectiveness by providing structured association graphs with related Wikipedia articles (as a knowledge map form) for students' queries in Chinese. First, it produces a knowledge map with clear relationships among each field of knowledge, so students can easily identify the most important keywords among contents. Second, the search bar of WikiSeeker is capable of using natural language to search instead of typing keywords. You can see a tour of WikiSeeker on Youtube.

The above two features make WikiSeeker intuitive and easy to use for K-12 students. According to the research essay "WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness" ^[2] by the researcher Sheng-Nan Cheng (鄭盛南), two experimental groups were adopted in this study: one asks students to use Chinese Wikipedia directly to answer questions, and another asks students use the WikiSeeker website to answer the same questions. The results showed that the students who used WikiSeeker were 10.8% more correct in their answers (on average, 13.73 out of 19, compared to 15.8 out of 19 questions). Moreover, it was found that female or middle-achieving students reached the highest learning improvement when using WikiSeeker. The conclusion suggests that WikiSeeker is suitable for students to acquire knowledge in Chinese Wikipedia.

References

↑ Dang, Quang Vinh; Ignat, Claudia-Lavinia (2016). "Quality Assessment of Wikipedia Articles Without Feature Engineering". Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. JCDL '16. New York, NY, USA: ACM. pp. 27–30. ISBN 9781450342292. doi:10.1145/2910896.2910917.
↑ 鄭盛南 (Sheng-Nan Cheng): 維基嬉客(WikiSeeker) 一個結構化關聯圖之搜尋系統對於學生學習成效之研究 (WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness). http://handle.ncl.edu.tw/11296/ndltd/34378100977780630350 Thesis, National University of Tainan 2015 (in Chinese)

Supplementary references:

↑ "Dr. Tsung-Ho Liang Curriculum Vitae". Retrieved 2016-07-09.
↑ "Chinese knowledge and information processing". Retrieved 2016-07-09.

{{Wikipedia:Signpost/Template:Signpost-article-comments-end||2016-05-28|2016-07-31}}

[”dang_jcdl2016”-1] Dang, Quang Vinh; Ignat, Claudia-Lavinia (2016). "Quality Assessment of Wikipedia Articles Without Feature Engineering". Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. JCDL '16. New York, NY, USA: ACM. pp. 27–30. ISBN 9781450342292. doi:10.1145/2910896.2910917.

[4] 鄭盛南 (Sheng-Nan Cheng): 維基嬉客(WikiSeeker) 一個結構化關聯圖之搜尋系統對於學生學習成效之研究 (WikiSeeker─The Study of the Impact of a Search System with Structured Association Graphs on Learning Effectiveness). http://handle.ncl.edu.tw/11296/ndltd/34378100977780630350 Thesis, National University of Tainan 2015 (in Chinese)

[2] "Dr. Tsung-Ho Liang Curriculum Vitae". Retrieved 2016-07-09.

[3] "Chinese knowledge and information processing". Retrieved 2016-07-09.

[1]

[supp 1]

[supp 2]

[2]

@@ Line 20: / Line 20: @@
 ===Taiwanese researcher develop new ontology tool using Chinese Wikipedia to help primary and secondary school kids learn better===
-:''By [[User:Shangkuanlc|Liang (WMTW)]] and [[User:Protnmike|Tsung-Ho Liang (Tainan, Taiwan)]]''
+:''By [[User:Shangkuanlc|Liang (WMTW)]] and [[User talk:Protnmike|Tsung-Ho Liang (Tainan, Taiwan)]]''
 Dr. Tsung-Ho Liang (梁宗賀)<ref group=supp>{{cite web| url=https://sites.google.com/site/protnmike/ | title = Dr. Tsung-Ho Liang Curriculum Vitae | accessdate=2016-07-09}}</ref> is a systems analyst in the information center at the [[Tainan]] City Government's Bureau of Education. He is currently involved in studying big data in education, especially dealing with unstructured data and natural language processing techniques. In 2013, he started a project to integrate the contents of Chinese Wikipedia with the Chinese Knowledge and Information Processing (CKIP) technology and established a new search engine for Chinese Wikipedia<ref group=supp>{{cite web|title=Chinese knowledge and information processing|url=http://ckip.iis.sinica.edu.tw/CKIP/engversion/|accessdate=2016-07-09}}</ref> – [http://seeker.tn.edu.tw/ WikiSeeker (維基嬉客)].