2022 Volume 30 Pages 352-360
Assessing whether an ungraded second language learner can read a given text quickly is important for supporting learners of diverse backgrounds. Second language acquisition (SLA) studies have tackled such assessment tasks wherein only a single short vocabulary test result is available to assess a learner. Such studies have shown that the text-coverage or namely the percentage of words the learner knows in the text, is the key assessment measure. Currently, count-based percentages are used, in which each word in the given text is classified as being known/unknown to the learner, and the words classified as known are then simply counted. When each word is classified, we can also obtain an uncertainty value as to how likely each word is known to the learner. However, how to leverage these informative values to guarantee their use as an assessment measure that is comparable to that of the previous values remains unclear. We propose a novel framework that allows assessment methods to be uncertainty-aware while guaranteeing comparability to the text-coverage threshold. Such methods involve a computationally complex problem for which we also propose a practical algorithm. In our evaluation using newly created crowdsourcing-based dataset, our best method under our framework outperformed conventional methods.