Leveraging Temporal Word Embeddings for the Detection of Scientific Trends
Dridi, Amna (2021) Leveraging Temporal Word Embeddings for the Detection of Scientific Trends. Doctoral thesis, Birmingham City University.
Preview |
Text
Amna Dridi PhD Thesis published_Final version_Submitted Feb 2021_Final Award Jul 2021.pdf - Accepted Version Download (5MB) |
Abstract
Tracking the dynamics of science and early detection of the emerging research trends could potentially revolutionise the way research is done. For this reason, computational history of science and trend analysis have become an important area in academia and industry. This is due to the significant implications for research funding and public policy. The literature presents several emerging approaches to detecting new research trends. Most of these approaches rely mainly on citation counting. While citations have been widely used as indicators of emerging research topics, they pose several limitations. Most importantly, citations can take months or even years to progress and then to reveal trends. Furthermore, they fail to dig into the paper content.
To overcome this problem, this thesis leverages a natural language processing method – namely temporal word embeddings – that learns semantic and syntactic relations among words over time. The principle objective of this method is to study the change in pairwise similarities between pairs of scientific keywords over time, which helps to track the dynamics of science and detect the emerging scientific trends. To this end, this thesis proposes a methodological approach to tune the hyper-parameters of word2vec – the word embedding technique used in this thesis – within the scientific text. Then, it provides a suite of novel approaches that aim to perform the computational history of science by detecting the emerging scientific trends and tracking the dynamics of science. The detection of the emerging scientific trends is performed through the two approaches Hist2vec and Leap2Trend.These two approaches are, respectively, devoted to the detection of converging keywords and contextualising keywords. On the other hand, the dynamics of science is performed by Vec2Dynamics that tracks the evolvement of semantic neighborhood of keywords over time.
All of the proposed approaches have been applied to the area of machine learning and validated against different gold standards. The obtained results reveal the effectiveness of the proposed approaches to detect trends and track the dynamics of science. More specifically, Hist2vec strongly correlates with citation counts with 100% Spearman’s positive correlation. Additionally, Leap2Trend performs with more than 80% accuracy and 90% precision in detecting emerging trends. Also, Vec2Dynamics shows great potential to trace the history of machine learning literature exactly as the machine learning timeline does. Such significant findings evidence the utility of the proposed approaches for performing the computational history of science.
Item Type: | Thesis (Doctoral) |
---|---|
Dates: | Date Event 8 February 2021 Submitted 1 July 2021 Accepted |
Uncontrolled Keywords: | Scholarly data mining, trend analysis, computational history of science, machine learning, temporal word embeddings |
Subjects: | CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science CAH11 - computing > CAH11-01 - computing > CAH11-01-08 - others in computing |
Divisions: | Doctoral Research College > Doctoral Theses Collection Faculty of Computing, Engineering and the Built Environment > College of Computing |
Depositing User: | Jaycie Carter |
Date Deposited: | 11 Jul 2022 16:01 |
Last Modified: | 11 Jul 2022 16:01 |
URI: | https://www.open-access.bcu.ac.uk/id/eprint/13410 |
Actions (login required)
![]() |
View Item |