1 s2.0 S2590123023000671 Main
1 s2.0 S2590123023000671 Main
1 s2.0 S2590123023000671 Main
Results in Engineering
journal homepage: www.sciencedirect.com/journal/results-in-engineering
Technical Note
A R T I C L E I N F O A B S T R A C T
Keywords: The 2030 Agenda of the United Nations (UN) revolves around the Sustainable Development Goals (SDGs).
Sustainability A critical step towards that objective is identifying whether scientific production aligns with the SDGs’
United Nations achievement. To assess this, funders and research managers need to manually estimate the impact of their
Sustainable Development Goals
funding agenda on the SDGs, focusing on accuracy, scalability, and objectiveness. With this objective in mind, in
Artificial Intelligence
this work, we develop ASDG, an easy-to-use Artificial-Intelligence-based model for automatically identifying the
Aerospace Engineering
potential impact of scientific papers on the UN SDGs. As a demonstrator of ASDG, we analyze the alignment
of recent aerospace publications with the SDGs. The Aerospace data set analyzed in this paper consists of
approximately 820,000 papers published in English from 2011 to 2020 and indexed in the Scopus database.
The most-contributed SDGs are 7 (on clean energy), 9 (on industry), 11 (on sustainable cities), and 13 (on
climate action). The establishment of the SDGs by the UN in the middle of the 2010 decade did not significantly
affect the data. However, we find clear discrepancies among countries, likely indicative of different priorities.
Also, different trends can be seen in the most and least cited papers, with apparent differences in some SDGs.
Finally, the number of abstracts the code cannot identify decreases with time, possibly showing the scientific
community’s awareness of SDG.
In 2015 all state members of the United Nations (UN) adopted the • SDG 2: End hunger, achieve food security and improved nu-
2030 Agenda for Sustainable Development. The UN intends to promote trition and promote sustainable agriculture.
peace and prosperity for people and the planet with a vision for the near
future. To make that vision a reality, the 2030 Agenda consists of 17 • SDG 3: Ensure healthy lives and promote well-being for all at
Sustainable Development Goals (SDGs) [1]. They represent the actions all ages.
that countries from all over the world (both developed and developing)
• SDG 4: Ensure inclusive and equitable quality education and
should implement as global cooperation for the future of our planet.
promote lifelong learning opportunities for all.
The 17 SDGs, see description in [1], are as follows (those most
closely related to Aerospace Engineering have been written in italic • SDG 5: Achieve gender equality and empower all women and
font): girls.
* Corresponding author.
E-mail address: [email protected] (S. Hoyas).
https://doi.org/10.1016/j.rineng.2023.100940
Received 4 November 2022; Received in revised form 9 January 2023; Accepted 3 February 2023
Available online 10 February 2023
2590-1230/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
A. Sánchez-Roncero, Ò. Garibo-i-Orts, J.A. Conejero et al. Results in Engineering 17 (2023) 100940
• SDG 6: Ensure availability and sustainable management of A critical example of the importance of understanding the situa-
water and sanitation for all. tion of a certain funder about the SDGs is climate change. The SDGs
must be accomplished while we are amid a climate emergency, as con-
• SDG 7: Ensure access to affordable, reliable, sustainable, and firmed in the last Intergovernmental Panel on Climate Change [6]. This
modern energy for all. is particularly important in the case of Aerospace Engineering [7,8].
• SDG 8: Promote sustained, inclusive, and sustainable eco- To summarize the importance of aerodynamics, for example, about a
nomic growth, full and productive employment, and decent work quarter of today’s energy is spent moving fluids along pipes or vehi-
for all. cles through air or water. Turbulence dissipates 25% of this energy,
which is responsible for up to 5% of the CO2 dumped by humanity every
• SDG 9: Build resilient infrastructure, promote inclusive and sus- year [9]. Considering that 340 billion liters of fuel were used in 2017 for
tainable industrialization, and foster innovation. air transportation worldwide (as reported by IATA [10]), there is con-
• SDG 10: Reduce inequality within and among countries. siderable potential for energy savings and fuel consumption reduction.
Before the coronavirus Disease-19 (COVID-19) pandemic, this quantity
• SDG 11: Make cities and human settlements inclusive, safe, re- grew yearly at an unsustainable 3% rate.
silient and sustainable. To summarize, ASDG can be seen as a contribution to SDG17. How-
ever, we will not consider this SDG as it is the most difficult to identify.
• SDG 12: Ensure sustainable consumption and production pat-
Certainly, few papers coming from the aerospace world are devoted to
terns.
this issue. Excepting this SDG, ASDG is ready to analyze any field of sci-
• SDG 13: Take urgent action to combat climate change and its ence and technology. This work presents the first application of ASDG:
impacts. Aerospace Engineering. A summary of ASDG is given in the next section,
together with a description of the database. The results are explained in
• SDG 14: Conserve and sustainably use the oceans, seas, and
the third section. Conclusions and future work are described in the last
marine resources for sustainable development.
section.
• SDG 15: Protect, restore and promote sustainable use of ter-
restrial ecosystems, sustainably manage forests, combat desertifi- 2. Methods
cation, and halt and reverse land degradation and halt biodiversity
loss. The code employed for this article, ASDG, can identify the connec-
• SDG 16: Promote peaceful and inclusive societies for sus- tion between a paper and an SDG through its abstract. It uses four differ-
tainable development, provide access to justice for all and build ent models: Non-Negative Matrix Factorization (NMF) [11], Distributed
effective, accountable and inclusive institutions at all levels. Representations of Topics (Top2Vec) [12], Latent Dirichlet Allocation
(LDA) [13], and BERTopic [14]. Due to their inherently different na-
• SDG 17: Strengthen the means of implementation and revi- ture, the information that each model extracts from a text is different.
talize the Global Partnership for Sustainable Development. In other words, their functionalities are complementary. To take advan-
tage of this fact, ASDG introduces a voting mechanism. Similar ideas
The two main questions this article wants to contribute are: is the have been used very recently for studying the social network Twitter
Aerospace Engineering scientific community focused on fulfilling the [15]. In the voting stage, ASDG takes the scores of each model for
SDGs? What are the most relevant SDGs in this community? To the best each text as inputs. Using this information, ASDG decides which iden-
of our knowledge, there is no published work about this relationship. tified SDGs have enough confidence to assume that the text relates to
In this work, our answer is given using Artificial Intelligence (AI) tools. them.
It is important to note that the recent paradigm change introduced The validation of ASDG was carried out in a previous publication
by the fast digitalization of business, academics, daily life, and even [5]. The model’s training (based on 510 manually-curated text files re-
policy-making is profound. A recent study by Vinuesa et al. [2] in-depth lated to each SDG) was described in that work. Briefly, after download-
examined how AI affects the accomplishment of the UN’s 2030 Agenda. ing all papers referenced in [2], for a total of 186 works, we manually
Although they discovered that 79% of the aims would be positively af- selected papers with at least an Abstract and Body differentiated, ex-
fected by AI, they also noted that the growth of AI could hinder or even tracting the sections in 40% of them. A Deep Neural Network [16] was
have a detrimental impact on the achievement of 35% of these targets. used to extract the remaining 60% automatically. This tool is based on
The SDGs are all interconnected, and while there are numerous syner- images instead of converting the pdf file to text. We validated this tool
gies, it is vital to recognize and properly document any trade-offs to with the extracted pdf files and checked out every abstract. As the au-
reach the full potential of AI’s ability to contribute to creating a sus- thors of [2] classified all these papers based on an expert consensus, we
tainable future. Furthermore, Gupta et al. [3] extended their work to labeled these papers to classify all these papers correctly, obtaining an
discussions on the implications of AI on the SDGs at the indicator level. 81% agreement.
In this regard, it is crucial to emphasize that implementing clear and The methods mentioned above are briefly described next.
understandable strategies requires employing AI-based technologies to
achieve the SDGs. According to Vinuesa and Sirmaeck [4], deploying 2.1. NMF
interpretable AI would produce an algorithmic usage that focuses on
accountability and transparency. Non-negative Matrix Factorization model (NMF) [11]. This method
With this in mind, a preliminary version of our code ASDG (Auto- can reduce the space dimension of the problem, extracting essential
matic Classification of Impact to Sustainable Development Goals) can features. We consider 16 topics, as SDG 17 is currently not considered.
be found in [5]. We believe that a promising way to achieve significant All training and validation texts have been preprocessed. This includes:
progress in the SDG Agenda is by using AI-based methods to inform
policy decisions to maximize the synergies and minimize the trade-offs. • Words lemmatization + stop words
With this goal in mind, we created ASDG. This AI-based framework con- • Removing numeric and non-ASCII characters.
stitutes a step in this direction by enabling the automatic classification • Words frequency and documents frequency were set to 1. This con-
of hundreds of thousands of scientific papers by their impact on each figuration means that no words are excluded.
SDG. • Bigrams were allowed.
2
A. Sánchez-Roncero, Ò. Garibo-i-Orts, J.A. Conejero et al. Results in Engineering 17 (2023) 100940
All training texts are automatically identified with the appropriate perspective. This has the advantage of separating the clustering
SDG, using this information to associate each topic with one SDG. The technique from the topic generation, allowing more flexibility.
score corresponding to each topic for each text file is queried after the
model has been trained. The named SDGs are multiplied by that score, 2.5. Voting
then recorded in a subject association map (nTopics x 17). The values
for each topic are normalized (values/sum (values)), and those topics A combination of the previously described model is used to take
with scores of less than 0.1 are discarded. The final result is a matrix, advantage of their respective strengths, as the models complement each
where each row represents the likelihood that each SDG will be associ- other. After a careful study, one document is linked to an SDG if:
ated with a particular and single topic.
• Any model’s score on an SDG is greater than 0.4 (maximum 0.5),
2.2. Top2Vec or
• The model’s score on an SDG is greater than 0.1 for LDA and
A Top2Vec model [12] was trained using the embedding model “all- BerTopic.
MiniLM-L6-v2”. This embedding was pre-trained on a larger corpus,
which works better when the training corpus is small. A light prepro- Using this voting system, we successfully classified 81% of the papers
cessing is required to remove non-ASCII characters. In this case, no based only on the information in the abstract.
document segmentation is defined. The extraction of topics was unsu-
pervised. Since the association of the training texts with the SDGs was 2.6. Database and implementation
known beforehand, we queried the associated texts and their scores for
each topic, creating an association matrix as it was done with the NMF Regarding the database, we have downloaded 820,000 documents,
model. comprising articles, conference papers, and books from the Scopus
database [18]. The search criterion relied on seeking the words
2.3. LDA “aerospace,” “aeronautics,” “aeronautical,” and “aviation” in all the
metadata of the papers. We selected papers from 2011 to 2020, saving
A latent-Dirichlet-allocation model [13] was also trained with the the following data:
following configuration:
• Abstract
• Number of topics: 16. • Year
• Passes: 400. Iterations: 1000. Chunk size: 2000 • Citations, as of November 2022.
• Bigrams are allowed • Country
• Minimum word count: 1, Maximum word frequency: 0.7 • Keywords
• Open-access information
The training and validation texts were preprocessed similarly to the
NMF case. In this case, the model assumes that the documents follow a For obvious reasons, the language of the document must be English.
Dirichlet distribution over topics and topics over words. Thus, it in- This procedure may lead to over-represented affiliations in English, and
herently allows having more than one topic in each document. The some papers of one of the authors, i.e., [19,20], are not found based on
association matrix was calculated as with the other models. Only the these keywords. However, the casuistic can be extremely long, and it is
UN training texts were used. Note that this method has been success- nearly impossible to add every possible author to the list. Nevertheless,
fully employed to automatically classify the AI curricula of a wide range the number of papers studied is high. We firmly believe it represents
of universities based on their respective contents [17]. the state of Aerospace Engineering to SDGs, as we are analyzing a pro-
duction of more than 80,000 papers yearly. The set was downloaded
2.4. Bertopic in packages of around 20,000 documents each, taking special care of
not repeating any document. Finally, around one hundred papers were
BERTopic is a topic modeling technique very similar to Top2Vec discarded because they did not contain an abstract.
since both are unsupervised clustering-based techniques [14]. BERTopic To summarize, and following the flowchart of Fig. 1, for every doc-
extracts coherent topic representation via implementing a class-based ument, we have performed the following algorithm:
variation of the term frequency-inverse document frequency (TF-IDF).
The steps it follows are: 1. Extract the abstract and metadata from a CSV file.
2. Lemmatize and remove any non-ASCII character.
sdg
• Generating the document embeddings with a pretrained transfor- 3. Compute the score for every SDG and every method 𝛼𝑥 .
mer-based language model. The embedded words which are seman- 4. Evaluate the score for every SDG, following the rules of the box of
tically similar will be placed close to each other in semantic space. Fig. 1.
In this way, document-level information is extracted from the cor- 5. Extract the SDG with the maximum score.
pora. 6. Save this SDG with the document’s metadata to an output file.
• The document embeddings are dimensionally reduced. This is be-
cause as data increases dimensionality, the distance to the closest This algorithm was implemented in Python version 3.9. The code is
point tends to approach the distance to the farthest point. As a re- easily parallelizable, as every document can be run independently. We
sult, in high dimensional space, spatial locality becomes ill-defined, ran it on a typical computer, taking less than 3 hours to classify all the
and distance measures differ little [14]. abstracts.
• A density-based method cluster is created. This technique assumes
that words near the cluster’s centroid are most representative of 3. Results
that cluster. However, in practice, a cluster will not always lie
within a sphere around a cluster centroid which might conduce To study the results of our analysis, we will use the term frequency,
to the extraction of misleading topics. defined as
• Topics vectors are extracted from the cluster. A class-based version 𝐷sdg
of TF-IDF is used to overcome the limitation of the centroid-based 𝐹= .
𝐷total
3
A. Sánchez-Roncero, Ò. Garibo-i-Orts, J.A. Conejero et al. Results in Engineering 17 (2023) 100940
sdg
Fig. 1. Flowchart of the ASDG framework, where 𝛼𝑥 stands for the score in method 𝑥 for SDG sdg. This process has been carried out for the 820,000 documents in
the database.
Table 1
Frequency expressed as a percentage of selected SDGs in 2011 and 2020. The
last row shows the difference between these two rows.
SDG 3 7 9 11 12 13 15
• This study has been done with abstracts, which makes the identifi-
cation more difficult.
• We have used a high threshold, avoiding false identifications as
much as possible.
4
A. Sánchez-Roncero, Ò. Garibo-i-Orts, J.A. Conejero et al. Results in Engineering 17 (2023) 100940
Fig. 3. Distribution of the database in terms of absolute numbers (left) and frequency (right) for every year. The SDGs are represented by their colors, starting from
1 at the bottom and following the order in Fig. 2 in a counterclockwise sense. The black region corresponds to unidentified abstracts, and the white dotted lines
indicate the transition between the three large groups.
5
A. Sánchez-Roncero, Ò. Garibo-i-Orts, J.A. Conejero et al. Results in Engineering 17 (2023) 100940