11institutetext: TU Berlin, Einsteinufer 25, 10587 Berlin, Germany 11email: [email protected]

Next-Gen Software Engineering:
AI-Assisted Big Models

Ina K. Schieferdecker 11 0000-0001-6298-2327
Abstract

The effectiveness of model-driven software engineering (MDSE) has been demonstrated in the context of complex software; however, it has not been widely adopted due to the requisite efforts associated with model development and maintenance, as well as the specific modelling competencies required for MDSE. Concurrently, artificial intelligence (AI) methods, particularly machine learning (ML) methods, have demonstrated considerable abilities when applied to the huge code bases accessible on open-source coding platforms. The so-called big code provides the basis for significant advances in empirical software engineering, as well as in the automation of coding processes and improvements in software quality with the use of AI. The objective of this paper is to facilitate a synthesis between these two significant domains of software engineering (SE), namely models and AI in SE. The paper provides an overview of the current status of AI-assisted software engineering. In light of the aforementioned considerations, a vision of AI-assisted Big Models in SE is put forth, with the aim of capitalising on the advantages inherent to both approaches in the context of software development. Finally, the new paradigm of pair modelling in MDSE is proposed.

Keywords:
Software Engineering, Model-Driven Engineering, Artificial Intelligence, Big Models, AI4SE

s

1 Introduction

Software engineering (SE) is a field of informatics/computer science that addresses the development and analysis of systematic approaches for the design, development, verification & validation and maintenance of software and of software-based systems111In this paper, the term ”software” will be used as a general reference to the collective term for computer programs and related SE artefacts.. It establishes principles and identifies optimal practices for the production and operation of software, with the aim of ensuring the reliability, security, scalability, maintainability and alignment of software with user, business and societal requirements. The objective of software engineering is the production of high-quality software that is cost-effective, delivered in a timely manner and can be readily adapted as requirements evolve.

The term software engineering was coined in 1967 with the very first software crisis Valdez (1988), but still software engineering practices and the resulting software quality have not kept pace with the quality levels required in critical domains or application contexts. Recently, another dramatic story was added to the software horror show: The root cause analysis for the millions of outages caused by the CrowdStrike Falcon sensor CrowdStrike (2024) found that the number of fields in an IPC template type was not validated, a runtime array bounds check was missing, the content validator contained a logic error, template type testing was too limited, and template instances were not tested within the content interpreter. This was unprofessional software development that ignored state of the art of SE.

However, software outages are not the only indication that SE still has no full answers to the challenges posed by the increasing complexity of software-based systems and the diversity of requirements placed on them: Shah et al. (2012) analysed that “[t]here exists a statistically significant medium sized difference between open and closed source projects: the former have a DD [(defect density)] that is 4 defects per KLoC lower than the latter. Java projects exhibit a significantly lower DD than C projects, 4.1 defects per KLoC on average: In general the Size appears to be negatively correlated to DD: the larger the project the lower the DD. In particular, large projects are 10 times less defective than medium ones.”. According to McConnell (2004), the defect density (DD) of software projects within the industrial sector is estimated to fall within the range of 1 to 25. This suggests that a software program comprising one million lines of code (MLoC) may contain between 1,000 and 25,000 defects. Given the extensive research conducted on the subject of software size in terms of lines of code, the derivation of estimates regarding the approximate number of defects in a software is a relatively straightforward process. To illustrate, Softonic (2023) compares typical software such as operating systems, browsers, and office suites: The Windows 10 operating system has approximately 80 MLoC, Ubuntu 50 MLoC, MacOS X 84 MLoC, Android 12 MLoC, iOS 12 MLoC, the browser Google Chrome 6.7 MLoC, Mozilla Firefox 21 MLoC, or the office suites Microsoft Office 2013 45 MLoC, Apache OpenOffice 19 MLoC, and LibreOffice 10 MLoC. Therefore, the challenges associated with the development of accurate and suitable software remain significant.

Brooks and Bullet (1987) argued that the primary challenge in software development lies in the specification, design, and testing of complex systems, rather than in the labour for representation of the system or testing its fidelity. It emphasized the distinction between essential difficulties (inherent complexity) and accidental difficulties (extraneous challenges), stating that past advances, like high-level programming languages, have only reduced the latter. The paper suggested that addressing accidental difficulties would not lead to major improvements in software development, as essential difficulties remain fundamental. Fraser et al. (2007) critiqued the notion of "accidental difficulties", arguing that these so-called accidents are often the result of negligence or poor practices, not mere happenstance. This paper advocated for a disciplined, science-based, and model-driven approach to software development, similar to traditional engineering disciplines. Building on this, Schieferdecker (2020) highlighted the growing societal dependence on autonomous and intelligent software systems, suggesting that new approaches are needed to ensure not only traditional software quality facets (safety, security, etc.), but also to address socio-technical and socio-political implications, particularly in human-machine collaboration.

In model-driven development (MDD Selic (2003)), also known as model-driven software engineering Schieferdecker (2024), models are original artefacts that are engineered with the intention of facilitating the top-down construction of complex software. Furthermore, models are utilised during runtime to enable the monitoring, verification & validation of software operations, as well as the optimisation of its performance Hailpern and Tarr (2006); France and Rumpe (2007); Weißleder and Lackner (2013).

As has been the case with the comprehensive incorporation of modelling in software engineering, artificial intelligence (AI) methodologies have been employed in software engineering activities from the outset, see for example Barstow (1988). In the context of significant advancements in AI-driven software engineering in 2021, Gartner’s hype cycle for emerging technologies Burke et al. (2021) projected that AI-enhanced software engineering would reach the peak of inflated expectations within a five-to-ten-year timeframe, subsequently entering a phase of productivity stabilization. Since 2023, it is estimated that AI-augmented software engineering will reach the productivity plateau already in two to five years, while in previous iterations of the hype cycle, AI-augmented development and AI-assisted design were identified as emerging technologies on the rise. It was interesting to observe the evolving business expectations and the convergence of trends for AI-supported design and development into a unified technological trend for AI-supported engineering. Moreover, it is encouraging to observe that software engineering is receiving particular attention and prioritisation, which is indicative of the growing importance of this field.

This paper addresses the role of models in software engineering in Section 2, the evolution of big code on coding platforms in Section 3, and the utilisation of AI for coding and software engineering in Section 4. It presents a novel taxonomy of AI for software engineering (AI4SE) to facilitate a deeper understanding of this evolving field. Section 5 examines the practice of contributing software models to coding platforms, which is increasingly being done with software code. In line with the increasing prevalence of large-scale models, it examines recent developments in AI-enhanced modelling and model utilisation within the domain of software engineering. It also presents an updated version of the AI4SE taxonomy, designated as AI4BM, to outline the current status and future prospects of AI applications for Big Models. An outlook concludes the paper.

2 Models in Software Engineering

According to IEEE (2022) (see Figure 1), the software lifecycle is comprised of distinct phases and activities that can be described as knowledge areas, including requirements, architecture, design, construction, testing and maintenance. There are also knowledge areas that deal with the fundamentals of computer science, mathematics and engineering, as well as cross-cutting activities pertaining to software quality, security, configuration management and engineering management. Furthermore, there are cross-cutting knowledge areas that encompass the processes, operations, professional practice and economics of software engineering, along with a distinct area for software engineering models and methods. This knowledge area encompasses the processes and methodologies associated with modelling, the various types of models, including those pertaining to information, behaviour and structure, and the analysis of models, see also Schieferdecker (2024).

Refer to caption
Figure 1: Software Engineering knowledge areas according to SWEBOK IEEE (2022)

Models in SE are the key to dealing with the complexity of software. Models provide the essential abstractions to capture requirements, support design decisions, or to offer comprehensive overviews of software structures and behaviours. They are essential tools in the engineering process for constructing and maintaining software. They are also vital for configuration, monitoring and other runtime support during software operation and management.

The traditional understanding of model-driven software engineering (MDSE) evolved over time:

  1. 1.

    In addition to explicit model development and top-down model use, models are also extracted from software executions with the objective of avoiding the costs associated with model development on the one hand and leveraging the benefits of MDSE on the other hand Bagheri and Sullivan (2013).

  2. 2.

    The combination of model development of model artefacts with model extraction from code artefacts has resulted in top-down/bottom-up MDSE approaches Steffen et al. (2007)benefiting from both directions of MDSE as shown in Vaupel et al. (2015); Weißleder and Lackner (2013) or Garcia et al. (2023).

  3. 3.

    Furthermore, as the concepts underlying programming became increasingly abstract, low-level information and structure models have also become integral to conventional programming. This is exemplified by the use of the data structures often implicitly defined in JSON Lv et al. (2018), XML schemata, or SQL table specifications.

These practical outcomes of model-driven, model-based and/or model-like software developments established a foundation for the utilisation of AI in software engineering, as discussed in Section 5. Prior to this, however, a more detailed view of the current state of practice of big code on open source platforms will be undertaken in Section 3.

3 Big Code in Software Engineering

The advent of software coding platforms can be traced back to the 1990s, although it was not until the early 2000s that they truly gained prominence, with the launch of Github in April 2008 representing a significant milestone in this regard. At the present time, GitHub is the most widely utilised software coding platform. As stated by Blog (2021), GitHub hosts over 100 million projects and 40 million users. It became not only a significant platform for software engineering collaboration, but also a prominent reference for open-source software mining Kalliamvakou et al. (2014). The study also demonstrated that a considerable number of GitHub repositories are not directly related to software development. This is because GitHub is not solely utilised for coding collaborations; it is also employed for collaborations on websites, editing books or other publications, and is even used as a storage platform. A subsequent manual analysis Kalliamvakou et al. (2016) revealed that over a third of the repositories on GitHub were not software developments.

Nevertheless, the software engineering community can further enhance its comprehension of software engineering processes, methods and tools by gaining insights via the mining of coding platforms. An overview of the GitHub platform as a whole or of a number of repositories can be obtained by utilising tools such as Kibble Apache (2024). In all lines of code of the coding projects on GitHub, approximately 70% are comprised of actual code, with approximately 20% consisting of information structures in formats such as JSON or XML (see Figure 2).

Refer to caption
Figure 2: Coding projects breakdown on GitHub by Kibble Apache (2024)

In this context, the concept of "Big Code", analogous to the notion of "Big Data", was introduced in reference to the vast quantities of software code that have accumulated over time Markovtsev and Long (2018); Ortin et al. (2016); Vechev et al. (2016) and that can be analysed and reused to gain deeper insights into software engineering in general, as well as using the code to train ML models.

4 AI in Software Engineering

The use cases for artificial intelligence (AI) in software engineering (SE) are numerous and cover a wide spectrum as discussed already decades ago Barstow (1988) and since then. For example, Feldt et al. (2018) proposes the AI in SE Application Levels (AI-SEAL) taxonomy, which differentiates between the point of applying AI to the software engineering process, the software product or at runtime, the levels of automation in between 1 (“[h]uman considers alternatives, makes and implements decision”) and 10 (“[c]omputer makes and implements decision if it feels it should, and informs human only if it feels this is warranted.”)222Expressing emotion in a computer is currently scientifically problematic because there is no such thing as a computer with the capacity for emotion. A better wording would have been “[c]omputer makes and implements decision if it decides it should, and informs human only if it decides this is warranted.”, and the types of AI along the five tribes differentiation by Domingos (2015). It is noteworthy that the majority of the papers analysed in Feldt et al. (2018) employ AI in the software engineering process. However, despite the numerous facets of the software engineering process (Figure 1), this study does not provide further elaboration. Nevertheless, other publications offer more detailed discussions of AI in SE, including Barenkamp et al. (2020) or Ozkaya (2023). Notwithstanding the aforementioned considerations, a taxonomy for the role of AI in software engineering has yet to emerge.

In light of the recent advancements in machine learning (ML) that have made significant breakthroughs possible, this paper focuses mainly on the latest developments in the application of ML to SE. For ML to be effective, it is essential to utilise the appropriate structures inherent to SE artefacts and has been a topic of considerable debate: Allamanis et al. (2018) presents an overview of the various ways in which source code can be represented, including representational models of tokens, token contexts, program dependency graphs, API calls, abstract syntax trees, object usage, and others. These representational models are used for AI assistance in SE, including the creation of recommender systems, the inference of coding conventions, the detection of anomalies and defects, the analysis of code, the rewriting and translation of code, the conversion of code to text for the purposes of documentation and information retrieval, and the synthesis and general generation of code from text. Karampatsis et al. (2020) adds further applications of AI assistance such as code completion, API migration and code repair.

Furthermore, Allamanis et al. (2018) presents [t]he naturalness hypothesis[:] Software is a form of human communication; software corpora have similar statistical properties to natural language corpora; and these properties can be exploited to build better software engineering tools.”. Given the intrinsic formats and formal characteristics of coding and modelling languages employed in SE, the second aspect of the naturalness hypothesis can be considered relatively straightforward. Nevertheless, the initial proposition reinforces the necessity for SE artefacts that are readily comprehensible. This assertion was previously made by  Fowler (2005) in a different form: “[A]ny fool can write code that a computer can understand, good programmers write code that humans can understand.”. Consequently, the application of AI in SE is not merely concerned with code generation; it also encompasses the enhancement of code through techniques such as refactorings, with the objective of optimising readability and maintainability.

Refer to caption
Figure 3: AI-assisted Coding

So, while a vast amount of approaches to utilising AI in SE have emerged in the scientific literature, they can be broadly categorised into three principal lines of enquiry: supporting the understanding of SE artefacts, the generation of SE artefacts from another, and the improvement of SE artefacts. These principal approaches may be applied to a variety of SE artefacts, including, but not limited to, requirements, designs, codes, tests, builds and/or miscellaneous of SE processes.

Therefore, with respect to the AI-SEAL taxonomy Feldt et al. (2018), we consider the goals of AI application as missing and add them as a separate dimension with the three facets of understanding, generation and improvement. Furthermore, we refine the phases of application according to IEEE (2022) by extending the software production into the tasks of software engineering including requirements, architecture & design, construction, testing, and maintenance. For the overall SE process, we differentiate its management and its economics. We also extend software operations to software configuration and software execution. Finally, we limit the levels of autonomy in resembles of the levels of autonomous driving Barabas et al. (2017) into four levels of support from recommendations to full automation: (1) AI assistance, where the developer is in full control and receives recommendations to chose from, (2) AI-assisted selection where the AI preselects options, (3) AI-based partial automation where the AI selects options in simple, standard cases, and (4) AI-based full automation where the AI operates without the developer. As of today, level 1 is the most often used and level 4 is by far from becoming realistic. Whether there will be further differentiation in the preference for automation is also an open question.

This novel AI4SE taxonomy is shown in Figure 4. Due to space limitations, we will focus on the state of the art in applying ML to software development only. In addition, the fidelity and applicability of this new taxonomy is examined by presenting mainly recent and highly cited publications.

Refer to caption
Figure 4: AI4SE - The Taxonomy of AI-assisted Software Engineering
Refer to caption
Figure 5: AI-assisted Software Development

Körner et al. (2014) contributes to requirements understanding, generation and improvement by presenting an AI-based automation approach to requirements engineering that begins by converting natural language into an Eclipse Modeling Framework (EMF) model. It then applies linguistic rules to identify errors, such as ambiguities or incorrect quantifiers, and provides suggestions for requirements analysts to make final decisions. This approach supports the entire requirements elicitation and change process.

Another requirements improvement is given in Perini et al. (2012) presenting the prioritisation of requirements by combining the preferences of project stakeholders with approximations of the order of requirements computed by ML techniques.

For Architecture & design understanding and generation, Bhat et al. (2019) describes the automated curation of design decisions to support architectural decision-making. It helps software architects by organizing and recommending design decisions based on previous cases and contextual information of the current project. By leveraging existing design knowledge, the approach analyses historical data and design choices to improve the quality and consistency of architectural designs.

The list of publications on AI-assisted coding is huge. Major developments are for example described in Gupta and Sundaresan (2018) for code understanding. It introduces DeepCodeReviewer that uses deep learning to recommend code reviews for common issues based on historical peer reviews. It assess the relevance of reviews to specific code snippets, suggests appropriate reviews from a repository of common feedback, and improves code reviews by focusing on defect detection. Guo et al. (2020) describes GraphCodeBERT, a pre-trained model for programming languages that incorporates data flow semantics rather than just code syntax. GraphCodeBERT demonstrates its performance both in code understanding for code search and clone detection, and in code generation and improvement through code translation and code refinement. Ceran et al. (2023) presents a study focused on predicting software quality using defect density as a key feature representing quality to achieve higher accuracy in software quality prediction compared to previous studies. The research shows that data pre-processing, feature extraction and the application of ML algorithms significantly improve prediction accuracy.

For code generation, Bird et al. (2023) discusses early experiences of developers using GitHub Copilot, which uses a language model trained on source code. Guided by Copilot, developers can write code faster than a human colleague, potentially accelerating development. Three empirical studies with Copilot highlight the different ways developers use Copilot, the challenges they face, the evolving role of code review, and the potential impact of pair programming with AI on software development. Svyatkovskiy et al. (2020) discusses IntelliCode Compose, a multilingual code completion tool that predicts entire sequences of code tokens up to full lines of code. The generative transformer model has been trained on 1.2 billion lines of Python, C#, JavaScript and TypeScript code.

Bader et al. (2019) presents Getafix, a tool for fixing common bugs by learning from previous human-written fixes, for code improvement. It uses hierarchical clustering to group bug fix patterns into a hierarchy from general to specific, and a ranking system based on the context of the code change to suggest the most appropriate fix. Another debugging approach, DeepDebug, is presented in Drain et al. (2021), which has been trained by mining GitHub repositories to detect and fix bugs in Java methods.

Since different software testing techniques are complementary, reveal different types of defects and test different aspects of a program, Lenz et al. (2013) presents for test understanding and improvement an ML-based approach to link test results from different techniques, to cluster test data based on functional similarities, and to generate classifiers according to test objectives, which can be used for test case selection and prioritisation.

Tufano et al. (2022) discusses a test generation approach for writing unit test cases by generating assert statements. The approach uses a transformer model that was first pre-trained on an English text corpus, further semi-supervisedly trained on a large source code corpus, and finally fine-tuned for the task of generating assert statements for unit tests. The assert statements are accurate and increase test coverage.

For test improvement, Zhao et al. (2015) discusses test case prioritisation based on source code changes, software quality metrics, test coverage data, and code coverage-based clustering. It reduces the impact of similar test cases covering the same code and improves fault detection performance.

For the understanding, generation and improvement of configurations for software deployments, Tatineni and Mustyala (2021) explores the role of ML in DevOps for intelligent release management. It suggests combining continuous monitoring, predictions of the likelihood of deployment failures, root cause analysis, and pipeline optimisation to reduce deployment failures and improve release management efficiency and software quality.

Solutions which address several goals and tasks are associated with the SE process: For the improvement of the SE process, Spieker et al. (2017) introduces Retecs, a method for automatically learning test case selection and prioritisation in continuous integration, aimed at minimising the time between code commits and developer feedback on failed tests. Retecs uses reinforcement learning to select and prioritise test cases based on their execution time, previous execution history and failure rates. It effectively learns to prioritise error-prone test cases by following a reward function and analysing past CI cycles.

For the understanding of the SE process, Lin et al. (2021) presents the T-BERT framework for generating trace links between source code and natural language artefacts such as requirements or code issues. It demonstrates superior accuracy and efficiency for software traceability, especially in data-limited environments.

The presented selection of recent research demonstrates the feasibility of the added dimensions and facets of the AI4SE taxonomy compared to Feldt et al. (2018). It is our belief that it is only a matter of time before research results are presented in primary studies for the open aspects. For further reading, Wong et al. (2023); Allamanis et al. (2018) may be used.

5 AI-Assisted Big Models in Software Engineering

In addition to the development of AI applications in SE, the availability of models for SE for MDSE has increased significantly. This is also due to the fact that in addition to the naturalness hypothesis of Allamanis et al. (2018), there is {quoting} the modelling hypothesis: Software is (also) a formalized communication between humans and computers; model corpora, software corpora, and natural language corpora have similar statistical properties; the properties of model corpora can be employed to develop more efficacious software engineering tools.

In their critical review, Hailpern and Tarr (2006) suggest that while MDSE may have potential in the context of large-scale, distributed industrial software development, it is not a guaranteed success. However, the advent of big code and AI has opened up new avenues for integrating AI applications, particularly ML, into existing SE models. As demonstrated by Hamilton et al. (2017), the intrinsic formal graph structures of SE models can be leveraged to facilitate the preparation of models for AI/ML applications. Conversely, there is a necessity for elevated abstraction levels to enhance the efficacy of AI-assisted automated coding Pudari and Ernst (2023).

Refer to caption
Figure 6: AI-assisted Software Modelling

When both factors are considered together, it becomes evident that there are significant opportunities for MDSE.

  1. (1)

    The provision of large amounts of top-down models on coding platforms.

  2. (2)

    The generation of bottom-up models from big code.

  3. (3)

    The formation of Big Models as a basis for more advanced empirical MDSE and further improvements of MDSE.

  4. (4)

    The application of AI methods on Big Models and for Big Models-

  5. (5)

    The shift towards the new paradigm of pair modelling in SE, which altogether will turn into the next generation of software engineering.

Regarding opportunity (1), Stórrle et al. (2014) presented the first entities of the SEMI Software Engineering Models Index, a catalogue of model repositories, and invited further contributions to SEMI. Hebig et al. (2016) used another approach of mining GitHub for projects including Unified Modelling Language (UML) models, which could well be combined with the SEMI initiative. As a result of numerous efforts towards SE model collections, the Lindholmen dataset was developed Robles et al. (2017) and reviewed in Robles et al. (2023).

In view of opportunity (2), extracting representational models from code is in fact the opposite of using models to generate code more efficiently. Since the design, specification, and maintenance of such models can be complex and time-consuming, various techniques have been developed to extract models from code and/or execution traces. These techniques include the extraction of all types of models including for example the extraction of information models, see e.g. Burson et al. (1990); Murphy and Notkin (1996), of structural models, see e.g. Kazman et al. (1998); Guo et al. (1999), and of behavioural models, see e.g. Corbett et al. (2000); Lo et al. (2009). The process of extracting models from code and/or execution traces offers the advantage of retrieving models that are up-to-date with the code/traces, but it also carries the potential disadvantage of mismatch with the model representation/abstraction requirements.

With regard to opportunity (3), there are initial studies on Big Models such as Ho-Quang et al. (2017) which explores the increasing role of modelling, especially in safety-critical software development. It surveyed a range of projects utilising the UML and identified collaboration as the primary rationale for employing models. This is because models facilitate communication and planning for joint implementation efforts within teams, including those who are not directly involved in modelling or are new to the team.

For opportunity (4), the application of AI methods to Big Models is elaborated in first studies like:

  • Shcherban et al. (2021); Mangaroliya and Patel (2020) supporting the classification of UML diagrams and contributing to model understanding.

  • Babur (2016) presenting an approach for comparing and merging model variants by incorporating techniques from information retrieval, natural language processing, and machine learning and contributing to model understanding and improvement.

  • Baki and Sahraoui (2016) describing an approach to learn model transformations from examples of source and target model pairs and being a contribution for model generation.

Last but not least for opportunity (5), the pair modelling paradigm naturally evolves from the insight that pair programming (see e.g. Bipp et al. (2008)) with AI tools based on large code models as pair programmers and partners in software development evolved into a very supportive application of AI in software engineering (see e.g. Dakhel et al. (2023)).

Based on Big Models and large SE models models for ML, i.e. large models for ML trained with SE models, pair modelling will emerge as a new model-driven software development technique in which a software engineer and an AI tool collaborate in software development. The engineer or tool in the role of the driver writes or improves software artefacts, including models, while the other, in the role of the observer, reviews each element of the software artefact as it is typed into an artefact. The one in the role of observer is also in the role of navigator, considering systemic and strategic aspects of software development: The navigator identifies potential improvements and possible upcoming problems that need to be addressed later on if they are not to be avoided. This allows the driver to focus on the development aspects without losing sight of the crosscutting and overarching aspects of software development. Like in pair programming in pair modelling, the observer is used as an assurance of the overall resulting software quality and as a guide to high quality software engineering. The two, the driver and observer/navigator, can switch roles. Indee, the exact interplay of driver and observer/navigator in pair modelling depends on the capabilities of the AI tool and the level of support according to the AI4SE taxonomy, it can provide to the software engineer being in either role.

Hence, as AI is increasingly applied to Big Models, it is necessary to incorporate the specific aspects of modelling into the AI4SE taxonomy, which currently does not distinguish between modelling and other tasks related to software development. At this early stage of development, it is not yet clear whether modelling should be treated as a separate task or like software quality and software security as a cross-cutting concern affecting numerous facets within the AI4SE taxonomy.

6 Summary

After reviewing the state of the practice of model-driven software engineering, big code on open source software platforms, and the application of AI in software engineering, the taxonomy of AI-assisted software engineering is developed and used to categorise recent publications. In addition, the concept of Big Models is defined and explored with a view to future opportunities for further adoption of model-driven software engineering. Finally, recent research on the application of AI approaches to big models is reviewed and categorised.

Future work will further explore the evolution of Big Models and AI applications in MDSE, potentially leading to an update of the new AI4SE taxonomy.

Acknowledgements

The ideas presented in this paper have been developed through constructive dialogue in the Feldafinger Kreis, the German Testing Board and the Association for Software Quality and Education. The author acknowledges that while authored by her, the writing process was aided by AI tools, specifically ResearchRabbit for determining related work, and ChatGPT and DeepL for fine-tuning the wording.

The author has no competing interests to declare that are relevant to the content of this article.

References

  • Allamanis et al. (2018) Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51(4), 1–37 (2018)
  • Apache (2024) Apache: Kibble - suite of tools for collecting, aggregating and visualizing activity in software projects. https://kibble.apache.org/ (Sep 2024), last access Sep. 11, 2024
  • Babur (2016) Babur, Ö.: Statistical analysis of large sets of models. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 888–891 (2016)
  • Bader et al. (2019) Bader, J., Scott, A., Pradel, M., Chandra, S.: Getafix: Learning to fix bugs automatically. Proceedings of the ACM on Programming Languages 3(OOPSLA), 1–27 (2019)
  • Bagheri and Sullivan (2013) Bagheri, H., Sullivan, K.: Bottom-up model-driven development. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 1221–1224, IEEE (2013)
  • Baki and Sahraoui (2016) Baki, I., Sahraoui, H.: Multi-step learning and adaptive search for learning complex model transformations from examples. ACM Transactions on Software Engineering and Methodology (TOSEM) 25(3), 1–37 (2016)
  • Barabas et al. (2017) Barabas, I., Todoruţ, A., Cordoş, N., Molea, A.: Current challenges in autonomous driving. In: IOP conference series: materials science and engineering, vol. 252,1, p. 012096, IOP Publishing (2017)
  • Barenkamp et al. (2020) Barenkamp, M., Rebstadt, J., Thomas, O.: Applications of ai in classical software engineering. AI Perspectives 2(1), 1 (2020)
  • Barstow (1988) Barstow, D.: Artificial intelligence and software engineering. In: Exploring artificial intelligence, pp. 641–670, Elsevier (1988)
  • Bhat et al. (2019) Bhat, M., Tinnes, C., Shumaiev, K., Biesdorf, A., Hohenstein, U., Matthes, F.: Adex: A tool for automatic curation of design decision knowledge for architectural decision recommendations. In: 2019 IEEE International Conference on Software Architecture Companion (ICSA-C), pp. 158–161, IEEE (2019)
  • Bipp et al. (2008) Bipp, T., Lepper, A., Schmedding, D.: Pair programming in software development teams–an empirical study of its benefits. Information and Software Technology 50(3), 231–240 (2008)
  • Bird et al. (2023) Bird, C., Ford, D., Zimmermann, T., Forsgren, N., Kalliamvakou, E., Lowdermilk, T., Gazit, I.: Taking flight with copilot. Communications of the ACM 66(6), 56–62 (2023)
  • Blog (2021) Blog, Z.: How to choose the best code repository for your project. https://huspi.com/blog-open/software-code-repositories/ (Apr 2021), last access Sep. 11, 2024
  • Brooks and Bullet (1987) Brooks, F.P., Bullet, N.S.: No silver bullet. essence and accidents of software engineering. IEEE computer 20(4), 10–19 (1987), doi:10.1109/MC.1987.1663532
  • Burke et al. (2021) Burke, B., Davis, M., Dawson, P.: Hype cycle for emerging technologies, 2021 (2021)
  • Burson et al. (1990) Burson, S., Kotik, G., Markosian, L.: A program transformation approach to automating software re-engineering. In: Proceedings., Fourteenth Annual International Computer Software and Applications Conference, pp. 314–322, IEEE (1990)
  • Ceran et al. (2023) Ceran, A.A., Ar, Y., Tanrıöver, Ö.Ö., Ceran, S.S.: Prediction of software quality with machine learning-based ensemble methods. Materials Today: Proceedings 81, 18–25 (2023)
  • Corbett et al. (2000) Corbett, J.C., Dwyer, M.B., Hatcliff, J., Laubach, S., Păsăreanu, C.S., Robby, Zheng, H.: Bandera: Extracting finite-state models from java source code. In: Proceedings of the 22nd international conference on Software engineering, pp. 439–448 (2000)
  • CrowdStrike (2024) CrowdStrike: External technical root cause analysis — channel file 291. https://www.crowdstrike.com/wp-content/uploads/2024/08/Channel-File-291-Incident-Root-Cause-Analysis-08.06.2024.pdf#page=5.50 (aug 2024), last access Sep. 9, 2024
  • Dakhel et al. (2023) Dakhel, A.M., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M.C., Jiang, Z.M.J.: Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software 203, 111734 (2023)
  • Domingos (2015) Domingos, P.: The master algorithm: How the quest for the ultimate learning machine will remake our world. Basic Books (2015)
  • Drain et al. (2021) Drain, D., Wu, C., Svyatkovskiy, A., Sundaresan, N.: Generating bug-fixes using pretrained transformers. In: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming, pp. 1–8 (2021)
  • Feldt et al. (2018) Feldt, R., de Oliveira Neto, F.G., Torkar, R.: Ways of applying artificial intelligence in software engineering. In: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 35–41 (2018)
  • Fowler (2005) Fowler, M.: Refactoring, a first example. https://staff.cs.utu.fi/staff/jouni.smed/doos_06/material/ (Apr 2005), last access Sep. 12, 2024
  • France and Rumpe (2007) France, R., Rumpe, B.: Model-driven development of complex software: A research roadmap. In: Future of Software Engineering (FOSE’07), pp. 37–54, IEEE (2007)
  • Fraser et al. (2007) Fraser, S.D., Brooks Jr, F.P., Fowler, M., Lopez, R., Namioka, A., Northrop, L., Parnas, D.L., Thomas, D.: No silver bullet" reloaded: Retrospective on" essence and accidents of software engineering. In: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, pp. 1026–1030 (2007), doi:10.1145/1297846.1297973
  • Garcia et al. (2023) Garcia, N.H., Deshpande, H., Wu, R., Kahl, B., Wortmann, A.: Lifting ros to model-driven development: Lessons learned from a bottom-up approach. In: 2023 IEEE/ACM 5th International Workshop on Robotics Software Engineering (RoSE), pp. 31–36, IEEE (2023)
  • Guo et al. (2020) Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., et al.: Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)
  • Guo et al. (1999) Guo, G.Y., Atlee, J.M., Kazman, R.: A software architecture reconstruction method. In: Software Architecture: TC2 First Working IFIP Conference on Software Architecture (WICSA1) 22–24 February 1999, San Antonio, Texas, USA, pp. 15–33, Springer (1999)
  • Gupta and Sundaresan (2018) Gupta, A., Sundaresan, N.: Intelligent code reviews using deep learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18) Deep Learning Day (2018)
  • Hailpern and Tarr (2006) Hailpern, B., Tarr, P.: Model-driven development: The good, the bad, and the ugly. IBM systems journal 45(3), 451–461 (2006)
  • Hamilton et al. (2017) Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017)
  • Hebig et al. (2016) Hebig, R., Quang, T.H., Chaudron, M.R., Robles, G., Fernandez, M.A.: The quest for open source projects that use uml: mining github. In: Proceedings of the ACM/IEEE 19th international conference on model driven engineering languages and systems, pp. 173–183 (2016)
  • Ho-Quang et al. (2017) Ho-Quang, T., Hebig, R., Robles, G., Chaudron, M.R., Fernandez, M.A.: Practices and perceptions of uml use in open source projects. In: 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 203–212, IEEE (2017)
  • IEEE (2022) IEEE: Guideto the software engineering body of knowledge (swebok guide draft v4). https://www.scribd.com/document/715429676/Swebok-v4-Beta-v2022dec31-1 (dec 2022), last access Sep. 9, 2024
  • Kalliamvakou et al. (2014) Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp. 92–101 (2014)
  • Kalliamvakou et al. (2016) Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: An in-depth study of the promises and perils of mining github. Empirical Software Engineering 21, 2035–2071 (2016)
  • Karampatsis et al. (2020) Karampatsis, R.M., Babii, H., Robbes, R., Sutton, C., Janes, A.: Big code!= big vocabulary: Open-vocabulary models for source code. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 1073–1085 (2020)
  • Kazman et al. (1998) Kazman, R., Woods, S.G., Carrière, S.J.: Requirements for integrating software architecture and reengineering models: Corum ii. In: Proceedings fifth working conference on reverse engineering (Cat. No. 98TB100261), pp. 154–163, IEEE (1998)
  • Körner et al. (2014) Körner, S.J., Landhäußer, M., Tichy, W.F.: Transferring research into the real world: How to improve re with ai in the automotive industry. In: 2014 IEEE 1st International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 13–18, IEEE (2014)
  • Lenz et al. (2013) Lenz, A.R., Pozo, A., Vergilio, S.R.: Linking software testing results with a machine learning approach. Engineering Applications of Artificial Intelligence 26(5-6), 1631–1640 (2013)
  • Lin et al. (2021) Lin, J., Liu, Y., Zeng, Q., Jiang, M., Cleland-Huang, J.: Traceability transformed: Generating more accurate links with pre-trained bert models. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 324–335, IEEE (2021)
  • Lo et al. (2009) Lo, D., Mariani, L., Pezzè, M.: Automatic steering of behavioral model inference. In: Proceedings of the 7th Joint Meeting Of The European Software Engineering Conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp. 345–354 (2009)
  • Lv et al. (2018) Lv, T., Yan, P., He, W.: Survey on json data modelling. In: Journal of Physics: Conference Series, vol. 1069, 1, p. 012101, IOP Publishing (2018)
  • Mangaroliya and Patel (2020) Mangaroliya, K., Patel, H.: Classification of reverse-engineered class diagram and forward-engineered class diagram using machine learning. arXiv preprint arXiv:2011.07313 (2020)
  • Markovtsev and Long (2018) Markovtsev, V., Long, W.: Public git archive: a big code dataset for all. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp. 34–37 (2018)
  • McConnell (2004) McConnell, S.: Code complete. Pearson Education (2004)
  • Murphy and Notkin (1996) Murphy, G.C., Notkin, D.: Lightweight lexical source model extraction. ACM Transactions on Software Engineering and Methodology (TOSEM) 5(3), 262–292 (1996)
  • Ortin et al. (2016) Ortin, F., Escalada, J., Rodriguez-Prieto, O.: Big code: New opportunities for improving software construction. J. Softw. 11(11), 1083–1088 (2016)
  • Ozkaya (2023) Ozkaya, I.: Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Software 40(3), 4–8 (2023)
  • Perini et al. (2012) Perini, A., Susi, A., Avesani, P.: A machine learning approach to software requirements prioritization. IEEE Transactions on Software Engineering 39(4), 445–461 (2012)
  • Pudari and Ernst (2023) Pudari, R., Ernst, N.A.: From copilot to pilot: Towards ai supported software development. arXiv preprint arXiv:2303.04142 (2023)
  • Robles et al. (2023) Robles, G., Chaudron, M.R., Jolak, R., Hebig, R.: A reflection on the impact of model mining from github. Information and Software Technology 164, 107317 (2023)
  • Robles et al. (2017) Robles, G., Ho-Quang, T., Hebig, R., Chaudron, M.R., Fernandez, M.A.: An extensive dataset of uml models in github. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 519–522, IEEE (2017)
  • Schieferdecker (2020) Schieferdecker, I.: Responsible software engineering. The future of software quality assurance pp. 137–146 (2020), doi:10.1007/978-3-030-29509-7_11
  • Schieferdecker (2024) Schieferdecker, I.: The power of models for software engineering. In: Steffen, B., Hinchey, M. (eds.) The Combined Power of Research, Education and Dissimination, p. 14 (2024)
  • Selic (2003) Selic, B.: The pragmatics of model-driven development. IEEE software 20(5), 19–25 (2003)
  • Shah et al. (2012) Shah, S.M.A., Morisio, M., Torchiano, M.: An overview of software defect density: A scoping study. In: 2012 19th Asia-Pacific Software Engineering Conference, vol. 1, pp. 406–415, IEEE (2012)
  • Shcherban et al. (2021) Shcherban, S., Liang, P., Li, Z., Yang, C.: Multiclass classification of four types of uml diagrams from images using deep learning. In: SEKE, pp. 57–62 (2021)
  • Softonic (2023) Softonic: Code by the numbers: How many lines of code in popular programs, apps, and video games? https://en.softonic.com/articles/programs-lines-code (apr 2023), last access Sep. 9, 2024
  • Spieker et al. (2017) Spieker, H., Gotlieb, A., Marijan, D., Mossige, M.: Reinforcement learning for automatic test case prioritization and selection in continuous integration. In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, pp. 12–22 (2017)
  • Steffen et al. (2007) Steffen, B., Margaria, T., Nagel, R., Jörges, S., Kubczak, C.: Model-driven development with the jabc. In: Hardware and Software, Verification and Testing: Second International Haifa Verification Conference, HVC 2006, Haifa, Israel, October 23-26, 2006. Revised Selected Papers 2, pp. 92–108, Springer (2007)
  • Stórrle et al. (2014) Stórrle, H., Hebig, R., Knapp, A.: An index for software engineering models (poster). CEUR Workshop Proceedings 1258 (09 2014)
  • Svyatkovskiy et al. (2020) Svyatkovskiy, A., Deng, S.K., Fu, S., Sundaresan, N.: Intellicode compose: Code generation using transformer. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp. 1433–1443 (2020)
  • Tatineni and Mustyala (2021) Tatineni, S., Mustyala, A.: Ai-powered automation in devops for intelligent release management: Techniques for reducing deployment failures and improving software quality. Advances in Deep Learning Techniques 1(1), 74–110 (2021)
  • Tufano et al. (2022) Tufano, M., Drain, D., Svyatkovskiy, A., Sundaresan, N.: Generating accurate assert statements for unit test cases using pretrained transformers. In: Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test, pp. 54–64 (2022)
  • Valdez (1988) Valdez, M.E.P.: A gift from Pandora’s box: The software crisis. Ph.D. thesis, University of Edinburgh (1988), URL https://era.ed.ac.uk/handle/1842/7304
  • Vaupel et al. (2015) Vaupel, S., Strüber, D., Rieger, F., Taentzer, G.: Agile bottom-up development of domain-specific ides for model-driven development. In: FlexMDE MoDELS, pp. 12–21 (2015)
  • Vechev et al. (2016) Vechev, M., Yahav, E., et al.: Programming with “big code”. Foundations and Trends® in Programming Languages 3(4), 231–284 (2016)
  • Weißleder and Lackner (2013) Weißleder, S., Lackner, H.: Top-down and bottom-up approach for model-based testing of product lines. arXiv preprint arXiv:1303.1011 (2013)
  • Wong et al. (2023) Wong, M.F., Guo, S., Hang, C.N., Ho, S.W., Tan, C.W.: Natural language generation and understanding of big code for ai-assisted programming: A review. Entropy 25(6), 888 (2023)
  • Zhao et al. (2015) Zhao, X., Wang, Z., Fan, X., Wang, Z.: A clustering-bayesian network based approach for test case prioritization. In: 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 3, pp. 542–547, IEEE (2015)