Understanding Web Application Workloads and Their Applications: Systematic Literature Review and Characterization

Roozbeh Aghili, Qiaolin Qin, Heng Li, Foutse Khomh Polytechnique Montreal, Canada
{roozbeh.aghili, qiaolin.qin, heng.li, foutse.khomh}@polymtl.ca

Abstract

Web applications, accessible via web browsers over the Internet, facilitate complex functionalities without local software installation. In the context of web applications, a workload refers to the number of user requests sent by users or applications to the underlying system. Existing studies have leveraged web application workloads to achieve various objectives, such as workload prediction and auto-scaling. However, these studies are conducted in an ad hoc manner, lacking a systematic understanding of the characteristics of web application workloads. In this study, we first conduct a systematic literature review to identify and analyze existing studies leveraging web application workloads. Our analysis sheds light on their workload utilization, analysis techniques, and high-level objectives. We further systematically analyze the characteristics of the web application workloads identified in the literature review. Our analysis centers on characterizing these workloads at two distinct temporal granularities: daily and weekly. We successfully identify and categorize three daily and three weekly patterns within the workloads. By providing a statistical characterization of these workload patterns, our study highlights the uniqueness of each pattern, paving the way for the development of realistic workload generation and resource provisioning techniques that can benefit a range of applications and research areas.

Index Terms:

Web applications, Workload patterns, Workload analysis, Time-series clustering

I Introduction

Web applications are delivered via the World Wide Web to users, allowing them to access complex functionality without installing or configuring local software components (except a browser). In the context of web applications, the term workload refers to requests sent by users or applications to the underlying system. Web applications, such as Wikipedia [1], typically monitor the workload data for various purposes, such as user behavior analysis (e.g., [2, 3]) or resource allocation (e.g., [4, 5]). Studying and analyzing these workloads are crucial in understanding the dynamics of user interactions, server responses, and resource utilization for web applications.

Web application workloads are valuable not only for workload prediction (e.g., [6, 7]), auto-scaling (e.g., [8, 9]) and the development of self-healing systems (e.g., [10, 11]), but also for software maintenance activities such as performance optimization (e.g., [12, 13]) and capacity planning (e.g., [14, 15]). Previous efforts have also included literature reviews and survey studies that summarize advancements in closely related areas of workload characterization (e.g., [16, 17]), auto-scaling (e.g., [18, 19]), and workload prediction (e.g., [20]).

Despite the significance of workload data in employing and evaluating these techniques, no existing work has systematically studied the characteristics of web application workloads and their applications. Therefore, in this study, we undertake a Systematic Literature Review (SLR) to identify existing studies utilizing web application workloads. From our SLR, we identify 12 web application workload datasets (worth more than 8.5 years of workloads in total) and study their applications. We then systematically characterize these workloads, providing valuable insights for future research on software maintenance practices, such as realistic workload generation and resource provisioning strategies. Our two Research Questions (RQ) are as follows.

RQ1

How are web application workloads used in existing research? While web application workloads are known to be valuable for various purposes, there is a gap in understanding their usage. This RQ aims to bridge this gap by identifying diverse applications of these workloads. To achieve this, we conduct an SLR examining articles that utilize web application workloads. This SLR involves a comprehensive search across two research databases, followed by article selection and forward and backward snowballing, ultimately resulting in a dataset of 78 articles. We analyze how these workloads are used, revealing current trends and potential limitations.
RQ2

What are the existing patterns in web application workloads? Although web application workloads are crucial for tasks such as workload prediction and auto-scaling, their inherent characteristics remain largely unexplored. This RQ focuses on systematically characterizing these workloads and identifying recurring patterns. Through a thorough examination of web application workloads using clustering techniques, we offer insights that researchers and practitioners can use to develop more accurate and efficient tools and techniques that are customized to the specific characteristics of workload patterns.

Figure 1 provides an overview of our study. Our research is initiated by the identification of 78 articles published between 1995 and 2024. These research articles were selected because they utilized web application workloads in their studies. To identify these papers, we perform a systematic literature search involving three stages: a thorough exploration of research databases, article selection, and snowballing the selected articles to find relevant papers. Once the set of articles is finalized, we identify the web applications they have utilized. We extract the available workloads and employ clustering techniques to characterize these workloads. Characterizing workload patterns of web applications is of importance, offering several advantages, including (1) uncovering hidden patterns among web application workloads, (2) revealing specific characteristics associated with each workload pattern, and (3) enabling the development of load testing tools, workload generators, and resource provisioning techniques based on our findings.

Our work makes several important contributions:

1.

Our SLR offers a comprehensive insight into the applications of web application workloads.
2.

We identified a total of 12 publicly available web application workload datasets through our SLR.
3.

We identified three daily and three weekly workload patterns across different web applications and presented their distinct characteristics. The results of this analysis could guide future work on realistic workload generation and resource provisioning.

We share our replication package¹¹1https://github.com/mooselab/web-app-workloads so that future work can replicate or extend our study.

II Related Work

This work performs an SLR on the usage of web application workloads and explore the characteristics of these workloads. Thus, we discuss related work on the following two aspects.

II-A Systematic Literature Reviews in Similar Context

While no previous work has systematically investigated the usage of web application workloads in the literature, there are existing SLR studies and survey papers on similar domains, including workload characterization (e.g., [16, 17]), auto-scaling (e.g., [18, 19]), and workload prediction (e.g., [20]). Several articles within these domains have leveraged web application workloads to implement and evaluate their systems.

Refer to caption — Figure 1: Overview of our study

Calzarossa et al. [16] presents a survey of the state of the art of workload characterization. They divide workloads in five areas: web applications, social networks, video services, mobile devices, and cloud computing. In the section related to web application workloads, they study 31 research articles. They further divide the web workloads into three sub-domains of shopping services, web robot traffic, and web content. They conclude that workload characterization is a fundamental component for all these three domains. Similarly, Shishira et al. [17] categorize the workloads into four areas of web applications, social networks, video services, etc.

Auto-scaling is the dynamic adjustment of the number of resources allocated to an application based on its workload, ensuring optimal performance and cost efficiency [19]. In [19], Qu et al. review the advancements of web application auto-scaling in cloud systems. While providing a taxonomy and survey, this study does not focus on the specific input data the articles utilized for their auto-scaling techniques. A more recent survey by Singh et al. [18] in the auto-scaling domain reports the experiment phase and the data used by the studied articles. They observe that some articles in their study use web application workloads, while others rely on cloud workloads, synthetic workloads, simulations, or custom testbeds.

Masdari et al. [20] provide a survey of the workload prediction methods. Based on their categorization, the studied papers either use realistic web application workloads, cloud workloads, self-collected workloads, or simulation techniques. Their findings reveal that over 86% of the articles exclusively utilize a single workload dataset to assess their results.

II-B Workload Characterization

Previous research efforts have characterized the cloud-based batch-job workloads such as the Google cluster trace (e.g., [21, 22]) and the Alibaba cluster trace (e.g., [23, 24]). One of the common approaches to characterize workloads is utilizing clustering techniques. Patel et al. [22] apply clustering approaches to the Google cluster trace [25] and Bitbrains workloads [26], comparing K-Means and Gaussian Mixture Model performance for cluster representativeness. Similarly, Shekhawat et al. [21] employ K-Means and six other machine learning algorithms to characterize the workload patterns in the Google cluster trace [25] and Bitbrains workloads [26], considering CPU usage, memory usage, disk usage, and network usage as their workload parameters. In another study, Chen et al. [24] cluster workloads in the Alibaba cluster trace [27] using K-Means and workload features such as job duration, CPU cores, memory utilization, and disk utilization, providing a statistical profile of workload behavior.

In the context of web application workloads, Shahidinejad et al. [28] combine K-Means and the Imperialist Competitive Algorithm (ICA) to classify and analyze web application workloads, including Worldcup98 [29] and NASA [30]. In another work, Chowdhury et al. [3] employ time-series analysis techniques to characterize the YouTube workload [31].

While existing studies have characterized web application workloads (e.g., [28, 32, 3]), they often lack a comprehensive overview of the general characteristics of different workloads. Unlike prior research, focusing on one or two specific workloads, our approach involves identifying and analyzing 12 public web application workloads resulting from an SLR. With this approach, we aim to uncover broad and consistent patterns across these diverse workloads, offering a more comprehensive understanding of web application workload dynamics.

III Systematic Literature Review

III-A Dimensions of the Review

The goal of this review is to provide researchers and practitioners with a structured overview of existing research using web application workloads. Through a meticulous examination of the research literature, we derive several key dimensions. These dimensions provide objective descriptions of different techniques and objectives and organize the literature review. The identified dimensions are as follows.

•

Workloads: Which web application workloads have been utilized in the literature?
•

Techniques and Objectives: What are the overall research objectives of the literature and what types of techniques have been used to achieve them?
•

Trends: What are the temporal trends in workload usage and research objectives over time?

III-B Approach

Through a comprehensive search across two research databases, we identify 762 potential candidate articles. After applying inclusion and exclusion criteria, we narrow down the selection to 41 papers. To ensure a comprehensive review, we employ forward and backward snowballing techniques, adding 33 more articles to our selection. The article selection methodology aligns with the systematic approach recommended by Wohlin et al. [33] and is described in the following.

III-B1 Searching Research Databases

We conduct our literature search on two highly reputable research databases: IEEE library ²²2https://ieeexplore.ieee.org/ and ACM library ³³3https://dl.acm.org/. We aim to identify papers from both conference proceedings and journals, written in English and published between January 2014 and December 2023, a time span of 10 years. To pinpoint the articles that either introduce or utilize web application workloads, we use the following query to search the databases:

[[Title: web application] OR [Title: workload] OR [Title: trace]] AND [Abstract: dataset] AND [E-Publication Date: (2014/01/01 TO 2023/12/31)]

Upon conducting this query on the IEEE library, we identify a total of 545 articles that satisfy our criteria. Simultaneously, our search from the ACM library yields an additional 217 articles after deduplication. Combining papers from both sources, we successfully compile a total of 762 candidate articles.

III-B2 Article Selection

We screen each paper, beginning with its abstract, and then proceed through the entire article until a decision is made based on the inclusion and exclusion criteria.

Inclusion criteria:

•

Papers must be written in English.
•

Papers must be in conference proceedings or journals.
•

Papers must introduce or utilize web application workloads that are publicly available and representative of real-world applications.
•

The utilized workloads must include timestamp information, as this data will enable us to conduct subsequent analysis and characterization of the workloads.

Exclusion criteria:

•

Papers not publicly available.
•

Papers introducing or utilizing private data (e.g., with Non-Disclosure Agreement (NDA) contracts).
•

Papers introducing or utilizing non-web application data such as cloud-based batch-jobs (e.g., Google cluster trace), social networks data, mobile devices workloads, and synthetic workloads. Overall, we exclude articles that leverage data unrelated to user interaction workloads.

After performing the article selection step, we end up selecting 41 out of the 762 candidate articles.

TABLE I: List of web application workloads of different literature

ID	Workload	Duration ^a	# Instances	Description	Freq.	Ref. examples ^d
1	Wikipedia ^b	5.5 years	1.0 T	Wikipedia workload consists of the number of users accessing Wikimedia Foundation articles from January 2018 to August 2023.	23	[6, 34, 35]
2	Calgary	1 year	727 K	Calgary workload includes one year of HTTP requests to the University of Calgary’s Department of Computer Science server in 1994-1995.	13	[36, 37, 38]
3	Saskatchewan	7 months	2.4 M	This workload contains seven months of HTTP requests sent to the University of Saskatchewan’s server in Canada in 1995.	18	[37, 39, 40]
4	Boston	6 months	1.1 M	Boston workload contains HTTP requests sent to the Boston University Computer Science Department from November 1994 to May 1995.	4	[41, 42]
5	Retailrocket	4.5 months	2.8 M	This workload contains HTTP requests to servers of an anonymous real-world e-commerce website over a period of 4.5 months in 2015.	1	[43]
6	Worldcup98	2 months	1.3 B	Worldcup98 workload consists of all requests made to the 1998 World Cup website between 30 April 1998 and 26 July 1998.	29	[8, 43, 44]
7	NASA	2 months	3.5 M	NASA workload contains the requests sent to NASA Kennedy Space Center in Florida, USA, between 1 July 1995 and 31 August 1995.	35	[8, 28, 45]
8	YouTube	44 days	1.5 M	A collection of workloads from a campus network measurement on YouTube traffic between June 2007 and March 2008 spans 10 months, though the actual data covers a period of 44 days.	3	[3, 46, 47]
9	Madrid	1 month	__ ^c	Real web service logs from the Complutense University of Madrid. This dataset was collected hourly throughout the month of May 2018.	1	[48]
10	ClarkNet	14 days	3.3 M	Clarknet was an Internet service provider located in Maryland, USA. This workload consists of two weeks of data in September 1995.	18	[44, 49, 5]
11	EPA	1 day	47.7 K	EPA workload contains one day of requests sent to the EPA server located at North Carolina, USA in August 1995.	1	[50]
12	SDSC	1 day	28.3 K	SDSC workload includes requests to the servers of the San Diego Supercomputer Center in California over a single day in August 1995.	1	[51]

a

This indicates the duration of each workload dataset. Certain literature utilized segments of these durations.
b

Unlike other workloads that have one specific dataset, Wikipedia offers a free API allowing users to download customized data sets. The duration and
instances mentioned here are the data extracted from Wikipedia for this study.
c

Unlike other workloads, this dataset does not provide raw data; instead, it offers preprocessed data.
d

The full list of articles is available in our replication package.

TABLE II: List of different literature objectives with their corresponding techniques

Objective	Utilized Techniques	Freq.	Ref. examples
Resource Management
Load Balancing	Classical Machine Learning, Deep Learning, Time-series/Statistical Analysis, Queueing Theory	4	[52, 38]
Resource Provisioning	Classical Machine Learning, Deep Learning, Optimization, Control Theory, Fuzzy Logic, Queueing Theory	6	[28, 5]
Caching Optimization	Time-series/Statistical Analysis	2	[12, 13]
Workload Analysis
Workload Prediction	Time-series Models, Classical Machine Learning, Deep Learning, Optimization, Time-series/Statistical Analysis, Filtering and Signal Processing, Markov Models, Fuzzy Logic	44	[43, 53, 40]
Workload Classification	Time-series Models, Classical Machine Learning, Optimization, Time-series/Statistical Analysis	4	[54, 28, 2]
Workload Characterization	Optimization, Time-series/Statistical Analysis	13	[55, 56, 32]
Self Adaptation
Auto-scaling	Optimization, Control Theory, Fuzzy logic, Queueing Theory	14	[43, 8]
Self-healing	Optimization, Markov Model	2	[10, 11]
Benchmarking	Time-series Models, Optimization, Control Theory, Time-series/Statistical Analysis, Queueing Theory	6	[57, 36]

III-B3 Forward and Backward Snowballing

To ensure the comprehensiveness of our study and to capture potentially relevant articles, we initiate the forward and backward snowballing phase on the selected 41 papers. During this phase, for each of the selected articles, we inspect the list of papers that have cited them (i.e., forward snowballing) and their reference papers (i.e., backward snowballing) and perform the article selection step (as described in Section III-B2) to decide whether the new article matches with our inclusion and exclusion criteria. After performing this step, we end up having 33 new articles. Combining the articles retrieved through our initial search of research databases and the subsequent snowballing process, we obtain a total of 78 articles, each of which introduces or utilizes public web application workloads.

III-B4 Qualitative Analysis

We manually examine each of the selected articles to extract its dimensions (i.e., workloads, techniques, objectives). This labeling process involves reviewing all sections of each paper to identify the relevant information, though most of this information is typically found in the methodology sections. We use an open coding approach [58] to extract the desired information. To label the articles, the first two authors of the paper (i.e., coders) jointly perform a coding process, determining each article’s workloads, techniques, and objectives. We perform a five-step coding process as follows.

Step 1: Coding. Each coder independently analyzes the first batch of articles (i.e., the initial 37) and assigns labels for each dimension (i.e., workloads, techniques, objectives) of each paper. Multiple labels can be assigned to each dimension.

Step 2: Discussion. The coders share their responses and discuss the labels they created, aiming for a common understanding. We join related labels and refine high-level ones, then revise the coding of the first batch accordingly.

Step 3: Coding. Each coder analyzes the second batch of articles (i.e., the final 37) based on the discussion results.

Step 4: Resolving disagreements. The coders compare their final results from step 3 and discuss any remaining conflicts, attempting to resolve them. If an agreement cannot be reached, the third author makes the final decision.

Step 5: Final revision. In the final stage, we create a mind map from all the produced labels. We then discuss the labels and form a hierarchy, change some labels’ names for clarity, and merge some small categories to be cohesive.

III-B5 Measuring the Reliability

Ensuring reliability is crucial for validating coding results [59]. The coding results are reliable when there is a specific level of agreement between coders, referred to as inter-coder agreement. In this study, we employ Cohen’s kappa [60] as a metric to quantify the reliability of agreements between two coders. We evaluate our coding reliability for the second batch of the articles (i.e., after the discussion session) and achieve a Cohen’s kappa of 0.94. A value of kappa $\geq$ 0.80 indicates a strong agreement [61].

III-C Results

We present the findings of our SLR in three main parts: the analysis of web application workloads, the discussion of objectives and techniques, and the temporal trend analysis.

III-C1 Web Application Workloads

Table I presents workload datasets used in the literature, showcasing details such as duration, data instances, description, frequency of usage in articles, and example references. Our review identifies 12 distinct workloads commonly used in the literature: Wikipedia [1], Worldcup98 [29], NASA [30], Saskatchewan [62], Calgary [63], EPA [64], Clarknet [65], Retailrocket [66], Boston [67], SDSC [68], Youtube [31], and Madrid [69].

Some workloads, such as the NASA dataset, appeared in many articles (35 in total), while others, such as RetailRocket, SDSC, and Madrid, were used much less frequently, each appearing in just one article. These datasets vary significantly in duration and the number of instances. For instance, the WorldCup98 workload, characterized by high demand, consists of 1.3 billion instances. In contrast, the SDSC workload spans only one day with 28 thousand instances.

III-C2 Objectives and Techniques

Table II presents objectives and primary techniques from literature articles, along with their frequency of usage and example references. A more detailed list of objectives and techniques along with all the reference articles can be found in our replication package⁴⁴4https://github.com/mooselab/web-app-workloads. Below we define the objective categories.

The first objective theme is Resource Management, which aims to ensure the efficient allocation, utilization, and optimization of computing resources within a system or network. It includes three objectives: Load Balancing, Resource Provisioning, and Caching Optimization.

Load balancing is the distribution of incoming network traffic or computational tasks across multiple resources to optimize resource utilization and ensure high availability. Various approaches are employed for this purpose. For example, Riska et al. [70] apply queueing theory to evaluate load balancing policies in distributed multi-server systems, modeling each server as a queue using Markov chains.

Resource provisioning is the process of initially allocating the resources and services from a cloud provider to a customer to meet the requirements of applications. Shahidinejad et al. [28] employ classical machine learning models to design their Resource Provisioning system. They propose a system to first cluster the workloads using K-Means and then use Decision Tree Regression (i.e., DTR) to determine scaling decisions for efficient resource provisioning. In another work, Zhou et al. [5] combines deep learning models (i.e., Long Short-Term Memory (i.e., LSTM)) with classical machine learning models (i.e., XGBoost) to provide proactive resource provisioning. They center their work on microservice systems.

Caching optimization is the techniques and strategies aimed at improving the efficiency and effectiveness of caching mechanisms. Time-series and statistical analysis are the main techniques used for this category. For example, Bairavasundaram et al. [12] create an image of the file system cache using inferred disk traffic data and propose an array caching mechanism.

The second objective theme is Workload Analysis. It is the examination and understanding of patterns, trends, and characteristics of system or network usage. This objective also includes three objectives: Workload Prediction, Workload Classification, and Workload Characterization.

Workload prediction involves using historical data and trends to forecast future workload demands and patterns, aiming to predict resource requirements and maintain optimal system performance. This objective represents the most common theme across our literature articles, comprising 45% of the total objectives outlined in these papers. Due to the number of articles aiming to improve workload prediction approaches, various techniques have been used. Kim et al. [53] build an ensemble architecture encompassing various time-series models such as Weighted Moving Average (i.e., WMA) and Auto-Regressive Integrated Moving Average (i.e., ARIMA). Their ensemble architecture chooses a subset of these models based on the characteristics of the upcoming workload. Deep learning is another popular solution. Shi et al. [14] use Deep Reinforcement Learning (i.e., DRL) and Attia et al. [71] use deep learning with a Differential Evolution (i.e., DE) algorithm as their optimization technique to predict the workloads. In another work, Saravanan et al. [39] use markov models in order to filter redundant historical data before passing them to a Generative Adversarial Network (i.e., GAN) model.

Workload classification involves using classification techniques to categorize workloads. In [72], time-series models such as Sample Entropy and classical machine Learning models such as K-Nearest Neighbors (i.e., KNN) are employed to classify workloads. In [2], Urdaneta et al. use time-series analysis techniques to classify Wikipedia workload, emphasizing challenges from its large, heterogeneous dataset.

Workload characterization is the understanding and describing the behavior and characteristics of workloads, including patterns, trends, and variations. The majority of articles within this category rely on time-series and statistical analysis due to the ability of time-series analysis to capture temporal dependencies and identify recurring patterns in workload data.

Self Adaptation is the third objective theme. It is the automatic adjustment of system parameters and resources in response to changing conditions or system faults, aiming to improve performance and reliability without human intervention. It includes two categories of Auto-scaling and Self-healing.

Auto-scaling is the dynamic adjustment of the number of resources allocated to an application based on its workload, ensuring optimal performance and cost efficiency. While resource provisioning focuses on the initial allocation and management of resources, auto-scaling continuously monitors system metrics and automatically adjusts resource levels to match current demand. Control Theory techniques such as Proportional Integrative Derivative (i.e., PID) [73] have been applied to automatically scale out public-cloud resources, utilizing only the customer’s existing knowledge base. In another attempt to create an auto-scaling system, Kumar et al. [74] use queueing theory and design a system that dynamically performs resource corrections at the virtual machine level by considering both underutilization and over-utilization scenarios.

Self-healing systems are designed to autonomously detect, diagnose, and recover from failures or performance degradation without human intervention, ensuring continuous operation and reliability. In both works of [10] and [11], the authors employ a Markov model along an optimization technique (i.e., Stochastic Programming) to design a self-healing system compatible with various adaptation objectives.

As the last objective, Benchmarking evaluates the performance, reliability, and scalability of systems by measuring and comparing their performance. In[75], Papadopoulos et al. propose a performance evaluation framework that utilizes Control Theory and optimization to evaluate different auto-scaling systems. In another work, Kumar et al.[57] evaluate optimization algorithms such as Genetic Algorithm and DE in the task of workload prediction. In a similar attempt, Kumar et al. [36] propose a performance evaluation framework that assesses the performance of six prediction models such as ARIMA and Exponential Smoothing on Workload Prediction.

III-C3 Temporal Trend Analysis

Initially, we compare the publication years of the studied articles with those of the workload datasets they employ. Our findings indicate that a significant proportion of the articles utilize relatively dated workload datasets. Specifically, while the median publication year of the articles is 2018, the median publication year of the datasets is 1995. To provide further insight, over 82% of articles published in 2015 and later rely on datasets that were published in 2000 or earlier. This analysis emphasizes the frequent utilization of older workloads in research studies. Figure 2 presents the scatter plot of this phenomenon.

Subsequently, we analyze the popularity of different objectives over time. Figure 3 presents the cumulative number of articles associated with each objective throughout the years. The analysis highlights workload prediction, auto-scaling, and workload characterization as the most prominent objectives. Particularly, there is a noticeable increase in articles focusing on workload prediction, especially after 2014. However, objectives such as self-healing and caching optimization have received relatively less attention, indicating potential areas for further exploration. This observation aligns with the conclusions drawn by Aghili et al. [76].

IV Workload Characterization

IV-A Motivation

Clustering is an essential process for uncovering existing patterns within data. By identifying these patterns in web application workloads, we can gain valuable insights into the behavior and dynamics of web application systems. This understanding is foundational for various tasks, including the development of workload generators and resource provisioning techniques. Leveraging the insights obtained by clustering workload data, researchers and practitioners can design more accurate and efficient tools and techniques that are customized to the specific characteristics of workload patterns.

IV-B Approach

To analyze web application workloads and identify the existing workload patterns, we start by extracting the workloads. We continue by preprocessing through aggregation, standardization, and smoothing. We then analyze the workloads in terms of variability and burstiness. After that, we employ K-Means to cluster the workloads, and finally, we analyze cluster characteristics.

IV-B1 Web Application Workloads

As mentioned in Section III-C1, the selected 78 articles utilize 12 web application workloads. A comprehensive description of these workloads, along with key features for each, is presented in Table I.

IV-B2 Workload Extraction

Most of these 12 web application workloads are accessible for download. However, some, such as Wikipedia, require scripting for data extraction. For the Wikipedia workload, we develop a script to retrieve data spanning 5.5 years, from January 2018 to August 2023.

After gathering all the workloads, we realize that each of them possesses a unique data format. Some are summary-based and only provide information on the number of users who accessed a particular server (e.g., the Wikipedia workload) per time interval, while the others are in the format of log lines and are event-based (e.g., the Worldcup98 workloads). To illustrate this contrast, we provide a comparative example of the raw data from Wikipedia and Worldcup98 workloads in Table III. We write separate scripts for each of these data types, extracting the number of users interacting with the web application within one-hour time intervals.

IV-B3 Preprocessing the Workloads

The preprocessing phase involves three stages: aggregation, standardization, smoothing.

Aggregation

After extracting the user interaction information for all the workloads, we establish two temporal granularities for our analyses: daily and weekly. These granularities have been widely used for workload-related tasks such as workload generation for load testing and resource provisioning [77, 78, 79].

TABLE III: Comparing two workload types: summary-based (e.g., Wikipedia) and event-based (e.g., Worldcup98)

Wikipedia - 2023/01/01-00
en.m Cristiano_Ronaldo 4888 0
en.m Lionel_Messi 2322 0
en.m Frédéric_Chopin 83 0
Worldcup98
2705258 - - [13/Jul/1998:22:00:01 +0000] ”GET/images/
102378.gif HTTP/1.0” 200 1658
1630377 - - [13/Jul/1998:22:00:01 +0000] ”GET/images/
hm_score_up_line03.gif HTTP/1.0” 200 90

TABLE IV: The schema of daily and weekly granularities

Workload	Day	0	1	2	…	22	23
Wikipedia	2023-01-01	20.7	20.6	18.5	…	27.7	24.9
Wikipedia	2023-01-02	22.3	19.8	19.2	…	27.4	25.3
Wikipedia	2023-01-03	21.5	19.7	19.0	…	26.5	24.0

(a) Daily granularity example extracted from the Wikipedia workload. The numbers are in million.

Workload	Week	M	Tu	…	Sa	Su
Wikipedia	2023-01-01	541.5	533.8	…	527.6	556.1
Wikipedia	2023-01-08	543.1	525.8	…	531.4	581.3
Wikipedia	2023-01-15	567.2	549.5	…	518.2	566.9

(b) Weekly granularity example extracted from the Wikipedia workload. The numbers are in million.

Daily Granularity: Analyzing daily granularity allows for a micro-level understanding of user behavior and system performance. This examination offers detailed insights into user engagement, system load, and operational peaks over a 24-hour cycle. This granularity enables the detection of short-term trends, such as hourly spikes in traffic or variations in user engagement between weekdays and weekends.

We aggregate workloads in one-hour time intervals. Thus, if a single workload spans one day, it will be represented in 24 instances, each corresponding to the number of user accesses in a 1-hour window. We provide an example of the daily granularity in Table IV(a), where each row corresponds to one day of data. Figure 4 presents the Wikipedia and Worldcup98 workloads in daily granularity, where each data point signifies the workload for one hour. As evident, patterns observed within these two days exhibit both similarities and differences, and the ultimate objective of this work is to uncover and understand these patterns within web application workloads.

Weekly Granularity: Weekly granularity offers a broader perspective, capturing trends and variations that span across different days of the week. Analyzing these patterns allows for the identification of cyclical behaviors and longer-term trends. For instance, it reveals differences in user engagement and system load along different seasons of the year.

To analyze weekly granularity, we aggregate the workloads in time intervals of one day. In this case, if a workload encompasses a duration of one week, it will be reflected in seven instances, each representing the number of user accesses in a 1-day window. We provide an example of the weekly granularity in Table IV(b), where each row corresponds to one week of data. Figure 5 shows the Wikipedia and Worldcup98 workloads in weekly granularity, where each data point signifies the workload for one day.

Standardization

We standardize our data to ensure uniformity across different workloads, especially when they exhibit significant load variations. For example, in Figure 5, we observe a considerable difference in scale between Wikipedia and Worldcup98 workloads: Wikipedia’s workload scale is more than 10 times greater compared to that of Worldcup98. Thus, standardization is essential to ensure that the subsequent clustering is not biased by different workload intensities.

Z-score is a widely used technique that adjusts data to have a mean of 0 and a standard deviation of 1. The formula applies to each data point in both daily and weekly granularities.

z=\frac{X(t)-\mu}{\sigma}

(1)

where $\text{X}(t)$ represents the original data value at time $t$ , and $\mu$ and $\sigma$ represent the mean and standard deviation, respectively.

Smoothing

Smoothing is employed to reduce noise and fluctuations in the workload data, thereby enhancing its clarity and facilitating trend identification. One commonly used smoothing technique is Exponential Moving Average (EMA). The main benefits of using the exponential smoothing method are its low cost and ease of application [80]. It assigns exponentially decreasing weights to past observations, ensuring that recent data points have a higher influence on the smoothed value compared to older ones. The EMA smoothing formula is as follows:

\text{EMA}(t)=\alpha\times\text{X}(t)+(1-\alpha)\times\text{EMA}(t-1)

(2)

where $\text{EMA}(t)$ represents the smoothed value at time $t$ , $\text{X}(t)$ denotes the original data value at time $t$ , and $\alpha$ is the smoothing factor, typically a value between 0 and 1, determining the weight assigned to the current observation.

IV-B4 Variability and burstiness analysis

Variability and burstiness analysis is performed before clustering to understand the inherent dynamics of web application workloads. This analysis helps identify distribution differences that might influence clustering results. By examining raw datasets for variability and burstiness after aggregation but before standardization, we retain the original data characteristics.

Variability is defined as the extent to which data points in a time series differ from each other and from the mean value [81]. We calculate variability by measuring the coefficient of variation (CV) of the workload intensities within one day (i.e., 24 hourly workload intensity values) or one week (i.e., seven daily intensity values) using the below formula:

\text{CV}=\frac{\sigma}{\mu}

(3)

where $\sigma$ is the standard deviation and $\mu$ is the mean.

Burstiness refers to the tendency of a time series to exhibit sudden, irregular increases in activity or intensity over short periods. To quantify burstiness, we calculate the mean and standard deviation of workloads per day and per week. Burstiness is then determined using the formula [82]:

\text{Burstiness}=\frac{\sigma-\mu}{\sigma+\mu}

(4)

Burstiness ranges from -1 to 1, where 1 indicates high irregularity, 0 indicates no significant burstiness, and -1 suggests a regular pattern. This quantification highlights periods with significant deviations from overall trends.

IV-B5 Clustering the Workloads

To investigate general workload patterns across a comprehensive dataset, we employ workload clustering. We first develop a script to combine the standardized and smoothed daily and weekly workloads from all 12 workload datasets. This combination process results in two unified datasets: the daily workload dataset and the weekly workload dataset. This combination enables us to construct a unified dataset that can effectively capture robust and generalized patterns. The daily dataset contains 3191 data points, each representing one day, and the weekly dataset contains 466 data points, each representing one week.

After constructing the combined daily and weekly datasets, we employ K-Means clustering. K-Means clustering is a widely recognized and employed clustering method, designed to divide n observations into k clusters, in which analyzed data sets are partitioned in relation to the selected parameters and grouped around cluster centroids [83].

Determining the optimal number of clusters (a.k.a., k value) is critical in applying the K-Means algorithm. Various methods exist for this purpose, including the elbow method [84] and the silhouette score [85]. In this study, we opt for the silhouette score, which measures the cohesion and separation of clusters, with values ranging from -1 to 1. A higher value indicates well-separated clusters and a near-zero or negative score suggests overlapping or incorrectly assigned data points.

We experiment with Euclidean [86], Dynamic Time Warping (DTW) [87], and Soft-DTW [88] distance metrics to determine the most suitable metric for our data. Our evaluation indicates that Euclidean distance yieldes the most meaningful cluster separations based on the silhouette score. We execute the K-Means algorithm for k values ranging from 1 to 20 and determine the optimal k based on the silhouette score. Additionally, to validate our findings, we visually inspect the clustering results using t-SNE illustration [89].

IV-B6 Analyzing Cluster Characteristics

Beyond identifying workload patterns, understanding their underlying characteristics and interrelationships is crucial. We analyze the clustering results from various perspectives: analyzing centroids, studying associations between daily and weekly patterns, and exploring the time dependence of workload patterns.

Centroid Analysis

A centroid is a representative point that summarizes the central tendency of the data and provides insight into its distribution or trend over time. In our analysis, we compute centroid values for each daily and weekly pattern, aiming to capture the characteristics of each cluster’s temporal behavior. To effectively model these centroids, we use Polynomial Models, specifically Quadratic and Cubic polynomials, chosen for their ability to capture the nuanced shapes and fluctuations observed in workload centroids. The quadratic model is mathematically expressed as:

at^{2}+bt+c=0

(5)

whereas the cubic model is expressed as:

at^{3}+bt^{2}+ct+d=0

(6)

where $a$ , $b$ , $c$ , and $d$ represent the coefficients of the models, and $t$ denotes the time variable.

When fitting the polynomial models, we optimize the algorithm using the Levenberg-Marquardt algorithm [90]. This method continuously optimizes the model parameters to reduce the total squared differences between the model’s predictions and the real data points. By doing so, this objective function guarantees that the adapted models precisely choose the temporal dynamics observed within the clusters.

Association between daily and weekly patterns

We determine the associations between daily and weekly patterns by calculating the presence of daily patterns within each weekly pattern. Specifically, for each weekly pattern, we calculate the percentage of the instances associated with each daily pattern. Through quantifying the frequency of daily patterns within weekly patterns, our objective is to unveil the hierarchical arrangement of temporal dynamics. This analytical strategy facilitates the interpretation of temporal associations, hence providing detailed insights into the patterns.

Time dependence of workload patterns

We explore the association between workload patterns and time. For daily patterns, we consider their association with days of the week (i.e., weekdays or weekends), and for weekly patterns, we consider their association with seasons of the year. Specifically, we calculate the percentage of each workload pattern’s presence within weekdays/weekends or each season.

IV-C Results

We first present the variability and burstiness analysis. Then, we provide the clustering findings in four parts: clustering patterns, centroid analysis, association between daily and weekly patterns, and time dependence of workload patterns.

IV-C1 Variability and burstiness analysis

Figure 6 illustrates the variability analysis for daily and weekly granularities across each workload. The number of workloads depicted in this figure (from 1 to 12) corresponds to the “ID” column listed in Table I. A CV close to 0 indicates low variability, suggesting stable workloads, while a CV greater than 1 suggests high relative variability. Figure 6 reveals two key observations: 1. Daily variability is significantly higher than weekly variability, and 2. Although most workloads exhibit relatively stable workloads, high variability is observed in two daily workloads (i.e., Boston and EPA) and one weekly workload (i.e., YouTube).

Figure 7 presents the burstiness analysis for daily and weekly granularities. Similar to variability findings, daily workloads exhibit higher burstiness. Specifically, while none of the weekly workloads exhibit burstiness values exceeding 0.5, two of the daily workloads, Boston and EPA, reach this threshold. The Wikipedia workload shows the lowest burstiness, nearing -1, indicating regular patterns. This regularity may result from its large global user base, which balances bursts across different regions.

IV-C2 Clustering Patterns

The clustering results are presented in Figure 8, showcasing the identified patterns. As mentioned in Section IV-B5, we obtain 3191 datapoints at the daily granularity of web application workloads. After clustering, we identify three distinct patterns. Specifically, Figure 8(a) illustrates the first cluster (i.e., D1) with 2262 instances, Figure 8(c) shows the second cluster (i.e., D2) with 406 instances, and Figure 8(e) depicts the third cluster (i.e., D3) with 523 instances. The x-axis of these figures represents a 24-hour day.

As discussed in Section IV-B5, 466 datapoints exist at the weekly granularity of web application workloads. After clustering, we uncover three patterns presented in Figures 8(b), 8(d), and 8(f), and are donated as W1, W2, and W3, with 283, 64, and 119 instances, respectively. The x-axis of these figures corresponds to a 7-day week, commencing from 1 (Monday) and concluding at 7 (Sunday).

Looking at daily and weekly patterns, it is evident that they exhibit unique patterns while having commonalities. Most of the clusters (i.e., D1, D2, D3, W1, W3) are non-monotonic and have curvy shapes. On the other hand, W2 has a sublinear shape with a slightly ascending pattern. All the patterns show a distinct peak and low period, yet the rate of ascending/descending to/from these peaks differs.

IV-C3 Centroid Analysis

Figure 9 shows the centroids of daily and weekly patterns. These centroids serve as fundamental representations of the underlying temporal dynamics captured within each cluster. Utilizing both quadratic polynomial and cubic polynomial approaches, we discover that the cubic polynomial model best fits our daily clusters and can successfully model the underlying nuances. On the other hand, for weekly patterns, due to their simpler shapes, the quadratic model is enough to fit the centroids. The coefficients of the cubic model (for daily patterns) and the quadratic model (for weekly patterns) are presented in Figure 9.

Complementing our mathematical modeling, we assign descriptive names to the clusters based on their centroid patterns.

Daytime active with rapid decline (D1): Substantial activity during daytime hours with a rapid decrease in activity from peak to off-peak hours.

Nighttime active (D2): Primary activity during the nighttime with gradual workload changes.

Daytime active with gradual decline (D3): Substantial activity during daytime hours with a gradual decrease in activity from peak to off-peak hours.

Weekday Active (W1): Higher activity on weekdays with consistent decrease.

Weekend Active (W2): Increased activity on weekends with steady rise.

Midweek Active (W3): Activity concentrated in midweek, exhibiting smooth changes.

IV-C4 Association between daily and weekly patterns

Table V shows the relative frequency of daily and weekly patterns. Notably, D1 emerges as the predominant daily pattern, while W1 stands out as the most common weekly pattern. Moreover, the combination of D1 and W1 represents the most frequent occurrence, encompassing over 43% of the workloads. In Table V, we include the count of workload datasets associated with each pattern within parentheses. For instance, the combination of D1 and W1 appears in six workload datasets. This observation suggests that the identified patterns are not specific to individual datasets, rather, they exist in multiple workload datasets.

TABLE V: Relative frequency of daily and weekly clusters, expressed in percentage. The numbers in parentheses indicate the frequency of these patterns across workload datasets.

	D1	D2	D3	Total
W1	43.6 (6)	2.4 (5)	3.8 (7)	49.8 (9)
W2	7.3 (8)	2.7 (9)	3.2 (6)	13.2 (10)
W3	7.8 (7)	9.4 (10)	19.8 (9)	37 (12)
Total	58.7 (9)	14.5 (10)	26.8 (11)	100 (12)

IV-C5 Time dependence of workload patterns

We evaluate how time affects workload patterns. Figure 10 illustrates each pattern’s distribution across different time periods. Figure 10(a) compares patterns between weekdays and weekends for each daily cluster. Analyzing the figure reveals that while D1 follows a typical distribution, D2 and D3 diverge from this trend, with D2 showing more weekend data and D3 displaying a higher proportion of weekday data, suggesting variations in workload patterns between weekdays and weekends.

In Figure 10(b), we examine the workload pattern distribution across weekly clusters throughout the seasons. Similar to daily patterns, there are noticeable variations in the distribution of weekly patterns across different seasons. Specifically, cluster W1 shows a higher proportion during the Summer and Winter periods, whereas cluster W3 displays more variability, with a notable increase in percentage during the Fall season.

V Discussion

Random, steady, or linearly increased/decreased workloads should be replaced by polynomially evolving workloads in realistic workload generation. For synthetic workload generation, researchers commonly use steady (i.e., constant level of activity) or linear (i.e., linear step-wise increase/decrease of the level of activity to model the light/normal/peak usage) patterns [78]. These patterns have been extensively adopted in existing work (e.g., [20, 73]). Some studies also utilize random workloads (e.g., [91, 92]). However, our RQ2 findings reveal that real-world web application workloads exhibit non-monotonic and non-linear (polynomial) behavior. Thus, we recommend that future studies consider polynomial workload patterns instead of random, linear, or steady ones. The polynomial models that we derived in this work can be directly used to guide the design of realistic workloads.

Resource provisioning strategies can leverage the identified daily and weekly workload patterns to achieve simpler and more robust resource allocations. Existing proactive resource provisioning strategies usually rely on predictive models to make provisioning decisions (e.g., [28, 5]). However, such predictive models often suffer from the challenges of interpretability and short prediction windows [93]. Practitioners can leverage the characterized workload patterns in our work to enhance the simplicity and robustness of their resource provisioning strategies. For example, they can progressively increase or decrease their provisioned resources based on a polynomial pattern (the parameters of the polynomial can be periodically learned from their workload data).

We call for the sharing of newer web application workload datasets. Our findings indicate a tendency to use outdated workload datasets in the literature. This reliance may fail to reflect the current dynamics of real-world scenarios due to significant changes in web technologies and the exponential growth of data in recent years. We encourage researchers to utilize more up-to-date workload datasets such as Wikipedia, which better capture the behaviors of modern users. Additionally, we urge researchers and practitioners to extract and share more recent web application workloads publicly.

VI Threats to Validity

External validity. We conduct a thorough review of web application workloads, analyzing 78 articles that leverage them. To mitigate the risk of missing some studies, we conduct a systematic approach including keyword search, forward and backward snowballing, and manual analysis. However, there’s a possibility of missing some studies, such as studies based on private datasets or non-English publications.

Internal validity. Internal validity may be threatened by the assumption that daily and weekly patterns adequately capture the temporal dynamics of web application workloads. It is possible that different temporal sequences, beyond daily and weekly, could reveal alternative patterns that were not considered. We deliberately focus on daily and weekly patterns as they are widely recognized temporal units in the context of web applications. Our analysis remains open to future exploration of other temporal granularities.

Construct validity. We use the K-Means algorithm for clustering workload patterns, known for its simplicity and effectiveness, with widespread usage in previous studies (e.g., [28, 24]). However, the choice of clustering algorithm can impact results. We encourage future research to explore alternative clustering algorithms. Moreover, the selection of the number of clusters may affect the quality and insights of the outcomes. To minimize bias, we employ a combination of quantitative metrics (i.e., silhouette score) and qualitative techniques (i.e., visualization) to determine the optimal number of clusters.

VII Conclusion

In this study, we perform a systematic literature review to understand the utilization and characteristics of web application workloads. Using a systematic approach, we identify 78 articles leveraging web application workloads and the 12 public workload datasets they utilize. While we observe that a wide spectrum of studies repetitively leverages these workload data for resource management, workload analysis, self-adaption, and benchmarking, we also notice a significant reliance on dated datasets. Through the characterization of the 12 identified workload datasets at daily and weekly granularities, we uncover three daily and three weekly patterns. Using statistical modeling, we find that these patterns display polynomial (non-monotonic and non-linear) behaviors. Future work can use the insights gained from our characterization in realistic workload generation and resource provisioning strategies, ultimately leading to more efficient software maintenance practices such as performance optimization and capacity planning. Another extension of our study could be the possibility of exploring different temporal granularities beyond daily and weekly patterns (e.g., seasonal or yearly patterns).

References

[1] W. Foundation, “Analytics datasets: Pageviews,” n.d., accessed on April 8, 2024. [Online]. Available: https://dumps.wikimedia.org/other/pageviews/readme.html
[2] G. Urdaneta, G. Pierre, and M. Van Steen, “Wikipedia workload analysis for decentralized hosting,” Computer Networks, vol. 53, no. 11, pp. 1830–1845, 2009.
[3] S. A. Chowdhury and D. J. Makaroff, “Category-based user interaction with online user-generated videos: workload characterization.” in CASCON, 2014, pp. 367–370.
[4] R. N. Calheiros, R. Ranjan, and R. Buyya, “Virtual machine provisioning based on analytical performance and qos in cloud computing environments,” in 2011 International Conference on Parallel Processing. IEEE, 2011, pp. 295–304.
[5] D. Zhou, H. Chen, K. Shang, G. Cheng, J. Zhang, and H. Hu, “Cushion: A proactive resource provisioning method to mitigate slo violations for containerized microservices,” IET Communications, vol. 16, no. 17, pp. 2105–2122, 2022.
[6] Z. Amekraz and M. Y. Hadi, “An adaptive workload prediction strategy for non-gaussian cloud service using arma model with higher order statistics,” in 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 2018, pp. 646–651.
[7] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud using predictive models for workload forecasting,” in 2011 IEEE 4th International Conference on Cloud Computing. IEEE, 2011, pp. 500–507.
[8] J. Dogani, F. Khunjush, and M. Seydali, “K-agrued: A container autoscaling technique for cloud-based web applications in kubernetes using attention-based gru encoder-decoder,” Journal of Grid Computing, vol. 20, no. 4, p. 40, 2022.
[9] M. Imdoukh, I. Ahmad, and M. G. Alfailakawi, “Machine learning-based auto-scaling for containerized applications,” Neural Computing and Applications, vol. 32, pp. 9745–9760, 2020.
[10] G. A. Moreno, J. Cámara, D. Garlan, and B. Schmerl, “Flexible and efficient decision-making for proactive latency-aware self-adaptation,” ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 13, no. 1, pp. 1–36, 2018.
[11] ——, “Efficient decision-making under uncertainty for proactive self-adaptation,” in 2016 IEEE International Conference on Autonomic Computing (ICAC). IEEE, 2016, pp. 147–156.
[12] L. N. Bairavasundaram, M. Sivathanu, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “X-ray: A non-invasive exclusive caching mechanism for raids,” ACM SIGARCH Computer Architecture News, vol. 32, no. 2, p. 176, 2004.
[13] P. Barford, A. Bestavros, A. Bradley, and M. Crovella, “Changes in web client access patterns: Characteristics and caching implications,” World Wide Web, vol. 2, pp. 15–28, 1999.
[14] T. Shi, H. Ma, G. Chen, and S. Hartmann, “Auto-scaling containerized applications in geo-distributed clouds,” IEEE Transactions on Services Computing, 2023.
[15] M. Abdullah, W. Iqbal, A. Mahmood, F. Bukhari, and A. Erradi, “Predictive autoscaling of microservices hosted in fog microdata center,” IEEE Systems Journal, vol. 15, no. 1, pp. 1275–1286, 2020.
[16] M. C. Calzarossa, L. Massari, and D. Tessera, “Workload characterization: A survey revisited,” ACM Computing Surveys (CSUR), vol. 48, no. 3, pp. 1–43, 2016.
[17] S. Shishira, A. Kandasamy, and K. Chandrasekaran, “Workload characterization: Survey of current approaches and research challenges,” in Proceedings of the 7th international conference on computer and communication technology, 2017, pp. 151–156.
[18] P. Singh, P. Gupta, K. Jyoti, and A. Nayyar, “Research on auto-scaling of web applications in cloud: survey, trends and future directions,” Scalable Computing: Practice and Experience, vol. 20, no. 2, pp. 399–432, 2019.
[19] C. Qu, R. N. Calheiros, and R. Buyya, “Auto-scaling web applications in clouds: A taxonomy and survey,” ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1–33, 2018.
[20] M. Masdari and A. Khoshnevis, “A survey and classification of the workload forecasting methods in cloud computing,” Cluster Computing, vol. 23, no. 4, pp. 2399–2424, 2020.
[21] V. S. Shekhawat, A. Gautam, and A. Thakrar, “Datacenter workload classification and characterization: An empirical approach,” in 2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS). IEEE, 2018, pp. 1–7.
[22] E. Patel and D. S. Kushwaha, “Clustering cloud workloads: K-means vs gaussian mixture model,” Procedia computer science, vol. 171, pp. 158–167, 2020.
[23] S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y. Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” in Proceedings of the ACM Symposium on Cloud Computing, 2021, pp. 412–426.
[24] W. Chen, K. Ye, Y. Wang, G. Xu, and C.-Z. Xu, “How does the workload look like in production cloud? analysis and clustering of workloads on alibaba cluster trace,” in 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2018, pp. 102–109.
[25] G. LLC, “Borg cluster traces from google,” 2019, accessed on April 8, 2024. [Online]. Available: https://github.com/google/cluster-data
[26] B. I. S. Inc, “Bitbrains dataset,” accessed on April 8, 2024. [Online]. Available: http://gwa.ewi.tudelft.nl/datasets/gwa-t-12-bitbrains
[27] A. Group, “Cluster data collected from production clusters in alibaba for cluster management research,” 2022, accessed on April 8, 2024. [Online]. Available: https://github.com/alibaba/clusterdata
[28] A. Shahidinejad, M. Ghobaei-Arani, and M. Masdari, “Resource provisioning using workload clustering in cloud computing environment: a hybrid approach,” Cluster Computing, vol. 24, no. 1, pp. 319–342, 2021.
[29] L. B. N. Laboratory, “Worldcup98 trace,” n.d., accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/WorldCup.html
[30] ——, “Nasa trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
[31] Y. G. Michael Zink, Kyoungwon Suh and J. Kurose, “Youtube traces from the campus network,” 2008, accessed on April 8, 2024. [Online]. Available: https://traces.cs.umass.edu/index.php/Network/Network
[32] M. F. Arlitt and C. L. Williamson, “Web server workload characterization: The search for invariants,” ACM SIGMETRICS Performance Evaluation Review, vol. 24, no. 1, pp. 126–137, 1996.
[33] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén, Experimentation in software engineering. Springer Science & Business Media, 2012.
[34] T. Wu, M. Pan, and Y. Yu, “A long-term cloud workload prediction framework for reserved resource allocation,” in 2022 IEEE International Conference on Services Computing (SCC). IEEE, 2022, pp. 134–139.
[35] V. K. Jayakumar, S. Arbat, I. K. Kim, and W. Wang, “Cloudbruno: A low-overhead online workload prediction framework for cloud computing,” in 2022 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2022, pp. 188–198.
[36] J. Kumar and A. K. Singh, “Performance assessment of time series forecasting models for cloud datacenter networks’ workload prediction,” Wireless Personal Communications, vol. 116, no. 3, pp. 1949–1969, 2021.
[37] ——, “Cloud datacenter workload estimation using error preventive time series forecasting models,” Cluster Computing, vol. 23, no. 2, pp. 1363–1379, 2020.
[38] F. Ebadifard and S. M. Babamir, “Autonomic task scheduling algorithm for dynamic workloads through a load balancing technique for the cloud-computing environment,” Cluster Computing, vol. 24, pp. 1075–1101, 2021.
[39] G. Saravanan and A. Santhosh Babu, “Workload prediction for enhancing power efficiency of cloud data centers using optimized self-attention-based progressive generative adversarial network,” International Journal of Communication Systems, vol. 37, no. 1, p. e5634, 2024.
[40] R. Karthikeyan, V. Balamurugan, R. Cyriac, and B. Sundaravadivazhagan, “Cosco2: Ai-augmented evolutionary algorithm based workload prediction framework for sustainable cloud data centers,” Transactions on Emerging Telecommunications Technologies, vol. 34, no. 1, p. e4652, 2023.
[41] C. Cunha, A. Bestavros, and M. Crovella, “Characteristics of www client-based traces,” Technical Report TR-95-010, Boston University Department of Computer Science, Tech. Rep., 1995.
[42] M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: Evidence and possible causes,” IEEE/ACM Transactions on networking, vol. 5, no. 6, pp. 835–846, 1997.
[43] A. Bauer, N. Herbst, S. Spinner, A. Ali-Eldin, and S. Kounev, “Chameleon: A hybrid, proactive auto-scaling mechanism on a level-playing field,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 4, pp. 800–813, 2018.
[44] F. Tahir, M. Abdullah, F. Bukhari, K. M. Almustafa, and W. Iqbal, “Online workload burst detection for efficient predictive autoscaling of applications,” IEEE Access, vol. 8, pp. 73 730–73 745, 2020.
[45] Z. Amekraz and M. Y. Hadi, “Canfis: a chaos adaptive neural fuzzy inference system for workload prediction in the cloud,” IEEE Access, vol. 10, pp. 49 808–49 828, 2022.
[46] M. Zink, K. Suh, Y. Gu, and J. Kurose, “Characteristics of youtube network traffic at a campus network–measurements, models, and implications,” Computer networks, vol. 53, no. 4, pp. 501–514, 2009.
[47] ——, “Watch global, cache local: Youtube network traffic at a campus network: measurements and implications,” in Multimedia Computing and Networking 2008, vol. 6818. SPIE, 2008, pp. 35–47.
[48] R. Moreno-Vozmediano, R. S. Montero, E. Huedo, and I. M. Llorente, “Efficient resource provisioning for elastic cloud services based on machine learning techniques,” Journal of Cloud Computing, vol. 8, no. 1, pp. 1–18, 2019.
[49] M. Abdullah, W. Iqbal, J. L. Berral, J. Polo, and D. Carrera, “Burst-aware predictive autoscaling for containerized microservices,” IEEE Transactions on Services Computing, vol. 15, no. 3, pp. 1448–1460, 2020.
[50] J. J. Prevost, K. Nagothu, B. Kelley, and M. Jamshidi, “Prediction of cloud data center networks loads using stochastic and neural models,” in 2011 6th International Conference on System of Systems Engineering. IEEE, 2011, pp. 276–281.
[51] D. G. Feitelson, “Metrics for mass-count disparity,” in 14th IEEE International Symposium on Modeling, Analysis, and Simulation. IEEE, 2006, pp. 61–68.
[52] Y. Lei, Y. Gong, S. Zhang, and G. Li, “Research on scheduling algorithms in web cluster servers,” Journal of Computer Science and Technology, vol. 18, pp. 703–716, 2003.
[53] I. K. Kim, W. Wang, Y. Qi, and M. Humphrey, “Cloudinsight: Utilizing a council of experts to predict future cloud application workloads,” in 2018 IEEE 11th international conference on cloud computing (CLOUD). IEEE, 2018, pp. 41–48.
[54] N. R. Herbst, N. Huber, S. Kounev, and E. Amrehn, “Self-adaptive workload classification and forecasting for proactive resource provisioning,” in Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering, 2013, pp. 187–198.
[55] V. Jaiman, S. B. Mokhtar, V. Quéma, L. Y. Chen, and E. Rivìere, “Héron: Taming tail latencies in key-value stores under heterogeneous workloads,” in 2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS). IEEE, 2018, pp. 191–200.
[56] M. Arlitt and T. Jin, “A workload characterization study of the 1998 world cup web site,” IEEE network, vol. 14, no. 3, pp. 30–37, 2000.
[57] J. Kumar and A. K. Singh, “Performance evaluation of metaheuristics algorithms for workload prediction in cloud environment,” Applied Soft Computing, vol. 113, p. 107895, 2021.
[58] S. H. Khandkar, “Open coding,” University of Calgary, vol. 23, p. 2009, 2009.
[59] R. Artstein and M. Poesio, “Inter-coder agreement for computational linguistics,” Computational linguistics, vol. 34, no. 4, pp. 555–596, 2008.
[60] J. Cohen, “A coefficient of agreement for nominal scales,” Educational and psychological measurement, vol. 20, no. 1, pp. 37–46, 1960.
[61] M. L. McHugh, “Interrater reliability: the kappa statistic,” Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012.
[62] L. B. N. Laboratory, “Saskatchewan trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/Sask-HTTP.html
[63] ——, “Calgary trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/Calgary-HTTP.html
[64] ——, “Epa trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/EPA-HTTP.html
[65] ——, “Clarknet trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html
[66] R. Rocket, “Retailrocket trace,” 2015, accessed on April 8, 2024. [Online]. Available: https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset
[67] L. B. N. Laboratory, “Boston trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/BU-Web-Client.html
[68] ——, “Sdsc trace,” 1995, accessed on April 8, 2024. [Online]. Available: https://ita.ee.lbl.gov/html/contrib/SDSC-HTTP.html
[69] C. U. of Madrid, “Complutense university of madrid trace,” 2018, accessed on April 8, 2024. [Online]. Available: https://drive.google.com/file/d/1a94djn7dRQAhciPsBk1SLIzsfC2EuPf4/view
[70] A. Riska, E. Smirni, and G. Ciardo, “Analytic modeling of load balancing policies for tasks with heavy-tailed distributions,” 2000.
[71] M. Attia, M. Arafa, E. Sallam, and M. Fahmy, “Application of an enhanced self-adapting differential evolution algorithm to workload prediction in cloud computing,” Int. J. Inf. Technol. Comput. Sci., vol. 11, no. 8, pp. 33–40, 2019.
[72] A. Ali-Eldin, J. Tordsson, E. Elmroth, and M. Kihl, “Workload classification for efficient auto-scaling of cloud resources,” Department of Computer Science, Umea University, Umea, Sweden, Tech. Rep, vol. 2, 2013.
[73] V. Persico, D. Grimaldi, A. Pescape, A. Salvi, and S. Santini, “A fuzzy approach based on heterogeneous metrics for scaling out public clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 8, pp. 2117–2130, 2017.
[74] D. Kumar and N. K. Gondhi, “A qos-based reactive auto scaler for cloud environment,” in 2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS). IEEE, 2017, pp. 19–23.
[75] A. V. Papadopoulos, A. Ali-Eldin, K.-E. Årzén, J. Tordsson, and E. Elmroth, “Peas: A performance evaluation framework for auto-scaling strategies in cloud applications,” ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 1, no. 4, pp. 1–31, 2016.
[76] R. Aghili, H. Li, and F. Khomh, “Studying the characteristics of aiops projects on github,” Empirical Software Engineering, vol. 28, no. 6, p. 143, 2023.
[77] C. Fehling, F. Leymann, R. Retter, W. Schupeck, and P. Arbitter, Cloud computing patterns: fundamentals to design, build, and manage cloud applications. Springer, 2014, vol. 545.
[78] Z. M. Jiang and A. E. Hassan, “A survey on load testing of large-scale software systems,” IEEE Transactions on Software Engineering, vol. 41, no. 11, pp. 1091–1118, 2015.
[79] M. Arlitt, D. Krishnamurthy, and J. Rolia, “Characterizing the scalability of a large web-based shopping system,” ACM Transactions on Internet Technology (TOIT), vol. 1, no. 1, pp. 44–69, 2001.
[80] C. Karmaker, “Determination of optimum smoothing constant of single exponential smoothing method: a case study,” International Journal of Research in Industrial Engineering, vol. 6, no. 3, pp. 184–192, 2017.
[81] Z. Wu, N. E. Huang, S. R. Long, and C.-K. Peng, “On the trend, detrending, and variability of nonlinear and nonstationary time series,” Proceedings of the National Academy of Sciences, vol. 104, no. 38, pp. 14 889–14 894, 2007.
[82] K.-I. Goh and A.-L. Barabási, “Burstiness and memory in complex systems,” Europhysics Letters, vol. 81, no. 4, p. 48002, 2008.
[83] R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on neural networks, vol. 16, no. 3, pp. 645–678, 2005.
[84] M. Syakur, B. Khotimah, E. Rochman, and B. D. Satoto, “Integration k-means clustering method and elbow method for identification of the best customer profile cluster,” in IOP conference series: materials science and engineering, vol. 336. IOP Publishing, 2018, p. 012017.
[85] K. R. Shahapure and C. Nicholas, “Cluster quality analysis using silhouette score,” in 2020 IEEE 7th international conference on data science and advanced analytics (DSAA). IEEE, 2020, pp. 747–748.
[86] P.-E. Danielsson, “Euclidean distance mapping,” Computer Graphics and image processing, vol. 14, no. 3, pp. 227–248, 1980.
[87] M. Müller, “Dynamic time warping,” Information retrieval for music and motion, pp. 69–84, 2007.
[88] M. Cuturi and M. Blondel, “Soft-dtw: a differentiable loss function for time-series,” in International conference on machine learning. PMLR, 2017, pp. 894–903.
[89] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
[90] A. Ranganathan, “The levenberg-marquardt algorithm,” Tutoral on LM algorithm, vol. 11, no. 1, pp. 101–110, 2004.
[91] I. K. Kim, W. Wang, Y. Qi, and M. Humphrey, “Empirical evaluation of workload forecasting techniques for predictive cloud resource scaling,” in 2016 IEEE 9th International Conference on Cloud Computing (CLOUD). IEEE, 2016, pp. 1–10.
[92] L. Liao, J. Chen, H. Li, Y. Zeng, W. Shang, J. Guo, C. Sporea, A. Toma, and S. Sajedi, “Using black-box performance models to detect performance regressions under varying workloads: an empirical study,” Empirical Software Engineering, vol. 25, pp. 4130–4160, 2020.
[93] F. Di Menna, L. Traini, and V. Cortellessa, “Time series forecasting of runtime software metrics: An empirical study,” in Proceedings of the 2024 ACM/SPEC International Conference on Performance Engineering, 2024.