CRA: Identifying Key Classes Using Markov-Chain-Based Ranking Aggregation

Du, Xin; Pan, Weifeng; Jiang, Bo; Ding, Luyun; Pan, Yun; Yuan, Chengxiang; Xiang, Yiming

doi:10.3390/axioms11100491

Open AccessArticle

CRA: Identifying Key Classes Using Markov-Chain-Based Ranking Aggregation

by

Xin Du

¹

,

Weifeng Pan

¹

,

Bo Jiang

¹,

Luyun Ding

¹,

Yun Pan

¹,

Chengxiang Yuan

¹ and

Yiming Xiang

^2,*

¹

School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China

²

School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Axioms 2022, 11(10), 491; https://doi.org/10.3390/axioms11100491

Submission received: 10 August 2022 / Revised: 15 September 2022 / Accepted: 15 September 2022 / Published: 22 September 2022

Download

Browse Figures

Versions Notes

Abstract

:

Researchers have proposed many approaches to identify key classes in software from the perspective of complex networks, such as CONN-TOTAL-W,

P a g e R a n k_{B R}

, and ElementRank, which can effectively help developers understand software. However, these approaches tend to rely on a single metric when measuring the importance of classes. They do not consider the aggregation of multiple metrics to select the winner classes that rank high in majority metrics. In this work, we propose a key class identification approach using Markov-Chain-based ranking aggregation, namely CRA. First, CRA constructs a weighted directed class coupling network (WDCCNet) to describe the software and further applies existing approaches on WDCCNet to calculate class importance. Second, CRA filters out some metrics according to specific rules and uses the Markov chain to aggregate the remaining metrics. When the state probability distribution reaches a fixed point and does not change anymore, the classes in the software are sorted in a descending order according to the probability distribution, and the top-15% classes are treated as key classes. To evaluate the CRA approach, we compare it with 10 baseline approaches available on 6 pieces of software. Empirical results show that our approach is superior to the baselines according to the average ranking of the Friedman Test.

Keywords:

key class identification; complex network; Markov chain; ranking aggregation

MSC:

68N19; 68N30

1. Introduction

The development of software cannot be separated from the evolution of software—“the dynamic behavior of software systems as they are maintained and enhanced over their lifetimes” [1,2]. In practical scenarios, software needs to be constantly updated to meet new user requirements and working environments [3,4], such as code modification, functionality increase, and bug fixes. After iterations and updates, a software’s scale will become larger, and its function will be more complex, which makes the software more difficult to understand. Especially in the context of real development, when a developer unfamiliar with the development process of the software is confronted with thousands of lines of code, where does he start to understand such a complex software? In other words, what are the effective technologies to help developers deal with the great challenges they face?

We focus our attention on how to help developers understand the software. Object-Oriented (OO) languages have become the mainstream language in software development; in OO paradigms, software systems are composed of various elements, such as methods, classes, and packages. Complex networks provide a useful perspective to represent software by abstracting these components of the software as nodes and the coupling relationships between these elements as edges in the network. These essential components are closely coordinated to achieve the important function of the software. Therefore, developers only need to find the important nodes on the network and then regard them as the starting point for understanding the software. In recent works, researchers utilized complex networks to represent software at the class granularity and named such networks as Software Network (SN) [5,6,7,8,9]. SN, another type of complex network, also possesses two properties that a complex network has [9], i.e., scale-free (the degree distribution of nodes in the network obeys a power-law distribution) [10] and small-world (a node requires only a short path to reach another node) [11].

In the field of complex networks, researchers have proposed many methods for measuring important nodes in networks, such as k-shell [12], degree centrality [13], and PageRank [14]. Most of these methods are applied to unweighted and undirected networks. However, a software network is a weighted directed network whose components interact with different closeness. These methods cannot be applied directly to SN. Therefore, software engineering researchers improved these methods and proposed new metrics, such as

C o r e_{w k}

[15], CONN-TOTAL-W [6],

P a g e R a n k_{B R}

[16], ElementRank [17], and ClassRank [18] etc., to identify key class nodes in the SN. Although these metrics obtained satisfying performance in key class identification, there is still a small part of key classes that are not successfully identified. Meanwhile, using a single metric cannot measure the importance of a class comprehensively.

Motivation: Social voting systems often determine preferred candidates in elections based on preferential ballots and pairwise comparison counts. During the comparison, the object wins a majority of the counts in a head-to-head battle with other competitors. That is, if there is a candidate who ranks high in most of the voting rankings, then the object is called “Condorcet winner". This phenomenon is not only presented in social voting systems but is also applied to the field of key class identification. We can consider the above approaches as voters, and classes are treated as candidates. Aggregation of multiple metrics generated by these approaches is achieved by pairwise comparison of the performance (ranking) of each class in them. Specifically, when one class ranks higher than the others by more than half metrics, it is regarded as a winner candidate and should be ranked higher in the final aggregated sequence. Therefore, in this work, we attempt to aggregate multiple metrics in a “the majority rule” way to obtain a new metric and check the ranking of the entity in this sequence to identify the key class.

In this paper, we are inspired by social choice theory and propose the CRA approach. It can generate a ranked list of classes for each software, in which the top-ranked classes are considered as key classes. Such classes can be used as a starting point for developers to understand the software. Specifically, first, we extract the software structure at the level of class granularity and construct the software network. Second, we calculate the importance of each node in the software network using 10 types of methods that were proposed and widely used. Then, we filter out several metrics according to specific rules and use the Markov chain to aggregate the remaining. Finally, all classes will be ranked according to the steady-state distribution of the Markov chain, and the top-ranked classes will be considered as key classes for the software comprehension process. In addition, CRA is empirically evaluated in six open source software. Through a series of experiments, we compare the performance of the CRA over the 10 methods. Empirical results show that our approach is superior to these baseline approaches.

The contribution of this paper can be summarized as follows:

We propose a novel approach—CRA—to aggregate class metrics and obtain a ranking sequence of classes. Developers can treat the top-ranked classes as the starting point for software understanding;
We evaluate CRA on six existing open source software and compare it with 10 baseline approaches. The results show that our approach achieves superior performance.

The rest of this paper is organized as follows: Related work is reviewed in Section 2. Our approach is described in detail in Section 3. Section 4 is theoretical and empirical evaluations, where Section 4.5 shows the results of our experiments. Section 5 is a summary of this work and discusses future work.

2. Related Work

In machine learning, approaches are classified into supervised and unsupervised learning based on whether labels need to be known beforehand. In the field of key class identification, researchers have also conducted extensive research using these two techniques.

Supervised techniques: The supervised technology uses a set of labeled data to learn a mapping from input to output and then apply this mapping relationship to unknown datasets to achieve the purpose of key class classification. Osman et al. used supervised learning to locate key classes by compressing class diagrams in open source projects [19]. Thung et al. extended the work by Osman et al. [20]. They combined design and network metrics as features and fed them into the classifier for training. However, these approaches suffered from some problems such as lack of data and class imbalance. To address this problem, researchers began experimenting with unsupervised learning to identify key classes in software.

Unsupervised techniques: Unlike supervised learning, unsupervised technology to identify key classes reveals the inherent characteristics and laws of data through learning these unlabeled samples. Zhou et al. [21] proposed weighted undirected class dependency networks, and then they used the h-index and its invariant metric to measure the importance of the nodes. Sora et al. [6] proposed CONN-TOTAL and CONN-TOTAL-W, where CONN-TOTAL is the total number of nodes connected to the specific node, and CONN-TOTAL-W is the sum of the weights on all link (in- and out-link) connected to the one node. Meyer constructed unweighted undirected class networks and used the K-Core decomposition algorithm to identify key classes in software [5]. Pan et al. extended

C o r e_{k}

and proposed the

C o r e_{w k}

approach to identify key classes [15]. The degree calculation is the biggest difference between

C o r e_{w k}

and

C o r e_{k}

. Specifically, the

C o r e_{w k}

considers both the link direction and weight when calculating the degree, which

C o r e_{k}

does not consider. Steidl et al. employed directed class dependency networks to represent a piece of software and applied PageRank [6,22,23,24], Betweenness, and HITS algorithms to measure the importance of nodes in the network [24]. PageRank

_{B R}

was proposed by Sora et al. The difference between PageRank

_{B R}

and PageRank is that the former takes back recommendation into account [16]. ElementRank [17] was proposed by Pan et al. They constructed multi-layer software networks at the class and package granularity of level, respectively. Moreover, the traditional PageRank algorithm only fits unweighted networks. Based on this, Pan et al. applied a weighted PageRank algorithm. Recently, Pan et al. proposed a PageRank-like algorithm called ClassRank. This work believed that classes with larger out- and in- degrees are tightly coupled with others. Thus, these classes are more accessible and important [18].

However, a single metric often does not provide a more comprehensive measure of the importance of a key class. In this work, we attempt to aggregate multiple existing metrics in a “minority rule” way. The purpose of doing this is to obtain a class importance metric that considers the advantages of multiple metrics. Our approach belongs to unsupervised learning in a strict sense.

3. The Proposed Method CRA

Figure 1 gives an overview of our proposed CRA method. It consists of four steps:

Step 1: Building software network; in the first step, by compiling the source code of the Java software system, we extract its software structure and construct the software network;
Step 2: Calculating Class Importance Metrics; in the second step, we apply the mainstream 10 key class identification methods to the constructed network and obtain 10 metrics respectively to measure the importance of classes;
Step 3: Aggregation of class importance metrics; in the third step, we filter out some of the metrics by specific rules and aggregate the rest metrics. After the n-step transition, the system gradually converges to a fixed point, and, eventually, we can obtain the stationary distribution of the Markov chain;
Step 4: we rely on this probability distribution to rank the classes, and top-ranked classes will be considered key as classes.

In this work, we parse the source code of the java projects. The reasons are as follows: (1) Most applications are developed in Java. (2) Compared with other programming languages, we can more easily find software projects written in Java for all scales. Note that, our approach is shown with examples in Java, but the same approach can be extrapolated to other languages.

3.1. Building Software Network

The construction process of the software network is shown in Figure 2. Firstly, we developed SNAP (Software Network Analysis Platform [17,18]) to extract the entities and coupling types at a specific granularity level from the software system’s static source code; then, different weights are assigned to the different coupling types; finally, these components and the couplings between them form the software network.

The nature of the traditional network is a graph containing two parts,

(V, L)

, where V denotes the network nodes and L denotes the links between the nodes. The mapping from the software network to the traditional network is equivalent to abstract entities and interactions of entities in the software into the nodes and edges in the network.

Structural Information Extraction: Entities in software contain multiple granularities level, including packages, classes, etc. Considering that our work is an aggregation strategy of class importance rankings generated by existing key class identification approaches, it is natural to think of extracting software structures at the class granularity level.

Coupling Types Extraction: Traditional networks only reflect the existence of links between nodes. However, there are various coupling relationships among the entities in software. In this work, we recognize nine types existing in software entities and assign directions links according to them [18].

Definition 1.

(Coupling Types Between Classes). For any two classes node u and

v (u \neq v)

, if there is a relationship as defined in Table 1, then we will recognize and extract it as

u \to v

:

Coupling Strength Calculation: Different coupling relationships in software have different coupling strengths. In this work, we adopt the objective weighting mechanism based on the distribution of coupling types, which was proposed by Abreu et al. [25]. The reasons can be divided into two aspects: (1) Internal reason: since our approach is based on existing approaches, to ensure fairness and consistency, the weights of entities in the CRA should be consistent with the assignment in the network constructed by existing approaches. (2) External reason: this weighting mechanism can objectively measure the closeness between classes and is widely used in several works.

In the previous step, we recognize nine types of couplings, let

R =

{

I N R

,

I M R

,

I N S

,

L V R

,

G V R

,

A C C

,

M C R

,

R T R

,

P A R

}. We use two

9 * 1

column vectors

w = {[\begin{matrix} w_{I N R} & \dots & w_{P A R} \end{matrix}]}^{⊤}

and

o = {[\begin{matrix} o_{I N R} & \dots & o_{P A R} \end{matrix}]}^{⊤}

to represent the weights and occurrence frequency of various coupling types on the link

u \to v

, respectively. The definition of weight on the link that connects class nodes u and v, i.e.,

W_{u \to v}

, can be considered as the dot product of two vectors

w_{R}

and

o_{R}

.

\begin{matrix} W_{u \to v} & = w \cdot o = [w_{I N R} \dots w_{P A R}] [\begin{matrix} o_{I N R} \\ ⋮ \\ o_{P A R} \end{matrix}] = w_{I N R} o_{I N R} + \dots + w_{P A R} o_{P A R} \end{matrix}

(1)

r belongs to R, which refers to any of the nine coupling types.

w_{r}

denotes the weight assigned to a particular coupling type, and it is defined as follows:

\begin{matrix} w_{r} = \{\begin{matrix} 10 N_{i n t r a}^{r} \neq 0 \land N_{i n t e r}^{r} = 0 \\ 1 N_{i n t r a}^{r} = 0 \land N_{i n t e r}^{r} = 0 \\ R o u n d (0.5 + 10 \times \frac{N_{i n t r a}^{r}}{N_{i n t r a}^{r} + N_{i n t e r}^{r}}) & o t h e r w i s e \end{matrix} \end{matrix}

(2)

where

N_{i n t e r}^{r}

and

N_{i n t r a}^{r}

indicate the number of intra- and inter-module couplings, respectively [15].

Software Network Definition: After defining the coupling relationship (link direction in the network) and the coupling strength (link weight in the network), we construct the WDCCNet (Weighted Directed Class Coupling Network) to represent the software at the class level of granularity. It is defined as follows:

Definition 2.

WDCCNet contains two parts, i.e., (V, L). We regard class in software as nodes. Note that the term class refers to both classes and interfaces in software. If a relation exists between two classes in the software, there will be a link connect two nodes in the WDCCNet. For instance, u and v are two class nodes in WDCCNet. When they interact in the actual software with the above nine coupling types, there will be a link

L_{u \overset{W_{u \to v}}{⟶} v} = < u, v >

(where

L_{u \overset{W_{u \to v}}{⟶} v} \in L

).

3.2. Calculating Class Importance Metrics

We count 10 key class identification methods that are the current mainstream approaches, i.e., h-index [21], a-index [21], CONN-TOTAL [6], CONN-TOTAL-W [6],

C o r e_{k}

[26],

C o r e_{w k}

[15], PageRank [6,22,23,24], PageRank

_{B R}

[16], ElementRank, and ClassRank [18]. We make a brief introduction to them in Section 2.

We implement these approaches and apply them to the WDCCNet that we constructed, and we can then obtain 10 metrics for measuring the importance of classes. The larger values indicate that the class is more important, which applies to all approaches.

3.3. Aggregation of Class Important Metrics by Markov chain

3.3.1. Brief Review of Markov chain

The model of Markov chain refers to the stochastic process of transition from one state to another in the state space. Specifically, similar to [27], the Markov chain model is defined as follows:

Definition 3.

Markov

(S, {P, π^{(n)}}_{n = 1}^{N})

is composed of a state space S, a set of N Markov chains

(P, π^{(n)})

defined over S. Each

(P, π^{(n)})

is a Markov chain that is time-homogeneous, which means that the current state is only related to the previous one state and not to the past state.

P

is a transition matrix, also called stochastic matrix, whose element

P_{i j}^{(n)}

is the value of i-th row and j-th column of

P

, which represents the probability that the chain that started in state i hits j in n-steps (

i, j \in S

).

P

satisfies the following properties: (1)

P_{i j} \geq 0

for all i,j; (2) for each row i,

\sum_{j}^{} P_{i j} = 1

.

π^{(n)}

denotes the state probability vector after n-steps, where

π

equals

π_{0}, π_{1}, . . ., π_{| S | - 1}

.

π^{(0)}

is the initial state distribution of the Markov chain.

3.3.2. Refining Metrics

We use a group of agents to denote each component of the Markov chain. Let T be the number of classes (state).

S = [s_{1}, . . ., s_{T}] \in R^{1 * T}

is a state space, where

T = | S |

. Each class node in WDCCNet corresponds to a particular state

s_{t} (s_{t} \in S)

. Let

τ = (τ_{1}, τ_{2}, . . ., τ_{H})

, where

τ_{h}

and H refer to a ranking returned by one specific class importance metric and the total number of metrics, respectively. Thus, the ranking position of each class is denoted as

τ_{h}^{(s_{t})} (s_{t} \in S)

. Meanwhile, when

s_{v}

ranks before

s_{u}

, we write this process as

τ_{h}^{(s_{v})} ≺ τ_{h}^{(s_{u})}

. However, one thing that needs to be considered is whether the ranking is a strong sorting, i.e., if class u and class v have the same value calculated by a specific method, we cannot compare their importance. Thus, a rule was proposed by us to filter out unsuitable metrics according to their repetition rates. The specific definition of the Repetition Rate (RR) is as follows:

Definition 4.

Repetition Rate(RR) = 1-

\frac{T - r}{T}

, where T equals the total number of classes (state), and r denotes the count of recurring values. Suppose

R R \geq 0.05

, which means the current metric has a high repetition rate. We cannot rank the classes based on the metric values very well.

As shown in Table 2, we calculate the RR of the 10 approaches on 6 software systems. According to the threshold of 0.05, we can filter out a portion of unsuitable measures. Eventually, five metrics are left for later aggregation operations. Therefore,

H = 5

. Note that the repetition rate is calculated based on the top-15% classes of the ranking list returned by each metric in a descending order. The reasons are two-fold: (1) There are many outlier nodes in WDCCNet, whose value is small, and they are mostly distributed in lower positions. Therefore, counting the repetition rate of these class nodes is meaningless. (2) Most software engineering researchers identified key classes in software only by checking the top-ranked classes [15,17,28,29].

3.3.3. Aggregating Metrics

In this part, we elaborate on how to aggregate metrics step-by-step. After the above operation, five metrics are left. We aggregate metrics mainly by constructing transition matrix

P^{^{'}}

, based on the following idea [30]: Assuming the current state is class

s_{u}

. First, we need to evenly select another state (class)

s_{v}

from the remaining classes. Next, we need to compare the position about two classes ranked by metrics, If

τ_{h}^{(s_{v})} ≺ τ_{h}^{(s_{u})}

in majority of ranking returned by metrics, then the current state

s_{u}

transfer to

s_{v}

. Otherwise stay in state

s_{u}

. Based on the above ideas, the element of the original transition matrix

P^{^{'}}

can be constructed as follows:

\begin{matrix} P_{i, j}^{^{'}} & = \{\begin{matrix} 0, & i f \sum_{h}^{H} |τ_{h}^{(i)} ≺ τ_{h}^{(j)}| > \frac{1}{2} * H \\ \frac{1}{T}, & e l s e \end{matrix}, \end{matrix} \begin{matrix} s . t . & i \neq j \end{matrix}

(3)

Note that, there exists a constraint on the Equation (3) that

i \neq j

. After a normalized operation, the diagonal part

(i = j)

of the matrix

P^{^{'}}

can be supplemented according to one of the properties satisfied by the transfer matrix, i.e., for each row i,

\sum_{j}^{} P_{i j} = 1

. However, the constructed matrix

P_{i, j}^{^{'}}

has a small problem: there are two states that point to each other but not to the other state: once a state points to one of them, a loop transfer occurs. Thus, to avoid loop transfers between states, i.e., it is possible to go from one state to every state. We introduce the ergodic Markov chain:

\begin{matrix} P = α P^{^{'}} + (1 - α) K, \end{matrix}

(4)

where

P

is the transition matrix for an ergodic Markov chain,

α (0 < α < 1)

is a coefficient that is typically set to 0.85 [17,22,23,30].

K

is a

T \times T

matrix each of whose entries is

\frac{1}{T}

.

3.3.4. An Example to Aggregate Metrics via Markov Chain

In this section, we describe the calculation process of our CRA approach in detail by giving an example. Suppose that the existing three metrics return the importance value of classes A, B, and C, as shown in Table 3. Firstly, we translate their value into rankings as shown in Table 4.

Next, we construct the transition matrix according to Equations (3) and (4). For class A, two of the three metrics rank it higher than C (more than half of all metrics), and all metrics rank it higher than B. Therefore, the definition of the transition matrix

P

is as follows:

\begin{matrix} P = 0.85 [\begin{matrix} 1 & 0 & 0 \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & 0 & \frac{2}{3} \end{matrix}] + (1 - 0.85) [\begin{matrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{matrix}] = [\begin{matrix} \frac{18}{20} & \frac{1}{20} & \frac{1}{20} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\ \frac{20}{60} & \frac{3}{60} & \frac{37}{60} \end{matrix}] \end{matrix}

We set initial state

π^{(0)}

so it equals

[\begin{matrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{matrix}]

. Note that the initial state is not unique since the final stationary distribution is regardless of the initial distribution. Then, the next state probability distribution

π^{(1)}

is the product of the

π^{(0)}

and transition matrix

P

. We describe this process as

π^{(1)} = π^{(0)} P = [\begin{matrix} \frac{47}{90} & \frac{13}{90} & \frac{30}{90} \end{matrix}], π^{(1)} \neq π^{(0)}

In such a manner, we can calculate

π^{(n)}

until it reaches a fixed point and does not change anymore, i.e.,

π^{(n)} = π^{(n - 1)}

. In this example, after 27 iterations of computation, the final state probability distribution

π^{(27)}

converges to

[\begin{matrix} 0.7692 & 0.0698 & 0.161 \end{matrix}], i . e ., (π^{(27)} = π^{(26)})

. Such a distribution that satisfies

π^{(n)} = π^{(n - 1)}

is called a stationary distribution. We rank three classes according to the values of the sequence formed by the entities in the stationary distribution. Using the above process, we achieve the aggregation of multiple metrics to rank the classes.

3.4. Class-Level Ranking

We rank classes in a descending order according to the value of stationary distribution. Top-ranked classes are more important than lower-ranked classes. As with these efforts [15,17,28,29], we use 15% as the threshold for identifying key classes. That is, the top-15% of classes were regarded as key classes, and these classes can be used as a starting point for developers to understand the software.

4. Experimental Validation

In this section, we describe the experimental setup that we follow to evaluate CRA. The experimental environment is a desktop computer equipped with AMD Ryzen 5 5600G with Radeon Graphics 3.90 GHz CPU and 16GB RAM, running Win10 (64-bit).

4.1. Research Questions

In the experiments, we aim to address several research questions:

RQ1: Can our method improve how to distinguish the difference between classes? We rank classes according to the probability of each state when they converge to a steady point, Thus, if two state have the same probability in stationary distribution, they cannot be ranked in theory, In this experiment, we need to judge whether we can obtain a strong ordering according stationary distribution;
RQ2: Can our method effectively identify key classes distributed in software systems? Currently, many methods have been proposed and applied to nine key classes. In this experiment, we focus on our method’s performance compared to other baseline methods when the cut-off line equals 15%.

4.2. Subject Systems

In this work, we chose six pieces of software systems as our subject software. The reasons were twofold: (1) These systems were applied to other advanced research work, which proves that these systems are representative. (2) It can be more intuitive to compare our method with these mainstream methods on the same software systems.

We list the detailed information about these six systems in Table 5. This table consists of three columns. We provide the complete name in the left column. The second and third columns offer the version and the number of classes in each software system. In order to facilitate other researchers to download the software and reproduce our experiments, we list the specific URL of the software in the right column.

4.3. Baseline Approaches

There are 10 mainstream metrics described in Section 2. These approaches achieved excellent performance when applied to the field of key class identification. Thus, in this work, these metrics also play the role of baseline approaches compared with our proposed method. Since we briefly summarized these approaches above, we will not repeat them anymore here.

4.4. Evaluation Metrics

Classes can be divided into two classes, i.e., key and non-key classes. Key classes can be treated as positive samples while non-key classes are negative samples, indicating that identifying key classes is a binary classification problem. We adopt the evaluation metrics used by Refs. [5,6,16,18,21], which measure precision, recall, and F1 score over identification of key class. The recall is often used to correctly evaluate how many key classes are identified by our approach. Precision is used to evaluate the number of key classes in our retrieved classes. Note that most of the work is more concerned with the recall to evaluate the performance of the approach. For example, even if an approach identifies all key classes that only rely on the top-15% of classes, the base number of retrieved classes is still too large. Such an approach has a full recall but low precision, equivalent to automatically hurting the precision. Thus, we introduce the F1 score, a harmonic function about Recall and Precision, to use both together.

4.5. Experiment Results and Analysis

In this subsection, we focus on answering the RQs proposed in Section 4.1 by obtaining the results from a series of experiences in detail.

4.5.1. RQ1: Can Our Method Improve How to Distinguish the Difference between Classes?

We perform this experiment to evaluate whether the condition exists where we cannot rank classes due to their importance values being the same. Detailed steps are specifically expressed as follows. First, we apply 11 approaches to measure the classes’ importance value in 6 software networks. Then, we rank classes according to their importance value returned by 11 approaches and calculate the repetition rate of top-15% classes in this experiment; the threshold was still set at 0.05. The approach has a low repetition rate when RR is equal to or less than 0.05, which indicates that the current approach can easily distinguish the class nodes in software networks.

The experimental result is shown in Figure 3. Every subfigure corresponds to a system. Its abscissa and ordinate represent 11 approaches and repetition rates, respectively. In order to display the comparison result more intuitively, we divide the threshold using a red dotted line. A value below the red line indicates that the approach performs well.

4.5.2. RQ2: Can Our Method Effectively Identify Key Classes Distributed in Software Systems?

We aim to evaluate our approach’s retrieval ability of key class by performing this experiment. Noticeably, it exists in some approaches that classes have the same importance value and cannot be sorted. Specifically, we use 15% as the cut-off line and check the top-15% of classes whether they are key classes. However, there are still classes with the same value of around 15%. Thus, we determine the specific number of retrieval classes according to the mean ranking of these indistinguishable classes. For example, we denote ith as the class of its ranking returned by one specific approach and assume them with the same importance value, whose ranking is obtained from 2th to 6th. While these five classes can be located in any of the five rankings due to the same value, we decide whether to append all of them by comparing the number of retrieval classes with their average location 4th (i.e.,

\frac{2 + 6}{2}

th).

Table 4 visualizes the results of this experiment, which reflects the performance of the 11 methods through 3 evaluation metrics, namely Recall, Precision, and F1. The bold black values in the table indicate the approach that performs the best when we apply it to a particular software system.

However, it can be seen from Table 6 that no approach has the best performance across all software, even the most state-of-the-art approach. Thus, an objective technique is adopted to select one approach with relatively better performance by conducting a comprehensive evaluation of them. Friedman Test [31], a non-parametric statistical test, can offer an important basis for comparison between algorithms. At the same time, such a technique is also widely used in similar scenarios to rank approaches based on their performance on multiple datasets. Generally speaking, the smaller the value returned by the Friedman test is, the better the approach performs. Every column in Table 7 denotes the average ranking result for one specific evaluation metric. The bold black font indicates that the corresponding approach performs best.

4.6. Threats of Validity

Several factors may influence the validity of our experimental conclusion. We divide these factors into two threats, i.e., the internal and external threats.

4.6.1. Threats to Internal Validity

One internal threat is the accuracy of the WDCCNet. It was extracted by the SNAP tool we developed. Therefore, the final result will be influenced by the accuracy of the network. However, this threat was minimized due to the SNAP tool having been sufficiently tested in our published papers [7,15,17,18]. Another is the metrics selection. Although we filter some metrics with a high reputation rate, there are still tiny nodes in the remaining metrics that have the same importance and cannot be ranked. Since this part of the node’s position is relatively backward, we only focus on the ranking of the top class. Thus, it does not pose a threat to our conclusion.

4.6.2. Threats to External Validity

The second potentially limiting factor is the selection of programming language. In this work, we aim to analyze Java. Extending it to other programming languages may differ a little from our conclusions. We will actively explore key classes in other OO languages, such as C++.

5. Conclusions and Future Work

In this work, we propose a key class identification approach based on the existing metrics and Markov chain. Our approach addresses the problem that a single metric cannot reflect the importance of the class from multiple perspectives. Specifically, we construct weighted directed class coupling software networks and apply some mainstream key class identification approaches to them. Then, we filter out some metrics according to specific rules and use the Markov chain to aggregate the remaining metrics. The stationary distribution of the Markov chain reaching a stable state can be described as the ranking of class importance. Top-15% classes were regarded as a key class to help developers understand unfamiliar software. The empirical results of experiments conducted on 6 Java subject systems show that our approach is superior to the other 10 baseline approaches according to the average ranking of the Friedman test. In future work, we will propose more efficient approaches to the nine key classes in software to help developers understand a piece of the system and extend our approach to other OO languages.

Author Contributions

Conceptualization, W.P. and X.D.; methodology, W.P. and X.D.; software, W.P., L.D. and X.D.; supervision, B.J., Y.P., C.Y. and Y.X.; writing-original draft, X.D. and L.D.; writing-review and editing, W.P., X.D. and L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62272412, 62032010, and 61832014), the Natural Science Foundation of Zhejiang Province (Grant Nos. LY22F020007 and LY21F020002), and the Key R&D Program of Zhejiang Province (Grant Nos. 2021C01162 and 2019C01004).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors gratefully acknowledge all the reviewers for their positive and valuable comments and suggestions regarding our manuscript.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Belady, L.A.; Lehman, M.M. A Model of Large Program Development. IBM Syst. J. 1976, 15, 225–252. [Google Scholar] [CrossRef]
Gupta, S.; Singh, P. Comprehending Scenario-Level Software Evolution Using Calling Context Trees. In Proceedings of the 2017 International Conference on Information Technology, ICIT 2017, Bhubaneshwar, India, 21–23 December 2017; pp. 125–130. [Google Scholar] [CrossRef]
Liu, H.; Han, Y.; Zhu, A. Modeling supply chain viability and adaptation against underload cascading failure during the COVID-19 pandemic. Nonlinear Dyn. 2022, 1–17. [Google Scholar] [CrossRef]
Sun, J.; Dushime, H.; Zhu, A. Beyond beauty: A qualitative exploration of authenticity and its impacts on Chinese consumers’ purchase intention in live commerce. Front. Psychol. 2022, 13, 944607. [Google Scholar] [CrossRef]
Meyer, P.; Siy, H.P.; Bhowmick, S. Identifying Important Classes of Large Software Systems through k-Core Decomposition. Adv. Complex Syst. 2014, 17, 1550004. [Google Scholar] [CrossRef]
Sora, I.; Chirila, C. Finding key classes in object-oriented software systems by techniques based on static analysis. Inf. Softw. Technol. 2019, 116, 106176. [Google Scholar] [CrossRef]
Du, X.; Wang, T.; Wang, L.; Pan, W.; Chai, C.; Xu, X.; Jiang, B.; Wang, J. CoreBug: Improving effort-aware bug prediction in software systems using generalized k-core decomposition in class dependency networks. Axioms 2022, 11, 205. [Google Scholar] [CrossRef]
Pan, W.; Ming, H.; Yang, Z.; Wang, T. Comments on “Using k-core Decomposition on Class Dependency Networks to Improve Bug Prediction Model’s Practical Performance”. IEEE Trans. Softw. Eng. 2022. [Google Scholar] [CrossRef]
Myers, C.R. Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs. Phys. Rev. E 2003, 68, 046116. [Google Scholar] [CrossRef]
Barabási, A.L.; Albert, R.; Jeong, H. Scale-free characteristics of random networks: The topology of the world-wide web. Phys. A Stat. Mech. Appl. 2000, 281, 69–77. [Google Scholar] [CrossRef]
Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
Kitsak, M.; Gallos, L.K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H.E.; Makse, H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010, 6, 888–893. [Google Scholar] [CrossRef]
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 1972, 2, 113–120. [Google Scholar] [CrossRef]
Brin, S.; Page, L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Comput. Netw. 1998, 30, 107–117. [Google Scholar] [CrossRef]
Pan, W.; Song, B.; Li, K.; Zhang, K. Identifying key classes in object-oriented software using generalized k-core decomposition. Future Gener. Comput. Syst. 2018, 81, 188–202. [Google Scholar] [CrossRef]
Sora, I. A PageRank based recommender system for identifying key classes in software systems. In Proceedings of the 10th IEEE Jubilee International Symposium on Applied Computational Intelligence and Informatics, SACI 2015, Timisoara, Romania, 21–23 May 2015; pp. 495–500. [Google Scholar] [CrossRef]
Pan, W.; Ming, H.; Chang, C.K.; Yang, Z.; Kim, D. ElementRank: Ranking Java Software Classes and Packages using a Multilayer Complex Network-Based Approach. IEEE Trans. Softw. Eng. 2021, 47, 2272–2295. [Google Scholar] [CrossRef]
Pan, W.; Ming, H.; Kim, D.K.; Yang, Z. Pride: Prioritizing Documentation Effort Based on a PageRank-Like Algorithm and Simple Filtering Rules. IEEE Trans. Softw. Eng. 2022. [Google Scholar] [CrossRef]
Osman, M.H.; Chaudron, M.R.V.; van der Putten, P. An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams. In Proceedings of the 2013 IEEE International Conference on Software Maintenance, Eindhoven, The Netherlands, 22–28 September 2013; pp. 140–149. [Google Scholar] [CrossRef]
Thung, F.; Lo, D.; Osman, M.H.; Chaudron, M.R.V. Condensing class diagrams by analyzing design and network metrics using optimistic classification. In Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, 2–3 June 2014; Roy, C.K., Begel, A., Moonen, L., Eds.; ACM: New York, NY, USA, 2014; pp. 110–121. [Google Scholar] [CrossRef]
Wang, M.; Lu, H.; Zhou, Y.; Xu, B. Identifying key classes using h-index and its variants. Jisuanji Kexue yu Tansuo 2011, 5, 891–903. [Google Scholar]
Perin, F.; Renggli, L.; Ressia, J. Ranking software artifacts. In Proceedings of the 4th Workshop on FAMIX and Moose in Reengineering (FAMOOSr 2010); 2010; pp. 1–4. [Google Scholar]
Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report 1999-66; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
Steidl, D.; Hummel, B.; Jürgens, E. Using Network Analysis for Recommendation of Central Software Classes. In Proceedings of the 19th Working Conference on Reverse Engineering, WCRE 2012, Kingston, ON, Canada, 15–18 October 2012; pp. 93–102. [Google Scholar] [CrossRef]
E Abreu, F.B.; Pereira, G.; Sousa, P.M.A. A Coupling-Guided Cluster Analysis Approach to Reengineer the Modularity of Object-Oriented Systems. In Proceedings of the 4th European Conference on Software Maintenance and Reengineering, CSMR 2000, Zurich, Switzerland, 29 February–3 March 2000; pp. 13–22. [Google Scholar] [CrossRef]
Alvarez-Hamelin, J.I.; Dall’Asta, L.; Barrat, A.; Vespignani, A. K-core decomposition of Internet graphs: Hierarchies, self-similarity and measurement biases. Netw. Heterog. Media 2008, 3, 371–393. [Google Scholar] [CrossRef]
Luo, D.; Xu, H.; Zhen, Y.; Dilkina, B.; Zha, H.; Yang, X.; Zhang, W. Learning Mixtures of Markov Chains from Aggregate Data with Structural Constraints. IEEE Trans. Knowl. Data Eng. 2016, 28, 1518–1531. [Google Scholar] [CrossRef]
Zaidman, A.; Demeyer, S. Automatic identification of key classes in a software system using webmining techniques. J. Softw. Maint. Res. Pract. 2008, 20, 387–417. [Google Scholar] [CrossRef]
Jiang, S.J.; Ju, X.L.; Wang, X.Y.; Li, H.Y.; Liu, Y.Q. Measuring the importance of classes using UIO sequence. Acta Electonica Sin. 2015, 43, 2062. [Google Scholar]
Dwork, C.; Kumar, R.; Naor, M.; Sivakumar, D. Rank aggregation methods for the Web. In Proceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, 1–5 May 2001; Shen, V.Y., Saito, N., Lyu, M.R., Zurko, M.E., Eds.; ACM: New York, NY, USA, 2001; pp. 613–622. [Google Scholar] [CrossRef]
García, S.; Fernández, A.; Luengo, J.; Herrera, F. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]

Figure 1. The overall process of the proposed approach.

Figure 2. The definition process of a software network (SN).

Figure 3. The comparison of repetition rate with baseline approaches for the top-15% classes.

Table 1. Descriptions of the coupling relationships.

Relationship	Description
INheritance RELation (INR)	u inherits v by keywords “extends”
IMplementation Relation (IMR)	u implements interface v by keywords “implements”
INStantiates (INS)	u instantiates an object of v
Local Variable Relation (LVR)	One method of u contains a local variable of v
Global Variable Relation (GVR)	u has an attribute with type v
ACCess (ACC)	One of the u’s methods accesses a property of v
Method Call Relation (MCR)	One of the u’s methods calls other method on an object of v
Return Type Relation (RTR)	One of the u’s methods has a return type of v
PARameter type (PAR)	One of the u’s methods has a parameter type of v

Table 2. The results of the repetition rate. The cells highlighted in gray background denote the values lower than 0.05 and the corresponding metrics that are left.

Systems	Metrics
Systems	h-Index	a-Index	CONN-TOTAL	CONN-TOTAL-W	${Core}_{k}$	${Core}_{wk}$	PageRank	PageRank $_{BR}$	ElementRank	ClassRank
Ant	0.9823	0.5778	0.8605	0.2388	1.0	0.0000	0.0296	0.0296	0.0148	0.0
Argo UML	0.9796	0.7953	0.8444	0.2835	1.0	0.1811	0.0315	0.0315	0.0000	0.0
jEdit	0.9733	0.5305	0.8929	0.2439	1.0	0.0000	0.0000	0.0000	0.0000	0.0
JHotDraw	0.9756	0.3537	0.7750	0.0988	1.0	0.0244	0.0000	0.0000	0.0000	0.0
JMeter	0.9556	0.3077	0.7000	0.1538	1.0	0.0000	0.0000	0.0000	0.0000	0.0
wro4j	0.9747	0.4941	0.8000	0.1860	1.0	0.0000	0.0000	0.0000	0.0000	0.0
AVG	0.9735	0.5099	0.8121	0.2008	1.0000	0.0343	0.0102	0.0102	0.0025	0.0000

Table 3. An example of a simulation class importance calculation.

	Metric-1	Metric-2	Metric-3
A	0.3	0.2	0.3
B	0.2	0.1	0.1
C	0.1	0.3	0.2

Table 4. An example of a simulated class ranking.

	Metric-1	Metric-2	Metric-3
A	1	2	1
B	2	3	3
C	3	1	2

Table 5. Basic information of the subject software systems.

System	Version	#C	URL
Apache Ant	1.6.1	900	http://ant.apache.org
Argo UML	0.9.5	846	https://argouml-tigris-org.github.io/
jEdit	5.1.0	1082(9)	http://www.jedit.org
jHotDraw	6.0b.1	544	http://www.jhotdraw.org
jMeter	2.0.1	258	http://jmeter.apache.org
wro4j	1.6.3	567(9)	https://code.google.com/p/wro4j

Table 6. Recall, precision, and F1 comparison of CRA against other baseline approaches (cut-off = 15%).

	Approaches	a-Index			h-Index			CONN-TOTAL			CONN-TOTAL-W
Systems		Recall	Precision	F1	Recall	Precision	F1	Recall	Precision	F1	Recall	Precision	F1
Ant		0	0	0	0.7	0.0429	0.0809	0.8	0.0516	0.097	0.8	0.0584	0.1088
Argo UML		0.1667	0.0157	0.0288	0.6667	0.0468	0.0874	1	0.0889	0.1633	0.9167	0.0866	0.1583
jEdit		0	0	0	1	0.0352	0.068	1	0.0417	0.08	0.8571	0.0366	0.0702
jHotDraw		0	0	0	1	0.1098	0.1978	1	0.1011	0.1837	1	0.1084	0.1957
jMeter		0.0714	0.0256	0.0377	0.7143	0.2222	0.339	0.6429	0.225	0.3333	0.5	0.1795	0.2642
wro4j		0	0	0	0.9167	0.1158	0.2056	0.9167	0.1222	0.2157	0.9167	0.1279	0.2245
	Approaches	Core $_{k}$			Core $_{wk}$			PageRank			PageRank $_{BR}$
Systems		Recall	Precision	F1	Recall	Precision	F1	Recall	Precision	F1	Recall	Precision	F1
Ant		0.7	0.0443	0.0833	0.7	0.0519	0.0966	0.6	0.0444	0.0828	0.6	0.0444	0.0828
Argo UML		0.6667	0.0471	0.0879	0.6667	0.063	0.1151	1	0.0945	0.1727	1	0.0945	0.1727
jEdit		0.8571	0.0309	0.0597	0.8571	0.0366	0.0702	1	0.0427	0.0819	1	0.0427	0.0819
jHotDraw		1	0.0857	0.1579	1	0.1098	0.1978	1	0.1098	0.1978	1	0.1098	0.1978
jMeter		0.7143	0.2222	0.339	0.4286	0.1538	0.2264	0.5	0.1795	0.2642	0.5	0.1795	0.2642
wro4j		0.9167	0.1236	0.2178	0.8333	0.1163	0.2041	0.75	0.1047	0.1837	0.75	0.1047	0.1837
	Approaches	ElementRank			ClassRank			CRA
Systems		Recall	Precision	F1	Recall	Precision	F1	Recall	Precision	F1
Ant		0.8	0.0593	0.1103	0.8	0.0593	0.1103	0.9	0.0667	0.1241
Argo UML		1	0.0945	0.1727	1	0.0945	0.1727	1	0.0945	0.1727
jEdit		1	0.0427	0.0819	1	0.0427	0.0819	1	0.0427	0.0819
jHotDraw		1	0.1098	0.1978	1	0.1098	0.1978	1	0.1098	0.1978
jMeter		0.3571	0.1282	0.1887	0.4286	0.1538	0.2264	0.6429	0.2308	0.3396
wro4j		1	0.1395	0.2449	1	0.1395	0.2449	0.9167	0.1279	0.2245

Table 7. Average ranking of the CRA and other baseline approaches.

Approaches	Average Ranking
Approaches	Recall	Precision	F1
a-index	11.0	11.0	11.0
h-index	5.3333	7.4167	7.0833
CONN-TOTAL	4.1667	5.8333	6.0
CONN-TOTAL-W	6.0	6.0	6.0
Core $_{k}$	6.1667	7.75	7.2500
Core $_{w k}$	7.8333	6.6667	7.0
PageRank	6.3333	5.5	5.6667
PageRank $_{B R}$	6.3333	5.5	5.6667
ElementRank	4.6667	4.0	4.0
ClassRank	4.4167	3.75	3.75
CRA	3.75	2.5833	2.5833

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, X.; Pan, W.; Jiang, B.; Ding, L.; Pan, Y.; Yuan, C.; Xiang, Y. CRA: Identifying Key Classes Using Markov-Chain-Based Ranking Aggregation. Axioms 2022, 11, 491. https://doi.org/10.3390/axioms11100491

AMA Style

Du X, Pan W, Jiang B, Ding L, Pan Y, Yuan C, Xiang Y. CRA: Identifying Key Classes Using Markov-Chain-Based Ranking Aggregation. Axioms. 2022; 11(10):491. https://doi.org/10.3390/axioms11100491

Chicago/Turabian Style

Du, Xin, Weifeng Pan, Bo Jiang, Luyun Ding, Yun Pan, Chengxiang Yuan, and Yiming Xiang. 2022. "CRA: Identifying Key Classes Using Markov-Chain-Based Ranking Aggregation" Axioms 11, no. 10: 491. https://doi.org/10.3390/axioms11100491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CRA: Identifying Key Classes Using Markov-Chain-Based Ranking Aggregation

Abstract

1. Introduction

2. Related Work

3. The Proposed Method CRA

3.1. Building Software Network

3.2. Calculating Class Importance Metrics

3.3. Aggregation of Class Important Metrics by Markov chain

3.3.1. Brief Review of Markov chain

3.3.2. Refining Metrics

3.3.3. Aggregating Metrics

3.3.4. An Example to Aggregate Metrics via Markov Chain

3.4. Class-Level Ranking

4. Experimental Validation

4.1. Research Questions

4.2. Subject Systems

4.3. Baseline Approaches

4.4. Evaluation Metrics

4.5. Experiment Results and Analysis

4.5.1. RQ1: Can Our Method Improve How to Distinguish the Difference between Classes?

4.5.2. RQ2: Can Our Method Effectively Identify Key Classes Distributed in Software Systems?

4.6. Threats of Validity

4.6.1. Threats to Internal Validity

4.6.2. Threats to External Validity

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI