Particle Swarm Optimization in The Fine-Tuning of Fuzzy Software Cost Estimation Models

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Prasad Reddy.P.V.G.

D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 12
Particl e Swarm Optimizati on i n the fi ne-tuni ng of Fuzzy Software
Cost Estimation Models


Prasad Reddy. P.V.G.D [email protected]
Department of Computer Science
and Systems Engineering,
Andhra University,
Visakhapatnam, India,

Abstract:

Software cost estimation deals with the financial and strategic planning of
software projects. Controlling the expensive investment of software development
effectively is of paramount importance. The limitation of algorithmic effort
prediction models is their inability to cope with uncertainties and imprecision
surrounding software projects at the early development stage. More recently,
attention has turned to a variety of machine learning methods, and soft
computing in particular to predict software development effort. Fuzzy logic is one
such technique which can cope with uncertainties. In the present paper, Particle
Swarm Optimization Algorithm (PSOA) is presented to fine tune the fuzzy
estimate for the development of software projects . The efficacy of the developed
models is tested on 10 NASA software projects, 18 NASA projects and
COCOMO 81 database on the basis of various criterion for assessment of
software cost estimation models. Comparison of all the models is done and it is
found that the developed models provide better estimation.

Keywords: Particle Swarm Optimization Algorithm (PSOA), Effort Estimation, Fuzzy Cost Estimation,
software cost estimation




1. Introduction
Software cost estimation refers to the predictions of the likely amount of effort, time, and staffing
levels required to build a software .Underestimating software costs can have detrimental effects
on the quality of the delivered software and thus on a companys business reputation and
competitiveness. Overestimation of software cost, on the other hand, can result in missed
opportunities to use funds in other projects [4]. The need for reliable and accurate cost
predictions in software engineering is an ongoing challenge [1]. Software cost estimation
techniques can be broadly classified as algorithmic and non-algorithmic models. Algorithmic
models are derived from the statistical analysis of historical project data [5], for example,
Constructive Cost Model (COCOMO) [2] and Software Life Cycle Management (SLIM) [11]. Non-
algorithmic techniques include Price-to-Win [2], Parkinson [2], expert judgment [5], and machine
learning approaches [5]. Machine learning is used to group together a set of techniques that
embody some of the facets of human mind [5], for example fuzzy systems, analogy, regression
trees, rule induction neural networks and Evolutionary algorithms. Among the machine learning
approaches, fuzzy systems and neural networks and Evolutionary algorithms are considered to
belong to the soft computing group. The algorithmic as well as the non-algorithmic (based on
expert judgment) cost estimation models, however, are not without errors. In the present paper a
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 13
fuzzy estimate is proposed. The parameters of the fuzzy estimate are tuned using the an
optimization technique known as Particle swarm optimization Algorithm (PSOA) .

2. Fuzzy Logic:
One of the new methods, which have recently been used in many applications, is Fuzzy Logic
Control. Fuzzy logic is one of the most useful approaches which deals with fuzziness. Fuzzy logic
is a methodology, to solve problems which are too complex to be understood quantitatively,
based on fuzzy set theory [13,14]. Use of fuzzy sets in logical expression is known as fuzzy logic.
A fuzzy set is characterized by a membership function, which associates with each point in the
fuzzy set a real number in the interval [0, 1], called degree or grade of membership. A triangular
fuzzy MF is described by a triplet (a, m, b), where m is the model value, a and b are the right and
left boundary respectively. Handling the imprecision in input supplied for size requires that size of
software project to be defined as a fuzzy number, instead of crisp number. The uncertainty at the
input level of the model yields uncertainty at the output. This becomes obvious and, more
importantly, bears a substantial significance in any practical endeavor. By changing the size using
fuzzy set, we can model the effort that impacts the estimation accuracy. Therefore, the size is
taken as an input MF and Effort is taken as output MF. The fuzzy estimate E can be computed as
a weighted average Sugeno defuzzification of the input MF.
3 2 1
3 3 2 2 1 1
) ( ) ( ) (
WA
e + e + e
e e + e e + e e
= ----(1)

Where
) (
1
e
,
) (
2
e
and
) (
3
e
represents the degree of fulfillment of each input.
1
e
,
2
e
and
3
e
are the weights of the fuzzy estimate. The parameters or the weights of the Fuzzy Estimate
are to be tuned properly. The parameters of the fuzzy estimate are tuned using the optimization
technique known as Particle swarm optimization Algorithm (PSOA).

3. Overview of Particle Swarm Optimization Algorithm(PSOA):
PSOA is one of the optimization techniques and a kind of evolutionary computation
technique[6,10]. The method has been found to be robust in solving problems featuring
nonlinearity and non-differentiability, multiple optima, and high dimensionality through adaptation,
which is derived from the social-psychological theory. The features of the method are as follows:
1. The method is developed from research on swarm such as fish schooling and bird flocking.
2. It is based on a simple concept. Therefore, the computation time is short and requires few
memories
3. It was originally developed for nonlinear optimization problems with continuous variables. It is
easily expanded to treat a problem with discrete variables.
According to the research results for birds flocking are finding food by flocking. PSO is basically
developed through simulation of bird flocking in two-dimension space. The position of each agent
is represented by XY axis position and also the velocity is expressed by vx (the velocity of X axis)
and vy (the velocity of Y axis). Modification of the agent position is realized by the position and
velocity information. Bird flocking optimizes a certain objective function. Each agent knows its
best value so far (pbest) and its XY position. This information is analogy of personal experiences
of each agent. Moreover, each agent knows the best value so far in the group (gbest) among
pbest. This information is analogy of knowledge of how the other agents around them have
performed. Namely, each agent tries to modify its position using the following information:
The current positions (x,y),
The current velocities (vx, vy),
The distance between the current position and pbest
The distance between the current position and gbest
This modification can be represented by the concept of velocity. Velocity of each agent can be
modified by the following equation:
) s gbest ( rand c ) s pbest ( rand c wv v
k
i 2 2
k
i i 1 1
k
i
1 k
i
+ + =
+
----(2)

Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 14
Where
k
i
v
- velocity of agent i at iteration k
w - weighting function
ci - weighting factor
rand - random number between 0 and 1

k
i
s
- current position of agent i at iteration k
pbesti - pbest of agent i
gbest - gbest of the group

The following weighting function is usually utilized in (2).
iter
iter
w w
w
max
min max

=
----(3)
where
wmax - initial weight
wmin - final weight
itermax - maximum iteration number
iter - current iteration number

Using Eqs. (2) and (3) a certain velocity, which gradually gets close to pbest and gbest can be
calculated. The current position can be modified by the following equation:
1 k
i
s
+
=
k
i
s
+
1 k
i
v
+
----(4)
k
i
s
current searching point
1 k
i
s
+
modified searching point
k
i
v
current velocity
1 k
i
v
+
modified velocity

4. Proposed Models:
Case 1:
4.1 Model I based on Kilo Lines of Code (KLOC):
The COnstructive Cost Model (COCOMO) was provided by Boehm [2][3][11]. This model
structure is classified based on the type of projects to be handled. They include the organic,
semidetached and embedded projects. This model structure comes in the following form

Effort =
o
(KLOC )
|


This model considers the effect of lines of code only. Model I is proposed taking the fuzzified size
of the software project to account for the impression in size, using triangular fuzzy sets. The
estimated effort now is a fuzzy estimate obtained weighed average defuzzification in (1) as
Fuzzy Estimate E=
|
|
.
|

\
|
e + e + e
e + e + e
o
| | |
3 2 1
3 2 1
b m a
----(5)

where , =3.2, =0.795, m represents size in KLOC, a=m and b=1.225m

4.2 Model II based on Kilo Lines of Code (KLOC) and Methodology (ME):
Model II is developed considering the effect of methodology (ME), as an element contributing to
the computation of the software developed effort. It is further modified by adding a bias term d.
The Model II thus takes the following form
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 15

Effort =
o
(KLOC )
|
+c (ME )+d

The fuzzy estimated effort for the above model is

Fuzzy Estimate E=
|
|
.
|

\
|
e + e + e
e + e + e
o
| | |
3 2 1
3 2 1
b m a
+c(ME)+d ----(6)

Where , =3.2, =0.795, m=size in KLOC, ME is methodology of the project, a=m and
b=1.225m ,c=-.895;d=19.9
Where
1
e
,
2
e
and
3
e
are the weights of the fuzzy estimate to be tuned. These weights are
tuned using the Particle Swarm optimization technique.

Case II:
The COCOMO81 database [14] consists of 63 projects data [15], out of which 28 are Embedded
Mode Projects, 12 are Semi-Detached Mode Projects, and 23 are Organic Mode Projects. Thus,
there is no uniformity in the selection of projects over the different modes. In carrying out our
experiments, we have chosen 53 projects data out of the 63, which have their lines of code (size)
to be less than 100KDSI.
The accuracy of Basic COCOMO is limited because it does not consider the factors like
hardware, personnel, use of modern tools and other attributes that affect the project cost. Further,
Boehm proposed the Intermediate COCOMO[3,4] that adds accuracy to the Basic COCOMO by
multiplying Cost Drivers into the equation with a new variable: EAF (Effort Adjustment Factor) .

The EAF term is the product of 15 Cost Drivers [5] that are listed in Table II .The multipliers of the
cost drivers are Very Low, Low, Nominal, High, Very High and Extra High.
If the category values of all the 15 cost drivers are Nominal, then EAF is equal to 1.
The 15 cost drivers are broadly classified into 4 categories [15,16].
1. Product : RELY - Required software reliability
DATA - Data base size
CPLX - Product complexity
2. Platform: TIME - Execution time
STORmain storage constraint
VIRTvirtual machine volatility
TURNcomputer turnaround time
3. Personnel: ACAPanalyst capability
AEXPapplications experience
PCAPprogrammer capability
VEXPvirtual machine experience
LEXPlanguage experience
4. Project : MODPmodern programming
TOOLuse of software tools
SCEDrequired development schedule
The cost drivers are as given in Table 3.Depending on the projects, multipliers of the cost drivers
will vary and thereby the EAF may be greater than or less than 1, thus affecting the Effort [15].
The Effort is given by Effort=
o
(KLOC )
|
[
=

15
1 i
i
EM







Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 16
Table 1: Intermediate COCOMO Cost Drivers with multipliers





























5. Experimental Study:
For this study we have taken data of 10 projects of NASA [12]. The experimental results for
various models are as shown in Table 3

Table 2: Estimated Efforts in Man Months of Various Models





S. No
Cost
Driver
Symbol
Very
low
Low Nominal High
Very
high
Extra
high
1
RELY
0.75 0.88 1.00 1.15 1.40
2
DATA
0.94 1.00 1.08 1.16
3
CPLX
0.70 0.85 1.00 1.15 1.30 1.65
4
TIME
1.00 1.11 1.30 1.66
5
STOR
1.00 1.06 1.21 1.56
6
VIRT
0.87 1.00 1.15 1.30
7
TURN

0.87 1.00 1.07 1.15





8
ACAP
0.87 1.00 1.07 1.15
9
AEXP
1.29 1.13 1.00 0.91 0.82
10
PCAP 1.42 1.17 1.00 0.86
0.70

11
VEXP
1.21 1.10 1.00 0.90
12
LEXP 1.14 1.07 1.00 0.95
13 MODP 1.24 1.10 1.00 0.91 0.82
14
TOOL
1.24 1.10 1.00 0.91 0.83
15
SCED 1.23 1.08 1.00 1.04 1.10
Size
in
KLOC
Measured
Effort.
Alaa F.
Sheta
G.E [7]
model
Estimate
Alaa F.
Sheta
Model 2
Estimate
Mittal[12]
Model I
Mittal
Model II Model I Model II
2.1 5 8.44042 11.2712 6.357633 4.257633 6.15 4.1304
3.1 7 11.2208 14.45704 8.664902 7.664902 8.393 7.4914
4.2 9 14.01029 19.97637 11.03099 13.88099 10.6849 13.6602
12.5 23.9 31.09857 31.6863 26.25274 24.70274 25.4291 24.1772
46.5 79 81.25767 85.00703 74.60299 77.45299 72.2623 75.9596
54.4 90.8 91.25759 94.97778 84.63819 86.93819 81.8631 85.1229
67.5 98.4 106.7071 107.2547 100.3293 97.67926 97.1814 95.6709
78.6 98.7 119.2705 118.0305 113.238 107.288 109.6851 105.0212
90.2 115.8 131.8988 134.0114 126.334 123.134 122.3703 120.6051
100.8 138.3 143.0604 144.4488 138.001 132.601 132.5814 129.8385
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 17


Figure 1 and 2 show the comparison of estimated effort to measured effort for Model I and Model
II. It is observed that by adding the effect of ME will improve the model prediction quality.

0
20
40
60
80
100
120
140
160
si ze
E
s
t
i
m
a
t
e
Measured Effort
Estimated Effort


Fig 1: Effort from Model I versus measured effort for 10 NASA projects

0
20
40
60
80
100
120
140
160
si ze
E
s
t
i
m
a
t
e
Measured Effort
Estimated Effort

Fig 2: Effort from Model II versus measured effort for 10 NASA projects
0
5
10
15
20
25
si ze
A
b
s
o
l
u
t
e

E
r
r
o
r
Alaa F. Sheta G.E [7]
Alaa F. Sheta
Mittal[12] Model I
Mittal Model II
Model I
Model II



Fig 3 Comparison of Error for different Models for 10 NASA projects
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 18
0
20
40
60
80
100
120
140
160
si ze
E
s
t
i
m
a
t
e
Measured Effort
Estimated Effort


Fig 4: Effort from Model I versus measured effort for 18 NASA projects


0
20
40
60
80
100
120
140
160
si ze
E
s
t
i
m
a
t
e
Measured Effort
Estimated Effort


Fig 5: Effort from Model II versus measured effort for 18 NASA projects

It was also found that adding a bias term similar to the classes of regression models helps to
stabilize the model by reducing the effect of noise in measurements. The efficacy of the models is
tested on NASA projects . A case study based on the COCOMO81 database compares the
proposed model with the Intermediate COCOMO Effort Prediction.
0
200
400
600
800
1000
1200
1400
Proj ect i d
E
s
t
i
m
a
t
e
Measured Effort
Estimated Effort

Fig 6 : COCOMO 81 model Project id versus measured effort

Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 19
0
200
400
600
800
1000
1200
1400
si ze
E
s
t
i
m
a
t
e
Measured Effort
Estimated Effort

Fig 7 : COCOMO 81 model project size versus measured effort
0
50
100
150
200
250
300
350
400
450
A
b
s
o
l
u
t
e

E
r
r
o
r
Absolute error PSO
fuzzy
Absolute error
COCOMO Adjusted

Fig 8 : COCOMO 81 model comparison of Absolute Error

Figure 3 shows a comparison of error in various models with respect to the estimated effort.
Figure 4 to Figure 8 shows the comparison of estimated effort to measured effort for 18 NASA
projects and COCOMO 81 dataset. Comparison of various models on the basis of various
criterions is given in Figure 9 to Figure 16.


Fig 9 Comparison of % VAF for different Models for 10 NASA projects
98.41
98.93
98.5
99.15
98.55
99.1
97.5
98
98.5
99
99.5
100
Alaa F.
Sheta
Model I
Alaa F.
Sheta
Model
II
Mittal
Model I
Mittal
Model
II
Model I Model
II
% VAF
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 20


26.49
44.75
12.17
10.8 10.87 10.69
A
la
a

F
.S
h
e
t
a

M
o
d
e
l
I
A
la
a

F
.S
h
e
t
a

M
o
d
e
l
I
I
M
it
t
a
l

M
o
d
e
l
I
M
i
tt
a
l
M
o
d
e
l
II
M
o
d
e
l
I
M
o
d
e
l
I
I
%

M
A
R
E


Fig 10 Comparison of % Mean Absolute Relative Error for different Models for 10 NASA projects
18.11
24.07
10.44
9
9.83
9.29
0
5
10
15
20
25
30
Alaa F.
Sheta
Model I
Alaa F.
Sheta
Model II
Mittal
Model I
Mittal
Model II
Model I Model II
%

M
M
R
E

Fig 11 Comparison of % Mean Magnitude of Relative Error for different Models for 10 NASA
projects


14.73
14.98
8.65
5.2
9.67
6.27
0
2
4
6
8
10
12
14
16
Alaa F.
Sheta
Model I
Alaa F.
Sheta
Model II
Mittal
Model I
Mittal
Model II
Model I Model II
%
M
d
M
R
E

Fig 12 Comparison of % Median of Magnitude of Relative Error for different Models for 10 NASA
projects

Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 21
96.3138
97.5648
96.2157
97.6703
96.3779
98.3906
94
95
96
97
98
99
100
A
l
a
a

F
.S
h
e
t
a

M
o
d
e
l
I
A
l
a
a

F
.
S
h
e
t
a

M
o
d
e
l
I
I
M
i
t
t
a
l

M
o
d
e
l

I
M
i
t
t
a
l

M
o
d
e
l

I
I
M
o
d
e
l
I
M
o
d
e
l
I
I
%

V
A
F

Fig 13 Comparison of % VAF for different Models for 18 NASA projects

56.1
63.64
38.22
23.73
35.77
19.45
0
10
20
30
40
50
60
70
Alaa F.
Sheta
Model I
Alaa F.
Sheta
Model II
Mittal
Model I
Mittal
Model II
Model I Model II
%

M
A
R
E

Fig 14 Comparison of % Mean Absolute Relative Error for different Models for 18 NASA projects

85.45
32.25
63.86
17.22
22.46
29.65
0
10
20
30
40
50
60
70
80
90
Alaa F.
Sheta
Model I
Alaa F.
Sheta
Model II
Mittal
Model I
Mittal
Model II
Model I Model II
%

M
M
R
E

Fig 15 Comparison of % Mean Magnitude of Relative Error for different Models for 18 NASA
projects
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 22


35.19
32.89
24.14
16.81
21.37
12.23
0
5
10
15
20
25
30
35
40
Alaa F.
Sheta
Model I
Alaa F.
Sheta
Model II
Mittal
Model I
Mittal
Model II
Model I Model II
%

M
d
M
R
E


Fig 16 Comparison of % Median of Magnitude of Relative Error for different Models for 18 NASA
projects

A first criterion for comparison is Variance-Accounted-For (VAF).The VAF is calculated as:
%VAF=[1 - var (Measured Effort Estimated Effort)/ var (Measured Effort)] 100 (7)
The second criteria is Mean Absolute Relative error (MARE) is calculated as
%MARE=mean(abs(Measured Effort Estimated Effort)/ (Measured Effort )) 100 (8)
%MMRE Mean Magnitude of Relative Error (MMRE) values. It should be less than 25% to be
acceptable.
100 MRE
n
1
MMRE %
n
1 i
i
=
=


Where MRE(Magnitude of Relative Error )=abs(Measured Effort Estimated Effort)/ (Measured
Effort)] 100
% MdMRE is Median of MRE values. It should be less than 25% to be acceptable.
% MdMRE for COCOMO 81 dataset is 17.02% and % MMRE for COCOMO 81 dataset is 21.15%
It is observed that the proposed models have higher % VAF, lower % MARE ,lower %
MMRE and lower % MdMRE as compared to previous methods in literature. A model
which gives higher VAF, lower Mean absolute Relative Error would be the best model.
Hence it is obvious that the proposed models give better estimates.

6. Conclusions:
In the present paper two Fuzzy software cost estimation models based on weighed average
defuzzification are considered. The weights of the models are fine tuned using Particle Swarm
Optimization Algorithm. The analysis based on VAF, Mean Absolute Relative Error, Mean
Magnitude of Relative Error and Median Magnitude of Relative Error show that PSOA always
leads to a satisfactory result. The obtained results are superior as compared to previously
reported work in the literature

7. References:
[1] Hodgkinson, A.C. and Garratt, P.W.,A Neuro-Fuzzy Cost Estimator, In (Eds.) Proc. of the 3rd
International Conference on Software Engineering and Applications SAE , 1999 pp.401-406.
[2] Boehm B. W., Software Engineering Economics, Englewood Cliffs, NJ , Prentice-Hall,1981.
[3]. B. W. Boehm et al., Software Cost Estimation with COCOMO II, Prentice Hall, (2000.)
Prasad Reddy.P.V.G.D
International J ournal of Software Engineering (IJ SE), Volume (1): Issue (2) 23
[4] L. C. Briand, T. Langley, and I. Wieczorek, A replicated assessment and comparison of
common software cost modeling techniques, In Proceedings of the 2000 International Conference
on Software Engineering, Limerick, Ireland, 2000, pp.377-386.
[5] Schofield C. , Non-Algorithmic Effort Estimation Techniques, Technical Reports, Department
of Computing, Bournemouth University, England, TR98-01 (1998)
[6] Suresh Chandra Satapathy, J .V.R. Murthy, P.V.G.D. Prasad Reddy, B.B. Misra, P.K.
Dash and G. Panda, Particle swarm optimized multiple regression linear model for data
classification Applied Soft Computing , 9, ( 2), (2009), Pages 470-476
[7] Alaa F. Sheta, Estimation of the COCOMO Model Parameters Using Genetic Algorithms for
NASA Software Projects , J ournal of Computer Science 2 (2)(2006) 118-123
[8]Bailey, J .W. and Basili, A Meta model for software development resource expenditure. In: Proc.
Intl. Conf. Software Engineering, (1981)107-115
[9] Putnam, L. H.,A General Empirical Solution to the Macro Software Sizing and Estimating
Problem, IEEE Transactions on Software Engineering, 4(4) (1978). 345 361
[10] E. C. Laskari, K. E. Parsopoulos and M.N. Vrahatis, Particle Swarm Optimization for Minimax
Problems , Evolutionary Computation, In: (Eds.) CEC '02 Proceedings of the 2002 Congress
On, 2, 2002, pp. 1576 -158.
[11] J .E. Matson, B.E. Barrett, J .M. Mellichamp, Software Development Cost Estimation Using
Function Points, IEEE Trans. on Software Engineering, 20(4) (1994) 275-287.
[12] Harish Mittal and Pradeep Bhatia Optimization Criteria for Effort Estimation using Fuzzy
Technique CLEI ELECTRONIC J OURNAL, 10(1) ( 2007) pp1-11
[13] L.A. Zadeh, From Computing with numbers to computing with words-from manipulation of
measurements to manipulation of perceptions, Int. J . Appl. Math. Comut.Sci, 12(3) (2002) 307-
324.
[14] L.A. Zadeh, , Fuzzy Sets, Information and Control, 8, (1965) 338-353.
[15] Kirti Seth, Arun Sharma & Ashish Seth, Component Selection Efforts Estimation a Fuzzy
Logic Based Approach, IJ CSS-83, Vol (3), Issue (3).
[16] Zhiwei Xu, Taghi M. Khoshgoftaar, Identification of fuzzy models of software cost
estimation, Fuzzy Sets and Systems 145 (2004) 141163

You might also like