Software Testing - Past, Present, and Future

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

Software Testing

Past, Present, and Future


Jeffrey Voas
Chief Scientist
21515 Ridgetop Circle, Suite
250
Sterling, VA 20166
(703) 404-9293
[email protected]
http://www.rstcorp.com

Outline
The Past 30 Years of Seeking Quality Software
Miscellaneous testing facts
What is software testing?
Why is testing problematic?
Where do we go from here?

Reality Check
The amount of software in a typical device doubles
every 18 months [Reme Bourguignon, VP of
Philips in Holland (and numerous Airbus
officials)]
Our defect densities are about the same today as
they were at any other time in the past 20 years
0.5-2.0 software failures/KSLOC
Over the spectrum from game software to mainframe software to
safety-critical code

Q: Are we better off today with all of the recent


"advances" in software engineering theory and
practice than we were without them?

Our 30-Year "Quality-Seeking" Adventure

Seminal ideas from the past

1. Process Improvement/Maturity
Clean Pipes, Dirty Water?

[J. Voas, "Can Clean Pipes Produce Dirty Water?", IEEE Software, July, 1997]

2. Formal Methods
Effective, but not the silver bullet
3:1 reduction in defect density in the CAA study [Pfleeger
and Hatton, IEEE Software, 1997]
London Air Traffic Control Centre, 200,000 lines of C
"We found no compelling quantitative evidence that formal design
techniques alone produced code of higher quality than informal
design techniques."

Recommendation 1: FM's only work in conjunction with


other approaches
Recommendation 2: Some algorithms MUST have them.
Find which ones

Cleanroom
Scalability

3. Languages and OO Design


Recent Studies (Tichy, Hatton, Humphrey, Shepherd and
Cartwright)
C++: 25% more defects than C or Pascal
A OO C++ defect takes 2-3 times as long to debug regardless of defect complexity
Inheritance engenders 6 times more defects

We did a NIST-sponsored study that showed the horrific


"testability" problems of OOD
Java: multi-threading and inheritance. Nearly impossible to write
correctly. And worse to test!
Al Davis predicts 2020 is when we quit creating languages like ice
cream flavors [IEEE Software, July 1998]
Language definition manuals: one order of magnitude greater
Java JDK 1.2 is twice as large as JDK 1.0.2

4. Code Measurement
Myth: Complexity correlates to testing costs: True
(but only for certain types of testing)
Myth: Complex software is bad software: False
Studies show that medium complexity modules are more
reliable than smaller ones. [L. Hatton, "Reexamining the
Fault Density-Component Size Connection", IEEE
Software, March, 1997]

Metrics: Over 1000 [Horst Zuse]


What metrics correlate with what other metrics
and what do their results mean?
Single collection to multivariate collection
Discriminant analysis; Principle Component Analysis

5. Software Standards
Often very vague

Prove that your software only does "good things"


Full of disclaimers
Profitable to org. that commissions them
Used if mandated
ROI is questionable
Thwart creativity
"Protectionist" legislation

2167A: 400 English words per Ada code statement [C.


Jones]
time-to-market? DOA in industry

Take 8-15 years to complete (on average)


"Old news" before being ratified

So Where Have We Gotten To?.

[C. Jones, Patterns of Software Systems Failure and Success, 1996]

The Testing Process

Why Do We Test?
1. Assess Reliability
To test to 10-9 pof requires about 4.6 billion tests
Hardware reliability models are not well suited for
software!
No mass production
No physical decay
Failures may not be independent

Predictability is much lower for digital systems than for


analog systems

2. "Test-out" (detect) Code Faults

The Great Myth: "Test-out" the Bugs

[C. Jones, Patterns of Software Systems Failure and Success, 1996]

When Do We Test Software?

Did You Know That:


1. Testing/debugging can worsen reliability
Each correction had a 15% chance of creating a
problem as large as the problem it was supposedly
fixing [Ed Adams, IBM, 1984]

2. We often chase the "wrong bugs"


33% of all faults failed less than once in every 5000
execution years
2% of all faults caused the common failures (more than
once every 5 execution years)

3. Testing cannot show the absence of faults,


but can show the existence of faults

Did You Know That:


4. The cost to develop software is directly proportional
to the cost of testing
Testing accounts for 50% of all pre-release costs.
Testing accounts for 70% of all post-release costs.
Post-release costs are often more than pre-release.

5. 25% of the total cost ($5B) to design Boeing's 777 is


rumored to have gone into performing Modified
Condition/Decision Coverage (an advanced form of
white-box unit testing). Note that this is just one of
dozens of different forms of testing required by the
FAA for "on-the-plane" software. Supposedly there
are 8-10M SLOC on the 777.

Did You Know That:


6. Y2K testing costs:
$600B to fix Y2K globally [Computer News Daily.
"Year 2000 Prophet Preaches $600 Billion Digital Fix",
Oct 1997]
60% of ALL Y2K costs will go to testing [TechBeamers. "White Paper: Year 2000 Focuses on
Testing", May 10, 1996] 70% for some projects [R.L.
Scheier. "Year 2000: Testing Can't Wait"
Computerworld, Oct. 1997]
Rumors have it that 60-80% of all Y2K fixes will not
get tested

Did You Know That:


7. Software testing consultants cost!
Junior-level test engineer: $80-$100/hour (0-1 years of
experience)
Medium-level test engineer: $100-$150/hour
Senior-level test engineer: $150-$300/hour
If testing for security, these figures will go even higher.
Off-shore testers can cut these figures in half.
Question: Do you want to send your source code
abroad?

"Testing is another area where I have to say I'm a little bit


disappointed in the lack of progress. At Microsoft, in a
typical development group, there are many more testers
than there are engineers writing code. Yet engineers spend
well over a third of their time doing testing type work. You
could say that we spend more time testing than we do
writing code. And if you go back through the history of
large-scale systems, that's the way they've been. But, you
know, what kind of new techniques are there in terms of
analyzing where those things come from and having
constructs that do automatic testing? Very, very little. So if
you know a researcher out there who wants to work on that
problem, boy, we'd love to put a group together."
[Massachusetts Institute of Technology Distinguished Lecture Series 1996
Bill Gates Keynote Address, Wednesday May 30th, 1996]

Did You Know That:


8. The most commonly applied software testing
techniques used today (coverage and black-box) were
developed in the 1960s and 1970s

Then, 1-10K SLOC procedural systems


Today, 100M SLOC
"Guinea pig" syndrome
If your dentist told you he still drilled "the ole fashioned way he
learned 40 years ago," would you run?

9. Most oracles are humans. Can you say "error prone?"


10. 70% of safety-critical code can be exceptions. It is the
last code written, and rarely tested to any degree of
thoroughness.

Did You Know That:


11. From RST's commercial clients, approximately:
0% use formal methods
10% collect code metrics or process metrics
50% claim to be attempting to follow international software
engineering standards (6-sigma, ISO, SEI > 1)
0% are > SEI Level 1
70% use object-oriented design
90% use C++ or Java
90% unit test
100% system test
100% hire 3rd party testing organizations

Is Testing The Silver Bullet?

Testing Problem #1: Time

Problem #2: Faults Hide From Tests

Testing can be viewed as selecting different colored balls


from an urn where:
Black ball = input on which the software fails.
White ball = input on which the software succeeds.

Only when testing is exhaustive is there an empty urn.

Software that Always Fails

This urn represents a software system


that fails on every possible input

Correct Software

This urn represents a software system that succeeds


on every possible input

Typical Software

This urn represents virtually all software in use


today

Fault Density (Size)


Fault density is the number of inputs that cause
failure for a specific fault.
number of black balls that are hooked together in a chain

The following urn represents five faults, each


with density of one.

Fault Density
This urn has five inputs that cause failure
One fault in the program is responsible for this failure
Thus, this fault is of density five
Goal of DFT: Big Bugs!
(Speculation: Formal methods decrease fault densities)

Design-for-Testability (DFT)
To change the shape of the
urn such that all black balls
(if any) are quickly selected
during sampling

Problem #3: Test Management/Costs


1. Testing problems are not only technical
issues
2. Practical issues: test plans, estimating
test costs, hiring qualified test personnel,
test stoppage criteria
Microsoft: 1 to 1.1 now
Microsoft: test code for Windows-NT was 25%
as large as is all of the code in Windows-NT
Mutual fund project: 50% as large

Problem #3: Test Management/Costs


3. Other issues that cannot be
overlooked:
Test data generation: complex structures?
Test driver (harness) generation: Must have for
object-oriented
Oracles: huge defect rates, particularly for
applications like simulations
Test tool market: Rumored to be $1B/year
currently and increasing by 20-30% per year
Embedded?

Problem #4: Training Personnel


Al Davis's words in his farewell letter as Editor-inChief of IEEE Software [July 1998]
"Too much of the core of many computer science degrees
hinges upon achieving efficiency of computing resources
(time, memory, and so on) that are no longer in short supply,
and on building things that most practitioners no longer
build. Too little time is devoted to producing reliable,
maintainable, quality software."

Never hired a single undergraduate CS major that


knew anything about software testing or had studied
it. Similar results for MS-CS candidates [RST HR
data]
Summer Intern Program

Problem #5: What Techniques To Use?


Testing experts rarely agree!
(e.g., black-box vs. white-box or system-level vs. unit)

Marginal evidence correlating testing methods


and the types of errors they detect
Other than the subsumes hierarchy, most
evidence is anecdotal that relates defects found
per technique to costs of that technique
Reason: Taboo for organizations to reveal real
error/fault and test-effectiveness data. ("Dirty
laundry" syndrome)
NIST: Error, Fault, Failure Database

Problem #6: Books and Education


Very few books that are practical; most
are outdated, except that they do teach
the fundamentals and theory behind
different types of testing
Probably no more than 100 universities
that offer grad-level classes in software
testing (two per state).

Solution: Just Do More Testing, Right?


Wrong! (The Sun is going to burn out)
We actually over-test many systems. Our testing
is simply too ad hoc and brute-force. This is
equivalent to repeatedly pulling the same ball
from the urn.
Testing must be applied intelligently and must
be goal-based

Today.

"No" technologies for sufficiently testing large-scale software


"No" technologies for sufficiently testing distributed software
"No" technologies for sufficiently testing real-time software
40% COTS: Component-based software engineering is impossible
without testing [J. Voas, IEEE Computer, June 1998]
Reuse requires testing! (Ariane 5)
OO languages and systems are untestable at the system level
Huge security and national IT infrastructure problems that are seeking
sound solutions
A lack of diversity in commercial offerings. We are all vulnerable to the
same attacks
Testers are perceived as "street sweepers" (developer want-to-be), and
even so, are hard to find
Legislation that favors software vendors
70% of all development organizations are SEI-1 [C. Jones]

The Gap Grows

Summary
All Gloom and Doom? No!
Testing is back! The key testing conferences are
so well attended that they are splitting into
biyearly forums. Practitioners are more
interested than ever.
Testing, in its many forms, is still the most
widely applied technique for certifying quality
Practical and theoretical weaknesses exist, even
for trivial software systems
Now is the time to invest in research, training,
and automated tools! We've lost 15 years!

You might also like