Testing Is Overrated
Testing Is Overrated
Testing Is Overrated
I didn’t re-implement Ruby in Erlang, or write a web server in assembly that’s 10x faster than
Apache, or start a successful company. At the end of the day, I’m just a guy who makes web
sites.
Luke Francl
Don’t get me wrong, testing is great. Never forget the first time I saved myself from
committing buggy code with my own unit test. And once written, programmatic tests provide
a nice regression framework that helps catch future errors and makes refactoring possible.
test-first
coverage
autotest test cases
We as developers hear, read, and write a lot about testing.
Why so much?
But we can write code! And so we play to our strength -- coding -- and try to code our way
out of buggy software.
All you need is tests
In the worst case, this leads to a mindset that developer testing is all you need, and if we can
only get to 100% code coverage, we’ll be bug free. You’ve got people having Rcov length
contests.
I read a blog entry just last week by a guy who was suggesting the “End of Bugs” due to
behavior driven development and 100% rcov code coverage.
(I didn’t mention his name in my talk, but this was Adam Wiggins from Heroku: http://
adam.blog.heroku.com/past/2008/7/6/the_end_of_bugs/ I didn’t know he’d be at
RubyFringe, but he came up to me later and was like “Hi, I’m Adam. You called me an idiot.”
Sorry Adam! Seriously, he was really nice about it. We had a good talk about testing.)
Extensive research
So I’ve been doing extensive research about the benefits of developer testing...
- Code Complete 2nd, Steve McConnell
- Facts and Fallacies of Software Engineering, Robert L. Glass
And I’ve come to the conclusion that there are some significant weaknesses of developer
testing.
audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
testing is hard
Testing is hard, and most developers aren’t very good at it.
The reason is that most developers tend to write “clean” tests that verify the normal path of
program execution, instead of “dirty” tests that verify error states or boundry conditions
(which is where most errors lie).
McConnell reports: Immature: 5 clean for every 1 dirty. Mature testing org: 5 dirty for 1 clean.
Not less clean tests -- 25x more dirty tests!
aussiegall: http://flickr.com/photos/aussiegall/2238073479/
total_withholdings = 0
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
1
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
16
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0
17
16
employees.each do |employee|
company_retirement = 0
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.
Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.
Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.
Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
def test_last_day_items_are_privacy_scoped_for_non_friends
non_friend = create_user
story = stories(:learning_no)
story.published_at = 10.minutes.ago
story.save!
story = stories(:aaron_private_story)
story.published_at = 5.minutes.ago
story.save!
items_for_non_friend = accounts(:quentin_and_aaron).last_day_items
assert_privacy_status(items_for_non_friend, "Public")
end
You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they
sneak into production.
Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really
wanted: customer punches in how many users they want, for how many months, and is then
billed all at once.
You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they
sneak into production.
Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really
wanted: customer punches in how many users they want, for how many months, and is then
billed all at once.
Tests are code, code has bugs. Tests are just as likely to have bugs as the code they’re
testing.
jpctalbot: http://flickr.com/photos/laserstars/640499324/
def test_critical_functionality
begin
... Bunch of stuff to exercise code ...
rescue
So who tests the tests? I don’t think there’s a way to do this automatically. You need to review
them by hand.
Flowizm: http://flickr.com/photos/flowizm/178152601/
Defect Detection Rates of Selected Techniques
Unit testing
Code reviews
Code inspections
Prototyping
System test
Defection detection rates from Code Complete. Full table is in your handout.
Unit test: 15-50%
Informal code reviews: 20-35%
Formal code inspections: 45-70%
Modeling/prototyping: 35-80%
System test (black box): 25-55%
Note: First of all, unit testing isn’t all that great at finding defects. Formal code inspections
can catch up to 70% of the defects. Note also the strength of prototyping, with up to 80%. I
think this is what makes iterative development such a big win.
Manual testing
Code reviews
Unit tests
User testing
The interesting thing is that different defect detection techniques tend to find different types
of defects.
Complements to developer
testing
GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Manual testing
And of course, there is manual QA. A good QA person is worth their weight in gold. I once
worked with a guy who was an absolute machine at finding bugs, and he was really good at
explaining how they happened and creating bug reports.
You always end up doing some amount of manual testing. It makes sense to have testers to
do this instead of making programmers do it.
Story: how we do manual testing: QA person responsible for verifying fixes; also does
exploratory, blackbox tests.
http://www.osnews.com/story/19266/WTFs_m
Informal “Code reviews” can find between 20-35% of all defects. Formal “code inspections”
between 45-70%. The difference between formal and informal code reviews.
code review kitty is not pleased with your code
http://flickr.com/photos/louse101/454412441/in/set-72157600062650522/
Growing better developers
Aside: can code reviews help us become better developers?
As a programmer who’s not young enough to know everything any more, I am hopeful.
Usability
testing
I have been blown away by the problems we have found using usability testing.
The ultimate
You can have 150% code coverage and thousands of unit tests. Not one of them will tell you if
your application sucks.
Jeff Atwood calls usability problems The Ultimate Unit Test Failure.
hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
From Don’t Make Me Think
by Steve Krug
You may thing usability testing involves expensive labs with two-way mirrors and cameras
everywhere. But usability testing testing doesn’t have to be expensive! It’s fun and cheap with
Steve Krug’s techniques.
We use $20 screen recording software and a USB microphone and pay participants about $50.
Don’t put all your
eggs in one basket...
I’m not saying don’t write tests. I’m saying, don’t put all your eggs in one basket.
Andrew Dowsett
Jan Tik: http://flickr.com/photos/jantik/6708183/
audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
jpctalbot: http://flickr.com/photos/laserstars/640499324/
Flowizm: http://flickr.com/photos/flowizm/178152601/
GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/
hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
Andrew Dowsett <http://flickr.com/photos/andrew_dowsett/510812719/>