Testing Is Overrated

Download as pdf
Download as pdf
You are on page 1of 38

?

Who is this guy?

I didn’t re-implement Ruby in Erlang, or write a web server in assembly that’s 10x faster than
Apache, or start a successful company. At the end of the day, I’m just a guy who makes web
sites.

Jan Tik: http://flickr.com/photos/jantik/6708183/


Testing is overrated

Luke Francl

So I had to pick something controversial.

Don’t get me wrong, testing is great. Never forget the first time I saved myself from
committing buggy code with my own unit test. And once written, programmatic tests provide
a nice regression framework that helps catch future errors and makes refactoring possible.

But I think it’s overemphasized to the detriment of other defect-detection techniques.


fuzz
story runner fixtures green bar

RSpec object mother


miniunit Shoulda
unit tests
stub Mocha
Watir
mock
random behaviors
rcov Test::Unit
BDD TDD
Selenium test-along

test-first
coverage
autotest test cases
We as developers hear, read, and write a lot about testing.

Why so much?

I think it’s because it’s something we, as programmers, can control.


We usually can’t hire QA testers. It may be a struggle to institute code review in our company.
We may not have the authority to set up usability tests.

But we can write code! And so we play to our strength -- coding -- and try to code our way
out of buggy software.
All you need is tests

In the worst case, this leads to a mindset that developer testing is all you need, and if we can
only get to 100% code coverage, we’ll be bug free. You’ve got people having Rcov length
contests.

I read a blog entry just last week by a guy who was suggesting the “End of Bugs” due to
behavior driven development and 100% rcov code coverage.

(I didn’t mention his name in my talk, but this was Adam Wiggins from Heroku: http://
adam.blog.heroku.com/past/2008/7/6/the_end_of_bugs/ I didn’t know he’d be at
RubyFringe, but he came up to me later and was like “Hi, I’m Adam. You called me an idiot.”
Sorry Adam! Seriously, he was really nice about it. We had a good talk about testing.)
Extensive research

So I’ve been doing extensive research about the benefits of developer testing...
- Code Complete 2nd, Steve McConnell
- Facts and Fallacies of Software Engineering, Robert L. Glass

And I’ve come to the conclusion that there are some significant weaknesses of developer
testing.

audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
testing is hard
Testing is hard, and most developers aren’t very good at it.

The reason is that most developers tend to write “clean” tests that verify the normal path of
program execution, instead of “dirty” tests that verify error states or boundry conditions
(which is where most errors lie).

McConnell reports: Immature: 5 clean for every 1 dirty. Mature testing org: 5 dirty for 1 clean.
Not less clean tests -- 25x more dirty tests!

aussiegall: http://flickr.com/photos/aussiegall/2238073479/
total_withholdings = 0

employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

1
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

16
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
total_withholdings = 0

17
16
employees.each do |employee|

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT


government_retirement = compute_government_retirement(employee)
end

company_retirement = 0

if employee.wants_retirement && eligible_for_retirement(employee)


company_retirement = get_retirement(employee)
end

gross_pay = compute_gross_pay(employee)

personal_retirement = 0

if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end

withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement

pay_employee(employee, net_pay)

total_withholdings = total_withholdings + withholding


total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.

How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.

Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
Code Coverage

code coverage

Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.

Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
Code Coverage

code coverage

Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.

Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
Code Coverage

code coverage

Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.

Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
def test_last_day_items_are_privacy_scoped_for_non_friends
non_friend = create_user

story = stories(:learning_no)
story.published_at = 10.minutes.ago
story.save!

story = stories(:aaron_private_story)
story.published_at = 5.minutes.ago
story.save!

items_for_non_friend = accounts(:quentin_and_aaron).last_day_items

assert_privacy_status(items_for_non_friend, "Public")
end

You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they
sneak into production.

Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really
wanted: customer punches in how many users they want, for how many months, and is then
billed all at once.

Fortunately they are cheap to fix if caught in production. Iterative development.


You can’t test code that’s
not there

You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they
sneak into production.

Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really
wanted: customer punches in how many users they want, for how many months, and is then
billed all at once.

Fortunately they are cheap to fix if caught in production. Iterative development.


Tests have bugs

Tests are code, code has bugs. Tests are just as likely to have bugs as the code they’re
testing.

jpctalbot: http://flickr.com/photos/laserstars/640499324/
def test_critical_functionality
begin
... Bunch of stuff to exercise code ...

# Commented out by Luke to fix test failure

# assert "Some important assert", condition

rescue

# Don't let anything fail this test!


end
end

Sweet! 100% test coverage!

So who tests the tests? I don’t think there’s a way to do this automatically. You need to review
them by hand.

Adapted from: http://thedailywtf.com/Comments/AddComment.aspx?


ArticleId=5128&ReplyTo=138758&Quote=Y
Developer testing
isn’t very good
at finding defects

Flowizm: http://flickr.com/photos/flowizm/178152601/
Defect Detection Rates of Selected Techniques

Unit testing

Code reviews

Code inspections

Prototyping

System test

0% 25% 50% 75% 100%

Defection detection rates from Code Complete. Full table is in your handout.
Unit test: 15-50%
Informal code reviews: 20-35%
Formal code inspections: 45-70%
Modeling/prototyping: 35-80%
System test (black box): 25-55%

Note: First of all, unit testing isn’t all that great at finding defects. Formal code inspections
can catch up to 70% of the defects. Note also the strength of prototyping, with up to 80%. I
think this is what makes iterative development such a big win.
Manual testing

Code reviews

Unit tests

User testing

* Set overlap completely fabricated

The interesting thing is that different defect detection techniques tend to find different types
of defects.
Complements to developer
testing

GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Manual testing

And of course, there is manual QA. A good QA person is worth their weight in gold. I once
worked with a guy who was an absolute machine at finding bugs, and he was really good at
explaining how they happened and creating bug reports.

You always end up doing some amount of manual testing. It makes sense to have testers to
do this instead of making programmers do it.

Story: how we do manual testing: QA person responsible for verifying fixes; also does
exploratory, blackbox tests.

Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/


So if developers aren’t very good at testing, what are they good at? Criticizing other people’s
code.

http://www.osnews.com/story/19266/WTFs_m

Informal “Code reviews” can find between 20-35% of all defects. Formal “code inspections”
between 45-70%. The difference between formal and informal code reviews.
code review kitty is not pleased with your code

Sociological aspect to code reviews. Tell story of my first code review.

Reviewee’s ego as well as code is on the line.

http://flickr.com/photos/louse101/454412441/in/set-72157600062650522/
Growing better developers
Aside: can code reviews help us become better developers?

Skeptical of methodology. 10x developers will be successful no matter what methodology


they use.

So, can code reviews help us become better programmers?

- reading code is the best way to learn.


- constructive criticism from better programmers

As a programmer who’s not young enough to know everything any more, I am hopeful.
Usability
testing

I have been blown away by the problems we have found using usability testing.
The ultimate

You can have 150% code coverage and thousands of unit tests. Not one of them will tell you if
your application sucks.

Jeff Atwood calls usability problems The Ultimate Unit Test Failure.

hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
From Don’t Make Me Think
by Steve Krug

You may thing usability testing involves expensive labs with two-way mirrors and cameras
everywhere. But usability testing testing doesn’t have to be expensive! It’s fun and cheap with
Steve Krug’s techniques.

We use $20 screen recording software and a USB microphone and pay participants about $50.
Don’t put all your
eggs in one basket...

I’m not saying don’t write tests. I’m saying, don’t put all your eggs in one basket.

Andrew Dowsett <http://flickr.com/photos/andrew_dowsett/510812719/>


...or you’ll end up as
roadkill
...or you’ll end up as roadkill.
Thanks!
(You can yell at me over drinks.)
Jan Tik audreyjm529 aussiegall GAV01

jpctalbot Flowizm Stuck in Customs hans.gerwitz

Andrew Dowsett
Jan Tik: http://flickr.com/photos/jantik/6708183/
audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
jpctalbot: http://flickr.com/photos/laserstars/640499324/
Flowizm: http://flickr.com/photos/flowizm/178152601/
GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/
hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
Andrew Dowsett <http://flickr.com/photos/andrew_dowsett/510812719/>

You might also like