Short Software Engineering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Enhance Your Productivity and Software Quality with Techniques from Silicon Valley

Benjamin S. Skrainka University College London Institute for Fiscal Studies


[email protected]

February 23, 2011

The Big Picture

Whether you like it or not you are a software engineer:


Much wisdom we can learn from Silicon Valley Much technology we can exploit About increasing your productivity About reproducible results (scientic method, getting sued)

much of the cost of software is maintenance!

Good Code

Good code is:


Easy to maintain Easy to extend Easy to understand ... even after a six month break! Straight-forward and direct ... no side-eects or surprises! Reads like English (or some other human language)

Some Questions

Before writing a line of code, ask yourself:


What will this code be used for? How often will it be used? How might it evolve? How can I isolate myself from possible changes, such as using a dierent solver? What part of this code is generic and what part problem-specic? i.e,

What can I reuse? What should I abstract into a library?

Roadmap

Tactical Programming Designing Better Software Debugging and Optimization Software Development Tools

Goals of Tactical Programming

Tactics are about structuring your code so that:


Easier to read Easier to detect bugs Easier to understand Easier to extend i.e., to minimize the costs of working with your code

increased productivity for free!!!

Use A Coding Convention

A good coding convention makes your code read like a good story and makes your intent clear:

Naming of functions, variables, and lenames Grouping and layout of code such as braces Modication history Comments Respect the local coding convention when working on code

Choose a convention and stick to it!

Structure Your Code


Group logical chunks of code together:

Separate larger blocks with comments


Create horizontal lines of -, =, etc. to indicate higher-level groupings Just like books are organized into chapters, sections, subsections, etc. Use vertical space (blank lines) to set o lower-level chunks of code Put space around operators =, +, -, *, / and inside of {}, (), and [] Choose a sensible indentation scheme, such as two spaces Beware of tabs ...

Use white space:


Anything longer than 1-2 screen-fulls of code should be a separate function

Choose Good Names


Choose names which describe the role of a function or variable:

Separate multiple words with CamelCase or _ Function names should start or end with a verb: CalcMarketShares() Encode type information into variable names: oat, int, matrix, vector, etc. One variable denition per line + a comment Start indexes with ix: ixStart, ixStop One p for each level of pointer indirection

Bad Names: p, x, y, n, i, j, k, l, jfunc1 Good Names: dwPriceFood, dwExcessDemand, dwIncome, nGoods, vProb, IntegrateMarketShares(), IsValid(), ix, jx, kx, pHHData

Braces
There are two main styles for braces: 1TBS/K+R/etc. if( IsBadState() ) { fixProblem() ; } Allman/GNU/etc. if( IsBadState() ) { fixProblem() ; }

Write Comments
Comments are important:

History of changes Why you did something, not what you did Explain anything tricky you wont remember why you did something next month... Use comments and white space to convey logical structure of code on small, medium, and large scales Start any le with a short one line comment explaining purpose of module Document function interfaces and any quirks

One Place Only


Strive to minimize duplication:

Are you writing code with cut and paste? abstract it into a function ... Use constants whenever possible:

Dene all numbers and constants in only one place Dene indexes (with good names) for dierent columns or rows in a matrix Make arguments const when only used for input No hard-coded numbers!!! macros templates

Automate what you can:


When you have to make changes, it is easier if you only have to modify it in one place!

Order of Operations

Dont abuse order of operations:


Only use order of operations for +, -, /, * For everything else, use parentheses! Avoid clever tricks and side-eects

MATLAB Tricks
Here are a couple tricks to improve your MATLAB code:

Use cells by commenting the start of a section with %%:


Group a logically-related block of code Rerun the cell with CTRL + RETURN

Handle errors with keyboard Store column indexes in a structure: Index.Price, Index.Income, ... Wrap related variables into a structure: ChoiceData.X = mCovariates ; ChoiceData.Y = vChoices ; ChoiceData.nObs = length( vChoices ) ;

How to Design Software

Much of good software design is based on:


Planning ahead for maintenance (one of the biggest costs of most projects) and future extensions Writing testable code Choosing good abstractions Designing good interfaces

What to Worry About

Questions to ponder:

Where will my code run? What technologies does it depend on? How is it likely to change? How will it be used? How often will it be used?

Write a design document!!! You dont have time not to plan...

Trade-os

You need to evaluate many trade-os:


Speed vs. robustness Speed vs. memory usage Speed vs. maintainability (e.g. fast code may require unreadable optimizations) Development time vs. code quality (performance, maintainability, reusability) Quality vs. frequency of use

Interfaces
An interface is a contract:

Clear and easy to remember Use the same interface for similar objects/operations Promotes loose coupling and reuse Minimizes maintenance headaches by isolating implementation from interface Publish the interface in a header le:

Separate from the implementation le Protect with include guards if using C preprocessor May need second header le for private information

Only a few arguments put any more in a struct

Practice Information Hiding


Hiding information and implementation make your code more robust:

Put only the minimum amount of information in the public name space Make everything else private or static Prevents unintentional access Now changing implementation details wont break other code Encapsulate state information in a struct, not a global if possible Avoid global variables!!! They often lead to race conditions...

Reusable Code

Write reusable code:


Collect general tools and components into a common library Reuse for faster development of other projects Decrease bugs through use of production code

Corollary: reuse (high quality) existing software libraries and components dont reinvent the wheel

Defensive Programming I

Write code to facilitate debugging:


Modularize functionality E.g., access shared resources or special facilities only through one library: splineLib, splineCreate, splineEval, splineDelete, ... If a bug occurs then it is:
1. In the library 2. Use of the library

Defensive Programming II

Isolate your code from things which might change:


Third party software: MPI, solvers, libraries Platform-specic technologies: OS-specic APIs Buggy code by co-workers (software condom)

I.e., write a thin layer between your code and volatile resources

Test Driven Development


TDD uses unit tests and a tight write-test-debug cycle to catch bugs early:

Unit tests are short pieces of code which exercise all (or the key) paths through a function

The sooner you nd a bug, the cheaper/easier it is to x Immediately program to an interface to verify design decisions Catch bugs caused by other changes to system

Many popular unit test frame works are available: junit, cunit, boost::test, etc. Interpreted languages provide a similar productivity boost by letting you test code interactively as you develop it. TDD is a philosophy for software development Refactor code which is unwieldy

Debugging

Unfortunately, you will make mistakes:


Learn to use the debugger Dont sprinkle your code with printf, WRITE, etc.:

Obscures code readability I/O slows code considerably Message logging to les Print messages to screen in debug version only

Add diagnostic logging to large applications


Debugging

Use the C : to facilitate debugging (even in FORTRAN): #ifdef USE_DIAG #define DIAG_PRINT #else #define DIAG_PRINT #endif PRINT *, !

Must use correct compiler ags: -fpp -allow no_fppcomments

Optimization

Your intuition about what needs optimization is often wrong:


First, get your code to work correctly Then optimize:


Measure code with a proler Optimize what needs optimizing

MATLAB has a built-in optimizer For gcc, use gperf

Vectorization
Write loops which support vectorization (unrolling):

Use:

Straight-line code Vector (array) data only Local variables Assignment statements only Pre-dened (constant) exit condition Function calls Non-mathematical operations (which are dicult to vectorize) Mixing vectorizable types Memory access patterns which prevent vectorization i.e. where one statement access future and/or previous array elements

Avoid:

Make

Make manages building software:


Checks dependencies Builds only what is necessary Allows abstraction of build process:

Tools Options Platform specic details

Promotes portability

Editor and OS

Invest in your tools:

Choose your editor with more care than you would your spouse because you will spend more time with your editor, even after the spouse is gone. Harry J. Paarsch

Learn to use a good programming editor: Vi, Emacs, jEdit, Notepad++, Eclipse, etc. Will increase your productivity

Same applies to your OS get some Unix in your life! etags, cscope, ctree, etc. make it easy to explore code Eclipse, MS Visual Studio have powerful tools as well

Version Control

Version Control is a safety net for programmers:


Manages every version of your code Supports distributed software development Supports multiple developers Keeps everything synchronized Automatically merges dierent changes to the same code Common examples: SVN, hg, git, ClearCase, Perforce, ... Much better than DropBox...

Unix and Windows clients are available

Create a Repository
The rst step is to create a repository to store all versions of your A source code (C, FORTRAN, MATLAB, R, LTEX, etc.):

Should be accessible from all computers which will access your code

A machine you can access via SSH which is running SVN A commercial repository hosting service (sometimes free) such as www.ProjectLocker.com or github.com

Example (WARNING: always use fsfs): ssh [email protected] mkdir SVN mkdir SVN/ThesisCode svnadmin create /home/joe/SVN/ThesisCode --fs-type fsfs ls -F SVN/ThesisCode README.txt conf/ dav/ db/ format hooks/ locks/

Getting Started
SVN provides two commands:

svnadmin: to create and administer repositories svn: to perform version control operations (checkout, commit, di, etc.) Execute a command with help, e.g.: svn --help svn commit --help

There are several ways to get help:

Use the man command: man svn

Google Red Bean SVN book for details

Can congure in ~/.subversion/config

Import Your Code

If you have existing code, you need to import it:

svn import -m "Descriptive Message About Your Work" \ /local/path/to/ThesisCode \ svn+ssh://[email protected]/home/joe/SVN/Thesis


Your code is now under version control Run this on the machine which hosts your code

Checkout Your Code

To work on your code on a computer, you must rst check it out:

cd mkdir sbox cd sbox svn checkout \ svn+ssh://[email protected]/home/joe/SVN/Thesis Note: svn+ssh is just one example of the type of URLs supported by SVN to refer to a location.

Get to Work

In the course of your work you will use the following commands:

svn commit svn update svn add [ le | directory ] svn mkdir dir svn rm [-fr] FileOrDir svn diff -r PREV BasicDriver.c svn log BasicDriver.c

More advanced operations include branches and tags....

You might also like