Short Software Engineering
Short Software Engineering
Short Software Engineering
Much wisdom we can learn from Silicon Valley Much technology we can exploit About increasing your productivity About reproducible results (scientic method, getting sued)
Good Code
Easy to maintain Easy to extend Easy to understand ... even after a six month break! Straight-forward and direct ... no side-eects or surprises! Reads like English (or some other human language)
Some Questions
What will this code be used for? How often will it be used? How might it evolve? How can I isolate myself from possible changes, such as using a dierent solver? What part of this code is generic and what part problem-specic? i.e,
Roadmap
Tactical Programming Designing Better Software Debugging and Optimization Software Development Tools
Easier to read Easier to detect bugs Easier to understand Easier to extend i.e., to minimize the costs of working with your code
A good coding convention makes your code read like a good story and makes your intent clear:
Naming of functions, variables, and lenames Grouping and layout of code such as braces Modication history Comments Respect the local coding convention when working on code
Create horizontal lines of -, =, etc. to indicate higher-level groupings Just like books are organized into chapters, sections, subsections, etc. Use vertical space (blank lines) to set o lower-level chunks of code Put space around operators =, +, -, *, / and inside of {}, (), and [] Choose a sensible indentation scheme, such as two spaces Beware of tabs ...
Separate multiple words with CamelCase or _ Function names should start or end with a verb: CalcMarketShares() Encode type information into variable names: oat, int, matrix, vector, etc. One variable denition per line + a comment Start indexes with ix: ixStart, ixStop One p for each level of pointer indirection
Bad Names: p, x, y, n, i, j, k, l, jfunc1 Good Names: dwPriceFood, dwExcessDemand, dwIncome, nGoods, vProb, IntegrateMarketShares(), IsValid(), ix, jx, kx, pHHData
Braces
There are two main styles for braces: 1TBS/K+R/etc. if( IsBadState() ) { fixProblem() ; } Allman/GNU/etc. if( IsBadState() ) { fixProblem() ; }
Write Comments
Comments are important:
History of changes Why you did something, not what you did Explain anything tricky you wont remember why you did something next month... Use comments and white space to convey logical structure of code on small, medium, and large scales Start any le with a short one line comment explaining purpose of module Document function interfaces and any quirks
Are you writing code with cut and paste? abstract it into a function ... Use constants whenever possible:
Dene all numbers and constants in only one place Dene indexes (with good names) for dierent columns or rows in a matrix Make arguments const when only used for input No hard-coded numbers!!! macros templates
When you have to make changes, it is easier if you only have to modify it in one place!
Order of Operations
Only use order of operations for +, -, /, * For everything else, use parentheses! Avoid clever tricks and side-eects
MATLAB Tricks
Here are a couple tricks to improve your MATLAB code:
Group a logically-related block of code Rerun the cell with CTRL + RETURN
Handle errors with keyboard Store column indexes in a structure: Index.Price, Index.Income, ... Wrap related variables into a structure: ChoiceData.X = mCovariates ; ChoiceData.Y = vChoices ; ChoiceData.nObs = length( vChoices ) ;
Planning ahead for maintenance (one of the biggest costs of most projects) and future extensions Writing testable code Choosing good abstractions Designing good interfaces
Questions to ponder:
Where will my code run? What technologies does it depend on? How is it likely to change? How will it be used? How often will it be used?
Trade-os
Speed vs. robustness Speed vs. memory usage Speed vs. maintainability (e.g. fast code may require unreadable optimizations) Development time vs. code quality (performance, maintainability, reusability) Quality vs. frequency of use
Interfaces
An interface is a contract:
Clear and easy to remember Use the same interface for similar objects/operations Promotes loose coupling and reuse Minimizes maintenance headaches by isolating implementation from interface Publish the interface in a header le:
Separate from the implementation le Protect with include guards if using C preprocessor May need second header le for private information
Put only the minimum amount of information in the public name space Make everything else private or static Prevents unintentional access Now changing implementation details wont break other code Encapsulate state information in a struct, not a global if possible Avoid global variables!!! They often lead to race conditions...
Reusable Code
Collect general tools and components into a common library Reuse for faster development of other projects Decrease bugs through use of production code
Corollary: reuse (high quality) existing software libraries and components dont reinvent the wheel
Defensive Programming I
Modularize functionality E.g., access shared resources or special facilities only through one library: splineLib, splineCreate, splineEval, splineDelete, ... If a bug occurs then it is:
1. In the library 2. Use of the library
Defensive Programming II
Third party software: MPI, solvers, libraries Platform-specic technologies: OS-specic APIs Buggy code by co-workers (software condom)
I.e., write a thin layer between your code and volatile resources
Unit tests are short pieces of code which exercise all (or the key) paths through a function
The sooner you nd a bug, the cheaper/easier it is to x Immediately program to an interface to verify design decisions Catch bugs caused by other changes to system
Many popular unit test frame works are available: junit, cunit, boost::test, etc. Interpreted languages provide a similar productivity boost by letting you test code interactively as you develop it. TDD is a philosophy for software development Refactor code which is unwieldy
Debugging
Learn to use the debugger Dont sprinkle your code with printf, WRITE, etc.:
Obscures code readability I/O slows code considerably Message logging to les Print messages to screen in debug version only
Debugging
Use the C : to facilitate debugging (even in FORTRAN): #ifdef USE_DIAG #define DIAG_PRINT #else #define DIAG_PRINT #endif PRINT *, !
Optimization
Vectorization
Write loops which support vectorization (unrolling):
Use:
Straight-line code Vector (array) data only Local variables Assignment statements only Pre-dened (constant) exit condition Function calls Non-mathematical operations (which are dicult to vectorize) Mixing vectorizable types Memory access patterns which prevent vectorization i.e. where one statement access future and/or previous array elements
Avoid:
Make
Checks dependencies Builds only what is necessary Allows abstraction of build process:
Promotes portability
Editor and OS
Choose your editor with more care than you would your spouse because you will spend more time with your editor, even after the spouse is gone. Harry J. Paarsch
Learn to use a good programming editor: Vi, Emacs, jEdit, Notepad++, Eclipse, etc. Will increase your productivity
Same applies to your OS get some Unix in your life! etags, cscope, ctree, etc. make it easy to explore code Eclipse, MS Visual Studio have powerful tools as well
Version Control
Manages every version of your code Supports distributed software development Supports multiple developers Keeps everything synchronized Automatically merges dierent changes to the same code Common examples: SVN, hg, git, ClearCase, Perforce, ... Much better than DropBox...
Create a Repository
The rst step is to create a repository to store all versions of your A source code (C, FORTRAN, MATLAB, R, LTEX, etc.):
Should be accessible from all computers which will access your code
A machine you can access via SSH which is running SVN A commercial repository hosting service (sometimes free) such as www.ProjectLocker.com or github.com
Example (WARNING: always use fsfs): ssh [email protected] mkdir SVN mkdir SVN/ThesisCode svnadmin create /home/joe/SVN/ThesisCode --fs-type fsfs ls -F SVN/ThesisCode README.txt conf/ dav/ db/ format hooks/ locks/
Getting Started
SVN provides two commands:
svnadmin: to create and administer repositories svn: to perform version control operations (checkout, commit, di, etc.) Execute a command with help, e.g.: svn --help svn commit --help
Your code is now under version control Run this on the machine which hosts your code
cd mkdir sbox cd sbox svn checkout \ svn+ssh://[email protected]/home/joe/SVN/Thesis Note: svn+ssh is just one example of the type of URLs supported by SVN to refer to a location.
Get to Work
In the course of your work you will use the following commands:
svn commit svn update svn add [ le | directory ] svn mkdir dir svn rm [-fr] FileOrDir svn diff -r PREV BasicDriver.c svn log BasicDriver.c