Python Style Guide
Python Style Guide
This style guide has been converted to several PEPs (Python Enhancement Proposals): PEP 8 for
the main text, PEP 257 for docstring conventions. See the PEP index.
XXX Intro.
But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't
apply. When in doubt, use your best judgement. Look at other examples and decide what looks
best. And don't hesitate to ask!
Table of Contents
• Lay-out -- how to use tabs, spaces, and newlines.
• Comments -- on proper use of comments (and documentation strings).
• Names -- various naming conventions.
Lay-out
XXX Intro.
Indentation
Use the default of Emacs Python-mode: 4 spaces for one indentation level. For really old code
that you don't want to mess up, you can continue to use 8-space tabs. Emacs Python-mode
auto-detects the prevailing indentation level used in a file and sets its indentation parameters
accordingly.
Tabs or Spaces?
Never mix tabs and spaces. The most popular way of indenting Python is with spaces only. The
second-most popular way is with tabs only. Code indented with a mixture of tabs and spaces
should be converted to using spaces exclusively. (In Emacs, select the whole buffer and hit
ESC-x untabify.) When invoking the python command line interpreter with the -t option, it
issues warnings about code that illegally mixes tabs and spaces. When using -tt these warnings
become errors. These options are highly recommended!
The preferred way of wrapping long lines is by using Python's implied line continuation inside
parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses
around an expression, but sometimes using a backslash looks better. Make sure to indent the
continued line appropriately. Emacs Python-mode does this right. Some examples:
class Rectangle(Blob):
Blank Lines
Separate top-level function and class definitions with two blank lines. Method definitions inside
a class are separated by a single blank line. Extra blank lines may be used (sparingly) to
separate groups of related functions. Blank lines may be omitted between a bunch of related
one-liners (e.g. a set of dummy implementations).
When blank lines are used to separate method definitions, there is also a blank line between
the `class' line and the first method definition.
Pet Peeves
x=1
y=2
long_variable = 3
(Don't bother to argue with me on any of the above -- I've grown accustomed to this style over
15 years.)
Other Recommendations
• Always surround these binary operators with a single space on either side: assignment
(=), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and,
or, not).
• Use your better judgement for the insertion of spaces around arithmetic operators.
Always be consistent about whitespace on either side of a binary operator. Some
examples:
• i = i+1
• submitted = submitted + 1
• x = x*2 - 1
• hypot2 = x*x + y*y
• c = (a+b) * (a-b)
• c = (a + b) * (a - b)
• Don't use spaces around the '=' sign when used to indicate a keyword argument or a
default parameter value. For instance:
• def complex(real, imag=0.0):
• return magic(r=real, i=imag)
Comments
Comments that contradict the code are worse than no comments. Always make a priority of
keeping the comments up-to-date when the code changes!
If a comment is short, the period at the end is best omitted. Block comments generally consist
of one or more paragraphs built out of complete sentences, and each sentence should end in a
period.
Python coders from non-English speaking countries: please write your comments in English,
unless you are 120% sure that the code will never be read by people who don't speak your
language.
Block Comments
Block comments generally apply to some (or all) code that follows them, and are indented to
the same level as that code. Each line of a block comment starts with a # and a single space
(unless it is indented text inside the comment). Paragraphs inside a block comment are
separated by a line containing a single #. Block comments are best surrounded by a blank line
above and below them (or two lines above and a single line below for a block comment at the
start of a a new section of function definitions).
Inline Comments
An inline comment is a comment on the same line as a statement. Inline comments should be
used sparingly. Inline comments should be separated by at least two spaces from the
statement. They should start with a # and a single space.
Inline comments are unnecessary and in fact distracting if they state the obvious. Don't do this:
x = x+1 # Increment x
But sometimes, this is useful:
x = x+1 # Compensate for border
Documentation Strings
All modules should normally have doc strings, and all functions and classes exported by a
module should also have doc strings. Public methods (including the __init__ constructor)
should also have doc strings.
The doc string of a script (a stand-alone program) should be usable as its "usage" message,
printed when the script is invoked with incorrect or missing arguments (or perhaps with a "-h"
option, for "help"). Such a doc string should document the script's function and command line
syntax, environment variables, and files. Usage messages can be fairly elaborate (several
screenfuls) and should be sufficient for a new user to use the command properly, as well as a
complete quick reference to all options and arguments for the sophisticated user.
For consistency, always use """triple double quotes""" around doc strings.
There are two forms of doc strings: one-liners and multi-line doc strings.
One-liners are for really obvious cases. They should really fit on one line. For example:
def kos_root():
"""Return the pathname of the KOS root directory."""
global _kos_root
if _kos_root: return _kos_root
...
Notes:
• Triple quotes are used even though the string fits on one line. This makes it easy to later
expand it.
• The closing quotes are on the same line as the opening quotes. This looks better for
one-liners.
• There's no blank line either before or after the doc string.
• The doc string is a phrase ending in a period. It prescribes the function's effect as a
command ("Do this", "Return that"), not as a description: e.g. don't write "Returns the
pathname ..."
Multi-line doc strings consist of a summary line just like a one-line doc string, followed by a
blank line, followed by a more elaborate description. The summary line may be used by
automatic indexing tools; it is important that it fits on one line and is separated from the rest of
the doc string by a blank line.
The entire doc string is indented the same as the quotes at its first line (see example below).
Doc string processing tools will strip an amount of indentation from the second and further lines
of the doc string equal to the indentation of the first non-blank line after the first line of the doc
string. Relative indentation of later lines in the doc string is retained.
I recommend inserting a blank line between the last paragraph in a multi-line doc string and its
closing quotes, placing the closing quotes on a line by themselves. This way, Emacs' fill-
paragraph command can be used on it.
I also recommend inserting a blank line before and after all doc strings (one-line or multi-line)
that document a class -- generally speaking, the class' methods are separated from each other
by a single blank line, and the doc string needs to be offset from the first method by a blank
line; for symmetry, I prefer having a blank line between the class header and the doc string.
Doc strings documenting function generally don't have this requirement, unless the function's
body is written as a number of blank-line separated sections -- in this case, treat the doc string
as another section, and precede it with a blank line.
The doc string for a module should generally list the classes, exceptions and functions (and any
other objects) that are exported by the module, with a one-line summary of each. (These
summaries generally give less detail than the summary line in the object's doc string.)
The doc string for a function or method should summarize its behavior and document its
arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be
called (all if applicable). Optional arguments should be indicated. It should be documented
whether keyword arguments are part of the interface.
The doc string for a class should summarize its behavior and list the public methods and
instance variables. If the class is intended to be subclassed, and has an additional interface for
subclasses, this interface should be listed separately (in the doc string). The class constructor
should be documented in the doc string for its __init__ method. Individual methods should be
documented by their own doc string.
If a class subclasses another class and its behavior is mostly inherited from that class, its doc
string should mention this and summarize the differences. Use the verb "override" to indicate
that a subclass method replaces a superclass method and does not call the superclass method;
use the verb "extend" to indicate that a subclass method calls the superclass method (in
addition to its own behavior).
Do not use the Emacs convention of mentioning the arguments of functions or methods in
upper case in running text. Python is case sensitive and the argument names can be used for
keyword arguments, so the doc string should document the correct argument names. It is best
to list each argument on a separate line, with two dashes separating the name from the
description, like this:
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
Version Bookkeeping
If you have to have RCS or CVS crud in your source file, do it as follows.
__version__ = "$Revision: 6104 $"
# $Source$
These lines should be included after the module's doc string, before any other code, separated
by a blank line above and below.
Naming Conventions
The naming conventions of Python's library are a bit of a mess, so we'll never get this
completely consistent -- nevertheless, here are some guidelines.
There's also the style of using a short unique prefix to group related names together. This is not
used much in Python, but I mention it for completeness. For example, the os.stat() function
returns a tuple whose items traditionally have names like st_mode, st_size, st_mtime and so
on. The X11 library uses a leading X for all its public functions. (In Python, this style is
generally deemed unnecessary because attribute and method names are prefixed with an
object, and function names are prefixed with a module name.)
In addition, the following special forms using leading or trailing underscores are recognized
(these can gerally be combined with any case convention):
• _single_leading_underscore: weak "internal use" indicator (e.g. "from M import *" does
not import objects whose name starts with an underscore).
• single_trailing_underscore_: used by convention to avoid conflicts with Python keyword,
e.g. Tkinter.Toplevel(master, class_="ClassName").
• __double_leading_underscore: class-private names in Python 1.4.
• __double_leading_and_trailing_underscore__: "magic" objects or attributes that live in
user-controlled namespaces, e.g. __init__, __import__ or __file__. Sometimes
these are defined by the user to trigger certain magic behavior (e.g. operator
overloading); sometimes these are inserted by the infrastructure for its own use or for
debugging purposes. Since the infrastructure (loosely defined as the Python interpreter
and the standard library) may decide to grow its list of magic attributes in future
versions, user code should generally refrain from using this convention for its own use.
User code that aspires to become part of the infrastructure could combine this with a
short prefix inside the underscores, e.g. __bobo_magic_attr__.
Module Names
Since module names are mapped to file names, and some file systems are case insensitive and
truncate long names, it is important that module names be chosen to be fairly short and not in
conflict with other module names that only differ in the case -- this won't be a problem on Unix,
but it will be when the code is transported to Mac or Windows.
There is an emerging convention that when an extension module written in C or C++ has an
accompanying Python module that provides a higher level (e.g. more object oriented) interface,
the Python module's name CapWords, while the C/C++ module is named in all lowercase and
has a leading underscore (e.g. Tkinter/_tkinter).
"Packages" (groups of modules, supported by the "ni" module) generally have a short all
lowercase name.
Class Names
Almost without exception, class names use the CapWords convention. Classes for internal use
have a leading underscore in addition.
Exception Names
If a module defines a single exception raised for all sorts of conditions, it is generally called
"error" or "Error". As far as I can tell, built-in (extension) modules use "error" (e.g. os.error),
while Python modules generally use "Error" (e.g. xdrlib.Error).
Function Names
Plain functions exported by a module can either use the CapWords style or lowercase (or
lower_case_with_underscores). I have no strong preference, but believe that the
CapWords style is used for functions that provide major functionality (e.g.
nstools.WorldOpen()), while lowercase is used more for "utility" functions (e.g.
pathhack.kos_root()).
(Let's hope that these variables are meant for use inside one module only.) The conventions
are about the same as those for exported functions. Modules that are designed for use via
"from M import *" should prefix their globals (and internal functions and classes) with an
underscore to prevent exporting them.
Method Names
Hmm, the story is largely the same as for functions. When using ILU, here's a good convention:
use CapWords for methods published via an ILU interface. Use lowercase for methods accessed
by other classes or functions that are part of the implementation of an object type. Use one
leading underscore for "internal" methods and instance variables when there is no chance of a
conflict with subclass or superclass attributes or when a subclass might actually need access to
them. Use two leading underscores (class-private names, enforced by Python 1.4) in those
cases where it is important that only the current class accesses an attribute. (But realize that
Python contains enough loopholes so that an insistent user could gain access nevertheless, e.g.
via the __dict__ attribute. Only ILU or Python's restricted mode will XXX