Anti-Patterns in Python Programming
Constantine Lignos | Last updated: July 09, 2014
Note (9/1/2019): The advice here is somewhat out of date at this point. An update is coming someday to change the examples to reflect Python 3 and modern performance characteristics of iteration.
This page is a collection of the most unfortunate but occasionally subtle issues I’ve seen in code written by students new to writing Python. It’s written to help students get past the phase of writing ugly Python code and understand some of the most common idioms. The simplifications employed (for example, ignoring generators and the power of itertools when talking about iteration) reflect its intended audience.
There are always reasons to use some of these anti-patterns, which I’ve tried to give those where possible, but in general using these anti-patterns makes for less readable, more buggy, and less Pythonic code. If you’re looking for broader introductory materials for Python, I highly recommend The Python Tutorial or Dive into Python.
If you have comments or wish to use this work in way other than what the license allows, feel free to get in touch with me by e-mail.
Iteration
The use of range
Programmers that are new to Python love using range
to
perform simple iteration by applying it over the length of an
iterable and then getting each element:
Recite it in your sleep: range
is not for simple, obvious iterations over sequences.
For those used to numerically defined for
loops, range
feels like home,
but using it for iteration over sequences is bug-prone and less clear than using the standard
for
construct directly on an iterable. Just write:
Misuses of range
are prone to unfortunate off-by-one bugs. This is
commonly caused by forgetting that range
is inclusive in its first
argument and exclusive in its second, just like substring
in
Java
and many, many, other functions of this type. Those who think too
hard about not overrunning the end of their sequence are going to
create bugs:
The common excuses for using range
inappropriately are:
Needing the index value in the loop. This isn't a valid excuse. Instead, you should write:
Needing to iterate over two loops at once, getting a value at the same index from each. In this case, you want to use
zip
:Needing to iterate over only part of a sequence. In this case, just iterate over a slice of the sequence and include a comment to make it clear that this was intentional:
An exception to this is when you're iterating over a sequence so big that the overhead introduced by slicing the would be very expensive. If your sequence is 10 items, this is unlikely to matter, but if it is 10 million items or this is done in a performance-sensitive inner loop, this is going to be very important. Consider using
xrange
in this case.
An important use case of range
outside of iterating over a sequence
is when you genuinely need a list of numbers not to be used for indexing:
Using list comprehensions properly
If you have a loop that looks like this, you want to rewrite it as a list comprehension:
Instead, write a list comprehension:
Why do this? For one, you avoid any bugs related to correctly initializing
alist. Also, the code just looks a lot cleaner and what you’re doing is clearer.
For those from a functional programming background,
map
may feel more familiar, but I find it less Pythonic.
Some common excuses for not using a list comprehension:
You need to nest your loop. You can nest entire list comprehensions, or just put multiple loops inside a list comprehension. So, instead of writing:
Write:
Note that in a list comprehension with multiple loops, the loops have the same order as if you weren't making a list comprehension at all.
You need a condition inside your loop. But you can do this in a list comprehension just as easily:
A valid reason for not using a list comprehension is that you can’t do exception handling inside one. So if some items in the iteration will cause exceptions to be raised, you will need to either offload the exception handling to a function called by the list comprehension or not use a list comprehension at all.
Performance Pitfalls
Checking for contents in linear time
Syntactically, checking if something is contained in a list or a set/dictionary look alike, but under the hood things are different. If you need to repeatedly check whether something is contained in a data structure, use a set instead of a list. (You can use a dict if you need to associate a value with it and also get constant time membership tests.)
Keep in mind that creation of the set introduces one-time overhead; creation will take linear time even though membership testing takes constant time. So if you are checking for membership in a loop, it’s almost always worth it to take the time to build a set since you only have to build the set once.
Leaky Variables
Loops
Generally speaking, in Python the scope of a name is wider than one might expect given other languages. For example, in Java, the following code will not even compile:
However, in Python the equivalent will always compile and often produce the intended result:
This will work in all cases except when y
is empty; in that case the
loop never runs and the call to processList
will raise a NameError
because idx
is not defined. If you use Pylint, it would warn you
about “Using possibly undefined loop variable idx.”
The solution is to always be explicit and set idx
to some special
value before the loop, so you know what to look for if the loop never
runs. This is called the Sentinel
Pattern. So what value should
you use for a sentinel? Starting with C or earlier, back when int
ruled the Earth, a common pattern for a function that needed to return
an “expected error” result was to return -1
. For example,
let’s say you want to return the index of an item in a list:
In the general case, None
is a better sentinel of choice in Python,
even if it isn’t used consistently by Python’s standard types (e.g.,
str.find
). See the style section for recommended ways to test for
None.
The outer scope
Programmers new to Python often love to put everything in what is called the outer scope, the parts of a python file not contained in a block such as a function of class. The outer scope corresponds to the global namespace; for the purpose of this discussion, you should assume the contents of the global namespace are accessible anywhere within a single Python file.
The outer scope is great for defining constants that the whole module
needs access to that you want to declare at the top of a file. It’s
wise to give anything in the outer scope distinctive names, for
example IN_ALL_CAPS
. That makes it easier to avoid bugs like the
following:
If you look closely, you’ll see that the definition of print_file
names its argument filenam
, but the body of the function references
filename
. However, this program works just fine. Why? In
print_file
when a local variable named filename
isn’t found, the
next step is to look at the global namespace. Since the code that
calls print_file lives in the outer scope (even though it’s indented),
the variable filename declared there is visible to print_file
.
So, how do you avoid problems like this? First, don’t set any values
in the outer scope that aren’t IN_ALL_CAPS
. Things like parsing
arguments are best delegated to a function named main
, so that any
internal variables in that function do not live in the outer scope.
This also serves as a reminder about the global
keyword. You do not
need the global keyword if you are just reading the value of a global
name. You only need it if you want to change what object a global
variable name refers to. See this discussion of the global keyword on
Stack Overflow for more
information.
Style, style, style
Honor thy PEP 8
PEP 8 is the universal style guide for Python code. It’s wise to follow it as much as possible; the better you stick to it, the easier it will be for others to read your code. The guidelines below are all taken from PEP 8 and seem to be the ones people need to be reminded of most often.
Testing for empty
If you want to check whether a container type (e.g., list, dictionary,
set) is empty, simply test it instead of doing something
like len(x) > 0
:
If you want to store this result somewhere, use
bool(positive_numbers)
; bool
is what is called to determine the
truth value of the target of if
.
Testing for None
As I mentioned previously, None
makes a good sentinel
value. How should you check for it?
If you are specifically testing for None
and not just
other things that evaluate as False
(e.g., empty
containers, 0) use is
:
If you are using None
as a sentinel, this is the
desired pattern; you want to distinguish None
from 0,
for example.
If you are just testing for whether you have something useful to work with, a simple if pattern is usually good enough:
For example, if x
is expected to be a container type, but could be
None
based on the result of another function, this handles all the
cases that you care about at once. Just be aware that if you change
the value that goes into x
such that False
or 0.0
are useful
values, this may not behave the way you want.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.