Anti-Patterns in Python Programming

Constantine Lignos | Last updated: July 09, 2014

Note (9/1/2019): The advice here is somewhat out of date at this point. An update is coming someday to change the examples to reflect Python 3 and modern performance characteristics of iteration.


This page is a collection of the most unfortunate but occasionally subtle issues I’ve seen in code written by students new to writing Python. It’s written to help students get past the phase of writing ugly Python code and understand some of the most common idioms. The simplifications employed (for example, ignoring generators and the power of itertools when talking about iteration) reflect its intended audience.

There are always reasons to use some of these anti-patterns, which I’ve tried to give those where possible, but in general using these anti-patterns makes for less readable, more buggy, and less Pythonic code. If you’re looking for broader introductory materials for Python, I highly recommend The Python Tutorial or Dive into Python.

If you have comments or wish to use this work in way other than what the license allows, feel free to get in touch with me by e-mail.


Iteration

The use of range

Programmers that are new to Python love using range to perform simple iteration by applying it over the length of an iterable and then getting each element:

for i in range(len(alist)):
    print alist[i]

Recite it in your sleep: range is not for simple, obvious iterations over sequences. For those used to numerically defined for loops, range feels like home, but using it for iteration over sequences is bug-prone and less clear than using the standard for construct directly on an iterable. Just write:

for item in alist:
    print item

Misuses of range are prone to unfortunate off-by-one bugs. This is commonly caused by forgetting that range is inclusive in its first argument and exclusive in its second, just like substring in Java and many, many, other functions of this type. Those who think too hard about not overrunning the end of their sequence are going to create bugs:

# An incorrect way to iterate over a whole sequence
alist = ['her', 'name', 'is', 'rio']
for i in range(0, len(alist) - 1): # Off by one!
    print i, alist[i]

The common excuses for using range inappropriately are:

  1. Needing the index value in the loop. This isn't a valid excuse. Instead, you should write:

        for index, value in enumerate(alist): 
            print index, value
        
  2. Needing to iterate over two loops at once, getting a value at the same index from each. In this case, you want to use zip:

        for word, number in zip(words, numbers):
            print word, number
        
  3. Needing to iterate over only part of a sequence. In this case, just iterate over a slice of the sequence and include a comment to make it clear that this was intentional:

        for word in words[1:]: # Exclude the first word
            print word
        

    An exception to this is when you're iterating over a sequence so big that the overhead introduced by slicing the would be very expensive. If your sequence is 10 items, this is unlikely to matter, but if it is 10 million items or this is done in a performance-sensitive inner loop, this is going to be very important. Consider using xrange in this case.

An important use case of range outside of iterating over a sequence is when you genuinely need a list of numbers not to be used for indexing:

# Print foo(x) for 0<=x<5
for x in range(5):
    print foo(x)

Using list comprehensions properly

If you have a loop that looks like this, you want to rewrite it as a list comprehension:

# An ugly, slow way to build a list
words = ['her', 'name', 'is', 'rio']
alist = []
for word in words:
    alist.append(foo(word))

Instead, write a list comprehension:

words = ['her', 'name', 'is', 'rio']
alist = [foo(word) for word in words]

Why do this? For one, you avoid any bugs related to correctly initializing alist. Also, the code just looks a lot cleaner and what you’re doing is clearer. For those from a functional programming background, map may feel more familiar, but I find it less Pythonic.

Some common excuses for not using a list comprehension:

  1. You need to nest your loop. You can nest entire list comprehensions, or just put multiple loops inside a list comprehension. So, instead of writing:

        words = ['her', 'name', 'is', 'rio']
        letters = []
        for word in words:
            for letter in word:
                letters.append(letter)
        

    Write:

        words = ['her', 'name', 'is', 'rio']
        letters = [letter for word in words
                          for letter in word]
        

    Note that in a list comprehension with multiple loops, the loops have the same order as if you weren't making a list comprehension at all.

  2. You need a condition inside your loop. But you can do this in a list comprehension just as easily:

        words = ['her', 'name', 'is', 'rio', '1', '2', '3']
        alpha_words = [word for word in words if isalpha(word)] 
        

A valid reason for not using a list comprehension is that you can’t do exception handling inside one. So if some items in the iteration will cause exceptions to be raised, you will need to either offload the exception handling to a function called by the list comprehension or not use a list comprehension at all.


Performance Pitfalls

Checking for contents in linear time

Syntactically, checking if something is contained in a list or a set/dictionary look alike, but under the hood things are different. If you need to repeatedly check whether something is contained in a data structure, use a set instead of a list. (You can use a dict if you need to associate a value with it and also get constant time membership tests.)

# Assume we start with a list
lyrics_list = ['her', 'name', 'is', 'rio']
# Avoid this
words = make_wordlist() # Pretend this returns many words that we want to test
for word in words:
    if word in lyrics_list: # Linear time
        print word, "is in the lyrics"
# Do this
lyrics_set = set(lyrics_list) # Linear time set construction
words = make_wordlist() # Pretend this returns many words that we want to test
for word in words:
    if word in lyrics_set: # Constant time
        print word, "is in the lyrics"

Keep in mind that creation of the set introduces one-time overhead; creation will take linear time even though membership testing takes constant time. So if you are checking for membership in a loop, it’s almost always worth it to take the time to build a set since you only have to build the set once.


Leaky Variables

Loops

Generally speaking, in Python the scope of a name is wider than one might expect given other languages. For example, in Java, the following code will not even compile:

// Get the index of the lowest-indexed item in the array
// that is > maxValue
for(int i = 0; i < y.length; i++) {
    if (y[i] > maxValue) {
        break;
    }
}
// Oops, there is no i here
processArray(y, i);

However, in Python the equivalent will always compile and often produce the intended result:

for idx, value in enumerate(y):
    if value > max_value:
        break
    
processList(y, idx)

This will work in all cases except when y is empty; in that case the loop never runs and the call to processList will raise a NameError because idx is not defined. If you use Pylint, it would warn you about “Using possibly undefined loop variable idx.”

The solution is to always be explicit and set idx to some special value before the loop, so you know what to look for if the loop never runs. This is called the Sentinel Pattern. So what value should you use for a sentinel? Starting with C or earlier, back when int ruled the Earth, a common pattern for a function that needed to return an “expected error” result was to return -1. For example, let’s say you want to return the index of an item in a list:

def find_item(item, alist):
    # None is arguably more Pythonic than -1
    result = -1
    for idx, other_item in enumerate(alist):
        if other_item == item:
            result = idx
            break

    return result

In the general case, None is a better sentinel of choice in Python, even if it isn’t used consistently by Python’s standard types (e.g., str.find). See the style section for recommended ways to test for None.

The outer scope

Programmers new to Python often love to put everything in what is called the outer scope, the parts of a python file not contained in a block such as a function of class. The outer scope corresponds to the global namespace; for the purpose of this discussion, you should assume the contents of the global namespace are accessible anywhere within a single Python file.

The outer scope is great for defining constants that the whole module needs access to that you want to declare at the top of a file. It’s wise to give anything in the outer scope distinctive names, for example IN_ALL_CAPS. That makes it easier to avoid bugs like the following:

import sys

# See the bug in the function declaration?
def print_file(filenam):
    """Print every line of a file."""
    with open(filename) as input_file:
        for line in input_file:
            print line.strip()
            
if __name__ == "__main__":
    filename = sys.argv[1]
    print_file(filename)

If you look closely, you’ll see that the definition of print_file names its argument filenam, but the body of the function references filename. However, this program works just fine. Why? In print_file when a local variable named filename isn’t found, the next step is to look at the global namespace. Since the code that calls print_file lives in the outer scope (even though it’s indented), the variable filename declared there is visible to print_file.

So, how do you avoid problems like this? First, don’t set any values in the outer scope that aren’t IN_ALL_CAPS. Things like parsing arguments are best delegated to a function named main, so that any internal variables in that function do not live in the outer scope.

This also serves as a reminder about the global keyword. You do not need the global keyword if you are just reading the value of a global name. You only need it if you want to change what object a global variable name refers to. See this discussion of the global keyword on Stack Overflow for more information.


Style, style, style

Honor thy PEP 8

PEP 8 is the universal style guide for Python code. It’s wise to follow it as much as possible; the better you stick to it, the easier it will be for others to read your code. The guidelines below are all taken from PEP 8 and seem to be the ones people need to be reminded of most often.

Testing for empty

If you want to check whether a container type (e.g., list, dictionary, set) is empty, simply test it instead of doing something like len(x) > 0:

numbers = [-1, -2, -3]
# This will be empty
positive_numbers = [num for num in numbers if num > 0]
if positive_numbers:
    # Do something awesome

If you want to store this result somewhere, use bool(positive_numbers); bool is what is called to determine the truth value of the target of if.

Testing for None

As I mentioned previously, None makes a good sentinel value. How should you check for it?

If you are specifically testing for None and not just other things that evaluate as False (e.g., empty containers, 0) use is:

if x is not None:
    # Do something with x

If you are using None as a sentinel, this is the desired pattern; you want to distinguish None from 0, for example.

If you are just testing for whether you have something useful to work with, a simple if pattern is usually good enough:

if x:
    # Do something with x

For example, if x is expected to be a container type, but could be None based on the result of another function, this handles all the cases that you care about at once. Just be aware that if you change the value that goes into x such that False or 0.0 are useful values, this may not behave the way you want.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.