Basic data structures - Python Bootcamp

Constantine Lignos

Contents

  1. Sequences
    1. List basics
    2. Working with lists
  2. Maps
    1. Default dictionaries
  3. Strings
    1. String methods
    2. String formatting
  4. Data model

Sequences

List basics

For example, conisder the following list of strings:

lyrics = ['Her', 'name', 'is', 'Rio']

Each string itself is actually its own container type, although it is not a list. Lists are special in that you can modify their contents and the items in them do not need to be the same type:

lyrics[0] = 'His'
lyrics[3] = 7

Working with lists

len returns the length of a sequence, which is the number of items in it:

>>> len(lyrics)
4

[] allow you to access elements in a list. If you ask for an index that’s not in the sequence, you get an error. In addition to the usual indices, you can ask for negative indices, which go from the end of the sequence

>>> lyrics[0] # The first element
'Her'
>>> lyrics[4]
IndexError: list index out of range
>>> lyrics[-1] # The last element
'Rio'
>>> lyrics[-2] # The second-to-last element
'is'

You can get more than one element at a time via slicing. Slicing gets you an ordered subsequence of the list between two indices, inclusive of the first index, and exclusive of the second. To slice use the colon:

>>> lyrics = ['Her', 'name', 'is', 'Rio']
>>> lyrics[0:1] # From 0 to before 1
['Her']
>>> lyrics[0:2] # From 0 to before 2
['Her', 'name']
>>> lyrics[1:]  # From 1 to the end
['name', 'is', 'Rio']
>>> lyrics[:-1] # From start to before -1
['Her', 'name', 'is']
>>> lyrics[:] 	 # From start to the end 
['Her', 'name', 'is', 'Rio']
>>> lyrics[-2:] # From -2 to end
['is', 'Rio']

The for loop gives simple iteration over sequences. Read this as “Each time the loop runs, set item equal to the next element in items. We call item a loop variable, meaning it’s the main variable that changes as the loop runs.

for item in items:
    print(item)

If we need to do something with the index as well, enumerate can be used. “Each time the loop runs, set item equal to the next element in items and set index to the index of that item.

for index, item in enumerate(items):
    print(item, "is at index", index)

Note two useful syntactic tidbits there. First, you can automatically unpack the index and item at once. You can also print multiple things by separating them with a comma, which will put a space between them.

You can add one item to a list by using append, or multiple by using extend:

>>> lyrics = ['Her', 'name', 'is', 'Rio']
>>> lyrics.append('and')
>>> lyrics
['Her', 'name', 'is', 'Rio', 'and']
>>> lyrics.extend(['she', 'dances', 'on', 'the', 'sand'])
>>> lyrics
['Her', 'name', 'is', 'Rio', 'and', 'she', 'dances', 'on', 'the', 'sand']

Note that these change this list in place but don’t return anything. Don’t make the mistake of writing something like:

# This is pointless, lyrics2 will be None
lyrics2 = lyrics.append('and')

Maps

Map basics:

Curly braces ({}) are in the creation of dictionaries, while square braces ([]) are used for lookup, just like lists.

Creating a new empty dictionary
words_to_nums = {}
# Assigning a value to a key in the dictionary
words_to_nums['one'] = 1
# Getting a value from the dictionary using a key
num = words_to_nums['one']
# Deleting a key from the dictionary
del words_to_nums['one']
# Creating a new dictionary with items in it
words_to_nums = {'one': 1, 'two': 2, 'three': 3}

In this example the keys of the dictionary are strings. The keys in a dict can be any immutable type, which we’ll define later, but for the moment contains strings, integers, and tuples but not lists or dictionaries.

You can iterate over keys, values, or both using a for loop

# Loop over keys
for key in adict:
    print("Key:", key)

# Loop over values
for value in adict.values():
    print("Value:", value)

# Loop over both
for key, value in adict.items():
    print("Key", key, "has value", value)

Checking whether a key is in the dictionary is easy

if key in adict:
    print("Found key", key)

What if a key doesn’t exist? An exception occurs.

>>> words_to_nums = {'one': 1, 'two': 2, 'three': 3}
>>> words_to_nums['one']
1
>>> nums_to_words['four']
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    words_to_nums['four']
KeyError: 'four'

You can fix this by either checking in advance for whether a key is in the dictionary or by handling the exception, which we’ll talk about later.

Default dictionaries

Sometimes you want to be able to set up a dictionary so that everything has a default value. For example, let’s assume that you want have a dictionary from formal names to nicknames, where each name can have multiple nicknames. Also, let’s assume that you want to read in a file that looks something like this:

Johnathan John
Johnathan Johnny
Johnathan Jack

What we want is a dictionary that looks like this, with string keys and list values:

{'Johnathan': ['John', 'Johnny', 'Jack']}

We can accomplish this by using a defaultdict, a dictionary that comes with a default value for a key we haven’t seen before. In this case, that default value would be an empty list. We can ask for that by doing the following:

from collections import defaultdict
nicknames = defaultdict(list)

Exercise: Write a program that populates a default dictionary from a file of nicknames as shown above.

Strings

Strings are a sequence type, so they can be sliced and iterated over just like a list:

>>> word = "happy"
>>> word[1:3]
'ap'
>>> for letter in word:
        print(letter)
   
h
a
p
p
y

However, they are immutable, so unlike a list you cannot change the contents of a string object.

>>> word[0] = "s"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

Rather than modifying a string, you just can just replace it with another one:

>>> word = "sappy"

String methods

Here are some popular ones you may need:

Examples:

>>> "file.py".endswith(".py")
True
>>> " 1 2 3 ".split()
['1', '2', '3']
>>> "Abracadabra".split("a")
['Abr', 'c', 'd', 'br', '']
>>> " The quick brown fox \n".strip()
'The quick brown fox'
>>> ", ".join(["cats", "dogs", "elephants"])
'cats, dogs, elephants'

String formatting

You often want to print out strings nicely, as a part of normal operation or debugging. This is best accomplished by using the format method on a string.

>>> lyrics = ['Her', 'name', 'is', 'Rio']
>>> print("lyrics contains {} items".format(len(lyrics)))
lyrics contains 4 items

The format method replaces areas marked with {} with its arguments. You can use this to this to control the details of what goes in. For example:

>>> print("Lyrics: {}".format(lyrics))
Lyrics: ['Her', 'name', 'is', 'Rio']
>>> print("First word: {}".format(lyrics[0]))
First word: Her
>>> print("First word: {!r}".format(lyrics[0]))
First word: 'Her'
>>> print("2/3 is {}".format(2.0 / 3.0))
2/3 is 0.666666666667
>>> print("2/3 is {:0.2f}".format(2.0 / 3.0))
2/3 is 0.67

You can use indices to control the order in which strings are interpolated (placed into the host string). If you leave them out, strings will be interpolated in the order given:

>>> "{} spam! {} spam!".format("Lovely", "Wonderful")
'Lovely spam! Wonderful spam!'
>>> "{1} {0}! {2} {0}!".format("spam", "Lovely", "Wonderful")
'Lovely spam! Wonderful spam!'

By default, format coaxes its arguments into the prettiest strings possible. Using {!r} calls repr on the argument, which makes the representation “machine-readable” by adding things like quotation marks around it and sometimes type information. Other format specifiers can do things like control the number of digits displayed. See the format string specification for the gory details.

Data model

As alluded to earlier, some types of data can be changed in-place while others cannot. The tuple is a type similar to a list but one that cannot be changed once it is created; it is immutable.

>>> words1 = ['the', 'dog']
>>> words1[1] = 'cat'
>>> words1
['the', 'cat']
>>> words2 = ('the', 'dog')
>>> words2[1] = 'cat'
Traceback (most recent call last):
  File "<pyshell#58>", line 1, in <module>
    words2[1] = 'cat'
TypeError: 'tuple' object does not support item assignment

This manifests itself when considering the question “will changing this object affect anything else?”. For example:

>>> x = 7
>>> y = x
>>> x += 1
>>> x
8
>>> y
7

As integers are immutable, at the beginning x and y refer to the same object but when x is incremented it points to a different object instead. However, note:

>>> x = ['a']
>>> y = x
>>> x.append('b')
>>> x
['a', 'b']
>>> y
['a', 'b']

As lists are mutable, append changes the object in-place, affecting x and y.