Using the CSV module - Python Bootcamp
Constantine Lignos
Contents
Understanding CSVs
A comma-separated values (CSV) file is a convenient way to store data that can be easily read and written. Despite the name, many CSVs files aren’t even comma-delimited (tab is common, but people rarely call them TSV) and there is no standard way of doing it.
You might think you can trivially read and write CSVs using
.split(',')
to read them and ','.join(fields)
to write
them. However, this is risky; if some of the fields contain commas
themselves, those fields will be enclosed by quotes and you’ll end up
misparsing.
Reading a CSV
The simplest way to read a CSV is the
csv.reader
function. This will return a reader object that allows you to read
each row as a list. An example adapted from the documentation:
>>> import csv
>>> with open('eggs.csv', 'U') as csvfile:
... spamreader = csv.reader(csvfile)
... for row in spamreader:
... print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam
This demonstrates another useful Python construct, using with
to
mark of a block where you will use a file, which will automatically
close it when you are done with it.
Writing a CSV
Similarly,
csv.writer
allows you to write a CSV by writing one row at a time. Note that
when you write a CSV file, you want to set the mode to 'wb'
.
DictReader/Writer
What if you don’t want to rely on hard-coding the order of the fields
in each line? You can use the
DictReader
and
DictWriter
classes to help. These allow you to read and write each row as a
dictionary, with keys being the field names and values being the value
of that field in each row.
Too slow?
The standard Python CSV parser is designed to handle a lot of strange
input well, including Excel files. If you care more about speed than
broad features, take a look at read_csv
in
pandas.
Exercises
Here are some exercises to get you used to working with the CSV module. Try these out using a sample CSV file.
- Write a function that reads a CSV and uses a
defaultdict
to store a list of the values for each item as it reads in the CSV. Usecsv.reader
andcsv.writer
. - Write a second function that takes the dictionary produced above
and then computes the minimum, maximum, and mean values for each
item. (You’ll need to write your own function to compute the mean.)
Write these values to another CSV with four fields:
item, min, mean, max
. - When you’ve got everything working, replace the reader and writer
with
csv.DictReader
andcsv.DictWriter
.