9. Files#
In this notebook, we cover the following subjects:
Opening Files;
Reading Files;
A Context Manager;
Writing Files,
Working with
csv
files,Working with
json
files
# To enable type hints for lists, dicts, tuples, and sets we need to import the following:
from typing import List, Dict, Tuple, Set
9.1. Opening Files#
So far, we have worked with data that was stored and created directly in our notebooks. For example, we created a dictionary of word frequencies from a string of text. However, most of the data we use in real-world applications is stored in files, often due to its sheer size or because it needs to be kept for future use. Therefore, it’s essential to learn how to retrieve this data so we can perform operations on it.
First things first. How do we find a file?
9.1.1. File Names and Paths#
Files are organized into directories (also called “folders”). Every running program has a current directory, which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.
A string like "/Users/mvdbrand/Documents/GitLab/course-material-jbi010/202122/Lectures/week3"
that identifies a file or directory is called a path. Often we assign this path to a variable, for example:
file_path: str = "assets/halloween.txt"
Here, assets
is a folder located in the same directory as this notebook. We want to access the file halloween.txt
from this folder.
9.1.2. The open()
Function#
Now that we know how to locate our file, the next step is to open it. As the title of this section suggests, we’ll use the open()
function for this purpose. Your first instinct might be to write something like this:
halloween_file = open(file_path)
print(f'Type of the open file object is: {type(halloween_file)}')
print('\nContent of our file is:\n')
print(halloween_file)
# Does it work like expected?
Type of the open file object is: <class '_io.TextIOWrapper'>
Content of our file is:
<_io.TextIOWrapper name='assets/halloween.txt' mode='r' encoding='UTF-8'>
This isn’t quite what we expected, is it? The function seems to return something called a _io.TextIOWrapper
object. This happens because when we try to open a file, we’re really asking the operating system (OS) to locate the file by its name and ensure it exists. We use Python’s built-in open()
function to do this. If the open()
function is successful, the operating system returns a file handle, which in this case is the _io.TextIOWrapper
object. The file handle isn’t the actual data, but rather an intermediary that allows us to read from or write to the file.
So, while we might have expected the content of the data to be displayed when we printed the file, the first step was successful; the file was located, and a file handle was returned. Now, let’s move on to the next step: how do we actually read a file?
Note
Programs that store their data are persistent
9.2. Reading Files#
9.2.1. The .read()
Method#
So, in the previous section, we figured out how to locate a file successfully. Now, we want to see what’s inside. If you know the file is small compared to your main memory, you can use the read()
method on the file handle. This method pulls the entire content of the file into a single string, including all line breaks.
It’s smart to store the result of the read()
method in a variable, given that it exhausts resources.
halloween_file = open("assets/halloween.txt") # You can also pass the string immediately in the function call
content = halloween_file.read()
print('\nContent of our file is:\n')
print(content)
halloween_file.close()
Content of our file is:
Halloween costume inventory:
Ghost
Zombie
Witch
Pumpkin
Candy inventory:
Snickers 15
Jawbreakers 32
Tony's 8
You might be wondering, what’s the deal with this .close()
at the end? Even though we’re only reading the file in this section and not writing to it, it’s still important to close it. When writing to a file, closing is crucial because, until you do, the data might not actually be saved or stored properly. By getting into the habit of closing files, even when just reading, you ensure consistent behavior across different Python environments, making your code more reliable. But don’t worry, you won’t have to always open and close files by hand. We will learn how to do it using a specific command later on.
9.2.2. The .readlines()
Method#
Sometimes, you don’t need to read the entire content of a file at once; instead, you may want to process it line by line. The .readlines()
method is ideal for this because it returns the content as a list of strings, with each line split at the newline character (\n
).
halloween_file = open("assets/halloween.txt") # You can also pass the string immediately in the function call
content = halloween_file.readlines()
print('\nContent of our file is:\n')
print(content)
halloween_file.close()
Content of our file is:
['Halloween costume inventory:\n', 'Ghost\n', 'Zombie\n', 'Witch\n', 'Pumpkin\n', '\n', 'Candy inventory:\n', 'Snickers 15\n', 'Jawbreakers 32\n', "Tony's 8"]
To access the content line by line, you can simply use a for loop:
for line in content:
print(line)
Halloween costume inventory:
Ghost
Zombie
Witch
Pumpkin
Candy inventory:
Snickers 15
Jawbreakers 32
Tony's 8
9.2.3. The .readline()
Method#
The .readline()
method might feel a bit unintuitive. It reads the content one line at a time, returning the next line each time you call it. It’s easier to understand when you see it in action.
halloween_file = open("assets/halloween.txt") # You can also pass the string immediately in the function call
line = halloween_file.readline()
print(line)
Halloween costume inventory:
And what happens when we call the method again?
line = halloween_file.readline()
print(line)
Ghost
And again?
line = halloween_file.readline()
print(line)
halloween_file.close()
Zombie
9.3. A Context Manager#
Manually opening and closing files is quite prone to errors, especially when writing files, which can lead to problems. To prevent this, before we start writing files, we’ll introduce a context manager; a tool that automatically handles opening/closing for you. We do this by using a with
statement, as the file is automatically closed once you exit the scope of this statement.
file_path: str = "assets/halloween.txt"
with open(file_path, "r") as halloween_file: # The "r" parameters indicates we read a file
# The file is only open in this scope
content = halloween_file.read()
# Once you're out, the file is closed again
print(content)
Halloween costume inventory:
Ghost
Zombie
Witch
Pumpkin
Candy inventory:
Snickers 15
Jawbreakers 32
Tony's 8
Note
Using a context manager is best practice because it reduces errors, so it’s expected that you use it in future programs.
9.3.1. Optional parameters of the open()
function#
The open()
function can take many optional parameters. These parameters can help you control how the file is opened and interacted with.
Let’s check an open()
function with the most common optional parameters:
with open('example.txt', mode='w', delimiter=',', encoding='utf-8') as file:
file.write('Hello, world!')
So what do these parameters do?
mode:
we are already familiar with this
‘w’: Write mode (creates a new file or truncates an existing file).
Other modes include ‘r’ (read), ‘a’ (append), ‘x’ (exclusive creation), ‘b’ (binary), and ‘t’ (text).
encoding:
Specifies the encoding to use for the file (e.g., ‘utf-8’, ‘ascii’).
delimiter:
Used in CSV files to specify the character that separates fields (e.g., ‘,’ for comma, ‘\t’ for tab).
9.4. Writing Files#
9.4.1. The .write()
Method#
To write a file you should open it with mode w
(from write) as second parameter. If the file already exists, opening it in write mode removes the current content from the file, so be careful! If the file does not exist, a new one is created.
file_path: str = "assets/output.txt"
a_string: str = "This is my first line written to a file!"
with open(file_path, "w") as outfile:
outfile.write(a_string)
Open the file in the text editor, did we succeed?
9.4.2. The .append()
Method#
If we want to add to an already existing file, we can use the .append()
method. This can be done by opening the file with mode a (this means append). However, with append, we can only add to the end of the file.
file_path: str = "assets/output.txt"
# Note how we add a new character line to make sure it it on a new line in the file
a_second_string: str = "\nThis is my second line written to a file!"
with open(file_path, "a") as outfile:
outfile.write(a_second_string)
9.5. Working with csv
files#
Welcome you to the world of CSV
(Comma-Separated Values) files! These files are incredibly popular for storing tabular data in a straightforward text format that’s easy to read and write. In this section, we’ll delve into how to use csv.DictReader
, a superb tool that allows you to work with CSV files in a more intuitive way.
9.5.1. What is csv.DictReader
?#
csv.DictReader
reads each row of a CSV
file and converts it into a dictionary, where the keys are drawn from the first row (the header) of the file. This means you can access your data by column names instead of numerical indices, making your code cleaner and easier to understand.
9.5.2. Key Features of csv.DictReader
#
Here are some important features that make csv.DictReader
so useful:
Header Row: The first row of the CSV file serves as the keys for the dictionaries created from each subsequent row.
Customizable Delimiters: You can specify different delimiters (like semicolons) and quote characters, accommodating various CSV formats.
Case Sensitivity: Be mindful that dictionary keys are case-sensitive. For instance, “Name” and “name” are distinct keys.
Handling Missing Values: If a value is missing in a row, the corresponding key in the dictionary will simply be an empty string.
9.5.3. How to Use csv.DictReader
#
Enough theory, right? Let’s see how the csv.DictReader
works in action. Here is the plan how we gonna read the csv file:
( Use the employees.csv
file located in the assets
folder for testing)
First, let’s import the csv module
Second, open the CSV file: Use the
open()
function to read your CSV file, ensuring you specify the correct mode (‘r’ for reading).Third, create a DictReader Object: Pass the opened file object to
csv.DictReader
Lastly, iterate over the rows: Loop through the
DictReader
object to access each row as a dictionary
Ok, let’s see the code:
# import the csv module
import csv
# open the CSV file
with open('assets/employees.csv') as employees_file:
# create a DictReader Object
reader: Dict[str, str] = csv.DictReader(employees_file)
# iterate over the rows
for row in reader:
print(row) # Each row is a dictionary
{'Name': 'Alice', 'Age': '30', 'Department': 'HR'}
{'Name': 'Bob', 'Age': '25', 'Department': 'Engineering'}
{'Name': 'Charlie', 'Age': '28', 'Department': 'Marketing'}
If your CSV file uses semicolons instead of commas, how would you handle that in your code?
To handle a CSV file that uses semicolons instead of commas, you can specify the delimiter as ’;’
in the open()
function.
See, we just have dictionaries!
Now what can we do with this? Let’s say I wanna know the names of all of my employees. How could we get their names?
# let's initialize a list to store employee names
employee_names: List[str] = []
# Read the 'employees.csv' file and extract names
with open('assets/employees.csv') as employees_file:
reader = csv.DictReader(employees_file)
for row in reader:
employee_names.append(row['Name'])
employee_names
['Alice', 'Bob', 'Charlie']
Easy, right? Now there is one more thing we need to cover in this chapter…
9.6. Working with json
files#
If you’re new to programming or Python, you might be wondering what JSON
is and why it’s important. Simply put, JSON
is a popular format for storing and exchanging data. It’s widely used in web applications and APIs because it’s easy to read and write for both humans and machines.
In the following, we’ll explore how to work with JSON
files in Python using the built-in json
module.
9.6.1. Why use json
?#
Ok so we’ve already seen the csv dictreader, so why do we even bother understanding something new? Well, consider the bellow points:
Easy to Read: JSON is text-based and looks similar to Python dictionaries, making it simple to understand.
Language Independent: JSON can be used with many programming languages, not just Python, which makes it a versatile choice for data interchange.
Supports Complex Structures: JSON can handle nested data structures, such as lists and dictionaries, allowing you to represent more complex data easily.
Convinced? Me neither! Let me share the fact that turned me into a JSON believer:
JSON is the standard format for data exchange on the web. Understanding JSON empowers you to interact with APIs and web services, allowing you to build dynamic applications that can communicate with other platforms seamlessly! It’s like learning the secret language of the internet!
9.6.2. Getting started with the json
module#
The json module in Python provides simple methods for working with JSON
data. Here’s what you’ll need to know:
Importing the JSON Module: Before we use the
json
module, we need to import it into our Python script.Reading JSON Files: We can read data from a
JSON
file and convert it into Python objects using thejson.load()
function.Extracting Information: Once the data is loaded into Python, we can easily access and manipulate it.
Let’s look at an example. We will be working with the fish.json
file.
# importing the json module
import json
with open('assets/fish.json') as fish_file:
fishes: List[Dict[str, str]] = json.load(fish_file) # load the json data into a Python list
for fish_dict in fishes:
print(fish_dict)
{'Species': 'Clownfish', 'Color': 'Orange and white', 'Habitat': 'Coral reefs'}
{'Species': 'Guppy', 'Color': 'Varied', 'Habitat': 'Freshwater streams'}
{'Species': 'Angelfish', 'Color': 'Black, white, and yellow', 'Habitat': 'Freshwater rivers'}
{'Species': 'Betta', 'Color': 'Varied', 'Habitat': 'Freshwater ponds'}
{'Species': 'Tetra', 'Color': 'Blue, red, and yellow', 'Habitat': 'Freshwater rivers'}
{'Species': 'Discus', 'Color': 'Blue, green, and brown', 'Habitat': 'Amazon River'}
See, we have dictionaries again!
9.7. Exercises#
Let’s practice! Mind that each exercise is designed with multiple levels to help you progressively build your skills. Level 1 is the foundational level, designed to be straightforward so that everyone can successfully complete it. In Level 2, we step it up a notch, expecting you to use more complex concepts or combine them in new ways. Finally, in Level 3, we get closest to exam level questions, but we may use some concepts that are not covered in this notebook. However, in programming, you often encounter situations where you’re unsure how to proceed. Fortunately, you can often solve these problems by starting to work on them and figuring things out as you go. Practicing this skill is extremely helpful, so we highly recommend completing these exercises.
For each of the exercises, make sure to add a docstring
and type hints
, and do not import any libraries unless specified otherwise.
9.7.1. Exercise 1#
Level 1: Let’s practice what we’ve just learned. For this exercise, you are tasked with writing a function called add_to_shopping_list()
that takes a list of strings as input, where each string represents a shopping item. You need to add these items, one by one, onto a new line in a text file called shopping_list.txt
, which should be saved in the assets
folder. Each time a new list of items is added to the file, the function should also print the entire content of the file to the terminal.
Example input: you pass this argument to the parameter in the function call.
items_to_add List[str] = ["Apples", "Bananas", "Bread"]
Example output: as well as in termin as the text file
Apples
Bananas
Bread
# TODO.
Level 2: Modify the add_to_shopping_list()
function so that it has an extra parameter, append. If append is set to False, the function should overwrite the existing shopping_list.txt file instead of appending to it. Consequently, if it is True, it should append to the file.
Example input:
shopping_list: List[str] = ['cat', 'dog', 'monkey']
append: bool = False
Example output:
'cat'
'dog'
'monkey'
# TODO.
Level 3: Further enhance the add_to_shopping_list()
function to include error handling. If an invalid input is provided (e.g., a non-list type), the function should raise a ValueError
with an appropriate message. Additionally, print a confirmation message every time items are successfully added.
Example input:
shopping_list: Tuple[str] = ("Apples, Bananas, Bread")
append: bool = True
Example output:
ValueError: Input must be a list of strings.
# TODO.
9.7.2. Exercise 2#
Level 1: For this exercise, you are provided with a CSV file called flowers.csv stored in the assets folder. The file contains information about different types of flowers, with columns “Name” and “Color”. Write a function called read_flowers_from_csv() that reads the CSV file and returns its content as a list of dictionaries.
Example output:
[
{"Name": "Rose", "Color": "Red"},
{"Name": "Sunflower", "Color": "Yellow"},
{"Name": "Tulip", "Color": "Pink"}
]
# TODO.
Level 2: Modify the read_flowers_from_csv()
function to return the list only if the CSV file contains data. If it’s empty, return an empty list.
# TODO.
Level 3: Enhance the read_flowers_from_csv()
function to allow filtering by color. The function should take a parameter, filter_color
, and only return the flowers that match the given color.
Example input:
color: str = 'Yellow'
Example output:
[
{"Name": "Sunflower", "Color": "Yellow"}
]
# TODO.
9.7.3. Exercise 3#
Level 1: You are provided with a JSON file called pokemons.json
stored in the assets folder. The file contains a list of Pokémon, with each Pokémon having a "name"
and "type"
. Write a function called read_pokemons_from_json()
that reads the JSON file and returns its content as a list of dictionaries.
Example output:
[
{"name": "Pikachu", "type": "Electric"},
{"name": "Charmander", "type": "Fire"},
{"name": "Bulbasaur", "type": "Grass"},
{"name": "Squirtle", "type": "Water"},
{"name": "Raichu", "type": "Electric"},
{"name": "Vulpix", "type": "Fire"},
{"name": "Jolteon", "type": "Electric"},
{"name": "Oddish", "type": "Grass"},
{"name": "Poliwag", "type": "Water"},
{"name": "Growlithe", "type": "Fire"},
{"name": "Electabuzz", "type": "Electric"},
{"name": "Tentacool", "type": "Water"}
]
# TODO.
Level 2: Modify the read_pokemons_from_json()
function to add another feature. If a Pokémon’s type appears more than once in the list (i.e., multiple Pokémon of the same type exist), group the Pokémon by their type. The function should return a dictionary where the keys are the types, and the values are lists of Pokémon with that type.
Expected output:
{
"Electric": [
{"name": "Pikachu", "type": "Electric"},
{"name": "Raichu", "type": "Electric"},
{"name": "Jolteon", "type": "Electric"},
{"name": "Electabuzz", "type": "Electric"}
],
"Fire": [
{"name": "Charmander", "type": "Fire"},
{"name": "Vulpix", "type": "Fire"},
{"name": "Growlithe", "type": "Fire"}
],
"Grass": [
{"name": "Bulbasaur", "type": "Grass"},
{"name": "Oddish", "type": "Grass"}
],
"Water": [
{"name": "Squirtle", "type": "Water"},
{"name": "Poliwag", "type": "Water"},
{"name": "Tentacool", "type": "Water"}
]
}
# TODO.
Level 3: Enhance the read_pokemons_from_json()
function to include an additional parameter, sort_by_name
. If sort_by_name=True
, the function should return the Pokémon grouped by type, but also sorted alphabetically by their name within each type group.
Example input:
sort_by_name = True
Example output:
{
"Electric": [
{"name": "Electabuzz", "type": "Electric"},
{"name": "Jolteon", "type": "Electric"},
{"name": "Pikachu", "type": "Electric"},
{"name": "Raichu", "type": "Electric"}
],
"Fire": [
{"name": "Charmander", "type": "Fire"},
{"name": "Growlithe", "type": "Fire"},
{"name": "Vulpix", "type": "Fire"}
],
"Grass": [
{"name": "Bulbasaur", "type": "Grass"},
{"name": "Oddish", "type": "Grass"}
],
"Water": [
{"name": "Poliwag", "type": "Water"},
{"name": "Squirtle", "type": "Water"},
{"name": "Tentacool", "type": "Water"}
]
}
# TODO.
Material for the VU Amsterdam course “Introduction to Python Programming” for BSc Artificial Intelligence students. These notebooks are created using the following sources:
Learning Python by Doing: This book, developed by teachers of TU/e Eindhoven and VU Amsterdam, is the main source for the course materials. Code snippets or text explanations from the book may be used in the notebooks, sometimes with slight adjustments.
Python for Text Analysis: For this particular notebook on working with files, we’ve drawn inspiration from the VU Master’s course Python for Text Analysis offered by the Humanities department.