06 - Files and modules

Files

In Python, reading and writing file contents is very convenient. A function called open() is used to get an object corresponding to a file, similar to C language. Its arguments are the filename and file access mode (r - read, w - write (overwrites existing content), a - appends at the end of the file, r+ - read and write):

file_r = open('data.txt', 'r')      # opens a file for reading
data = file_r.read()                # reads the whole file and saves to a string variable
file_r.close()                      # closes the file

file_w = open('new_data.txt', 'w')  # opens a new file for writing
file_w.write(data)                  # writes contents of a variable to a file
file_w.close()                      # closes the second file

The text can be easily split into a list of strings, each containing a single line:

file = open('data.txt', 'r')

list_of_lines = file.read().splitlines()

for line in list_of_lines:
    print(line)

file.close()

Very often data is written to text files, with variables delimited with a specific character, for example:

1;2;3;4
45;34;12;32;54;21
4;5;6332;23;2

In above example, consecutive numbers are separated with a semicolon (;). The easiest way to separate a line into multiple elements is to use the split() function. The function returns a list of strings. The example below will read the file line by line, and sum numbers in each of the lines.

file = open('numbers_to_sum.txt', 'r')

for line in file.read().splitlines()    # for each line in file
    numbers_strings = line.split(';')   # split the line into a list of strings
                                        # the strings have to be converted to numeric values

    numbers = []                        # create an empty list

    for nbr in numbers_strings:         # for each element in the list of strings
        numbers.append(int(nbr))        # convert the string to an integer and add to the list

    print(sum(numbers))                 # sum all the integers

file.close()

10
198
6366

Modules

A lot of useful functionality is not built into the base interpreter, but available in the form of modules. A module is a library containing a specific set of functions (similar to C/C++ libraries). Some of modules are included in the default Python installation, other have to be installed using pip package manager. It is also possible to write custom modules. Usually, modules are loaded at the beginning of the script, using import statement, for example:

import os
import glob

# [...]

After importing a library, all its functionality is available under the object of the same name (library_name.function_name()), for example os.chdir().

`requests` module

The requests is a quick and simple library for HTTP access (the protocol which is used by web browsers). To download and print the source code of a webpage, two commands are enough:

import requests

req = requests.get("http://google.com")
print(req.text)

Many services are available as web APIs, where the response can contain some requested information.

NumPy

NumPy is a popular library for convenient matrix operations, providing funcitonality very similiar to MATLAB.

Import the library at the beginning of the script using the following command:

import numpy as np

Above command differs from the previously used import statements - the library is imported under the name np to shorten the following code. This means all its functions will be accessed as np.function_name, not numpy.function_name. This convention (np as an alias for numpy) is universally used. It is strongly not recommended to create own abbreviation, as it leads to indecipherable code.

Creating matrices

The easiest way to create a NumPy matrix is to convert a Python list using np.array() function. For 1-dimensional matrices (vectors) individial element access is the same as in Python lists. Remember that indexing, contrary to MATLAB, is 0-based:

a = np.array([-1, 3.14, 0]) # creates a 1D matrix (vector)
print("Dimensions:", a.shape)
print(a)
a[0] = 5
print(a)

Similarly, it is possible to create a 2D matrix based on a list of lists (rows):

b = np.array([[10, 20, 30], [41, 51, 61]]) # creates a 2 by 3 matrix
print("Dimensions:", b.shape)
print("Number of rows:", b.shape[0])
print("Number of columns:", b.shape[1])
print("Various elements elementy:", b[0, 0], b[0, 1], b[1, 0])

Commonly used types of matrices can be generated using following functions:

c = np.zeros([2, 2])         # initialized with zeros
print("zeros:")
print(c)
print()

d = np.ones([1, 2])          # initialized with ones
print("ones:")
print(d)
print()

e = np.full([2, 2], 7)       # initialized with a constant value
print("full:")
print(e)
print()

f = np.eye(4)                # identity matrix
print("eye:")
print(f)
print()

g = np.random.random([2, 3]) # random values from uniform <0...1> range
print("random:")
print(g)

The shape attribute returns a tuple, but you can access its elements the same way as with lists, using square brackets.

Often its required to create a vector of uniformly distributed values. Two functions can be used for that purpose: np.arange(start, stop, step) and np.linspace(start, stop, num):

x = np.arange(10, 30, 5)           # vector from 10 to 30 (right-open interval), with step of 5
print("arange:", x)
print()

y = np.linspace(0, 2*np.pi, 10)    # vector from 0 to 2pi (closed interval), with 10 elements
print("linspace:", y)

Basic math operations

With NumPy arrays, basic math operations (addition, subtraction, multiplication, division, exponentiation) are done with standard operators element-wise. Inpu matrices have to be of compatible sizes, the result is returned as a new array. Additionally, basic functions compatible with arrays, such as np.sin, np.sqrt are available. Full list of math routines can be found in the documentation: https://docs.scipy.org/doc/numpy/reference/routines.math.html

a = np.array([20, 30, 40, 50])
b = np.arange(0, 4)
c = a - b
print("subtraction:")
print(a - b)

print("exponetiation by a scalar:")
print(b**2)

print("value of function 10*sin(a):")
print(10*np.sin(a))

Contrary to MATLABa, * operator used with numpy arrays performs the operation per-element To perform matrix multiplication, use @ operator:

A = np.array([[1, 0],
              [0, 1]])
B = np.array([[2, 0],
              [3, 4]])
              
print("per-element multiplication:")
print(A * B)
print()

print("matrix multiplication:")
print(A @ B)

Data plotting - Matplotlib

Matplotlib is a popular library used to generate plots in Python. It is tightly correlated to NumPy, and its interface is close to the one used for plotting in MATLAB. Usually matplotlib.pyplot is imported as plt:

import matplotlib.pyplot as plt

Check the example below:

x = np.linspace(0, 10, 1000)
y = x**2

plt.figure() # creates a new plot
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(["y=x^2"])
plt.show() # show the plot window

You should achieve the following result:

Matplotlib

Note that legend function accepts a list of labels, even if only a single one is passed.

Contrary to MATLAB, consecutive plot calls will not overwrite existing content. Formatting the plot is very similar to MATLAB syntax - you will find full descriptionin the library documentation: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html

x = np.linspace(0, 10, 1000)

plt.figure()
plt.plot(x, 20*np.sin(x), 'r')
plt.plot(x[::50], x[::50]**2, '^k')
plt.plot(x, 5*np.cos(x), '--g')
plt.legend(["y=20sin(x)", "y=x^2", "y=5cos(x)"])
plt.show()

It is possible to pass a matrix as plot y values, each column is treated as a separate value series:

plt.figure()
plt.plot(np.random.random((30, 2)))
plt.show()

And plotting figures with several subplots:

t = np.linspace(-np.pi, np.pi, 100)
plt.figure()
plt.subplot(1, 2, 1)
plt.plot(t, np.sin(t))
plt.subplot(1, 2, 2)
plt.plot(t, np.cosh(t))
plt.show()

Final assignments 🔥 🔨

🔨 🔥 Files 🔥 🔨

An input file with students' test grades is given as an input: students.txt. Read the grades and calculate the final grade (average of test grades).
Print surnames, names and final grades in the following format:

Doe Jane: 4.5
Jobs Dave: 2.0
Best Stephen: 2.0

🔨 🔥 Requests 🔥 🔨

Your dormitory roommate mines Bitcoin. Write a script for him, which will check current exchange rates for BTC with respect to common currencies. Use HTTP API available at https://blockchain.info/. The link to check a specific currency:

https://blockchain.info/tobtc?currency=USD&value=1

Check the output in a web browser.

The link is constructed as follows: https://blockchain.info/tobtc?currency= + currency_symbol + &value= + converted_amount. Your script should check the amount of BTC you can buy for 1 unit of the following currencies: USD, EUR, RUB, GBP, CHF, and print the output in console:

1 USD to BTC: 0.0001249
1 EUR to BTC: 0.00013677
1 RUB to BTC: 0.00000192
1 GBP to BTC: 0.0001536
1 CHF to BTC: 0.00012545

Avoid repetitive code, use loops where possible.

🔨 🔥 NumPy, Matplotlib 🔥 🔨

Create vector of x values from -5 to 5 (inclusive), with step of 0.1.
For those x arguments, calculate a series of Gaussian curves described by the following formula:

Gauss equation

With several sets of parameters:

Gauss parameters

Plot the curves in a single figure.
Generate a legend label list automatically based on the parameter list. Add a legend to the figure.

Authors: Jakub Tomczyński, Tomasz Mańkowski