23. Using NumPy | LorenaGonzalezJuarez

Edit me

23.1. What are NumPy Arrays?

NumPy is a fundamental package for many math and science computations using Python. NumPy provides an array that can have N-dimensions. NumPy serves as the base for many packages that can extend its capabilities, such as, pandas and scikit-learn. It is a lean package focusing on operations on arrays. These other packages add functionality to it. However, it is a powerful package for many higher-level math operations.

Overview of NumPy

Installing and Using NumPy

Installing NumPy is as simple as using pip install:

python -m pip install numpy

Once the package is installed, it must be imported to use it. Enter into the REPL, by entering python. Once inside the REPL, standard practice is to import it as np:

import numpy as np

Arrays

NumPy operates on arrays. A 1-d array can be thought of as a Python list or as a column from a database. A multi-dimensional array can be thought of as many lists or columns in a database. While this is not exactly what the array is, it’s a useful way to imagine them. If you create arrays using the array module, all elements of the array must be of the same type.

Create Array

Let’s create an array that holds the first ten integers:

>>> import numpy as np
>>> simple_array = np.arange(10)
>>> simple_array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>

We can also do the same thing, but instead of integers we’ll use floats by using dtypes:

>>> float_array = np.arange(10, dtype=float)
>>> float_array
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>>

It is possible to change the type of the elements in an array using astype. Let’s convert our float array to complex numbers:

>>> complex_array = float_array.astype(complex)
>>> complex_array
array([0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 5.+0.j, 6.+0.j, 7.+0.j,
       8.+0.j, 9.+0.j])
>>>

We can also create a NumPy array for a list or data file.

Let’s load int_list as an array named example_array:

int_list = [1297, 603, 1071, 539, 1222, 1424, 986, 397, 970, 1102, 499, 533, 908, 559, 386, 1183, 595, 69, 1141, 76, 863, 1343, 185, 895, 1312, 50, 918, 677, 394, 629, 1317, 944, 466, 751, 1050, 301, 415, 784, 19, 1395, 1223, 979, 252, 1155, 59, 107, 632, 995, 972, 867, 332, 751, 810, 50, 55, 218, 997, 1085, 475, 1494]
>>> int_list = [1297, 603, 1071, 539, 1222, 1424, 986, 397, 970, 1102, 499, 533, 908, 559, 386, 1183, 595, 69, 1141, 76, 863, 1343, 185, 895, 1312, 50, 918, 677, 394, 629, 1317, 944, 466, 751, 1050, 301, 415, 784, 19, 1395, 1223, 979, 252, 1155, 59, 107, 632, 995, 972, 867, 332, 751, 810, 50, 55, 218, 997, 1085, 475, 1494]
>>> int_array = np.array(int_list)
>>> int_array.dtype
dtype('int64')
>>> int_array
array([1297,  603, 1071,  539, 1222, 1424,  986,  397,  970, 1102,  499,
        533,  908,  559,  386, 1183,  595,   69, 1141,   76,  863, 1343,
        185,  895, 1312,   50,  918,  677,  394,  629, 1317,  944,  466,
        751, 1050,  301,  415,  784,   19, 1395, 1223,  979,  252, 1155,
         59,  107,  632,  995,  972,  867,  332,  751,  810,   50,   55,
        218,  997, 1085,  475, 1494])
>>>

Let’s load the same information from a file named scores.csv. We will use np.genfromtxt function to read the file and create an array out of its content. First, let’s download the scores.csv file we want to use:

$ curl -O https://raw.githubusercontent.com/linuxacademy/content-using-pythons-maths-science-and-engineering-libraries/master/scores.csv 
>>> scores_array = np.genfromtxt('scores.csv', delimiter = ',', dtype=int)
>>> scores_array
array([1297,  603, 1071,  539, 1222, 1424,  986,  397,  970, 1102,  499,
        533,  908,  559,  386, 1183,  595,   69, 1141,   76,  863, 1343,
        185,  895, 1312,   50,  918,  677,  394,  629, 1317,  944,  466,
        751, 1050,  301,  415,  784,   19, 1395, 1223,  979,  252, 1155,
         59,  107,  632,  995,  972,  867,  332,  751,  810,   50,   55,
        218,  997, 1085,  475, 1494])
>>>

Index and Slice an Array

Slicing an array in NumPy looks like slicing a list in Python.

From scores_array, let’s count the elements by using .size:

>>> scores_array.size
60
>>>

Let’s make an array with the 2nd, 3rd, and 4th element of scores_array. It should be equal to [1071, 539, 1222]:

>>> scores_array[2:5]
array([1071,  539, 1222])
>>>

As you can see, the indexing and slicing works the same here as it does in basic Python.

Basic Operations on NumPy Arrays

Information on these operations can be found here.

Printing

Printing an array is as simple as using print:

>>> print(scores_array)
[1297  603 1071  539 1222 1424  986  397  970 1102  499  533  908  559
  386 1183  595   69 1141   76  863 1343  185  895 1312   50  918  677
  394  629 1317  944  466  751 1050  301  415  784   19 1395 1223  979
  252 1155   59  107  632  995  972  867  332  751  810   50   55  218
  997 1085  475 1494]
>>>

Note this array doesn’t possess commas as the genfromtxt() does.

Simple Math Operations

In NumPy, math operations are elementwise. That is, NumPy applies the math operand to the zeroth element of each array, and that is the zeroth element of a new array. It does this for each element in the arrays. You can perform all basic math operands on these arrays.

Let’s raise each element in the scores_array to the zero power:

>>> scores_array ** 0
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> scores_array
array([1297,  603, 1071,  539, 1222, 1424,  986,  397,  970, 1102,  499,
        533,  908,  559,  386, 1183,  595,   69, 1141,   76,  863, 1343,
        185,  895, 1312,   50,  918,  677,  394,  629, 1317,  944,  466,
        751, 1050,  301,  415,  784,   19, 1395, 1223,  979,  252, 1155,
         59,  107,  632,  995,  972,  867,  332,  751,  810,   50,   55,
        218,  997, 1085,  475, 1494])
>>>

As you can see, the resulting array is an array of 1’s. Remember: Raising a number to the exponent 0 results in 1. You can also see that this created a new array and did not affect the original array.

What Did We Learn?

In this lesson, we looked at NumPy arrays. We learned to create them by various means, including reading a CSV file. We learned how to change the type stored in the array and how to index and slice. Finally, we discussed basic operations on arrays, such as printing and simple math. For more information on NumPy arrays, take a look at the official documentation.

23.2 Reshaping a Numpy Array Into a Matrix

NumPy has a matrix function, but the documentation indicates that this function is being deprecated, and using multi-dimension arrays is strongly suggested instead. We will be doing just that in this lesson: creating multi-dimension arrays that are matrices.

Arrays, or vectors, are one-dimensional. That is, they have a single column of items. A matrix is a rectangular arrangement of n rows and m columns, where n and m are greater than one. (For example: A 1 by 5 matrix is an array.)

Vectors (arrays) and matrices are essential to many science, math, and engineering operations. With these, we are reminded of linear algebra.

Note: If you have never used NumPy before, we strongly suggest you see the first lesson in this section, “What are NumPy Arrays?”.

Matrices

Create a Matrix From a Single Array

A matrix can be formed from a single array by re-arranging it. From the last lesson, we looked at int_list as an array. Remember to import NumPy.

>>> import numpy as np
>>> int_list = [1297, 603, 1071, 539, 1222, 1424, 986, 397, 970, 1102, 499, 533, 908, 559, 386, 1183, 595, 69, 1141, 76, 863, 1343, 185, 895, 1312, 50, 918, 677, 394, 629, 1317, 944, 466, 751, 1050, 301, 415, 784, 19, 1395, 1223, 979, 252, 1155, 59, 107, 632, 995, 972, 867, 332, 751, 810, 50, 55, 218, 997, 1085, 475, 1494]
>>> int_array = np.array(int_list)
>>> print(int_array)
[1297  603 1071  539 1222 1424  986  397  970 1102  499  533  908  559
  386 1183  595   69 1141   76  863 1343  185  895 1312   50  918  677
  394  629 1317  944  466  751 1050  301  415  784   19 1395 1223  979
  252 1155   59  107  632  995  972  867  332  751  810   50   55  218
  997 1085  475 1494]

shape is a tuple that gives dimensions of the array.

>>> int_array.shape
(60,)

Let’s reshape the int_array into a 10 x 6 matrix:

>>> int_matrix = int_array.reshape(10, 6)
>>> print(int_matrix)
[[1297  603 1071  539 1222 1424]
 [ 986  397  970 1102  499  533]
 [ 908  559  386 1183  595   69]
 [1141   76  863 1343  185  895]
 [1312   50  918  677  394  629]
 [1317  944  466  751 1050  301]
 [ 415  784   19 1395 1223  979]
 [ 252 1155   59  107  632  995]
 [ 972  867  332  751  810   50]
 [  55  218  997 1085  475 1494]]
 >>> int_matrix
array([[1297,  603, 1071,  539, 1222, 1424],
       [ 986,  397,  970, 1102,  499,  533],
       [ 908,  559,  386, 1183,  595,   69],
       [1141,   76,  863, 1343,  185,  895],
       [1312,   50,  918,  677,  394,  629],
       [1317,  944,  466,  751, 1050,  301],
       [ 415,  784,   19, 1395, 1223,  979],
       [ 252, 1155,   59,  107,  632,  995],
       [ 972,  867,  332,  751,  810,   50],
       [  55,  218,  997, 1085,  475, 1494]])
>>>

As you can see above, while we are calling this a matrix and it will function as a matrix, it is actually an array type.

Stacking Arrays

Two functions can be used to stack arrays, vstack and hstack. vstack stacks the arrays vertically and hstack stacks the arrays horizontally:

>>> a = np.array([1, 2, 3])
>>> print(a)
[1 2 3]
>>> b = np.array([4, 5, 6])
>>> print(b)
[4 5 6]
>>> c = np.vstack((a, b))
>>> print(c)
[[1 2 3]
 [4 5 6]]
>>> d = np.hstack((a,b))
>>> print(d)
[1 2 3 4 5 6]
>>>

These functions can also be used on a matrix, however the matrices must be of the same dimensions. As an example, let’s do a horizontal stack of a, b, and c:

>>> np.hstack((a,b,c))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in hstack
  File "/Users/lfritts/.pyenv/versions/621-3.8.2/lib/python3.8/site-packages/numpy/core/shape_base.py", line 343, in hstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 2 has 2 dimension(s)

This is also an example of a well-formed error message. It is specific in describing the problem. a (index 0) has one dimension, and c (index 2) has two dimensions.

The concatenate function

concatenate joins arrays and/or matrices along an axis. Axis 0 stacks them horizontally, axis 1 stacks them vertically, and axis None flattens them into an array.

axis = None

>>> a = np.array([[1, 2], [3, 4]])  # this is a matrix
>>> b = np.array([[5, 6]])
>>> np.concatenate((a,b), axis=None) # should flatten them
array([1, 2, 3, 4, 5, 6])
>>>

axis = 0

>>> np.concatenate((a,b), axis = 0)
array([[1, 2],
       [3, 4],
       [5, 6]])
>>>

Notice the element in b was appended to the items in a. But what would happen if b = [5] instead of b = [5, 6]? An error would occur.

>>> b = np.array([5])
>>> np.concatenate((a,b), axis = 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
>>>

Remember, the dimensions must be the same along the axis you are joining.

axis = 1

>>> b = np.array([[5, 6]])
>>> np.concatenate((a,b), axis = 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2 and the array at index 1 has size 1
>>>

Please take a minute and determine what the problem is before reading on for the fix.

>>> b = np.array([[5, 6], [7, 8]])
>>> np.concatenate((a,b), axis = 1)
array([[1, 2, 5, 6],
       [3, 4, 7, 8]])
>>>

What Did We Learn

In this lesson, we learned various ways of making matrices and ways to concatenate matrices. It is important to consider the dimensions when concatenating arrays or matrices, regardless of the method.

23.3. Math Operations on Arrays/Matrices

NumPy allows for simple math and linear algebra operations to be used on arrays and matrices. In this lesson, we will look at simple math operands. Here are some linear algebra operations.

Note: If you have never used NumPy before, we strongly suggest you see the first two lessons in this section, What are NumPy Arrays? and Reshaping a Numpy Array into a Matrix.

Math Operands on Arrays and Matrices

Let’s try addition on two arrays. Note that these arrays need the same amount of elements in both arrays when adding:

>>> a = np.array([1, 2, 3, 4, 5])
>>> b = np.array([10, 20, 30, 40, 50])
>>> c = a + b
>>> print(c)
[11 22 33 44 55]
>>>

You can also perform operations between an array and a single value:

>>> print(a)
[1 2 3 4 5]
>>> a / 2.5
array([0.4, 0.8, 1.2, 1.6, 2. ])
>>>

Notice the ints in the array were changed to floats.

Linear Algebra

In this lesson, we looked at simple math operands. You can also do linear algebra on arrays and matrices. For more information, see here.

Hands-On Lab 23: Creating a Matrix Using NumPy Arrays

Tags: