Why do I need to know about arrays?

In the first tutorials, we did nothing more than use python as a glorified calculator. In order to really see the power of computational mathematics, we have to start thinking of what a computer can do that humans can't- or more accurately in this case- don't want to do. Computers are machines, so they are very good at doing lots of things very quickly. Because of this, they are ideal for working with many numbers at once. Say for instance we have an equation which gives the final price of a stock option

$$ P_{\mathrm{final}} = P_{\mathrm{start}}e^{kt}, \mbox{ where } k \mbox{ is a constant}$$

For a given starting price, I can use the equation to compute the final price. But perhaps I need the final price for a whole set of starting prices. Or perhaps I want to know how the value of a given option increases as a function of time- i.e. I need to know the value of $P_{\mathrm{final}}$ for a whole set of values $t$. In these cases, I wouldn't want to do all of the separate calculations by hand.

Using numpy for arrays

In [1]:
import numpy as np

As before we import the numpy library (most of the code you write will have this line at the start!). We have already used some of numpy's functions (np.sin and np.cos etc.), but now we are going to use numpy to build 'arrays' (basically lists of numbers, or vectors).

The basic command to build an array is

np.array( [ 1, 2 , 3, 4 ] )

note that the spacing doesn't matter, but I've spaced this all out so you can see the key components. A one dimensional array (i.e. a row or column vector) is enclosed in [square] brackets, and the elements (i.e. the individual numbers in my array) are separated by commas.

Let's define a few arrays to use:

In [3]:
y=np.array([2,4,6,8])     # vector with 4 elements
z=np.array([3,4,5,7])     # vector with 4 elements
t=np.array([1,2,3,4,5])   # vector with 5 elements

We can add or subtract two arrays of the same size

In [4]:
print(y+z)
print(z-y)
[ 5  8 11 15]
[ 1  0 -1 -1]

But we'll get an error if we use differently sized arrays

In [5]:
print(t+z)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-bd7d45739d8d> in <module>()
----> 1 print(t+z)

ValueError: operands could not be broadcast together with shapes (5,) (4,) 

Other simple operations include adding or multiplying (equivalently, subtracting or dividing) by a constant. y+2 adds 2 to each element in the vector while z*3 multiplies each element by 3.

We can also compute the elementwise product of two arrays of the same size using the * operator. For two arrays of length $n$: $(a_1,\: a_2,\: \ldots,\: a_n)$ and $(b_1,\: b_2,\: \ldots,\: b_n)$ the elementwise product is the vector $(a_1\times b_1, \: a_2 \times b_2 , \: \ldots, \: a_n \times b_n)$.

In [6]:
print(y*z)
[ 6 16 30 56]

(Note that this is different from the dot product of two vectors which you study in linear algebra. You can do that in python, but we won't look at it in this course)

Any numpy function (np.sin, np.log) etc. can be applied to each element in a arrays:

e.g. taking

v=np.array([ 1, 2, 3])
np.sin(v)

gives back an array: $\quad [\sin(1), \: \sin(2), \: \sin(3)]$

In [8]:
v=np.array([1,2,3])
print(v)
print(np.sin(v))
[1 2 3]
[ 0.84147098  0.90929743  0.14112001]

Array elements

Having all the information packaged up into one convenient vector is very useful, but sometimes we need to pick out particular elements of vectors. To do this we use square brackets to reference the elements we want. For instance

In [5]:
x = np.array([1,10,100,1000])
print (x[0])
print (x[3])
print (x[-1])
y=np.sin(x)
print (y[2])
1
1000
1000
-0.5063656411097588

The first element in an array x is referenced by x[0]. This might seem counterintuitive, but in actual fact, you will see this a lot in mathematical language too: $x_n = 10^n \mbox{ for } n \in \{0,1,2,3\}$, would give $x_0, x_1, x_2$ and $x_3$, which corresponds exactly to what is encoded above. Notice the rather strange looking x[-1] though. This refers to the last element in the array. In this case then, we see that x[3] = x[-1] are the same element. (You can also have x[-2] to mean the second last, and so on).

Two very useful commands

So far we have been constructing arrays by typing in the values we want. However, ordinarily if we want to compute the result of a calculation for different inputs, we want many more inputs than we would type in by hand.

For instance, if I want to know the value of $\sin(x)$ for 100 values of $x$ between 0 and $2\pi$, I don't want to work out what the spacing bewteen the points is, calculate the values of x and type them all into an array.

Thankfully numpy has two built in functions which can help us build arrays. They do similar but subtly different things.

The first is called np.linspace which builds an array with a particular number of points between two limits. The input variables for linspace are the lower limit, the upper limit and the number of points: so if I want 100 equally spaced points for $x$ between 0 and $2\pi$ I type

x=np.linspace(0,2*np.pi,100)

The points are equally, or linearly spaced: hence the name: np.linspace. This is really good if we know the range of values that we want, and we don't necessarily care about the spacing.

In [12]:
x=np.linspace(0,1,10)
print (x)
[ 0.          0.11111111  0.22222222  0.33333333  0.44444444  0.55555556
  0.66666667  0.77777778  0.88888889  1.        ]

The second command is useful when you want to set the spacing between the values. In the above example we wanted 10 points between 0 and 1, which you might think would mean you get an array [0, 0.1, 0.2 ...]. But if you count that up, you'll realise you get to ten points at 0.9, and not 1.0.

As you can see, this isn't particularly intuitive- if we have to count in our heads how many elements there will be in the array we're likely to get things wrong. A better route for this type of array is to use the command np.arange, which builds an array with a particular spacing between two limits. The input variables are the lower limit, the upper limit and the gap between the points. Hence to create the vector above I type

x=np.arange(0,1,0.1)

However, there are some subtleties to this: executing this command I actually get

x=np.array([ 0. ,  0.1,  0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] 

I.E. I don't get the last element- 1.0. This is because the upper limit we pass to numpy is not included in the range. So we must instead type

x=np.arange(0,1.1,0.1)

or infact replace '1.1' with any $N$, $1.0<N\leq1.1$gives me 10 values.

This is confusing for people, but if you understand the motivation it might help. Typing

In [13]:
print (np.arange(10))
[0 1 2 3 4 5 6 7 8 9]

gives me 10 values. This is because the default behaviour is to start at 0, and use a spacing of 1. Think of the arange command as performing the operation

start at lower limit, move up in steps, stop before you reach the upper limit.

Exercises

Once you have read this information then proceed to try the exercises "2.0 Arrays" and "2.1 Debug Arrays"