4  Intro to numpy

In this lecture we will get to know and become experts in:

  1. Introduction to numpy
  2. Introduction to Simulating Probabilistic Events
import numpy as np
from numpy.random import default_rng

Introduction to numpy

NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.

  1. Vectorized, fast mathematical operations.
  2. Key features of NumPy is its N-dimensional array object, or ndarray
height = [1.79, 1.85, 1.95, 1.55]
weight = [70, 80, 85, 65]

#bmi = weight/height**2
height = np.array([1.79, 1.85, 1.95, 1.55])
weight = np.array([70, 80, 85, 65])

bmi = weight/height**2
bmi
array([21.84700852, 23.37472608, 22.35371466, 27.05515088])

Multiple Dimensions

are handled naturally by numpy, e.g.

hw1 = np.array([height, weight])
print(hw1)
print(hw1.shape)
hw2 = hw1.transpose()
print(hw2)
print(hw2.shape)
[[ 1.79  1.85  1.95  1.55]
 [70.   80.   85.   65.  ]]
(2, 4)
[[ 1.79 70.  ]
 [ 1.85 80.  ]
 [ 1.95 85.  ]
 [ 1.55 65.  ]]
(4, 2)

Accessing array elements

is similar to lists but allows for multidimensional index:

print(hw2[0,1])
70.0
print(hw2[:,0])
[1.79 1.85 1.95 1.55]
print(hw2[0])
#equivalent to
print(hw2[0,:])
#shape:
print(hw2[0].shape)
[ 1.79 70.  ]
[ 1.79 70.  ]
(2,)

To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:

print(hw2[[2,0,1]])
[[ 1.95 85.  ]
 [ 1.79 70.  ]
 [ 1.85 80.  ]]

Negative indices

print(hw2)
print("Using negative indices selects rows from the end:")
print(hw2[[-2,-1]])
[[ 1.79 70.  ]
 [ 1.85 80.  ]
 [ 1.95 85.  ]
 [ 1.55 65.  ]]
Using negative indices selects rows from the end:
[[ 1.95 85.  ]
 [ 1.55 65.  ]]

You can pass multiple slices just like you can pass multiple indexes:

hw2[:2,:1]
array([[1.79],
       [1.85]])

Reshaping

np.arange(32).reshape((8, 4))
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

Boolean indexing

height_gt_185 = hw2[:,0]>1.85
print(height_gt_185)
print(hw2[height_gt_185,1])
[False False  True False]
[85.]

numpy arrays cannot contain elements with different types. If you try to build such a list, some of the elements’ types are changed to end up with a homogeneous list. This is known as type coercion.

print(np.array([True, 1, 2]))
print(np.array(["True", 1, 2]))
print(np.array([1.3, 1, 2]))
[1 1 2]
['True' '1' '2']
[1.3 1.  2. ]

Lots of extra useful functions!

np.zeros((2,3))
#np.ones((2,3))
array([[0., 0., 0.],
       [0., 0., 0.]])
np.eye(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
np.column_stack([height, weight])
array([[ 1.79, 70.  ],
       [ 1.85, 80.  ],
       [ 1.95, 85.  ],
       [ 1.55, 65.  ]])

Data Summaries in numpy

We can compute simple statistics:

print(np.mean(hw2))
print(np.mean(hw2, axis=0))
38.3925
[ 1.785 75.   ]
print(np.unique([1,1,2,1,2,3,2,2,3]))

print(np.unique([1,1,2,1,2,3,2,2,3], return_counts=True))
[1 2 3]
(array([1, 2, 3]), array([3, 4, 2], dtype=int64))

Introduction to Simulating Probabilistic Events

Generating Data in numpy

Meet your friends:

  1. np.random.permutation: Return a random permutation of a sequence, or return a permuted range
  2. np.random.integers: Draw random integers from a given low-to-high range
  3. np.random.choice: Generates a random sample from a given 1-D array
# Do this (new version)
from numpy.random import default_rng
rng = default_rng()

x= np.arange(10)
print(x)
print(rng.permutation(x))
print(rng.permutation(list('intelligence')))
[0 1 2 3 4 5 6 7 8 9]
[6 7 9 4 1 0 3 8 2 5]
['t' 'c' 'n' 'l' 'e' 'n' 'i' 'e' 'e' 'l' 'i' 'g']
print(rng.integers(0,10,5))
print(rng.integers(0,10,(5,2)))
[7 9 7 9 4]
[[9 0]
 [8 6]
 [6 7]
 [0 5]
 [1 5]]
rng.choice(x,4)
array([8, 5, 1, 4])

Examples:

  • Spotify playlist
  • Movie List
movies_list = ['The Godfather', 'The Wizard of Oz', 'Citizen Kane', 'The Shawshank Redemption', 'Pulp Fiction']

# pick a random choice from a list of strings.
movie = rng.choice(movies_list,2)
print(movie)
['The Shawshank Redemption' 'The Godfather']

Birthday “Paradox”

Please enter your birthday on google drive https://forms.gle/CeqyRZ4QzWRmJFvs9

How many people do you think will share a birthday? Would that be a rare, highly unusual event?

How can we find out how likely it is that across \(n\) folks in a room at least two share a birthday?

Hint: can we put our random number generators to task ?

# Can you simulate 25 birthdays?
from numpy.random import default_rng 
rng = default_rng()


#initialize it to be the empty list:
shardBday = []

n = 40

PossibleBDays = np.arange(1,366)
#now "draw" 25 random bDays:
for i in range(1000):# is the 1000 an important number ??
#no it only determines the precision of my estimate !!
  ran25Bdays = rng.choice(PossibleBDays, n, replace = True)
  #it is of utmost importance to allow for same birthdays !! 
  #rng.choice(PossibleBDays, 366, replace = False)
  x , cts = np.unique(ran25Bdays ,return_counts=True)
  shardBday = np.append(shardBday, np.sum(cts>1))#keep this !!
  #shardBday = np.sum(cts>1)
#np.sum(shardBday>0)/1000
np.mean(shardBday > 0)

#shardBday = 2
0.893
5 != 3 #not equal
True
#Boolean indexing !!
x[cts > 1]
array([ 71, 192])
x[23]
182
#can you design a coin flip with an arbitary probability p = 0.25
#simulate 365 days with a 1/4 chance of being sunny

#fair coin
coins = np.random.randint(0,2,365)

np.unique(coins, return_counts=True)
(array([0, 1]), array([189, 176], dtype=int64))

Tossing dice and coins

Let us toss many dice or coins to find out: - the average value of a six-faced die - the variation around the mean when averaging - the probability of various “common hands” in the game Liar’s Dice: * Full house: e.g., 66111 * Three of a kind: e.g., 44432 * Two pair: e.g., 22551 * Pair: e.g., 66532

Some real world problems:

  1. Overbooking flights: airlines
  2. Home Office days: planning office capacities and minimizing social isolation