import numpy as np
from numpy.random import default_rng4 Intro to numpy
In this lecture we will get to know and become experts in:
- Introduction to numpy
- DataCamp, Introduction to Python, Chap 4
- Multiple Dimensions
- Data Summaries in numpy
- Introduction to Simulating Probabilistic Events
Introduction to numpy
NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.
- Vectorized, fast mathematical operations.
- Key features of NumPy is its N-dimensional array object, or ndarray
height = [1.79, 1.85, 1.95, 1.55]
weight = [70, 80, 85, 65]
#bmi = weight/height**2height = np.array([1.79, 1.85, 1.95, 1.55])
weight = np.array([70, 80, 85, 65])
bmi = weight/height**2
bmiarray([21.84700852, 23.37472608, 22.35371466, 27.05515088])
Multiple Dimensions
are handled naturally by numpy, e.g.
hw1 = np.array([height, weight])
print(hw1)
print(hw1.shape)
hw2 = hw1.transpose()
print(hw2)
print(hw2.shape)[[ 1.79 1.85 1.95 1.55]
[70. 80. 85. 65. ]]
(2, 4)
[[ 1.79 70. ]
[ 1.85 80. ]
[ 1.95 85. ]
[ 1.55 65. ]]
(4, 2)
Accessing array elements
is similar to lists but allows for multidimensional index:
print(hw2[0,1])70.0
print(hw2[:,0])[1.79 1.85 1.95 1.55]
print(hw2[0])
#equivalent to
print(hw2[0,:])
#shape:
print(hw2[0].shape)[ 1.79 70. ]
[ 1.79 70. ]
(2,)
To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:
print(hw2[[2,0,1]])[[ 1.95 85. ]
[ 1.79 70. ]
[ 1.85 80. ]]
Negative indices
print(hw2)
print("Using negative indices selects rows from the end:")
print(hw2[[-2,-1]])[[ 1.79 70. ]
[ 1.85 80. ]
[ 1.95 85. ]
[ 1.55 65. ]]
Using negative indices selects rows from the end:
[[ 1.95 85. ]
[ 1.55 65. ]]
You can pass multiple slices just like you can pass multiple indexes:
hw2[:2,:1]array([[1.79],
[1.85]])
Reshaping
np.arange(32).reshape((8, 4))array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
Boolean indexing
height_gt_185 = hw2[:,0]>1.85
print(height_gt_185)
print(hw2[height_gt_185,1])[False False True False]
[85.]
numpy arrays cannot contain elements with different types. If you try to build such a list, some of the elements’ types are changed to end up with a homogeneous list. This is known as type coercion.
print(np.array([True, 1, 2]))
print(np.array(["True", 1, 2]))
print(np.array([1.3, 1, 2]))[1 1 2]
['True' '1' '2']
[1.3 1. 2. ]
Lots of extra useful functions!
np.zeros((2,3))
#np.ones((2,3))array([[0., 0., 0.],
[0., 0., 0.]])
np.eye(3)array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
np.column_stack([height, weight])array([[ 1.79, 70. ],
[ 1.85, 80. ],
[ 1.95, 85. ],
[ 1.55, 65. ]])
Data Summaries in numpy
We can compute simple statistics:
print(np.mean(hw2))
print(np.mean(hw2, axis=0))38.3925
[ 1.785 75. ]
print(np.unique([1,1,2,1,2,3,2,2,3]))
print(np.unique([1,1,2,1,2,3,2,2,3], return_counts=True))[1 2 3]
(array([1, 2, 3]), array([3, 4, 2], dtype=int64))
Introduction to Simulating Probabilistic Events
Generating Data in numpy
Meet your friends:
np.random.permutation: Return a random permutation of a sequence, or return a permuted rangenp.random.integers: Draw random integers from a given low-to-high rangenp.random.choice: Generates a random sample from a given 1-D array
# Do this (new version)
from numpy.random import default_rng
rng = default_rng()
x= np.arange(10)
print(x)
print(rng.permutation(x))
print(rng.permutation(list('intelligence')))[0 1 2 3 4 5 6 7 8 9]
[6 7 9 4 1 0 3 8 2 5]
['t' 'c' 'n' 'l' 'e' 'n' 'i' 'e' 'e' 'l' 'i' 'g']
print(rng.integers(0,10,5))
print(rng.integers(0,10,(5,2)))[7 9 7 9 4]
[[9 0]
[8 6]
[6 7]
[0 5]
[1 5]]
rng.choice(x,4)array([8, 5, 1, 4])
Examples:
- Spotify playlist
- Movie List
movies_list = ['The Godfather', 'The Wizard of Oz', 'Citizen Kane', 'The Shawshank Redemption', 'Pulp Fiction']
# pick a random choice from a list of strings.
movie = rng.choice(movies_list,2)
print(movie)['The Shawshank Redemption' 'The Godfather']
Birthday “Paradox”
Please enter your birthday on google drive https://forms.gle/CeqyRZ4QzWRmJFvs9
How many people do you think will share a birthday? Would that be a rare, highly unusual event?
How can we find out how likely it is that across \(n\) folks in a room at least two share a birthday?
Hint: can we put our random number generators to task ?
# Can you simulate 25 birthdays?
from numpy.random import default_rng
rng = default_rng()
#initialize it to be the empty list:
shardBday = []
n = 40
PossibleBDays = np.arange(1,366)
#now "draw" 25 random bDays:
for i in range(1000):# is the 1000 an important number ??
#no it only determines the precision of my estimate !!
ran25Bdays = rng.choice(PossibleBDays, n, replace = True)
#it is of utmost importance to allow for same birthdays !!
#rng.choice(PossibleBDays, 366, replace = False)
x , cts = np.unique(ran25Bdays ,return_counts=True)
shardBday = np.append(shardBday, np.sum(cts>1))#keep this !!
#shardBday = np.sum(cts>1)#np.sum(shardBday>0)/1000
np.mean(shardBday > 0)
#shardBday = 20.893
5 != 3 #not equalTrue
#Boolean indexing !!
x[cts > 1]array([ 71, 192])
x[23]182
#can you design a coin flip with an arbitary probability p = 0.25
#simulate 365 days with a 1/4 chance of being sunny
#fair coin
coins = np.random.randint(0,2,365)
np.unique(coins, return_counts=True)(array([0, 1]), array([189, 176], dtype=int64))
Tossing dice and coins
Let us toss many dice or coins to find out: - the average value of a six-faced die - the variation around the mean when averaging - the probability of various “common hands” in the game Liar’s Dice: * Full house: e.g., 66111 * Three of a kind: e.g., 44432 * Two pair: e.g., 22551 * Pair: e.g., 66532
Some real world problems:
- Overbooking flights: airlines
- Home Office days: planning office capacities and minimizing social isolation