import numpy as np
from numpy.random import default_rng
4 Intro to numpy
In this lecture we will get to know and become experts in:
- Introduction to numpy
- DataCamp, Introduction to Python, Chap 4
- Multiple Dimensions
- Data Summaries in numpy
- Introduction to Simulating Probabilistic Events
Introduction to numpy
NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.
- Vectorized, fast mathematical operations.
- Key features of NumPy is its N-dimensional array object, or ndarray
= [1.79, 1.85, 1.95, 1.55]
height = [70, 80, 85, 65]
weight
#bmi = weight/height**2
= np.array([1.79, 1.85, 1.95, 1.55])
height = np.array([70, 80, 85, 65])
weight
= weight/height**2
bmi bmi
array([21.84700852, 23.37472608, 22.35371466, 27.05515088])
Multiple Dimensions
are handled naturally by numpy, e.g.
= np.array([height, weight])
hw1 print(hw1)
print(hw1.shape)
= hw1.transpose()
hw2 print(hw2)
print(hw2.shape)
[[ 1.79 1.85 1.95 1.55]
[70. 80. 85. 65. ]]
(2, 4)
[[ 1.79 70. ]
[ 1.85 80. ]
[ 1.95 85. ]
[ 1.55 65. ]]
(4, 2)
Accessing array elements
is similar to lists but allows for multidimensional index:
print(hw2[0,1])
70.0
print(hw2[:,0])
[1.79 1.85 1.95 1.55]
print(hw2[0])
#equivalent to
print(hw2[0,:])
#shape:
print(hw2[0].shape)
[ 1.79 70. ]
[ 1.79 70. ]
(2,)
To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:
print(hw2[[2,0,1]])
[[ 1.95 85. ]
[ 1.79 70. ]
[ 1.85 80. ]]
Negative indices
print(hw2)
print("Using negative indices selects rows from the end:")
print(hw2[[-2,-1]])
[[ 1.79 70. ]
[ 1.85 80. ]
[ 1.95 85. ]
[ 1.55 65. ]]
Using negative indices selects rows from the end:
[[ 1.95 85. ]
[ 1.55 65. ]]
You can pass multiple slices just like you can pass multiple indexes:
2,:1] hw2[:
array([[1.79],
[1.85]])
Reshaping
32).reshape((8, 4)) np.arange(
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
Boolean indexing
= hw2[:,0]>1.85
height_gt_185 print(height_gt_185)
print(hw2[height_gt_185,1])
[False False True False]
[85.]
numpy
arrays cannot contain elements with different types. If you try to build such a list, some of the elements’ types are changed to end up with a homogeneous list. This is known as type coercion.
print(np.array([True, 1, 2]))
print(np.array(["True", 1, 2]))
print(np.array([1.3, 1, 2]))
[1 1 2]
['True' '1' '2']
[1.3 1. 2. ]
Lots of extra useful functions!
2,3))
np.zeros((#np.ones((2,3))
array([[0., 0., 0.],
[0., 0., 0.]])
3) np.eye(
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
np.column_stack([height, weight])
array([[ 1.79, 70. ],
[ 1.85, 80. ],
[ 1.95, 85. ],
[ 1.55, 65. ]])
Data Summaries in numpy
We can compute simple statistics:
print(np.mean(hw2))
print(np.mean(hw2, axis=0))
38.3925
[ 1.785 75. ]
print(np.unique([1,1,2,1,2,3,2,2,3]))
print(np.unique([1,1,2,1,2,3,2,2,3], return_counts=True))
[1 2 3]
(array([1, 2, 3]), array([3, 4, 2], dtype=int64))
Introduction to Simulating Probabilistic Events
Generating Data in numpy
Meet your friends:
np.random.permutation
: Return a random permutation of a sequence, or return a permuted rangenp.random.integers
: Draw random integers from a given low-to-high rangenp.random.choice
: Generates a random sample from a given 1-D array
# Do this (new version)
from numpy.random import default_rng
= default_rng()
rng
= np.arange(10)
xprint(x)
print(rng.permutation(x))
print(rng.permutation(list('intelligence')))
[0 1 2 3 4 5 6 7 8 9]
[6 7 9 4 1 0 3 8 2 5]
['t' 'c' 'n' 'l' 'e' 'n' 'i' 'e' 'e' 'l' 'i' 'g']
print(rng.integers(0,10,5))
print(rng.integers(0,10,(5,2)))
[7 9 7 9 4]
[[9 0]
[8 6]
[6 7]
[0 5]
[1 5]]
4) rng.choice(x,
array([8, 5, 1, 4])
Examples:
- Spotify playlist
- Movie List
= ['The Godfather', 'The Wizard of Oz', 'Citizen Kane', 'The Shawshank Redemption', 'Pulp Fiction']
movies_list
# pick a random choice from a list of strings.
= rng.choice(movies_list,2)
movie print(movie)
['The Shawshank Redemption' 'The Godfather']
Birthday “Paradox”
Please enter your birthday on google drive https://forms.gle/CeqyRZ4QzWRmJFvs9
How many people do you think will share a birthday? Would that be a rare, highly unusual event?
How can we find out how likely it is that across \(n\) folks in a room at least two share a birthday?
Hint: can we put our random number generators to task ?
# Can you simulate 25 birthdays?
from numpy.random import default_rng
= default_rng()
rng
#initialize it to be the empty list:
= []
shardBday
= 40
n
= np.arange(1,366)
PossibleBDays #now "draw" 25 random bDays:
for i in range(1000):# is the 1000 an important number ??
#no it only determines the precision of my estimate !!
= rng.choice(PossibleBDays, n, replace = True)
ran25Bdays #it is of utmost importance to allow for same birthdays !!
#rng.choice(PossibleBDays, 366, replace = False)
= np.unique(ran25Bdays ,return_counts=True)
x , cts = np.append(shardBday, np.sum(cts>1))#keep this !!
shardBday #shardBday = np.sum(cts>1)
#np.sum(shardBday>0)/1000
> 0)
np.mean(shardBday
#shardBday = 2
0.893
5 != 3 #not equal
True
#Boolean indexing !!
> 1] x[cts
array([ 71, 192])
23] x[
182
#can you design a coin flip with an arbitary probability p = 0.25
#simulate 365 days with a 1/4 chance of being sunny
#fair coin
= np.random.randint(0,2,365)
coins
=True) np.unique(coins, return_counts
(array([0, 1]), array([189, 176], dtype=int64))
Tossing dice and coins
Let us toss many dice or coins to find out: - the average value of a six-faced die - the variation around the mean when averaging - the probability of various “common hands” in the game Liar’s Dice: * Full house: e.g., 66111 * Three of a kind: e.g., 44432 * Two pair: e.g., 22551 * Pair: e.g., 66532
Some real world problems:
- Overbooking flights: airlines
- Home Office days: planning office capacities and minimizing social isolation