
Python NumPy Cheatsheet
Welcome to your go-to NumPy cheatsheet! If you're working with numerical data in Python, you're almost certainly using NumPy. It's fast, versatile, and forms the foundation for many other data science libraries. Let's dive into the essentials and some pro-tips to make your code more efficient and readable.
Getting Started with NumPy
First things first, you need to install and import NumPy. If you haven't installed it yet, you can do so using pip:
pip install numpy
Once installed, import it in your Python script or notebook. The convention is to import it as np
:
import numpy as np
Now you're ready to harness the power of NumPy!
Creating Arrays
NumPy arrays are more efficient than Python lists for numerical operations. Here’s how you can create them.
Creating a one-dimensional array from a list:
arr = np.array([1, 2, 3, 4, 5])
print(arr)
# Output: [1 2 3 4 5]
Creating a two-dimensional array:
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
# Output:
# [[1 2 3]
# [4 5 6]]
You can also create arrays with initial values using built-in functions.
zeros = np.zeros((3, 4)) # 3x4 array of zeros
ones = np.ones((2, 3)) # 2x3 array of ones
range_arr = np.arange(0, 10, 2) # similar to range, but returns an array
random_arr = np.random.rand(2, 2) # 2x2 array with random values between 0 and 1
Common array creation functions:
Function | Description | Example |
---|---|---|
np.array() |
Create array from list/tuple | np.array([1,2,3]) |
np.zeros() |
Array filled with zeros | np.zeros((2,2)) |
np.ones() |
Array filled with ones | np.ones((3,3)) |
np.arange() |
Array with a range of values | np.arange(0, 5) |
np.linspace() |
Array with evenly spaced numbers | np.linspace(0, 1, 5) |
np.random.rand() |
Array with random values (uniform) | np.random.rand(2,2) |
Pro tip: Use np.empty()
if you need an array quickly without initializing values, but be cautious as it contains whatever was in memory.
Array Attributes
Understanding array attributes helps you debug and manipulate arrays effectively.
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) # Number of dimensions: 2
print(arr.shape) # Shape: (2, 3)
print(arr.size) # Total number of elements: 6
print(arr.dtype) # Data type: int64 (or similar)
- ndim: Number of array dimensions.
- shape: Tuple indicating the size of each dimension.
- size: Total number of elements in the array.
- dtype: Data type of the array elements.
You can change the shape of an array without changing its data using reshape()
.
arr = np.arange(6)
reshaped = arr.reshape(2, 3)
print(reshaped)
# Output:
# [[0 1 2]
# [3 4 5]]
Important: The total size must remain the same when reshaping.
Array Indexing and Slicing
Indexing and slicing in NumPy is powerful and similar to Python lists, but with extra capabilities for multidimensional arrays.
Basic indexing for 1D arrays:
arr = np.array([10, 20, 30, 40, 50])
print(arr[0]) # First element: 10
print(arr[-1]) # Last element: 50
Slicing:
print(arr[1:4]) # Elements from index 1 to 3: [20, 30, 40]
print(arr[::2]) # Every second element: [10, 30, 50]
For 2D arrays, you use comma-separated indices:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d[0, 1]) # Element at row 0, column 1: 2
print(arr_2d[:, 1]) # All rows, column 1: [2, 5, 8]
print(arr_2d[1:, :2]) # Rows from index 1, first two columns:
# Output:
# [[4 5]
# [7 8]]
You can also use boolean indexing to filter arrays:
arr = np.array([5, 10, 15, 20])
filtered = arr[arr > 10]
print(filtered) # [15, 20]
Remember: Slicing returns a view of the array, not a copy. Modifying the slice affects the original array. Use copy()
if you need a separate copy.
Array Operations
NumPy allows you to perform element-wise operations easily, which is much faster than using loops.
Basic arithmetic operations:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5 7 9]
print(a - b) # [-3 -3 -3]
print(a * b) # [4 10 18] (element-wise multiplication)
print(b / a) # [4. 2.5 2. ] (element-wise division)
print(a ** 2) # [1 4 9] (squares)
For matrix multiplication, use @
or np.dot()
:
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
print(matrix_a @ matrix_b)
# Output:
# [[19 22]
# [43 50]]
print(np.dot(matrix_a, matrix_b)) # Same as above
Other useful operations:
arr = np.array([1, 2, 3, 4, 5])
print(np.sum(arr)) # Sum of all elements: 15
print(np.mean(arr)) # Mean: 3.0
print(np.max(arr)) # Maximum value: 5
print(np.min(arr)) # Minimum value: 1
print(np.std(arr)) # Standard deviation: ~1.414
You can also specify the axis for operations on multidimensional arrays:
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr_2d, axis=0)) # Sum along columns: [5 7 9]
print(np.sum(arr_2d, axis=1)) # Sum along rows: [6 15]
Common mathematical functions:
Function | Description | Example |
---|---|---|
np.sqrt() |
Square root | np.sqrt(arr) |
np.sin() |
Sine | np.sin(arr) |
np.exp() |
Exponential | np.exp(arr) |
np.log() |
Natural log | np.log(arr) |
np.abs() |
Absolute value | np.abs(arr) |
Key point: NumPy operations are vectorized, meaning they are applied to each element without explicit loops, leading to significant performance gains.
Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes, under certain conditions.
Basic example:
a = np.array([1, 2, 3])
b = 2
print(a * b) # [2 4 6] - scalar is broadcast to array shape
With arrays of different shapes:
a = np.array([[1], [2], [3]]) # Shape (3,1)
b = np.array([10, 20, 30]) # Shape (3,)
print(a + b)
# Output:
# [[11 21 31]
# [12 22 32]
# [13 23 33]]
Rules for broadcasting: 1. Align dimensions from right to left. 2. Dimensions must be equal, or one of them must be 1. 3. If arrays don't have the same number of dimensions, prepend 1s to the shape of the smaller array.
Example where broadcasting fails:
a = np.array([1, 2, 3]) # Shape (3,)
b = np.array([10, 20]) # Shape (2,)
# a + b would raise ValueError: operands could not be broadcast together
Use broadcasting to write concise and efficient code without unnecessary loops or array manipulations.
Useful Functions and Methods
NumPy is packed with functions. Here are some you'll use frequently.
np.where()
: Return elements based on a condition.
arr = np.array([1, 2, 3, 4, 5])
result = np.where(arr > 3, arr, 0)
print(result) # [0 0 0 4 5]
np.unique()
: Find unique elements.
arr = np.array([1, 2, 2, 3, 3, 3])
unique_elements = np.unique(arr)
print(unique_elements) # [1 2 3]
np.concatenate()
: Join arrays.
a = np.array([1, 2])
b = np.array([3, 4])
combined = np.concatenate((a, b))
print(combined) # [1 2 3 4]
For 2D arrays, specify the axis:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
combined = np.concatenate((a, b), axis=0)
print(combined)
# Output:
# [[1 2]
# [3 4]
# [5 6]]
np.split()
: Split an array into multiple sub-arrays.
arr = np.arange(9)
split_arr = np.split(arr, 3)
print(split_arr)
# [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
np.sort()
: Sort an array.
arr = np.array([3, 1, 2])
sorted_arr = np.sort(arr)
print(sorted_arr) # [1 2 3]
For in-place sorting, use the array method:
arr.sort()
print(arr) # [1 2 3]
Handy tip: Use np.save()
and np.load()
to save and load arrays to/from disk efficiently.
np.save('my_array.npy', arr) # Save
loaded_arr = np.load('my_array.npy') # Load
Linear Algebra with NumPy
NumPy has a submodule numpy.linalg
for linear algebra operations.
Common operations:
from numpy import linalg
A = np.array([[1, 2], [3, 4]])
# Determinant
det = linalg.det(A)
print(det) # -2.0
# Inverse
inv = linalg.inv(A)
print(inv)
# Output:
# [[-2. 1. ]
# [ 1.5 -0.5]]
# Eigenvalues and eigenvectors
eigenvals, eigenvecs = linalg.eig(A)
print(eigenvals) # [-0.37228132 5.37228132]
print(eigenvecs)
# Output:
# [[-0.82456484 -0.41597356]
# [ 0.56576746 -0.90937671]]
Solving a system of linear equations:
# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = linalg.solve(A, b)
print(x) # [2. 3.]
Note: For large-scale linear algebra, consider SciPy which builds on NumPy and offers more advanced functionality.
Random Number Generation
The numpy.random
module is useful for generating random numbers and arrays.
Generating random integers:
random_int = np.random.randint(0, 10, size=5)
print(random_int) # e.g., [3 7 2 8 1]
Generating random floats from a uniform distribution:
random_floats = np.random.rand(3, 2)
print(random_floats)
# e.g.,
# [[0.5488135 0.71518937]
# [0.60276338 0.54488318]
# [0.4236548 0.64589411]]
From a normal (Gaussian) distribution:
normal_vals = np.random.randn(1000) # mean=0, std=1
Seeding for reproducibility:
np.random.seed(42) # Set seed
random_arr = np.random.rand(2, 2)
# Same random numbers every time
Common numpy.random
functions:
Function | Description | Example |
---|---|---|
rand() |
Uniform distribution [0,1) | np.random.rand(2,2) |
randn() |
Standard normal distribution | np.random.randn(100) |
randint() |
Random integers | np.random.randint(1,10,5) |
choice() |
Random choice from array | np.random.choice([1,2,3], size=2) |
shuffle() |
Shuffle array in-place | np.random.shuffle(arr) |
Best practice: Always set a seed when you need reproducible results, especially in testing or sharing code.
Performance Tips
To get the most out of NumPy, keep these tips in mind.
- Avoid loops: Use vectorized operations whenever possible.
- Use built-in functions: They are optimized in C and much faster.
- Preallocate arrays: If you know the size, create the array first instead of appending.
- Use appropriate data types: For example, use
np.float32
instead ofnp.float64
if precision allows, to save memory. - Be mindful of copies vs views: Understand when an operation returns a view (no copy) versus a copy to avoid unexpected behavior.
Example of preallocation:
# Instead of:
arr = np.array([])
for i in range(1000):
arr = np.append(arr, i)
# Do:
arr = np.empty(1000)
for i in range(1000):
arr[i] = i
Or even better, use vectorized creation:
arr = np.arange(1000)
Final thought: NumPy is a powerful library. The more you use it, the more you'll appreciate its efficiency and elegance. Practice these basics, and soon you'll be handling numerical data like a pro!