Python NumPy Cheatsheet

Welcome to your go-to NumPy cheatsheet! If you're working with numerical data in Python, you're almost certainly using NumPy. It's fast, versatile, and forms the foundation for many other data science libraries. Let's dive into the essentials and some pro-tips to make your code more efficient and readable.

Getting Started with NumPy

First things first, you need to install and import NumPy. If you haven't installed it yet, you can do so using pip:

pip install numpy

Once installed, import it in your Python script or notebook. The convention is to import it as np:

import numpy as np

Now you're ready to harness the power of NumPy!

Creating Arrays

NumPy arrays are more efficient than Python lists for numerical operations. Here’s how you can create them.

Creating a one-dimensional array from a list:

arr = np.array([1, 2, 3, 4, 5])
print(arr)
# Output: [1 2 3 4 5]

Creating a two-dimensional array:

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
# Output:
# [[1 2 3]
#  [4 5 6]]

You can also create arrays with initial values using built-in functions.

zeros = np.zeros((3, 4))  # 3x4 array of zeros
ones = np.ones((2, 3))    # 2x3 array of ones
range_arr = np.arange(0, 10, 2)  # similar to range, but returns an array
random_arr = np.random.rand(2, 2)  # 2x2 array with random values between 0 and 1

Common array creation functions:

Function	Description	Example
`np.array()`	Create array from list/tuple	`np.array([1,2,3])`
`np.zeros()`	Array filled with zeros	`np.zeros((2,2))`
`np.ones()`	Array filled with ones	`np.ones((3,3))`
`np.arange()`	Array with a range of values	`np.arange(0, 5)`
`np.linspace()`	Array with evenly spaced numbers	`np.linspace(0, 1, 5)`
`np.random.rand()`	Array with random values (uniform)	`np.random.rand(2,2)`

Pro tip: Use np.empty() if you need an array quickly without initializing values, but be cautious as it contains whatever was in memory.

Array Attributes

Understanding array attributes helps you debug and manipulate arrays effectively.

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.ndim)    # Number of dimensions: 2
print(arr.shape)   # Shape: (2, 3)
print(arr.size)    # Total number of elements: 6
print(arr.dtype)   # Data type: int64 (or similar)

ndim: Number of array dimensions.
shape: Tuple indicating the size of each dimension.
size: Total number of elements in the array.
dtype: Data type of the array elements.

You can change the shape of an array without changing its data using reshape().

arr = np.arange(6)
reshaped = arr.reshape(2, 3)
print(reshaped)
# Output:
# [[0 1 2]
#  [3 4 5]]

Important: The total size must remain the same when reshaping.

Array Indexing and Slicing

Indexing and slicing in NumPy is powerful and similar to Python lists, but with extra capabilities for multidimensional arrays.

Basic indexing for 1D arrays:

arr = np.array([10, 20, 30, 40, 50])
print(arr[0])   # First element: 10
print(arr[-1])  # Last element: 50

Slicing:

print(arr[1:4])   # Elements from index 1 to 3: [20, 30, 40]
print(arr[::2])   # Every second element: [10, 30, 50]

For 2D arrays, you use comma-separated indices:

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d[0, 1])   # Element at row 0, column 1: 2
print(arr_2d[:, 1])   # All rows, column 1: [2, 5, 8]
print(arr_2d[1:, :2]) # Rows from index 1, first two columns:
# Output:
# [[4 5]
#  [7 8]]

You can also use boolean indexing to filter arrays:

arr = np.array([5, 10, 15, 20])
filtered = arr[arr > 10]
print(filtered)  # [15, 20]

Remember: Slicing returns a view of the array, not a copy. Modifying the slice affects the original array. Use copy() if you need a separate copy.

Array Operations

NumPy allows you to perform element-wise operations easily, which is much faster than using loops.

Basic arithmetic operations:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)   # [5 7 9]
print(a - b)   # [-3 -3 -3]
print(a * b)   # [4 10 18]  (element-wise multiplication)
print(b / a)   # [4.  2.5 2. ]  (element-wise division)
print(a ** 2)  # [1 4 9]  (squares)

For matrix multiplication, use @ or np.dot():

matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])

print(matrix_a @ matrix_b)
# Output:
# [[19 22]
#  [43 50]]

print(np.dot(matrix_a, matrix_b))  # Same as above

Other useful operations:

arr = np.array([1, 2, 3, 4, 5])

print(np.sum(arr))      # Sum of all elements: 15
print(np.mean(arr))     # Mean: 3.0
print(np.max(arr))      # Maximum value: 5
print(np.min(arr))      # Minimum value: 1
print(np.std(arr))      # Standard deviation: ~1.414

You can also specify the axis for operations on multidimensional arrays:

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

print(np.sum(arr_2d, axis=0))  # Sum along columns: [5 7 9]
print(np.sum(arr_2d, axis=1))  # Sum along rows: [6 15]

Common mathematical functions:

Function	Description	Example
`np.sqrt()`	Square root	`np.sqrt(arr)`
`np.sin()`	Sine	`np.sin(arr)`
`np.exp()`	Exponential	`np.exp(arr)`
`np.log()`	Natural log	`np.log(arr)`
`np.abs()`	Absolute value	`np.abs(arr)`

Key point: NumPy operations are vectorized, meaning they are applied to each element without explicit loops, leading to significant performance gains.

Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes, under certain conditions.

Basic example:

a = np.array([1, 2, 3])
b = 2
print(a * b)  # [2 4 6] - scalar is broadcast to array shape

With arrays of different shapes:

a = np.array([[1], [2], [3]])  # Shape (3,1)
b = np.array([10, 20, 30])     # Shape (3,)
print(a + b)
# Output:
# [[11 21 31]
#  [12 22 32]
#  [13 23 33]]

Rules for broadcasting: 1. Align dimensions from right to left. 2. Dimensions must be equal, or one of them must be 1. 3. If arrays don't have the same number of dimensions, prepend 1s to the shape of the smaller array.

Example where broadcasting fails:

a = np.array([1, 2, 3])        # Shape (3,)
b = np.array([10, 20])         # Shape (2,)
# a + b would raise ValueError: operands could not be broadcast together

Use broadcasting to write concise and efficient code without unnecessary loops or array manipulations.

Useful Functions and Methods

NumPy is packed with functions. Here are some you'll use frequently.

np.where(): Return elements based on a condition.

arr = np.array([1, 2, 3, 4, 5])
result = np.where(arr > 3, arr, 0)
print(result)  # [0 0 0 4 5]

np.unique(): Find unique elements.

arr = np.array([1, 2, 2, 3, 3, 3])
unique_elements = np.unique(arr)
print(unique_elements)  # [1 2 3]

np.concatenate(): Join arrays.

a = np.array([1, 2])
b = np.array([3, 4])
combined = np.concatenate((a, b))
print(combined)  # [1 2 3 4]

For 2D arrays, specify the axis:

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
combined = np.concatenate((a, b), axis=0)
print(combined)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]]

np.split(): Split an array into multiple sub-arrays.

arr = np.arange(9)
split_arr = np.split(arr, 3)
print(split_arr)
# [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

np.sort(): Sort an array.

arr = np.array([3, 1, 2])
sorted_arr = np.sort(arr)
print(sorted_arr)  # [1 2 3]

For in-place sorting, use the array method:

arr.sort()
print(arr)  # [1 2 3]

Handy tip: Use np.save() and np.load() to save and load arrays to/from disk efficiently.

np.save('my_array.npy', arr)   # Save
loaded_arr = np.load('my_array.npy')  # Load

Linear Algebra with NumPy

NumPy has a submodule numpy.linalg for linear algebra operations.

Common operations:

from numpy import linalg

A = np.array([[1, 2], [3, 4]])

# Determinant
det = linalg.det(A)
print(det)  # -2.0

# Inverse
inv = linalg.inv(A)
print(inv)
# Output:
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Eigenvalues and eigenvectors
eigenvals, eigenvecs = linalg.eig(A)
print(eigenvals)   # [-0.37228132  5.37228132]
print(eigenvecs)
# Output:
# [[-0.82456484 -0.41597356]
#  [ 0.56576746 -0.90937671]]

Solving a system of linear equations:

# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = linalg.solve(A, b)
print(x)  # [2. 3.]

Note: For large-scale linear algebra, consider SciPy which builds on NumPy and offers more advanced functionality.

Random Number Generation

The numpy.random module is useful for generating random numbers and arrays.

Generating random integers:

random_int = np.random.randint(0, 10, size=5)
print(random_int)  # e.g., [3 7 2 8 1]

Generating random floats from a uniform distribution:

random_floats = np.random.rand(3, 2)
print(random_floats)
# e.g.,
# [[0.5488135  0.71518937]
#  [0.60276338 0.54488318]
#  [0.4236548  0.64589411]]

From a normal (Gaussian) distribution:

normal_vals = np.random.randn(1000)  # mean=0, std=1

Seeding for reproducibility:

np.random.seed(42)  # Set seed
random_arr = np.random.rand(2, 2)
# Same random numbers every time

Common numpy.random functions:

Function	Description	Example
`rand()`	Uniform distribution [0,1)	`np.random.rand(2,2)`
`randn()`	Standard normal distribution	`np.random.randn(100)`
`randint()`	Random integers	`np.random.randint(1,10,5)`
`choice()`	Random choice from array	`np.random.choice([1,2,3], size=2)`
`shuffle()`	Shuffle array in-place	`np.random.shuffle(arr)`

Best practice: Always set a seed when you need reproducible results, especially in testing or sharing code.

Performance Tips

To get the most out of NumPy, keep these tips in mind.

Avoid loops: Use vectorized operations whenever possible.
Use built-in functions: They are optimized in C and much faster.
Preallocate arrays: If you know the size, create the array first instead of appending.
Use appropriate data types: For example, use np.float32 instead of np.float64 if precision allows, to save memory.
Be mindful of copies vs views: Understand when an operation returns a view (no copy) versus a copy to avoid unexpected behavior.

Example of preallocation:

# Instead of:
arr = np.array([])
for i in range(1000):
    arr = np.append(arr, i)

# Do:
arr = np.empty(1000)
for i in range(1000):
    arr[i] = i

Or even better, use vectorized creation:

arr = np.arange(1000)

Final thought: NumPy is a powerful library. The more you use it, the more you'll appreciate its efficiency and elegance. Practice these basics, and soon you'll be handling numerical data like a pro!