
Python Copying Objects: Shallow vs Deep Copy
When you're working with objects in Python, you'll eventually face a common question: how do I properly copy this thing? It might seem straightforward at first, but Python's approach to copying objects comes with important nuances. Understanding the difference between shallow and deep copying is essential for writing reliable, bug-free code. Let's break it down together.
What is Copying?
In Python, assignment doesn't create a copy of an object. Instead, it creates a new reference to the same object. This means if you modify the new variable, the original object changes too. That's where copying comes in - it creates a new object with the same content.
original = [1, 2, 3]
reference = original # Not a copy!
reference[0] = 99
print(original) # Output: [99, 2, 3] - original changed!
To actually create a copy, we need to use Python's copying mechanisms. This is where shallow and deep copying enter the picture.
Shallow Copy
A shallow copy creates a new object, but doesn't create copies of the objects found within the original. Instead, it inserts references to the same nested objects. Think of it as copying the "outer shell" but keeping the same "insides."
import copy
original = [1, [2, 3], 4]
shallow_copy = copy.copy(original)
# Modify the outer list
shallow_copy[0] = 99
print(original[0]) # Output: 1 - unchanged!
# But modify the nested list...
shallow_copy[1][0] = 88
print(original[1][0]) # Output: 88 - changed in original too!
The shallow copy created a new list, but the nested list [2, 3]
is shared between both the original and the copy. This behavior is exactly what makes shallow copying both useful and potentially dangerous.
Copy Type | Outer Object | Nested Objects | Memory Usage |
---|---|---|---|
Shallow | New copy | Shared references | Lower |
Deep | New copy | New copies | Higher |
When to Use Shallow Copy
Shallow copying is appropriate when: - You're working with simple, flat data structures - You want to save memory by sharing nested objects - The nested objects are immutable and won't change - You intentionally want changes to nested objects to affect both copies
import copy
# Good use case: immutable nested objects
config = {'setting': 'default', 'options': ('read', 'write')}
config_copy = copy.copy(config)
# Since tuples are immutable, this is safe
config_copy['setting'] = 'custom'
# The tuple remains shared but can't be modified
Deep Copy
A deep copy creates a new object and recursively copies all objects found within the original. This means everything is duplicated - no shared references between the original and the copy.
import copy
original = [1, [2, 3], 4]
deep_copy = copy.deepcopy(original)
# Modify the nested list in the copy
deep_copy[1][0] = 88
print(original[1][0]) # Output: 2 - unchanged!
The deep copy created completely independent copies of all objects, including the nested list. Changes to one won't affect the other.
Deep copying is essential when: - You need complete independence between copies - Working with complex, nested data structures - The original contains mutable objects that might change - You want to avoid unexpected side effects
import copy
# Complex nested structure
employee = {
'name': 'Alice',
'skills': ['Python', 'SQL'],
'projects': [
{'name': 'Project A', 'status': 'active'},
{'name': 'Project B', 'status': 'completed'}
]
}
# Deep copy ensures complete independence
employee_copy = copy.deepcopy(employee)
employee_copy['skills'].append('JavaScript')
employee_copy['projects'][0]['status'] = 'paused'
# Original remains untouched
print(employee['skills']) # ['Python', 'SQL']
print(employee['projects'][0]['status']) # 'active'
Performance Considerations
Deep copying is more expensive than shallow copying in terms of both time and memory. The deeper and more complex your data structure, the more significant this difference becomes.
import copy
import time
# Create a complex nested structure
big_data = [[i for i in range(1000)] for _ in range(1000)]
# Time shallow copy
start = time.time()
shallow = copy.copy(big_data)
shallow_time = time.time() - start
# Time deep copy
start = time.time()
deep = copy.deepcopy(big_data)
deep_time = time.time() - start
print(f"Shallow copy: {shallow_time:.4f} seconds")
print(f"Deep copy: {deep_time:.4f} seconds")
You'll typically see that deep copying takes significantly longer, especially with large, complex structures.
Custom Copy Behavior
Sometimes you might want to define custom copying behavior for your own classes. Python's copy
module provides hooks for this purpose.
import copy
class CustomObject:
def __init__(self, data, metadata=None):
self.data = data
self.metadata = metadata or {}
self.timestamp = time.time()
def __copy__(self):
# Custom shallow copy
new_obj = self.__class__(self.data)
new_obj.metadata = self.metadata # Shared reference
new_obj.timestamp = time.time() # New timestamp
return new_obj
def __deepcopy__(self, memo):
# Custom deep copy
new_obj = self.__class__(copy.deepcopy(self.data, memo))
new_obj.metadata = copy.deepcopy(self.metadata, memo)
new_obj.timestamp = time.time()
return new_obj
# Usage
original = CustomObject([1, 2, 3], {'author': 'You'})
shallow_copy = copy.copy(original)
deep_copy = copy.deepcopy(original)
Common Pitfalls and Best Practices
Always be aware of what you're copying. The most common mistake is assuming a copy is independent when it's actually sharing references. This can lead to subtle bugs that are hard to track down.
Consider using immutable objects for nested data when appropriate. If your nested objects are tuples instead of lists, or frozensets instead of sets, you can safely use shallow copies without worrying about unexpected modifications.
Profile before optimizing. Don't avoid deep copying just because it's slower. If you need independent copies, use deep copying. Only optimize to shallow copying when you've confirmed it's safe and you have a performance requirement.
# Safe pattern with immutable nested objects
safe_data = (1, 2, (3, 4)) # Tuple with nested tuple
copy = safe_data[:] # Even slicing creates a shallow copy, but it's safe
# Potentially problematic pattern
risky_data = [1, 2, [3, 4]] # List with nested list
copy = risky_data[:] # Shallow copy - nested list is shared!
Remember that Python's built-in types have different copying behaviors. Lists, dictionaries, and sets can be shallow-copied using their built-in methods, while custom objects typically require the copy
module.
Operation | Type Affected | Result |
---|---|---|
list[:] | List | Shallow copy |
dict.copy() | Dictionary | Shallow copy |
set.copy() | Set | Shallow copy |
copy.copy() | Any object | Shallow copy |
copy.deepcopy() | Any object | Deep copy |
The key takeaway is this: choose your copy method based on your needs. If you want to share nested objects to save memory and don't mind coordinated changes, use shallow copying. If you need complete independence and isolation, use deep copying. Always test your assumptions about what's being shared and what's being copied.
Practice with different data structures and observe the behavior. The more you work with copying in Python, the more intuitive these concepts will become. Happy coding!