Python Copy Module: Shallow vs Deep Copy

Python Copy Module: Shallow vs Deep Copy

Welcome back to our Python learning journey! Today, we're going to dive into one of those topics that seems simple on the surface but has layers of complexity beneath: copying objects in Python. Specifically, we'll explore the copy module and the crucial difference between shallow and deep copies. Understanding this distinction is essential for avoiding subtle bugs in your code, so let's get started!

Understanding Assignment in Python

Before we talk about copying, let's quickly revisit how assignment works in Python. When you assign one variable to another, you're not creating a new object – you're creating a new reference to the same object.

original_list = [1, 2, 3]
assigned_list = original_list

assigned_list.append(4)
print(original_list)  # Output: [1, 2, 3, 4]

See what happened? Both variables point to the same list object in memory. This is why modifying assigned_list also changed original_list. This behavior is fine in many cases, but sometimes you need actual separate copies.

The Copy Module to the Rescue

Python provides the copy module with two main functions: copy() for shallow copies and deepcopy() for deep copies. Let's import it:

import copy

Now let's explore what each of these functions does and when to use them.

Shallow Copy: A Surface-Level Duplicate

A shallow copy creates a new object, but doesn't create copies of the objects found within the original. Instead, it inserts references to the objects from the original.

original_list = [1, 2, [3, 4]]
shallow_copied_list = copy.copy(original_list)

# Modify the top level
shallow_copied_list[0] = 99
print(original_list)  # Output: [1, 2, [3, 4]]
print(shallow_copied_list)  # Output: [99, 2, [3, 4]]

# Modify the nested object
shallow_copied_list[2].append(5)
print(original_list)  # Output: [1, 2, [3, 4, 5]]
print(shallow_copied_list)  # Output: [99, 2, [3, 4, 5]]

Notice how changing the nested list affected both the original and the copy? That's because the nested list is the same object in both cases.

Operation Type Affects Original? Use Case
Top-level modification No Changing individual elements
Nested object modification Yes When you want shared nested objects
Appending to list No Adding new elements to collections

Here are some common scenarios where shallow copies are useful:

  • When you want to modify a copy without affecting the original's top-level structure
  • When memory efficiency is important and nested objects don't need to be independent
  • When you intentionally want nested objects to be shared between copies

Deep Copy: A Complete Separation

A deep copy creates a new object and recursively copies all objects found within the original. This means everything is duplicated, creating completely independent objects.

original_list = [1, 2, [3, 4]]
deep_copied_list = copy.deepcopy(original_list)

# Modify the nested object
deep_copied_list[2].append(5)
print(original_list)  # Output: [1, 2, [3, 4]]
print(deep_copied_list)  # Output: [1, 2, [3, 4, 5]]

Now the nested list modification only affects the copy, not the original. This complete independence comes at a cost: deep copies are more memory-intensive and slower than shallow copies.

When to Use Each Type

Choosing between shallow and deep copy depends on your specific needs:

  • Use shallow copy when you only need to modify the top-level structure or when nested objects should remain shared
  • Use deep copy when you need complete independence between the original and the copy
  • Consider performance - deep copy can be significantly slower for complex objects

Let's look at a more complex example with nested dictionaries:

import copy

original = {
    'name': 'Alice',
    'scores': [85, 92, 78],
    'metadata': {'id': 123, 'tags': ['student', 'python']}
}

# Shallow copy
shallow = copy.copy(original)
shallow['scores'].append(95)
print(original['scores'])  # Output: [85, 92, 78, 95]

# Deep copy
deep = copy.deepcopy(original)
deep['metadata']['tags'].append('advanced')
print(original['metadata']['tags'])  # Output: ['student', 'python']
Copy Type Memory Usage Speed Nested Object Independence
Shallow Copy Low Fast No
Deep Copy High Slow Yes

Performance Considerations

The performance difference between shallow and deep copies can be significant, especially with large or complex objects. Let's test this:

import copy
import time

# Create a large nested structure
large_data = [[i for i in range(1000)] for _ in range(1000)]

# Time shallow copy
start = time.time()
shallow_copy = copy.copy(large_data)
shallow_time = time.time() - start

# Time deep copy
start = time.time()
deep_copy = copy.deepcopy(large_data)
deep_time = time.time() - start

print(f"Shallow copy time: {shallow_time:.4f} seconds")
print(f"Deep copy time: {deep_time:.4f} seconds")

You'll typically find that deep copy takes orders of magnitude longer than shallow copy for complex structures.

Special Cases and Gotchas

Some objects have special copying behavior. For example, modules, classes, functions, and methods are not copied – references to the same objects are always used.

import math

original = {'func': math.sqrt}
shallow = copy.copy(original)
deep = copy.deepcopy(original)

print(original['func'] is shallow['func'])  # Output: True
print(original['func'] is deep['func'])    # Output: True

Also, be aware that some objects may have custom __copy__() and __deepcopy__() methods that override the default behavior.

Practical Examples and Use Cases

Let's explore some real-world scenarios where understanding copy types is crucial:

Configuration Management: When you have default configuration settings and want to create modified versions without affecting the defaults.

default_config = {
    'timeout': 30,
    'retries': 3,
    'servers': ['server1', 'server2']
}

# For user-specific config that might modify servers
user_config = copy.deepcopy(default_config)
user_config['servers'].append('server3')

Data Processing: When processing data that shouldn't be modified during analysis.

raw_data = load_large_dataset()
processing_copy = copy.deepcopy(raw_data)  # Work on copy, preserve original

Caching Mechanisms: When you need to return data without allowing external modification.

def get_cached_data():
    cached = cache.get('expensive_data')
    return copy.deepcopy(cached)  # Prevent external modifications

Best Practices and Recommendations

Based on years of Python development, here are my recommendations:

  • Default to deep copy when you're unsure – it's safer but less efficient
  • Use shallow copy when you know nested objects won't be modified or should be shared
  • Document your choice – comment why you chose shallow or deep copy
  • Test edge cases – especially with nested structures and custom objects
  • Consider alternatives – sometimes rebuilding structures is better than copying

Remember that for simple, flat structures (lists of integers, strings, etc.), shallow and deep copy behave identically since there are no nested objects to copy.

Common Pitfalls to Avoid

Many developers encounter these issues when working with copies:

Unexpected shared state: Modifying a nested object in a shallow copy affects the original.

Performance bottlenecks: Using deep copy on large objects without considering the cost.

Circular references: Deep copy can handle circular references, but it's good to be aware of them.

a = []
b = [a]
a.append(b)

# This works, but creates complex structures
deep = copy.deepcopy(a)

Custom object issues: Objects with unusual __copy__ or __deepcopy__ implementations might not behave as expected.

Testing Your Understanding

Let's test your knowledge with a quick exercise:

import copy

data = {'a': 1, 'b': [2, 3], 'c': {'d': 4}}

copy1 = data.copy()  # dict's own copy method (shallow)
copy2 = copy.copy(data)
copy3 = copy.deepcopy(data)

copy1['b'].append(5)
copy2['c']['d'] = 99
copy3['a'] = 100

# What are the values of data now?

Take a moment to think about it before checking the solution!

The answer: data becomes {'a': 1, 'b': [2, 3, 5], 'c': {'d': 99}} because: - copy1 is a shallow copy (affects nested list) - copy2 is a shallow copy (affects nested dict) - copy3 is deep copy (doesn't affect original)

Integration with Other Python Features

The copy module works seamlessly with other Python features. For example, you can use it with:

List comprehensions: Sometimes it's cleaner to use comprehensions than copy:

# Instead of copy.copy() for each element
copied_list = [item[:] for item in nested_list]  # Shallow copy of each sublist

Custom classes: You can define __copy__ and __deepcopy__ methods for your classes:

class MyClass:
    def __init__(self, value):
        self.value = value

    def __copy__(self):
        return MyClass(self.value)

    def __deepcopy__(self, memo):
        return MyClass(copy.deepcopy(self.value, memo))

Memory Management Considerations

When working with large data structures, be mindful of memory usage:

import sys

large_object = create_large_data_structure()
print(f"Original size: {sys.getsizeof(large_object)} bytes")

shallow_copy = copy.copy(large_object)
print(f"Shallow copy size: {sys.getsizeof(shallow_copy)} bytes")

deep_copy = copy.deepcopy(large_object)
print(f"Deep copy size: {sys.getsizeof(deep_copy)} bytes")

The shallow copy will use much less memory since it shares nested objects, while the deep copy duplicates everything.

Final Thoughts and Next Steps

Understanding the difference between shallow and deep copies is a fundamental Python skill that will save you from many headaches. Remember:

  • Shallow copy = new container, same contents
  • Deep copy = new container, new contents

Practice with different data structures, and always think about whether you need true independence or if shared references are acceptable. When in doubt, test your assumptions – create small examples to verify the copying behavior before applying it to production code.

Happy coding, and may your copies always be of the appropriate depth!