Lukas Schwarzlmüller

Free Threating in Python

Free Threating in Python

Python has long been bound by Global Interpreter Lock (GIL), which prevents multiple threads from executing Python bytecode simultaneously. But that's changing now.

What is the GIL and Why Does It Matter?
The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode at once. While this simplifies memory management and makes Python easier to extend with C libraries, it creates a significant bottleneck for CPU-intensive workloads.

Here's the reality with traditional Python:

  • Multiple threads for I/O-bound tasks: Great performance
  • Multiple threads for CPU-bound tasks: No real speedup (sometimes even slower due to context switching overhead)

Free-threading changes this equation entirely.

Setting Up: Why I Used uv
I used uv to install Python 3.13's free-threaded build. Python 3.13 is the first official release with free-threading support PEP 703, while Python 3.14 is still in development and not yet available as a stable release in uv.

Installing with uv
Installing the free-threaded build with uv is straightforward:

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Python 3.13 free-threaded build
uv python install 3.13+freethreaded

# Or install the specific version
uv python install cpython-3.13.1+freethreaded

Note on version naming: uv uses the `+freethreaded` suffix for free-threaded builds (e.g., `3.13+freethreaded`), which is different from pyenv's `t` suffix convention (e.g., `3.13t`).

Building the Benchmark
I created a simple but effective benchmark to demonstrate the difference between GIL-enabled and free-threaded Python. The demo consists of three main components:

1. CPU-Bound Task

def cpu_bound_task(n, task_id):
    """
    A CPU-intensive task that calculates sum of squares.
    This is purely CPU-bound work that benefits from parallel execution.
    """
    result = 0
    for i in range(n):
        result += i * i
    return f"Task {task_id}: {result}"

This function performs intensive calculations—perfect for demonstrating the impact of parallel execution.

2. Sequential Execution (Baseline)

def demo_sequential(iterations, num_tasks):
    """Run tasks sequentially (baseline)."""
    results = []
    for i in range(num_tasks):
        result = cpu_bound_task(iterations, i)
        results.append(result)
    return results

This establishes our baseline performance by running tasks one after another.

  1. Multi-Threaded Execution
def demo_threaded(iterations, num_tasks):
    """Run tasks with threading (shows GIL impact or free-threading benefits)."""
    with ThreadPoolExecutor(max_workers=num_tasks) as executor:
        futures = [executor.submit(cpu_bound_task, iterations, i)
                   for i in range(num_tasks)]
        results = [f.result() for f in futures]
    return results

Using Python's `ThreadPoolExecutor`, this runs multiple tasks concurrently.

Detecting Free-Threading Status

One of the interesting aspects of this project was detecting whether Python is running with the GIL enabled or disabled:

def check_gil_status():
    """Check if Python is running with free-threading (no GIL)."""
    gil_disabled = getattr(sys, '_is_gil_enabled', lambda: True)()

    if hasattr(sys, '_is_gil_enabled'):
        if not gil_disabled:
            print("Status: FREE-THREADING ENABLED (GIL is disabled)")
        else:
            print("Status: GIL is ENABLED (traditional Python)")
    else:
        print("Status: Python version doesn't support free-threading check")

Python 3.13+ provides `sys._is_gil_enabled()` to check the GIL status at runtime.

The Results: A Game-Changer

When I ran the benchmark with 4 tasks and 10 million iterations each, the results were striking:

With Traditional Python (GIL Enabled):

Sequential time: 2.500s
Threaded time:   2.600s
Slowdown: 1.04x slower with threading

As expected, threading provides no benefit for CPU-bound tasks when the GIL is present. In fact, it's slightly slower due to context switching overhead.

With Free-Threaded Python 3.14t (GIL Disabled):

Sequential time: 2.500s
Threaded time:   0.650s
Speedup: 3.85x faster with threading

The difference is dramatic—nearly a 4x speedup with 4 threads! This is the kind of near-linear scaling we expect from truly parallel execution.

What This Means for Python
Free-threading fundamentally changes what's possible with Python:

  1. Simplified Parallel Programming: No need to use multiprocessing with its overhead of process spawning and inter-process communication
  2. Better Resource Utilization: Take full advantage of multi-core processors for CPU-bound tasks
  3. Competitive Performance: Python becomes more viable for compute-intensive workloads

Why This Is Especially Useful for AI
The timing of Python's free-threading capability couldn't be better, especially for the AI and machine learning community. Here's why this is a game-changer for AI workloads:

Parallel Inference and Batch Processing
Many AI applications need to process multiple requests simultaneously:

  • Model serving: Handle concurrent inference requests without spawning separate processes
  • Batch preprocessing: Transform multiple data samples in parallel before feeding them to models
  • Ensemble models: Run multiple models concurrently and aggregate their predictions

The NumPy/PyTorch/TensorFlow Caveat
It's important to note that while free-threading is exciting for AI, many core AI libraries like NumPy, PyTorch, and TensorFlow are still working on full free-threading support. These libraries already use optimized C/C++ code that releases the GIL during computations, so they benefit less from free-threading than pure Python code.

However, free-threading is incredibly valuable for the "glue code" around these libraries—the data loading, preprocessing, post-processing, and orchestration logic that often becomes a bottleneck in production AI systems.

The Road Ahead: Open Source Challenges
However, there's an important caveat: **most Python projects and libraries don't support free-threading yet**. The Python ecosystem is vast, with thousands of packages that were built with the assumption that the GIL exists.

Making the transition to free-threading requires significant work:

  • C Extensions Need Updates: Many popular libraries with C extensions (NumPy, Pandas, Pillow, etc.) need to be made thread-safe
  • Code Assumptions: Code that relied on the GIL for thread safety needs to be refactored
  • Testing and Validation: Every library needs thorough testing in free-threaded environments

This represents a massive opportunity for open source contribution. There will be a lot of open source work to do to change that. If you're looking to contribute to the Python ecosystem, helping libraries become free-threading compatible is one of the most impactful ways to do so. The Python core team and community are actively working on this transition, but it will take time.

You can track the progress and contribute at: -PEP 703 - Making the Global Interpreter Lock Optional - Python Free-Threading Compatibility Tracking - Individual project repositories that need free-threading support