home..

Parellel And Concurrent Computation In Python

When I first ran my full‐factorial simulation, it took over 13 hours on a 16-core machine. By restructuring it to use both process‐based parallelism for the outer loop and thread‐based concurrency for the inner work, I slashed total runtime to 30 minutes. Here’s how I did it:


1. The Challenge

A naïve serial loop ran every combination end-to-end and simply couldn’t finish in a workday.

2. Parallelizing the Outer Loop with multiprocessing.Pool

I identify each (sample size, replicate index) pair as an independent “task.” By wrapping my generate_task(...) function in a multiprocessing.Pool, I can schedule dozens of these tasks across all available CPU cores:

import multiprocessing as mp

def scenarios_generate(...):
    tasks = [(..., 'parallel')
             for i in I
             for j in J]
    with mp.Pool(mp.cpu_count()) as pool:
        pool.starmap(f, tasks)

3. Concurrentizing the Inner Loop with ThreadPoolExecutor

Inside each generate_task, I still had to loop over multiple (distribution, SNR) combinations. Those steps involve generating data and saving files—operations that mix CPU work with I/O. Spawning new processes here would be heavy, so I used a thread pool:

from concurrent.futures import ThreadPoolExecutor

def generate_task(...):
    # ... build effects ...
    with ThreadPoolExecutor() as threads:
        threads.map(
            lambda args: f(*args),
            [(...)
             for i in I
             for j in J]
        )

5. Results & Lessons Learned

6. Tips for Your Own Simulations

By combining process‐based and thread‐based concurrency, I turned a over 12-hour overnight batch into a half-hour morning run—freeing up my workstation for other experiments!

© 2025 Dingyi Lai   •  Powered by Soopr   •  Theme  Moonwalk