Futures encapsulate pending operations so that we can put them in queues, check whether they are done, and retrieve results (or exceptions) when they become available.
This chapter focuses on the concurrent.futures.Executor classes that encapsulate the pattern of “spawning a bunch of independent threads and collecting the results in a queue,”
futures—objects representing the asynchronous execution of an operation
The main features of the concurrent.futures package are the ThreadPoolExecutor and ProcessPoolExecutor classes, which implement an API to submit callables for execution in different threads or processes, respectively. The classes transparently manage a pool of worker threads or processes, and queues to distribute jobs and collect results.
The ThreadPoolExecutor constructor takes several arguments not shown, but the first and most important one is max_workers, setting the maximum number of worker threads to be executed. When max_workers is None (the default), ThreadPoolExecutor decides its value using the following expression—since Python 3.8: max_workers = min(32, os.cpu_count() + 4)
This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.
ThreadPoolExecutor now reuses idle worker threads before starting max_workers worker threads too.
To conclude: the computed default for max_workers is sensible, and ThreadPoolExecutor avoids starting new workers unnecessarily. Understanding the logic behind max_workers may help you decide when and how to set it yourself.
Where Are the Futures?
Since Python 3.4, there are two classes named Future in the standard library: concurrent.futures.Future and asyncio.Future. They serve the same purpose: an instance of either Future class represents a deferred computation that may or may not have completed.
An important thing to know about futures is that you and I should not create them: they are meant to be instantiated exclusively by the concurrency framework, be it concurrent.futures or asyncio. Here is why: a Future represents something that will eventually run, therefore it must be scheduled to run, and that’s the job of the framework. In particular, concurrent.futures.Future instances are created only as the result of submitting a callable for execution with a concurrent.futures.Executor subclass. For example, the Executor.submit() method takes a callable, schedules it to run, and returns a Future.
Both types of Future have a .done() method that is nonblocking and returns a Boolean that tells you whether the callable wrapped by that future has executed or not. However, instead of repeatedly asking whether a future is done, client code usually asks to be notified. That’s why both Future classes have an .add_done_callback() method: you give it a callable, and the callable will be invoked with the future as the single argument when the future is done. Be aware that the callback callable will run in the same worker thread or process that ran the function wrapped in the future.
There is also a .result() method, which works the same in both classes when the future is done: it returns the result of the callable, or re-raises whatever exception might have been thrown when the callable was executed. However, when the future is not done, the behavior of the result method is very different between the two flavors of Future. In a concurrency.futures.Future instance, invoking f.result() will block the caller’s thread until the result is ready. An optional timeout argument can be passed, and if the future is not done in the specified time, the result method raises TimeoutError. The asyncio.Future.result method does not support timeout, and await is the preferred way to get the result of futures in asyncio—but await doesn’t work with concurrency.futures.Future instances.
An example of the latter is the Executor.map we saw in Example 20-3: it returns an iterator in which __next__ calls the result method of each future, so we get the results of the futures, and not the futures themselves.
Launching Processes with concurrent.futures
The package enables parallel computation on multicore machines because it supports distributing work among multiple Python processes using the ProcessPoolExecutor class.
Both ProcessPoolExecutor and ThreadPoolExecutor implement the Executor interface, so it’s easy to switch from a thread-based to a process-based solution using concurrent.futures.
The constructor for ProcessPoolExecutor also has a max_workers parameter, which defaults to None. In that case, the executor limits the number of workers to the number returned by os.cpu_count().
Processes use more memory and take longer to start than threads, so the real value of ProcessPoolExecutor is in CPU-intensive jobs.
executor.map(check, numbers) returns the result in the same order as the numbers are given
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.