Skip to content

Config

https://docs.dask.org/en/stable/configuration.html

  • num_workers set the number of processes or threads to use (defaults to number of cores)

Example

export DASK_DISTRIBUTED__WORKERS__MEMORY__SPILL=0.85
dask.config.set({"distributed.workers.memory.spill": 0.85})

import dask
dask.config.config #show config

threads

from multiprocessing.pool import ThreadPool
dask.config.set(scheduler='threads', pool=ThreadPool(2)) #num_workers has no effect

processes

from multiprocessing.pool import ThreadPool
dask.config.set(scheduler='processes', num_workers=2, pool=ThreadPool(2)) #two workers, 2 threads per worker

comm

  • distributed.comm.zstd.threads 0 Number of threads to use. 0 for single-threaded, -1 to infer from cpu count.

scheduler

  • distributed.scheduler.work-stealing True Whether or not to move tasks around to balance work between workers dynamically.\ https://dask.discourse.group/t/understanding-work-stealing/335/9

  • distributed.scheduler.work-stealing-interval 100ms How frequently to balance worker loads.

worker

  • distributed.worker.lifetime.duration None The time (seconds) after creation to close the worker, like "1 hour".\ Can also have other units like "2 hours"

  • distributed.worker.lifetime.stagger 0 seconds Workers close with random variation time so not closed all at the same time.

  • distributed.worker.lifetime.restart False Do we try to resurrect the worker after the lifetime deadline?\ Mainly used to deal with worker stalls or memory leak