Introduction
In a shared computing cluster, users compete for multiple resource types: CPU, memory, disk I/O, network bandwidth.
Classical fair scheduling algorithms like max-min fairness were designed for a single resource dimension.
When you try to apply single-resource fairness to a multi-resource environment, you face a fundamental question: fair with respect to which resource?
Dominant Resource Fairness (DRF) answers this by identifying each user's most heavily demanded resource (their "dominant resource") and then equalizing the dominant share across users.
The result is a generalization of max-min fairness to multiple resource types, and it satisfies a set of desirable fairness and incentive-compatibility properties that no prior multi-resource allocation policy could jointly guarantee.
DRF was introduced by Ghodsi et al. in 2011 and became the default scheduling policy in Apache Mesos and (in a modified form) Apache YARN.
Understanding DRF is essential for anyone designing or operating multi-tenant resource schedulers.
Problem Setting
Consider a cluster with a total capacity vector of resources.
For concreteness, suppose the cluster has 9 CPUs and 18 GB of RAM.
Two users submit tasks:
- User A's tasks each require ⟨1 CPU, 4 GB RAM⟩.
- User B's tasks each require ⟨3 CPUs, 1 GB RAM⟩.
User A's tasks are memory-heavy relative to CPU.
User B's tasks are CPU-heavy relative to memory.
Any allocation scheme must decide how many tasks of each type to run, subject to the capacity constraints.
Dominant Resource and Dominant Share
Normalize each user's demand by the total cluster capacity to get the resource share per task.
- User A per task: ⟨1/9 CPU, 4/18 RAM⟩ = ⟨1/9, 2/9⟩.
- User B per task: ⟨3/9 CPU, 1/18 RAM⟩ = ⟨1/3, 1/18⟩.
The dominant resource for a user is the resource for which the user has the largest per-task share.
User A's dominant resource is RAM (2/9 > 1/9).
User B's dominant resource is CPU (1/3 > 1/18).
The dominant share of a user is the total fraction of their dominant resource consumed by all their running tasks.
If User A runs x tasks, User A's dominant share is x × 2/9.
If User B runs y tasks, User B's dominant share is y × 1/3.
DRF maximizes the minimum dominant share across all users, subject to capacity constraints.
This is max-min fairness applied to dominant shares.
Walkthrough
Concrete Example
Using the cluster above (9 CPUs, 18 GB RAM), we solve for the DRF allocation.
Step 1: Identify dominant resources.
- User A: dominant resource = RAM (share per task = 2/9).
- User B: dominant resource = CPU (share per task = 1/3).
Step 2: Equalize dominant shares.
Set User A's dominant share equal to User B's dominant share:
x × (2/9) = y × (1/3)
Simplify: 2x/9 = y/3, so y = 2x/3.
Step 3: Apply capacity constraints.
CPU constraint: x × 1 + y × 3 ≤ 9
RAM constraint: x × 4 + y × 1 ≤ 18
Substitute y = 2x/3:
CPU: x + 2x = 3x ≤ 9, so x ≤ 3.
RAM: 4x + 2x/3 = 14x/3 ≤ 18, so x ≤ 18 × 3/14 ≈ 3.86.
The binding constraint is CPU, giving x = 3 and y = 2.
Result: User A gets 3 tasks (consuming 3 CPUs, 12 GB RAM).
User B gets 2 tasks (consuming 6 CPUs, 2 GB RAM).
Total usage: 9 CPUs, 14 GB RAM.
User A's dominant share = 3 × 2/9 = 6/9 ≈ 66.7%.
User B's dominant share = 2 × 1/3 = 2/3 ≈ 66.7%.
The dominant shares are equal, as intended.
Progressive Filling Interpretation
In practice, you do not solve the system of equations directly.
DRF is implemented as a progressive allocation: at each scheduling step, the scheduler picks the user with the smallest current dominant share and grants them one additional task (if resources permit).
This is exactly analogous to the progressive filling algorithm for single-resource max-min fairness.
Algorithm
The following pseudocode describes the DRF scheduling loop used in Mesos.
Input:
R = <r_1, r_2, ..., r_m> # total cluster capacity (m resource types)
D_i = <d_i1, d_i2, ..., d_im> # per-task demand vector of user i
n users
State:
C_i = <0, 0, ..., 0> for each user i # consumed resources
s_i = 0 for each user i # dominant share
U = <r_1, r_2, ..., r_m> # unallocated resources
Algorithm DRF:
while true:
# Pick user with smallest dominant share
i = argmin_i(s_i)
# Check if user i's task fits in remaining resources
if D_i <= U componentwise:
# Allocate one task to user i
C_i = C_i + D_i
U = U - D_i
# Update dominant share
s_i = max_j(C_ij / R_j) # max over all resource types j
else:
break # or skip user i and try others
Each iteration runs in O(n × m) time (n users, m resource types), though in practice a min-heap on dominant shares reduces user selection to O(log n).
The loop terminates when no user's task fits in the remaining capacity.
Properties
DRF's significance lies not just in the allocation it produces, but in the formal properties it satisfies simultaneously.
Ghodsi et al. proved that DRF is the only policy satisfying all of the following.
Sharing Incentive
Every user gets at least as many resources as they would under an equal static partition of the cluster.
If there are n users, each user receives utility at least as high as if they exclusively owned 1/n of every resource.
This guarantees that no user is worse off by participating in the shared cluster.
Strategy-Proofness
No user can increase their allocation by lying about their resource demands.
A user who inflates their CPU request to manipulate the scheduler will not end up with a better outcome.
This is critical in multi-tenant environments where users control their task specifications.
Envy-Freeness
No user prefers another user's allocation to their own.
User A would not want to swap allocations with User B.
This is a strong fairness guarantee that prevents perceptions of unfairness.
Pareto Efficiency
It is not possible to increase one user's allocation without decreasing another's.
The cluster's resources are fully utilized (to the extent possible given the discrete task sizes and demand vectors).
Single Resource Fairness
When there is only one resource type, DRF reduces to max-min fairness.
This makes DRF a proper generalization.
Why Other Approaches Fail
Asset Fairness (equalize total resource value across users, with some pricing of resources) violates the sharing incentive. Competitive Equilibrium from Equal Incomes (CEEI) violates strategy-proofness.
Slot-based schedulers (where each slot bundles a fixed amount of CPU and RAM together) waste resources because the fixed slot dimensions rarely match actual task demands.
DRF avoids all of these pitfalls.
Practical Considerations
Weighted DRF
In production, not all users have equal priority.
Weighted DRF assigns a weight w_i to each user, and the scheduler picks the user minimizing s_i / w_i at each step.
This is the multi-resource analog of weighted fair queuing.
Discrete Tasks and Non-Divisibility
DRF as described assumes tasks are atomic units that cannot be split.
The progressive filling algorithm naturally handles this: you simply stop allocating to a user when their next task would exceed available capacity.
The resulting allocation may not achieve perfectly equal dominant shares, but it approximates the continuous solution closely.
Dynamic Environments
Real clusters are dynamic.
Tasks finish and free resources.
New users arrive, and Mesos handles this by recomputing DRF allocations when offers become available, using the current state of consumed resources.
Mesos handles this by recomputing DRF allocations when offers become available, using the current state of consumed resources.
Preemption can be layered on top to reclaim resources from users whose dominant shares exceed the fair level.
Limitations
DRF assumes tasks have fixed, known resource demands.
In practice, resource usage varies over time, and users may not accurately declare their needs (though strategy-proofness reduces the incentive to lie).
DRF also does not account for placement constraints (data locality, rack affinity), which require additional scheduling logic.
Hierarchical scheduling, where organizational units have sub-allocations, requires extensions like H-DRF.
Key Points
- DRF generalizes max-min fairness to multiple resource types by equalizing each user's share of their most demanded (dominant) resource.
- The dominant resource for a user is the resource type for which that user's aggregate consumption, as a fraction of cluster capacity, is largest.
- DRF is implemented via progressive filling: repeatedly allocate one task to the user with the smallest current dominant share.
- DRF simultaneously satisfies sharing incentive, strategy-proofness, envy-freeness, and Pareto efficiency, a combination no other multi-resource fairness policy achieves.
- Weighted DRF extends the basic algorithm to handle heterogeneous user priorities by dividing dominant shares by user weight's.
- DRF became the default allocation policy in Apache Mesos and influenced the capacity and fair schedulers in Apache YARN.
- DRF assumes fixed, known per-task resource demands and does not natively handle placement constraints or dynamic resource usage profiles.
References
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types." NSDI 2011.
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., and Stoica, I. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center." NSDI 2011.
Vavilapalli, V.K., Murthy, A.C., Douglas, C., et al. "Apache Hadoop YARN: Yet Another Resource Negotiator." SoCC 2013.
Bertsekas, D. and Gallager, R. "Data Networks." Prentice Hall, 1992. (Chapter 6: max-min fairness foundations.)