View the source on GitHub

wassign Manual


wassign is a tool to solve many kinds of assignment or bipartite matching problems that are preference based (like the assignment of college students to exercise groups). Here are some useful links to the most relevant parts of this document if you want to get started with wassign:


Table of contents

1 Introduction to wassign

We will use the following scenario throughout this introduction in order to introduce all the features of wassign:

1.1 A simple example

You are planning a convention where every participant has given preferences to different workshops that will be held at this convention. There are four events:

and 10 participants which gave the following preferences (between 0 and 10) to each of the workshops:

Ethan Fanny Gavin Hanna Isaac July Kevin Lily Mark Norah
A 10 8 10 5 8 8 0 10 10 9
B 6 10 4 0 5 0 0 9 3 5
C 0 0 1 0 5 0 10 6 0 1
D 5 4 7 10 10 10 0 5 0 10

We now want to assign each of our 10 participants to one of the four events using wassign. To do this, we first have to translate our problem into terms that wassign can understand. We can use two core concepts of wassign to solve our example problem:

You can see that in our example, the workshops are the choices and our participants are the choosers. We now have to put our data into an input file that wassign understands. For our example, a basic input file would look like this:

+choice("How to become famous",                    bounds(1, 4));
+choice("Paleo cooking for beginners",             bounds(2, 9));
+choice("Left-handed scissors: A critical review", bounds(2, 5));
+choice("Should you invest in bitcoin now?",       bounds(1, 6));

+chooser("Ethan", [10, 6, 0, 5]);
+chooser("Fanny", [8, 10, 0, 4]);
+chooser("Gavin", [10, 4, 1, 7]);
+chooser("Hanna", [5, 0, 0, 10]);
+chooser("Isaac", [8, 5, 5, 10]);
+chooser("July",  [8, 0, 0, 10]);
+chooser("Kevin", [0, 0, 10, 0]);
+chooser("Lily",  [10, 9, 6, 5]);
+chooser("Mark",  [10, 3, 0, 0]);
+chooser("Norah", [9, 5, 1, 10]);

The details of the syntax required for these input files (and other neat features like reading things from CSV files so you don’t have to type all these lines by yourself) can be found in the manual. If we put this input file through wassign with

wassign -i our-input-file.txt -o output-file

we promptly get back the following result as a CSV file (called output-file.assignment.csv):

"Chooser", "Generated Slot"

"Ethan",  "Paleo cooking for beginners"
"Fanny",  "Paleo cooking for beginners"
"Gavin",  "How to become famous"
"Hanna",  "Should you invest in bitcoin now?"
"Isaac",  "Left-handed scissors: A critical review"
"July",   "Should you invest in bitcoin now?"
"Kevin",  "Left-handed scissors: A critical review"
"Lily",   "Paleo cooking for beginners"
"Mark",   "How to become famous"
"Norah",  "Should you invest in bitcoin now?"

Which translates to the following assignment:

You may note the “Generated Slot” column header in the output file. This is present because we did not specify any slots in our input (so wassign generated one for us); in the next section we will look at slots and what they can be used for.

1.2 Getting more complicated with slots

Let’s say that the time schedule of our convention (which is held in a single day) looks as follows:

10:00 – 11:00 Welcome speech
11:00 – 12:30 Workshops I
12:30 – 14:00 Lunch break
14:00 – 15:30 Workshops II
15:30 – 17:00 After-convention party

As you can see, we actually have two timeslots where workshops will be held. But which workshop should go into which timeslot? We could now go on and assign our workshops to the timeslots by hand, but we can actually use wassign to do this for us based on the preferences the participants gave, so workshops are assigned to timeslots in such a way that the preferences of our participants can be satisfied as much as possible.

To do this, we use another core concept of wassign called slots. Slots are simply “buckets” where the choices will be distributed to. Then, every participant gets assigned to a choice in every slot.

We just have to modify our input file slightly:

+slot("Workshops I");
+slot("Workshops II");

+choice("How to become famous",                    bounds(1, 4));
+choice("Paleo cooking for beginners",             bounds(2, 9));
+choice("Left-handed scissors: A critical review", bounds(2, 5));
+choice("Should you invest in bitcoin now?",       bounds(1, 6));

+chooser("Ethan", [10, 6, 0, 5]);
+chooser("Fanny", [8, 10, 0, 4]);

... (the rest of our "old" input file) ...

If we now input this into wassign with

wassign -i our-input-file.txt -o output-file -t 3s

it will give us the following two output files after 3 seconds (the -t 3s option we gave wassign this time tells the program to search exactly 3 seconds for the best possible solution):

A file called output-file.scheduling.csv

"Choice",                                   "Slot"

"How to become famous",                     "Workshops II"
"Paleo cooking for beginners",              "Workshops II"
"Left-handed scissors: A critical review",  "Workshops I"
"Should you invest in bitcoin now?",        "Workshops I"

describing the scheduling of the workshops into the slots and a file called output-file.assignment.csv

"Chooser", "Workshops I",                              "Workshops II"

"Ethan",   "Should you invest in bitcoin now?",        "Paleo cooking for beginners"
"Fanny",   "Should you invest in bitcoin now?",        "Paleo cooking for beginners"
"Gavin",   "Should you invest in bitcoin now?",        "How to become famous"
"Hanna",   "Should you invest in bitcoin now?",        "How to become famous"
"Isaac",   "Left-handed scissors: A critical review",  "Paleo cooking for beginners"
"July",    "Should you invest in bitcoin now?",        "How to become famous"
"Kevin",   "Left-handed scissors: A critical review",  "Paleo cooking for beginners"
"Lily",    "Left-handed scissors: A critical review",  "Paleo cooking for beginners"
"Mark",    "Left-handed scissors: A critical review",  "How to become famous"
"Norah",   "Should you invest in bitcoin now?",        "Paleo cooking for beginners"

describing the assignment of the participants into the workshops.

As you can see, we got exactly what we were looking for: wassign tells us which workshop should go into which time slot and at the same time gives us the corresponding assignment for the participants.

1.3 Specifying custom constraints

We now assume that Lily called us beforehand and told us that she can’t attend the paleo cooking workshop because she is a vegetarian. We can use another feature of wassign called constraints to tell wassign that Lily must not be assigned to the paleo workshop under any circumstance. We do this by adding the following line to our input file:

+constraint( chooser("Lily").choices.contains_not(choice("Paleo")) );

Note that we do not have to specify the full name of the workshop; if there are no ambiguities, you can just use the first word of the name (in this case “Paleo”).

wassign will now give us a different solution where Lily is not in the paleo workshop. Try it out!

1.4 What else?

If you want to get an exhaustive overview over all features of wassign, you should consult the manual. Examples of things wassign can also do that were not covered in this short introduction:

2 Building wassign

2.1 Linux

To build wassign under Linux, you need the following prerequisites:

After making sure all prerequisites are met you can build wassign:

  1. git clone https://github.com/MaximilianAzendorf/wassign
  2. cd wassign
  3. cargo build --release

The resulting executable is written to target/release/wassign.

To run the test suite:

  1. cd wassign
  2. cargo test

2.2 Windows

Building on Windows is currently untested, but the project uses the standard Rust/Cargo toolchain there as well.

2.3 Building the documentation

You can build this documentation using pandoc and the respective makefile. Some figures require a LaTeX distribution, inkscape and ghostscript.

If you want a containerized build, use the Docker image based on pandoc/latex:3.9.0.2-ubuntu:

  1. cd doc
  2. make docker-build

The resulting files are written into doc/build/ on the host via a bind mount.

3 Using wassign

The following section contains the complete documentation of all wassign features and how to use them.

3.1 Input syntax

Input files are Rhai scripts, so all typical programming constructs like variables (let x = ...), loops (for(...)), conditionals (if(...)) and others are available to construct the input. In addition, the following API is used to interact with wassign.

3.1.1 Adding slots, choices choosers and constraints

Function
slot(name)
Creates a new slot with the given name or returns the slot with the given name if a slot with such name was already created. Note that if you create a new slot you still have to add it to the input with add or +.
Function
choice(name)
choice(name, args...)
Creates a new choice with the given name and choice arguments (see choice arguments for more information) or (if just a name is given) returns the choice with the given name if a choice with such name was already created. Note that if you create a new choice you still have to add it to the input with add or +.
Function
chooser(name)
chooser(name, preferences)
Creates a new chooser with the given name and preference list (e.g. chooser("Karl", [100, 50, 0])) or (if just a name is given) returns the chooser with the given name if a chooser with such name was already created. The preferences have to be in the same order in which the corresponding choices were added to the input. Note that if you create a new choice you still have to add it to the input with add or +.
Function
constraint(expression)
Creates a new constraint from the given expression. For more information on how to construct constraint expressions, see the respective section. Note that if you create a new constraint you still have to add it to the input with add or +.
Function
add(arg)
+arg (operator)
Adds the given argument arg (a newly created slot, choice, chooser or constraint) to the input.

The preference list may also contain numeric strings, which are parsed as integers. This is useful when the preferences come from CSV data.

3.1.2 Choice arguments

min(x) The choice must have at least x participants (e.g. +choice("Foo", min(5))). The default is 1.
max(x) The choice must have at most x participants (e.g. +choice("Foo", max(10)). The default is 1.
bounds(x, y) The choice must have at least x and at most y participants. This is the same as min(x), max(y).
parts(x) The choice has x parts. This means that the choice will be scheduled to x consecutive slots (in the order they were added to the input), and each chooser assigned to this choice will have this choice in the assignment at each of the slots.
optional The choice is optional, this means that this choice may not be scheduled at all. If the choice is not scheduled there will also be no chooser that get assigned to this choice.
optional_if(x) The choice is optional if x is true.

As an example, a choice named Foo that must have between 5 and 12 participants, is optional and has 3 parts would look like this in the input:

+choice("Foo", bounds(5, 12), parts(3), optional);

3.1.3 Constraints

3.1.3.1 Constraint objects

Constraints are relations between two constraint objects. All slots, choices and choosers are simple constraint objects. In addition to them, the following derived constraint objects can be used to construct constraints:

Constraint object
SLOT.choices
A list constraint object describing all choices that are scheduled to the slot SLOT.
Constraint object
SLOT.size
A numerical constraint object describing the number of choices that are scheduled to the slot SLOT.
Constraint object
CHOICE.slot
CHOICE.slot(part)
A simple constraint object describing the the slot to which choice CHOICE is scheduled to. You can specify a part index (starting at 0) to refer to a specific part of the choice; by default, part=0.
Constraint object
CHOICE.choosers
A list constraint object describing the the list of choosers that are assigned to the choice CHOICE.
Constraint object
CHOOSER.choices
A list constraint object describing the the list of choices that the chooser CHOOSER is assigned to.

3.1.3.2 Relational operators

Relation
Relations between simple constraint objects

left == right (equality)
left != right (inequality)
States that the simple constraint object left must be equal (or unequal) to the simple constraint object right.
Relation
Relations between numerical constraint objects

left == right (equality)
left != right (inequality)
left > right (greater-than)
left >= right (greater-or-equal-than)
left < right (less-than)
left <= right (less-or-equal-than)
States that the given relation must hold true between the numerical constraint object left and the numerical constraint object right. Note that right can also be a numerical constant.
Relation
Relations between list constraint objects

left == right (equality)
left != right (inequality)
left.contains(right) (contains-relation)
left.contains_not(right) (contains-not-relation)
States that the given relation must hold true between the list constraint object left and the list constraint object right, or (in the case of contains and contains_not) the simple constraint object right.

3.1.3.3 Examples

  • choice("X").slot != slot("Y"): Choice “X” must not be in slot “Y”.
  • slot("Y").size <= 5: Slot “Y” must have 5 or less choices scheduled to it.
  • chooser("X").choices.contains_not(choice("Y")): Chooser “Y” must not be assigned to choice “Y”.

3.1.4 Reading from CSV files

Function
read_csv(filename)
read_csv(filename, separator)
Reads the given file as a CSV file, optionally specifying a separator (the default is ,).
Function
readFile(filename)
Reads the given file and returns its full contents as a string.
Function
set_arguments(args)
Parses CLI-style arguments for the current input reader.
Function
CSVFILE.row(n)
CSVFILE[n]
Returns the row with number n (numbering starting at 0) of the CSVFILE (returned by read_csv). Rows are just plain lists of the row values, so individual values of a row can be accessed with the [] operator.
Function
CSVFILE.rows
Returns all rows of the CSVFILE (returned by read_csv).

3.1.4.1 Example

Given is the following CSV file called workshops.csv:

"Choice", "minimum", "maximum", "optional?"
"A",      5,         10,         "no"
"B",      5,         20,         "no"
"C",      10,        30,         "yes"

We can then use this file to generate choices for our input as follows:

let file = read_csv("workshops.csv");

for row in file.rows.slice(1, end) {
    +choice(row[0], bounds(row[1], row[2]), optional_if(row[3] == "yes"));
    // or add(choice(...))
}

Note that we have to skip the first row (with .slice(1, end)) because it contains the column headers.

Because read_csv keeps cell values as strings, helper functions such as min, max, bounds and parts accept numeric strings directly.

3.1.5 Utility functions

Function
LIST.slice(x, y)
Returns a list that only contains the element of the list LIST with indices between x (inclusive) and y (inclusive). Note that index numbering starts at 0. You can give end as the value for y as a replacement for LIST.length - 1.
Function
end
Sentinel value used with slice to indicate the last element.
Function
range(x, y)
Returns a list that contains all integers between x (inclusive) and y (inclusive) in order.

3.2 Program options

-h, --help Show a short summary of all program options.
--version Show the current version of wassign.
-i [file], --input [file] Read the input from the specified file(s). If this option is present more than once, all files will be read in the order they were given.
-o [prefix], --output [prefix] Write the output to files starting with the given prefix. The files generated will be named [prefix].assignment.csv and [prefix].scheduling.csv.
-a, --any If this option is given, wassign will not optimize any solution and will just return the first solution it finds.
-p [exp], --pref-exp [exp] Sets the preference exponent to the given value. See the respective section for more information.
-t [time], --timeout [time] Sets the optimization timeout. The syntax for this argument is described under the respective section.
--cs-timeout [time] Sets the timeout for attempting to satisfy critical sets of a certain preference level. Higher values may lead to better initial solutions, but it may take longer to find an initial solution in the first place. The syntax for this argument is described under the respective section.
--no-cs If this option is given, no critical set analysis is performed.
--no-cs-simp If this option is given, critical set simplification is disabled.
-j [n], --threads [n] Specifies the maximum number of computation threads. By default, wassign will use as many threads as there are logical CPU cores on the system.
-n [n], --max-neighbors [n] Specifies the maximum number of neighbor schedulings that will be explored per hill climbing iteration.
-g, --greedy If this option is given, wassign will not use the worst-preference scoring as a primary score and will instead just use sum-based scoring instead.

3.2.1 Preference exponent

The preference exponent is a parameter of wassign that directly affects which solutions are considered “fairer” than others.

As a rule of thumb: The higher the preference exponent, the more “even” the assignment will be, in the sense that e.g. wassign will prefer to give many people their second choice instead of giving few their first and many their third.

3.2.2 Time format

Program options like --timeout require a time string that has the following format: A time string consists of one or more components of the form [number][unit], concatenated without spaces. Examples:

  • 10s is a timespan of ten seconds.
  • 5h is a timespan of five hours.
  • 1d30m is a timespan of one day and 30 minutes.
  • 2w3d5h7m11s is a timespan of 2 weeks, 3 days, 5 hours, 7 minutes and 11 seconds.

4 Inner workings of wassign

4.1 Overview

The algorithm is based on shotgun hill climbing. More details about each step are found under the respective section:

  1. Perform Critical Set Analysis: This step is optional (and is only performed at the very beginning), but the results of the critical set analysis are used in the scheduling as well as the assignment solver to drastically decrease the time it takes for good solutions to be found.

  2. Solving the scheduling: Find a possible scheduling using a randomized depth-first search.

  3. Solving the assignment: Find the best possible assignment for the given scheduling by constructing a corresponding linear optimization or (depending on the given extra constraints) mixed integer programming problem instance and solving it using an MIP solver.

  4. Performing hill climbing: Find a “neighbor scheduling” of the given scheduling (by moving a single choice to another slot or by applying a randomized swap perturbation) and solve the assignment again. Repeat this until no explored neighbor improves the current solution (this is the hill climbing part).

  5. Performing shotgun hill climbing: Do the above again and again until the timeout is reached (this is the shotgun part). The best solution found by then is the output of the program.

4.2 Some notation and a word on preferences

4.2.1 Common notation used throughout this section

This part of the manual is rather theory-heavy, so we have to introduce some common notation that is used from here onwards:

4.2.1.1 Notation regarding the input

  • \(\mathbb{S}\), \(\mathbb{W}\) and \(\mathbb{P}\) are the non-empty sets of slots, choices and choosers respectively.

  • \(\min(w)\in\mathbb{N}\) and \(\max(w)\in\mathbb{N}\) are the minimum and maximum number of choosers of a choice \(w\in\mathbb{W}\). Generally, \(0<\min(w)\leq\max(w)\).

  • \(\varphi_p(w)\in\mathbb{N}\) is the preference given by a chooser \(p\in\mathbb{P}\) to the choice \(w\). We oftentimes use \(\varphi_p=\left[\varphi_p(w_1), \varphi_p(w_2), ..., \varphi_p(w_{|\mathbb{W}|})\right]\) as an abbreviation for the “preference vector” of chooser \(p\) (with respect to an “obvious” ordering of \(\mathbb{W}=\left\{w_1,...,w_{|\mathbb{W}|}\right\}\)).

  • \(\Phi\) is the set of all preferences that are given by at least one chooser to at least one choice, so \[\Phi=\bigcup_{p\in\mathbb{P}}\left\{\varphi_p(w)\mid w\in\mathbb{W}\right\}\text{ .}\]

4.2.1.2 Notation regarding the solutions

  • A scheduling (generally called \(\mathcal{S}\)) is a mapping \(\mathcal{S}:\mathbb{W}\to\mathbb{S}\cup\{\bot\}\) where \(\mathcal{S}(w)=\bot\) means that the optional choice \(w\) is not scheduled. More on that in the the respective section.

  • An assignment (generally called \(\mathcal{A}\)) is a mapping \(\mathcal{A}:\mathbb{P}\times\mathbb{S}\to\mathbb{W}\) (which means that a chooser \(p\) was assigned to the choice \(\mathcal{A}(p, s)\) in slot \(s\)). More on that in the respective section.

  • A solution (generally \(\mathcal{L}\)) is a pair \(\mathcal{L}=(\mathcal{S}_\mathcal{L},\mathcal{A}_\mathcal{L})\) of a scheduling and an assignment (that is based on \(\mathcal{S}_\mathcal{L}\)). We call \(\mathbb{L}\) the set of all solutions.

  • \(\varphi_\max(\mathcal{L})\) is the maximum preference that a chooser \(c\) has given to a choice \(w\) where \(c\) was assigned to \(w\). It is defined as \[\varphi_\max(\mathcal{L})=\max\left\{\varphi_p(w)|p\in\mathbb{P},w\in\mathbb{W}\wedge \exists s\in\mathbb{S}:\mathcal{A}_\mathcal{L}(p,s)=w\right\}\text{ .}\]

  • A \(\varphi\)-solution is a solution \(\mathcal{L}\) with \(\varphi_\max(\mathcal{L})=\varphi\).

4.2.2 Preferences are mirrored

It is very important to note that throughout this section, preferences are “mirrored”, meaning a lower preference given by a chooser to a choice means that the chooser “likes” the choice more. This is because

  • The formula used to score solutions does not work the other way around, and therefore
  • the program also handles preferences this way (they are converted after the input is read).

So, for example, if a chooser has the preferences \[100, 50, 0, 70, 30, 22\] given in the input, they will be converted to \[0, 50, 100, 30, 70, 78\] assuming that 100 is the maximum preference given in the whole input.

4.3 Solving schedulings

4.3.1 The problem

When we want to compute a scheduling, we want to assign each choice either one of the slots available or leave it unscheduled if it is optional. Given a scheduling \(\mathcal{S}\), \(\mathcal{S}(c)=s\) means that the choice \(c\) was assigned to the slot \(s\), while \(\mathcal{S}(c)=\bot\) means that it is omitted entirely. We also define \(\mathcal{S}^{-1}(s)=\left\{w\mid \mathcal{S}(w)=s\right\}\) as the inverse scheduling (the set of all choices that are scheduled to be in slot \(s\)).

This scheduling \(\mathcal{S}\) has to adhere to some rules because we want to also calculate an assignment based on the scheduling:

  1. For every slot, the sum of the minima and maxima of the choices scheduled into that slot have to allow for the number of choosers, so \[\sum_{w\in\mathcal{S}^{-1}(s)}\min(w) \,\leq \,|\mathbb{P}|\, \leq \sum_{w\in\mathcal{S}^{-1}(s)}\max(w)\] has to hold for each slot \(s\), because else it would be impossible to assign every chooser to a choice in a slot where this constraint is violated.

  2. Additional scheduling constraints from the input have to be satisfied.

  3. Ideally, we also want to get schedulings that allow for a better assignment solution. We use critical set analysis as a heuristic for this.

Because of the first requirement (sum of minima and maxima), this problem is NP-hard (proof).

4.3.2 Algorithm

Fortunately, despite the problem being NP-hard, it generally isn’t hard at all to find valid schedulings for real-world input data.

Because of this, valid schedulings are generated using a randomized (meaning solutions are visited in a semi-randomized order) depth-first backtracking search. A single backtracking step consists of deciding the slot of a single choice; for optional choices, the solver may also leave the choice unscheduled.

There are multiple heuristics at play to improve the performance of the scheduling solver.

  • Heuristic for choice order: The order in which choices are handled by the solver is determined by the number of scheduling constraints that apply to the single choices. Choices with the most constraints are handled first. Apart from that, the order of choices is random.

  • Heuristic for set order: Given the current partial solution \(\mathcal{S}_p\), which slots are tried first in the backtracking is determined by \(\sum_{w\in\mathcal{S}^{-1}_p(s)}\max(w)\) (so basically by how “full” the slot \(s\) already is in terms of choice maxima). Slots with a lower value are tried first.

  • Critical set analysis: Critical set analysis is used by trying to satisfy all critical sets for a set preference level \(p\) until a timeout is reached. If the timeout is reached without finding a valid solution, \(p\) is raised to the next level (so all critical sets with a preference of \(p\) are discarded), and the process is repeated until a solution is found.

4.4 Solving assignments

4.4.1 The problem

When we want to compute an assignment \(\mathcal{A}\) based on a given scheduling \(\mathcal{S}\), we want to assign each chooser to exactly one choice per slot (so for every pair of a slot \(s\) and a chooser \(p\), there is exactly one choice \(\mathcal{A}(s, p)\)).

We also want the resulting solution \(\mathcal{L}=(\mathcal{S}, \mathcal{A})\) to be the “best” one for the given scheduling, but we first have to define a metric for how “good” a solution is in order to be able to compare it to other solutions.

4.4.2 Solution scoring

We define a scoring function \(F:\mathbb{L}\to\mathbb{R}^2\) as follows

\[F(\mathcal{L})=\left(\varphi_\max(\mathcal{L}), \sum_{s\in\mathbb{S},p\in\mathbb{P}} \varphi_p\left(\mathcal{A}_\mathcal{L}(s,p)\right)^{\gamma}\right)\]

where \(\gamma\in\mathbb{R}^+\) is the so-called preference exponent that can be chosen freely and affects which solutions are favored over others. It can be thought of as the “fairness parameter”. For more information on this parameter see the respective section in the manual.

We use this function to assign a score to each solution, where lower scores (defined by the usual order relation \(<\) over \(\mathbb{R}^2\)) are assigned to “better” solutions.

Note that we have a scoring function with two components, the first one being \(\varphi_\max(\mathcal{L})\). This means that a solution \(\mathcal{L}_1\) is always “better” than a solution \(\mathcal{L}_2\) if \(\varphi_\max(\mathcal{L}_1)<\varphi_\max(\mathcal{L}_2)\).

We may also look at the two components of the scoring function separately. We then call them the major and minor term \(F_\textrm{maj}\) and \(F_\min\) of \(F\), so \(F(\mathcal{L})=\left(F_\textrm{maj}(\mathcal{L}), F_\min(\mathcal{L})\right)\).

In the Rust implementation, these terms are evaluated in two different places:

  • The major term is exactly the maximum mirrored preference that appears in the finished assignment.
  • The minor term is the normalized sum of \(\varphi^\gamma\) over the finished assignment.

During assignment solving, however, the flow model uses edge costs of \((\varphi+ 1)^\gamma\) for admissible chooser-to-choice edges. This shifts all feasible edge costs away from zero without changing the ordering of assignments that stay within the same major preference bound.

4.4.3 Algorithm

4.4.3.1 Minimum-cost flow network

We calculate the optimal assignment for a given scheduling by modelling the problem as a minimum-cost flow problem instance like in this example with 2 slots, 3 choices and 4 choosers and a scheduling \(\mathcal{S}^{-1}(s_1)=\{w_1, w_2\}\), \(\mathcal{S}^{-1}(s_2)=\{w_3\}\):

Or more formally: Given a scheduling \(\mathcal{S}\), we can construct a minimum-cost flow network \(N=(V,E,z,c,a)\) with

  • \(V=\mathbb{S}\cup\mathbb{W}\cup\left\{q^p_s\mid p\in\mathbb{P},s\in\mathbb{S}\right\}\) with a supply function \(z:V\to\mathbb{Z}\) having the following values:
    • \(z\left(q^p_s\right)=1\).
    • \(z(w)=-\min(w)\).
    • \(z(s)=-|\mathbb{P}|+\sum_{w\in\mathcal{S}^{-1}(s)}\min(w)\).
  • \(E=\{(w,s)\mid \mathcal{S}(w)=s\}\cup\left\{\left(q^p_s, w\right)\mid\mathcal{S}(w)=s \wedge q\in\mathbb{P}\right\}\) with a capacity function \(c:E\to\mathbb{N}\) and a cost function \(a:E\to\mathbb{N}\) having the following values:
    • \(c(w,s)=\max(w)-\min(w)\)
    • \(a(w,s)=0\).
    • \(c(q^p_s, w)=1\).
    • \(a(q^p_s, w)=\left(\varphi_p(w)+1\right)^\gamma\).

The optimal flow \(f\) of \(N\) then is also directly the optimal assignment \(\mathcal{A}\) with the following conversion: \[f(q^p_s, w)=1 \Leftrightarrow \mathcal{A}(p, s) = w \text{ .}\] Note that the implementation models this network as an integer linear program and solves it with an MIP backend even in the plain flow case, so integer assignments are enforced directly by the solver model.

Furthermore, because how we defined our edge costs, the total cost of the flow is a monotone proxy for the minor score \(F_\min((\mathcal{S}, \mathcal{A}))\) of the resulting solution inside a fixed major preference limit.

4.4.3.2 Handling additional assignment constraints

The implementation handles additional assignment constraints in two ways:

  • Some constraints directly block chooser-to-choice edges before the model is solved. For example, chooser("X").choices.contains_not(choice("Y")) removes all corresponding assignment edges.
  • Equality-style assignment constraints are encoded as edge groups. Each group is controlled by a binary switch variable that forces all grouped edges to act together.

This is used both for chooser-equality constraints and for dependent-choice groups induced by the constraint system. The result is an integer programming model that stays close to the flow formulation while still supporting the extra combinatorial restrictions.

4.4.3.3 Binary search through preference levels

So far, we are only optimizing by the minor score \(F_\min\). In order to find the optimal solution by \(F\), we just do binary search to find the minimum \(F_\textrm{maj}\) for which there exists a solution.

4.5 Critical set analysis

The most important (and most computationally expensive) heuristic used throughout wassign is called critical set analysis. It is computed once at the start of the computation and its result is a set \(\mathbb{C}\) of so called critical sets.

4.5.1 Definition

4.5.1.1 Critical set, CSA, \(\varphi\)-relevant CSA

A critical set \(C\) is a set is a pair \(C=(\varphi_C, W_C)\) with the following properties:

  • \(\varphi_C\in\Phi\) and \(W_C\subseteq\mathbb{W}\),

  • There exists a chooser \(p\in\mathbb{P}\) for which \(W_C=\left\{w\mid \varphi_p(w)\leq\varphi_C\right\}\) holds true.

The critical set analysis (in short CSA) \(\mathbb{C}=\left\{C_1, C_2, ..., C_{|\mathbb{C}|}\right\}\) is the set of all critical sets.

We further call \(\mathbb{C}(\varphi)=\left\{C\in\mathbb{C}\mid \varphi_C=\varphi\right\}\) the \(\varphi\)-relevant critical set analysis (\(\varphi\)-relevant CSA).

Example: Given slots \(\mathbb{S}=\{s_1,s_2,s_3\}\), choices \(\mathbb{W}=\{a, b, c, d, e, f\}\) and a single chooser \(\mathbb{P}=\{p\}\) with preferences \(\varphi_p=\left[100, 50, 0, 90, 70, 100\right]\), all valid critical sets would be \[C_1=(100, \{a, b, c, d, e, f\}), \quad C_2=(90, \{b, c, d, e\}),\] \[C_3=(70, \{b, c, e\}), \quad C_4=(50, \{b, c\}), \quad C_5=(0, \{c\})\text{ .}\]

4.5.1.2 Satisfying critical sets

Given a scheduling \(\mathcal{S}\), we call a critical set \(C\) satisfied if \[\bigcup_{w\in C}\left\{\mathcal{S}(w)\right\} = \mathbb{S}\text{ .}\]

Or in plain terms: A critical set is satisfied by a scheduling if, in this scheduling, every slot contains at least one choice in the critical set.

We can then use critical sets as a heuristic for how “good” assignments based on a given scheduling can be: If \(\mathcal{S}\) does not satisfy all critical sets of \(\mathbb{C}(\varphi)\), there can not be a solution based on \(\mathcal{S}\) that has a maximum preference of \(\varphi\) or lower (proof).

4.5.2 Preference bound

A useful piece of information we can immediately get out of the critical sets is a lower bound of the maximum preference a solution can have. We call this bound \(\overline{\varphi}\) the preference bound; it is defined as \[\overline{\varphi}=\min\left\{\varphi_C\mid C\in\mathbb{C}\wedge |C|<|\mathbb{S}|\right\}\text{ .}\] Or in plain words: If a critical set has fewer elements than there are slots in the input, this critical set can not be satisfied by any scheduling, and therefore there can not be any solution with a maximum preference lower than \(\overline{\varphi}\).

4.5.3 Simplifying critical sets

We can eliminate most of the critical sets in a CSA by the following observation: Given two critical sets \(C_1=(\varphi_1, W_1)\) and \(C_2=(\varphi_2, W_2)\), if \(\varphi_1\leq\varphi_2\) and \(W_1\supseteq W_2\) then all schedulings satisfying \(C_1\) also satisfy \(C_2\). We say that \(C_1\) is covered by \(C_2\).

We can therefore discard any critical sets that are already covered by another critical set; we then just have to alter our definition of \(\varphi\)-relevant critical sets to include all critical sets that have the same or greater preference: \[\mathbb{C}(\varphi)=\left\{C\in\mathbb{C}\mid\varphi_C\geq\varphi\right\} \text{ .}\]

Note: If we need \(\mathbb{C}(\varphi)\) e.g. in the scheduling solver, we can also discard all critical sets that are simply superset of another critical set in \(\mathbb{C}(\varphi)\).

4.6 Hill climbing

So far, we only optimize the solution for a given scheduling. In order to find a locally optimal scheduling (and the corresponding optimal assignment), we first find any valid scheduling \(S\) (using the scheduling solving algorithm as described above) and then inspect a bounded set of neighbor schedulings. These neighbors are generated by moving a single choice to another slot and, in the current Rust implementation, also by random swap-style perturbations between multiple choices.

Because the number of neighbors can be very large (and solving the assignment is quite expensive), only a bounded number of feasible neighbors are explored in each iteration. If any explored neighbor improves the score, the best improving neighbor becomes the next current solution; otherwise hill climbing stops at the current local optimum.

4.6.1 Shotgun hill climbing

We now optimize to a local optimum. To find the global optimum (or at least a very good local one), we simply repeat the hill climbing process over and over and choose the best solution found after a certain timeout; this is also called shotgun hill climbing or random-restart hill climbing.

This (very simple) approach has three main advantages over other optimization methods:

  1. It is very easy to implement,
  2. It is trivial to parallelize,
  3. It is quite effective.

Appendix

Proof: Scheduling is NP-hard

We reduce PARTITION to SCHEDULING.

Given a PARTITION-Instance \(S=\{n_1, n_2, ..., n_k\}\), let \(\sigma=\frac{1}{2}\sum_{i=1}^k n_i\). We define the following slot, choice and chooser sets \[\mathbb{S}= \{s_1, s_2\},\; \mathbb{W}=\{w_1, w_2, ..., w_k\},\; \mathbb{P}=\{p_1, p_2, ..., p_{\sigma}\}\] with \[\min(w_i)=\max(w_i)=n_i \text{ for all } i\in\{1,\dots,k\}\text{ .}\]

Because \(\min(w_i)=\max(w_i)=n_i\) for all \(i\), the scheduling constraint (for a slot \(s\)) \[\sum_{w\in\mathcal{S}^{-1}(s)}\min(w) \leq |\mathbb{P}| \leq \sum_{w\in\mathcal{S}^{-1}(s)}\max(w)\text{ ,}\] implies \[\sum_{w\in\mathcal{S}^{-1}(s_1)}\min(w) = \sum_{w\in\mathcal{S}^{-1}(s_2)}\min(w)\]

Therefore, if and only if there exists a valid scheduling for \((\mathbb{S}, \mathbb{W}, \mathbb{P})\) there exists a perfect partitioning of \(S\).

Proof: Non-satisfiability of \(\varphi\)-relevant CSA implies non-existence of \(\varphi\)-solutions

Proof by contradiction: Suppose there is a \(\varphi\)-solution \(\mathcal{L}=(\mathcal{S}, \mathcal{A})\) and let \(C\in\mathbb{C}(\varphi)\) be a \(\varphi\)-relevant critical set that is not satisfied by \(\mathcal{S}\).

Then, because \(C\) is not satisfied by \(\mathcal{S}\), there is a slot \(s\in\mathbb{S}\setminus\bigcup_{w\in C}\{\mathcal{S}(w)\}\) and a chooser \(p\) with \[\min\left\{\varphi_p(w)\mid w\in\mathcal{S}^{-1}(s)\right\} > \varphi\] and therefore \(\varphi_p\left(\mathcal{A}(p, s)\right) > \varphi\), so \(\varphi_\max(\mathcal{L})>\varphi\) and \(\mathcal{L}\) can not be a \(\varphi\)-solution.