Extended Polymerization - Dynamic Programming

09 Jul 2023

This post is part of a series: "Divide-and-conquer, Memoization, and Dynamic Programming"

Divide and Conquer
Memoization
Dynamic Programming
Benchmarking & Analysis (in progress)

In the previous post, we used memoization to speed up a divide-and-conquer algorithm with overlapping subproblems. This works by storing a table of solutions for each subproblem and re-using those solutions when the same subproblem is encountered again.

Although very effective, memoization does have tradeoffs:

It exchanges space for time, using memory proportional to the "depth" of your divide-and-conquer algorithm
It is still recursive, which can be less efficient if your programming language does not support tail call optimization or your algorithm is difficult to express in tail-call form.

We can further increase performance in terms of both runtime and memory usage by converting our memoized algorithm into an iterative one, using a technique called "dynamic programming".^[1]

Recursion vs. Iteration

Recursion (i.e. a function that invokes itself) and iteration (i.e. 'while' loops) are equally expressive in theory. Any problem that can be solved with a recursive algorithm can be solved with an equivalent iterative one, and vice versa. The reasons to prefer one technique over the other usually:

Which is more idiomatic or performant in a specific programming language
Whether the problem is "naturally" recursive, i.e. the solution is clearer when expressed with one technique or the other

Dynamic programming^[2] can be thought of as a way to transform a recursive, divide-and-conquer algorithm into an equivalent iterative one in order to use the minimum amount of resources.

stack space and function call overhead are reduced by replacing recursion with a loop
heap space usage is reduced by only storing the minimum amount of subproblem solutions needed for each iteration

Dynamic programming is not applicable to all recursive algorithms. For the technique to be effective, the problem must have "optimal substructure" and "overlapping subproblems". We have already demonstrated these attributes in previous posts, but in short:

We can get an optimal solution for an instance of our problem by combining optimal results from its subproblems
Dividing a problem into subproblems repeatedly causes the same subproblem to be solved many times

Converting to Dynamic Programming

With divide-and-conquer, we start by taking the full problem and splitting it into parts. We then split each of those parts into their own subparts, and those subparts into subparts... and so on until we reach a problem that is small enough to have a known solution. We can show this process using a tree structure. Here I've labeled each sub-problem with a letter pair and a number to represent "the number of times each letter occurs after expanding this pair of letters N times".

Tree diagram rooted at Σ(QW, 3) pointing to subproblems Σ(QW, 2) and Σ(WW, 2), and those pointing to their subproblems

To solve this problem iteratively, we can imagine building up the solutions in reverse - instead of starting with a full problem and dividing it, start with all the base cases (the smallest subproblems) and combine them repeatedly into larger solutions until we get a solution of the order we want. Representing each base case as a letter pair, imagine a table where each row is an iteration, and each column is a letter pair:

	QW	WW	WE	EW	WQ	QE	...
0
1
...

We can fill in the first (or I should say "zero-ith"?) row immediately - it is easy to know how many times every letter occurs in a letter pair when it hasn't been expanded at all.

	QW	WW	WE	EW	WQ	QE	...
0	Q:1, W:1, E:0	Q:0, W:2, E:0	Q:0, W:1, E:1	Q:0, W:1, E:1	Q:1, W:1, E:0	Q:1, W:0, E:1	...

To determine the next row, we can use the same relationships between letter pairs that we figured out in the recursive solution. For example, the letter count for "QW" after 1 round of expansion is equal to the letter count for "QW" after no rounds of expansion, plus the letter count for "WW" after no rounds of expansion, minus a single "W" for the letter shared between the pairs:

	QW	WW	WE	EW	WQ	QE	...
0	Q:1, W:1, E:0	Q:0, W:2, E:0	Q:0, W:1, E:1	Q:0, W:1, E:1	Q:1, W:1, E:0	Q:1, W:0, E:1	...
1	\(qw_0 + ww_0 - 1_w\)

We can do the same for all letter pairs:^[3]

	QW	WW	WE	EW	WQ	QE	...
0	Q:1, W:1, E:0	Q:0, W:2, E:0	Q:0, W:1, E:1	Q:0, W:1, E:1	Q:1, W:1, E:0	Q:1, W:0, E:1	...
1	\(qw_0 + ww_0 - 1_w\)	\(we_0 + ew_0 - 1_e\)	\(wq_0 + qe_0 - 1_q \)	...	...	...

Each of these expressions can be filled in with actual values, because we know the values of all the variables!

	QW	WW	WE	EW	WQ	QE	...
0	Q:1, W:1, E:0	Q:0, W:2, E:0	Q:0, W:1, E:1	Q:0, W:1, E:1	Q:1, W:1, E:0	Q:1, W:0, E:1	...
1	Q:1, W:2, E:0	Q:0, W:2, E:1	Q:1, W:1, E:1	...	...	...

To add another row, we simply repeat this process. For each letter pair in the new row, calculate a new value using values from the previous row and the relationships between letter pairs:

	QW	WW	WE	EW	WQ	QE	...
0	Q:1, W:1, E:0	Q:0, W:2, E:0	Q:0, W:1, E:1	Q:0, W:1, E:1	Q:1, W:1, E:0	Q:1, W:0, E:1	...
1	Q:1, W:2, E:0	Q:0, W:2, E:1	Q:1, W:1, E:1	...	...	...
2	\(qw_1 + ww_1 - 1_w\)	\(we_1 + ew_1 - 1_e\)	\(wq_1 + qe_1 - 1_q \)	...	...	...
...
n	\(qw_{n-1} + ww_{n-1} - 1_w\)	\(we_{n-1} + ew_{n-1} - 1_e\)	\(wq_{n-1} + qe_{n-1} - 1_q \)	...	...	...

Alternatively, we can represent the process of building this table by "flipping" the tree diagram of our divide-and-conquer algorithm. Instead of starting at the top of the tree and splitting each node into 2 subproblems, we start by pre-defining all the nodes at the bottom of the tree and working our way from the bottom of the tree towards the top:

Diagram showing subproblems arranged horizontally in rows, where each subproblem in a row is linked to 2 subproblems in the previous row

Implementation

First, we decide how to represent "rows" in the table. We'll try the simple thing first and just map character pairs to counters:

type CounterRow = Map<Pair, Counter<char>>

The map keys are like the columns in our table example, and the values are like the cells - they represent the values after a specific number of expansions.

From there we'll start with the same function signature as before:^[4]

let countAllCharacters (rules: Rules) (template: char list) (iterations: int) : Counter<char> =
    // ...

Inside this function, we have access to the rule set for a particular problem instance. We can use this to determine what all possible letter pairs are. In our instances so far, there has been a rule for every letter combination, so we can just take the "key" side of the rules.

let allPairs : Pair array =
    rules
    |> Map.keys
    |> Array.ofSeq

Next, starting data. This is building the "row zero" for our iteration.

// a helper function to create a base counter for a letter pair
let startingCounterForPair (l: char, r: char) : Counter<char> =
    Counter.empty ()
    |> Counter.incr l
    |> Counter.incr r

// build the start row by applying that function for all known letter pairs
let initialRow: CounterRow =
    allPairs
    |> Seq.map (fun pair ->
        pair, (startingCounterForPair pair)
    )
    |> Map.ofSeq

Once we have the starting row, how do we build the subsequent rows? We can break this down a little by first getting the next row value for a single letter pair, i.e. one column in the next row.

let nextCounterForPair (previous: CounterRow) (l: char, r: char) : Counter<char> =
    // get the expansion character for this pair
    let c = Map.find (l, r) rules
    // use that to find the right cells in the previous row to add together
    let prevLeft = Map.find (l, c) previous
    let prevRight = Map.find (c, r) previous
    // add cells from previous iteration, and account for overlap
    Counter.add prevLeft prevRight |> Counter.decr c

With this in place, building a while row is just applying nextCounterForPair for all letter pairs, e.g.:

allPairs
|> Seq.map (fun pair -> pair, nextCounterForPair previousRow pair)
|> Map.ofSeq

So we wrap this operation in a loop -

let rec loop (iteration: int) (previousRow: CounterRow) =
    // iteration*s* is from outer function scope
    if iterations - iteration = 0 then
        previousRow
    else
        let nextRow =
            allPairs
            |> Seq.map (fun p -> p, nextCounterForPair previousRow p)
            |> Map.ofSeq
        // recursion is in tail call position
        loop (iteration + 1) nextRow

"But that's not a loop, that's recursion again!" Yes - but this time, our recursive call is eligible for tail call optimization. The recursion is the last thing the function does before returning. This means the compiler should literally convert this to a loop for us, so the above should behave equivalently to:

let loop (iterations: int) (initialRow : CounterRow) =
  let mut i = 0
  let mut row = initialRow
  while i < iterations do:
    let nextRow =
      allPairs
      |> Seq.map (fun p -> p, nextCounterForPair row p)
      |> Map.ofSeq
    row <- nextRow
    i <- i + 1

  row

But why deal with mutable semantics if you don't have to?

Finally, we need to use our building blocks to process the template argument of countAllCharacters.

let countersByPair: CounterRow = loop 0 initialRow // (1)

template
|> List.pairwise
|> List.map (fun p -> countersByPair[p]) // (2)
|> List.fold Counter.add (Counter.empty ())
|> fun countWithOverlap ->
    //(3)
    template[1..(List.length template - 2)]
    |> List.fold (fun counter ch -> Counter.decr ch counter) countWithOverlap

Run our iterative algorithm, yielding a "row" containing the counters for each possible letter pair after the specified number of iterations. All the heavy computation happens here.
When we go through the letter pairs in the template, we don't need to calculate anything - we just look up the computed result
After adding the pairwise results together, we have to account for how the letter pairs overlap. The first and last characters in the template string only appear in one pair, but all the other characters appear in two different letter pairs: once as the second character in a pair and once as the first character in a pair. So we can account for the overlaps by subtracting 1 from the count for each letter in the template except the first and last.

Here is the completed module code

Requires the Counter module from previous posts.

type Pair = char * char
type Rules = Map<Pair, char>

module Iterative =

    type CounterRow = Map<Pair, Counter<char>>

    let countAllCharacters (rules : Rules) (template : char list) (iterations  : int) =
        let allPairs =
            rules
            |> Map.keys
            |> Array.ofSeq

        let startingCounterForPair (l: char, r: char) =
            Counter.empty ()
            |> Counter.incr l
            |> Counter.incr r

        let nextCounterForPair (previous: CounterRow) (l: char, r: char) =
            let c = Map.find (l, r) rules
            let prevLeft = Map.find (l, c) previous
            let prevRight = Map.find (c, r) previous
            Counter.add prevLeft prevRight |> Counter.decr c

        let rec loop (iteration: int) (previousRow: CounterRow) =
            if iterations - iteration = 0 then
                previousRow
            else
                let nextRow =
                    allPairs
                    |> Array.map (fun p -> p, nextCounterForPair previousRow p)
                    |> Map.ofArray
                loop (iteration + 1) nextRow

        let initialRow : CounterRow =
            allPairs
            |> Seq.map (fun pair ->
                pair, (startingCounterForPair pair)
            )
            |> Map.ofSeq

        // this is our completed table - or rather the last row
        let countersByPair = loop 0 initialRow

        // goal is to process the *template*

        let countWithOverlaps =
            template
            |> List.pairwise
            |> List.map (fun p -> countersByPair[p])
            |> List.fold Counter.add (Counter.empty ())

        // must subtract one for every character in the template except for the first and the last
        let adjustedCount =
            template[1..(List.length template - 2)]
            |> List.fold (flip Counter.decr) countWithOverlaps

        adjustedCount

Testing

In previous posts we built up a small F# script to run our different algorithms interactively. Because we've used the same function signature for all the algorithms, it is simple to add the new implementation as well.

Outline of `ExtendedPolymerization.fsx`

module String =
    // ...

open System.Collections.Generic
type Counter<'t when 't: comparison> = Dictionary<'t, uint64>

module Counter =
    // ...

type Pair = char * char
type Rules = Map<Pair, char>

module Recursive =
    // ...

module Memoized =
    // ...

module Iterative =
    // ...

let sampleRules =
    Map.ofList [
        ('Q', 'Q'), 'E'
        ('Q', 'W'), 'W'
        ('Q', 'E'), 'Q'
        ('W', 'Q'), 'W'
        ('W', 'W'), 'E'
        ('W', 'E'), 'Q'
        ('E', 'Q'), 'Q'
        ('E', 'W'), 'E'
        ('E', 'E'), 'W'
    ]

let sampleString = "QWE"
let longSampleString = "EWQQWEEWQEWEQ"

let countWithFunction solver start n =
    start
    |> String.toCharList
    |> fun cs ->
        solver sampleRules cs n
    |> Seq.map (fun (kv: KeyValuePair<char, uint64>) -> kv.Key, kv.Value)
    |> Seq.toList

let countRecursive = countWithFunction Recursive.countAllCharacters
let countMemoized = countWithFunction Memoized.countAllCharacters
let countIterative = countWithFunction Iterative.countAllCharacters

We can now compare our new algorithm to the previous ones in the F# REPL fsi:

> #load "ExtendedPolymerization.fsx";;
[Loading C:\Users\mrsei\source\advent of code\2021\ExtendedPolymerization.fsx]
module FSI_0002.ExtendedPolymerization
...

> open FSI_0002.ExtendedPolymerization;;
> #time "on";;

--> Timing now on

> countRecursive longSampleString 20;;
Real: 00:00:10.573, CPU: 00:00:10.593, GC gen0: 1877, gen1: 2, gen2: 1
val it: (char * uint64) list =
  [('E', 4413883UL); ('Q', 7361286UL); ('W', 807744UL)]

> countMemoized longSampleString 20;;
Real: 00:00:00.000, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
val it: (char * uint64) list =
  [('E', 4413879UL); ('Q', 7361286UL); ('W', 807740UL)]

> countIterative longSampleString 20;;
Real: 00:00:00.001, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
val it: (char * uint64) list =
  [('E', 4413883UL); ('Q', 7361286UL); ('W', 807744UL)]

The output of our new implementation matches our earlier ones, so we can assume it produces correct results. But we don't seem to be stressing either the memoized or iterative implementations enough to tell a difference between them.

> countMemoized longSampleString 40;;
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
...

> countIterative longSampleString 40;;
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
...

> countMemoized longSampleString 60;;
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
...

> countIterative longSampleString 60;;
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
...

> countMemoized longSampleString 80;;
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
...

> countIterative longSampleString 80;;
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
...

After more than 80 iterations, we quickly approach the maximum value of UInt64, with no discernable difference between the two algorithms. There are several possible reasons:

Small alphabet size - our toy alphabet is only three characters (QWE). This means the number of possible pairings is small, and the number of pairings determines how "wide" our memoization cache and dynamic programming table structures are.
Single invocation - the use case that benefits the most from either memoization or dynamic programming is querying against the same data multiple times. In our case, this would be like asking the algorithms to expand multiple different starting strings while allowing them to keep calculated results, but we aren't testing that here.
Not measuring memory usage - in theory memoization would use more memory than the iterative version, but fsi doesn't report heap memory usage, just garbage collector runs, so we can't see this difference if there is one.

Better Benchmarking

If we really want to compare the memoized and iterative algorithms, we will have to consider many more factors, such as:

How many iterations of an algorithm are needed to get good results,
What parameters have the most impact on the runtime and memory usage,
The statistical characteristics of our results, like standard deviation and outliers,
How to 'warm up' the measurements correctly, so that we aren't measuring the time it takes to set up the runtime,
How to measure memory usage correctly,
How .NET may perform optimizations, and more.

Fortunately other people have done a lot of work figuring out how best to benchmark .NET code. One of the results is BenchmarkDotNet - a framework specifically for .NET performance testing. In the future, I plan to write about using this framework to measure these algorithms.

This post is part of a series: "Divide-and-conquer, Memoization, and Dynamic Programming"

Divide and Conquer
Memoization
Dynamic Programming
Benchmarking & Analysis (in progress)

And it only took me 6 months to get around to writing about it. 🙂 ↩︎
The name "dynamic programming" has more to do with the context in which it was developed than the actual technique, at least in computer science. ↩︎
The expansion rules used here are described in a previous post: Extended Polymerization - Divide and Conquer ↩︎
See part 1, part 2 ↩︎