Saturday, May 17, 2014

Types, and two approaches to problem solving

Introduction

There are two broad approaches to problem solving that I see frequently in mathematics and computing. One is attacking a problem via subproblems, and another is attacking a problem via quotient problems. The former is well known though I’ll give some examples to make things clear. The latter can be harder to recognise but there is one example that just about everyone has known since infancy.


Subproblems

Consider sorting algorithms. A large class of sorting algorithms, including quicksort, break a sequence of values into two pieces. The two pieces are smaller so they are easier to sort. We sort those pieces and then combine them, using some kind of merge operation, to give an ordered version of the original sequence. Breaking things down into subproblems is ubiquitous and is useful far outside of mathematics and computing: in cooking, in finding our path from A to B, in learning the contents of a book. So I don’t need to say much more here.


Quotient problems

The term quotient is a technical term from mathematics. But I want to use the term loosely to mean something like this: a quotient problem is what a problem looks like if you wear a certain kind of filter over your eyes. The filter hides some aspect of the problem that simplifies it. You solve the simplified problem and then take off the filter. You now ‘lift’ the solution of the simplified problem to a solution to the full problem. The catch is that your filter needs to match your problem so I’ll start by giving an example where the filter doesn’t work.


Suppose we want to add a list of integers, say: 123, 423, 934, 114. We can try simplifying this problem by wearing a filter that makes numbers fuzzy so we can’t distinguish numbers that differ by less than 10. When we wear this filter 123 looks like 120, 423 looks like 420, 934 looks like 930 and 114 looks like 110. So we can try adding 120+420+930+110. This is a simplified problem and in fact this is a common technique to get approximate answers via mental arithmetic. We get 1580. We might hope that when wearing our filters, 1580 looks like the correct answer. But it doesn’t. The correct answer is 1594. This filter doesn’t respect addition in the sense that if a looks like a’ and b looks like b’ it doesn’t follow that a+b looks like a’+b’.


To solve a problem via quotient problems we usually need to find a filter that does respect the original problem. So let’s wear a different filter that allows us just to see the last digit of a number. Our original problem now looks like summing the list 3, 3, 4, 4. We get 4. This is the correct last digit. If we now try a filter that allows us to see just the last two digits we see that summing 23, 23, 34, 14 does in fact give the correct last two digits. This is why the standard elementary school algorithms for addition and multiplication work through the digits from right to left: at each stage we’re solving a quotient problem but the filter only respects the original problem if it allows us to see the digits to the right of some point, not digits to the left. This filter does respect addition in the sense that if a looks like a’ and b looks like b’ then a+b looks like a’+b’.


Another example of the quotient approach is to look at the knight’s tour problem in the case where two opposite corners have been removed from the chessboard. A knight’s tour is a sequence of knight’s moves that visit each square on a board exactly once. If we remove opposite corners of the chessboard, there is no knight’s tour of the remaining 62 squares. How can we prove this? If you don’t see the trick you can get get caught up in all kinds of complicated reasoning. So now put on a filter that removes your ability to see the spatial relationships between the squares so you can only see the colours of the squares. This respects the original problem in the sense that a knight’s move goes from a black square to a white square, or from a white square to a black square. The filter doesn’t stop us seeing this. But now it’s easier to see that there are two more squares of one colour than the other and so no knight’s tour is possible. We didn’t need to be able to see the spatial relationships at all.


(Note that this is the same trick as we use for arithmetic, though it’s not immediately obvious. If we think of the spatial position of a square as being given by a pair of integers (x, y), then the colour is given by x+y modulo 2. In other words, by the last digit of x+y written in binary. So it’s just the see-only-digits-on-the-right filter at work again.)


Wearing filters while programming

So now think about developing some code in a dynamic language like Python. Suppose we execute the line:


a = 1


The Python interpreter doesn’t just store the integer 1 somewhere in memory. It also stores a tag indicating that the data is to be interpreted as an integer. When you come to execute the line:


b = a+1


it will first examine the tag in a indicating its type, in this case int, and use that to determine what the type for b should be.


Now suppose we wear a filter that allows us to see the tag indicating the type of some data, but not the data itself. Can we still reason about what our program does?


In many cases we can. For example we can, in principle, deduce the type of


a+b*(c+1)/(2+d)


if we know the types of a, b, c, d. (As I’ve said once before, it’s hard to make any reliable statement about a bit of Python code so let's suppose that a, b, c and d are all either of type int or type float.) We can read and understand quite a bit of Python code wearing this filter. But it’s easy to go wrong. For example consider


if a>1 then:
return 1.0
else:
return 1


The type of the result depends on the value of the variable a. So if we’re wearing the filter that hides the data, then we can’t predict what this snippet of code does. When we run it, it might return an int sometimes and a float other times, and we won’t be able to see what made the difference.


In a statically typed language you can predict the type of an expression knowing the type of its parts. This means you can reason reliably about code while wearing the hide-the-value filter. This means that almost any programming problem can be split into two parts: a quotient problem where you forget about the values, and then problem of lifting a solution to the quotient problem to a solution to the full problem. Or to put that in more conventional language: designing your data and function types, and then implementing the code that fits those types.


I chose to make the contrast between dynamic and static languages just to make the ideas clear but actually you can happily use similar reasoning for both types of language. Compilers for statically typed languages, give you a lot of assistance if you choose to solve your programming problems this way.


A good example of this at work is given in Haskell. If you're writing a compiler, say, you might want to represent a piece of code as an abstract syntax tree, and implement algorithms that recurse through the tree. In Haskell the type system is strong enough that once you’ve defined the tree type the form of the recursion algorithms is often more or less given. In fact, it can be tricky to implement tree recursion incorrectly and have the code compile without errors. Solving the quotient problem of getting the types right gets you much of the way towards solving the full problem.


And that’s my main point: types aren’t simply a restriction mechanism to help you avoid making mistakes. Instead they are a way to reduce some complex programming problems to simpler ones. But the simpler problem isn’t a subproblem, it’s a quotient problem.

Dependent types

Dependently typed languages give you even more flexibility with what filters you wear. They allow you to mix up values and types. For example both C++ and Agda (to pick an unlikely pair) allow you to wear filters that hide the values of elements in your arrays while allowing you to see the length of your arrays. This makes it easier to concentrate on some aspects of your problem while completely ignoring others.


Notes

I wrote the first draft of this a couple of years ago but never published it. I was motivated to post by a discussion kicked off by Voevodsky on the TYPES mailing list http://lists.seas.upenn.edu/pipermail/types-list/2014/001745.html


This article isn’t a piece of rigorous mathematics and I’m using mathematical terms as analogies.


The notion of a subproblem isn’t completely distinct from a quotient problem. Some problems are both, and in fact some problems can be solved by transforming them so they become both.

More generally, looking at computer programs through different filters is one approach to abstract interpretation http://en.wikipedia.org/wiki/Abstract_interpretation. The intuition section there (http://en.wikipedia.org/wiki/Abstract_interpretation#Intuition) has much in common with what I’m saying.

4 comments:

  1. I think you could argue that the partition step of quicksort is a quotienty algorithm: to sort xs, we take the quotient of less-or-greater-than-pivot, and sort in the quotient. Compare to merge sort, which does the division into subproblems as the first step.

    ReplyDelete
  2. @quiz If you partition any set into a bunch of subsets then you have two ways of looking at the situation. You can think of the subsets as being *sub*sets. But you can think of the collection of subsets itself as being a quotient set. So you can move back and forth between the two views.

    A Galois connection (http://en.wikipedia.org/wiki/Galois_connection) is also something that allows you to go back and forth between quotient and sub- things. (That article also mentions abstract interpretation.)

    So to some extent it's a point of view. But in different contexts one or other view can seem more natural.

    ReplyDelete
  3. The intro reminded me of a recent post on cstheory, http://cstheory.stackexchange.com/questions/22578/would-it-be-possible-for-a-compiler-to-convert-a-recursive-sum-into-the-average

    The post could be cleaned up, but it asks an interesting question as to if a compiler could recognize and translate between various recursive formulas for the triangular numbers.

    $T_{a+b} = T_a + T_b + ab$, addition recursion

    $T_{ab} = T_aT_b + T_{a-1}T_{b-1}$, quotient recursion

    ReplyDelete
  4. @chad I finally got around to looking at that stackexchange link. As usual, my rule of thumb "90% of statements involving the term "Turing complete" are incorrect" seems to hold. The example given is part of a large family of recurrence relations that are trivial to solve and Mathematica's RSolve will handle it.

    Your own examples are a bit harder and I don't know how to attack them generally. If I were writing a compiler I'd throw expressions of the form SUM of POLYNOMIAL*EXPONENTIAL at them and solve for the coefficients.

    ReplyDelete