The standard algorithm for multiplying two n×n matrices
requires n3 multiplications, but, remarkably, in 1969, Strassen
showed it could be done in ≈n2.8 multiplications. This is
fascinating to me, but the algorithm itself is arcane and seemingly
comes out of the blue. In this little post I hope to demystify it by
showing how you could have discovered Strassen’s algorithm using a
computer-algebra system.
While ostensibly about 2×2 matrices, this technique can be
used to multiply arbitrary n×n matricies as well. The basic
idea is to break the n×n matricies into four n/2×n/2
quadrant matrices and use Strassen's algorithm to compute the
n×n product using seven matrix-multiplies over the
n/2×n/2 quadrants. Letting t(n) denote the number of
multiplies required to multiply n×n matricies using Strassen's
algorithm, this gives the following recurrence relation for the value
of t(n):
t(n)={17t(n/2)if n=1otherwise
When n is not a power of two, we can pad the matrices with zeros to
make the recurrence terminating. Doing so gives
t(n)=7log2n=nlog27≈n2.8.
I find this remarkable: It wouldn't have even occurred to me the
typical n3 algorithm for matrix multiplication is
sub-optimal. Moreover, the expressions for Pi seemingly comes from
thin air. How would one go about "discovering" Strassen's algorithm?
In 1970 Richard Brent gave an algebraic
method (p. 33)
which we will work with for the remainder of this essay. At a high
level, we will set up an algebra problem, the solutions to which
correspond to ways to multiply 2×2 matrices using 7
multiplications. This will have 64 equations with 84 variables, so to
avoid solving this by hand we will use a SMT
solver.
First, let C=AB and to simplify indexing let
aˉ=(a11,a12,a21,a22) be A in row-major form and
let bˉ and cˉ be likewise. Because aijakl nor
bijbkl appear in the expression for C, the kth product
will be of the form
Pk=(i∑aˉiαi(k))(j∑bˉjβj(k))
where αi(k),βi(k)∈{−1,0,1} are variables
indicating the leading coefficent of aˉi/bˉj in the
kth product. For example, in Strassen's algorithm P3=a11(b12−b22). As aˉ1 appears in P3,
α1(3)=1. As −bˉ4 appears in P3,
β4(3)=−1.
cˉl will be a sum of Pks, so letting
γl(k)∈{−1,0,1} denote Pk's coefficient in the expression
for cˉl,
cˉl=k∑γl(k)Pk.
To make this an algebra problem, we need another formula for
cˉl. We can use the standard expression for C=AB
an algebra problem, the solutions to which correspond to ways to
compute C=AB using k multiplies. Though, as hinted before, because
our matrices have 4 entries each, the i, j, and l indicies
range over 4 values, making 4⋅4⋅4=64 equations. Adding to
our complications, to recover Strassen's algorithm we'd set k=7,
meaning there are 3⋅4⋅7=84wl(k), αi(k),
and βj(k) variables. We will need a computer to solve this
algebra problem.
I used Z3 for this, an SMT solver
written at Microsoft. The
code is
straightforward, you can read it if you like, but trying to solve the
system of equations above will fail at first blush. To narrow the
search space, I added two constraints to the equations:
Swapping Pi with Pj yields a new, but uninteresting,
solution. To stop the solver from finding these duplicates, I
require the Pks to be lexographically ordered, viewing them as
strings of αi(k) and βj(k) variables.
Negating Pk and all its corresponding wl(k) values, or
negating both of Pk's summations yields a new, but
uninteresting, solution. To stop the solver from finding these
duplicates, I require the first non-zero αi(k) and
βj(k) values in Pk's summations to be positive. This
picks a sign for Pk, as exactly one of (−1)(Σ…) and
(Σ…) has a positive first non-zero term.
With these symmetries removed, the solver finds several Strassen-like
schemes in a minute. Nice - we've rediscovered Strassen's algorithm
from first principles.
Thanks to Pete, Jacob, Groceries, and Elam for interesting
conversations, and Entrepreneurs First for for hosting the weekend
hackathon where I got to weave this
together.