The standard algorithm for multiplying two n×nn\times n matrices requires n3n^3 multiplications, but, remarkably, in 1969, Strassen showed it could be done in n2.8\approx n^{2.8} multiplications. This is fascinating to me, but the algorithm itself is arcane and seemingly comes out of the blue. In this little post I hope to demystify it by showing how you could have discovered Strassen’s algorithm using a computer-algebra system.


The product of 2×22\times 2 matricies

A=(a11a12a21a22),B=(b11b12b21b22)A=\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{pmatrix},\quad B=\begin{pmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{pmatrix}

is typically written

AB=(a11b11+a12b21a11b12+a12b22a21b11+a22b21a21b12+a22b22).AB = \begin{pmatrix} a_{11}b_{11}+a_{12}b_{21} & a_{11}b_{12}+a_{12}b_{22}\\ a_{21}b_{11}+a_{22}b_{21} & a_{21}b_{12}+a_{22}b_{22} \end{pmatrix}.

but this requires 8 multiplications and multiplication is computationally expensive. Can we do better?

Remarkably, yes. In 1969 Volker Strassen showed it can be done with 7:

P1=(a11+a22)(b11+b22)P2=(a21+a22)b11P3=a11(b12b22)P4=a22(b21b11)P5=(a11+a12)b22P6=(a21a11)(b11+b12)P7=(a12a22)(b21+b22)\begin{aligned} P_1 &= (a_{11}+a_{22})(b_{11}+b_{22})\\ P_2 &= (a_{21}+a_{22}) b_{11}\\ P_3 &= a_{11} (b_{12}-b_{22})\\ P_4 &= a_{22} (b_{21}-b_{11})\\ P_5 &= (a_{11}+a_{12}) b_{22}\\ P_6 &= (a_{21}-a_{11})(b_{11}+b_{12})\\ P_7 &= (a_{12}-a_{22})(b_{21}+b_{22}) \end{aligned}
AB=(P1+P4P5+P7P3+P5P2+P4P1P2+P3+P6)AB = \begin{pmatrix} P_1 + P_4 - P_5 + P_7 & P_3 + P_5\\ P_2 + P_4 & P_1 - P_2 + P_3 + P_6 \end{pmatrix}

While ostensibly about 2×22\times 2 matrices, this technique can be used to multiply arbitrary n×nn\times n matricies as well. The basic idea is to break the n×nn\times n matricies into four n/2×n/2n/2\times n/2 quadrant matrices and use Strassen's algorithm to compute the n×nn\times n product using seven matrix-multiplies over the n/2×n/2n/2\times n/2 quadrants. Letting t(n)t(n) denote the number of multiplies required to multiply n×nn\times n matricies using Strassen's algorithm, this gives the following recurrence relation for the value of t(n)t(n):

t(n)={1if n=17t(n/2)otherwiset(n)=\begin{cases} 1 & \text{if } n=1 \\ 7t(n/2) &\text{otherwise}\end{cases}

When nn is not a power of two, we can pad the matrices with zeros to make the recurrence terminating. Doing so gives t(n)=7log2n=nlog27n2.8t(n)=7^{\log_2n}=n^{\log_27}\approx n^{2.8}.

I find this remarkable: It wouldn't have even occurred to me the typical n3n^3 algorithm for matrix multiplication is sub-optimal. Moreover, the expressions for PiP_i seemingly comes from thin air. How would one go about "discovering" Strassen's algorithm?

In 1970 Richard Brent gave an algebraic method (p. 33) which we will work with for the remainder of this essay. At a high level, we will set up an algebra problem, the solutions to which correspond to ways to multiply 2×22\times 2 matrices using 7 multiplications. This will have 64 equations with 84 variables, so to avoid solving this by hand we will use a SMT solver.

First, let C=ABC=AB and to simplify indexing let aˉ=(a11,a12,a21,a22)\bar{a}=(a_{11},a_{12},a_{21},a_{22}) be AA in row-major form and let bˉ\bar{b} and cˉ\bar{c} be likewise. Because aijakla_{ij} a_{kl} nor bijbklb_{ij} b_{kl} appear in the expression for CC, the kkth product will be of the form

Pk=(iaˉiαi(k))(jbˉjβj(k))P_k = (\sum_i \bar{a}_i\alpha^{(k)}_i)(\sum_j\bar{b}_j\beta^{(k)}_j)

where αi(k),βi(k){1,0,1}\alpha^{(k)}_i,\beta^{(k)}_i\in\{-1,0,1\} are variables indicating the leading coefficent of aˉi\bar{a}_i/bˉj\bar{b}_j in the kkth product. For example, in Strassen's algorithm P3=a11(b12b22)P_3 = a_{11} (b_{12}-b_{22}). As aˉ1\bar{a}_1 appears in P3P_3, α1(3)=1\alpha^{(3)}_1=1. As bˉ4-\bar{b}_4 appears in P3P_3, β4(3)=1\beta^{(3)}_4=-1.

cˉl\bar{c}_l will be a sum of PkP_ks, so letting γl(k){1,0,1}\gamma_l^{(k)}\in\{-1,0,1\} denote PkP_k's coefficient in the expression for cˉl\bar{c}_l,

cˉl=kγl(k)Pk.\bar{c}_l = \sum_k\gamma_l^{(k)}P_k.

To make this an algebra problem, we need another formula for cˉl\bar{c}_l. We can use the standard expression for C=ABC=AB

C=(aˉ1bˉ1+aˉ2bˉ3aˉ1bˉ2+aˉ2bˉ4aˉ3bˉ1+aˉ4bˉ3aˉ3bˉ2+aˉ4bˉ4)C = \begin{pmatrix} \bar{a}_1\bar{b}_1+\bar{a}_2\bar{b}_3 & \bar{a}_1\bar{b}_2+\bar{a}_2\bar{b}_4 \\ \bar{a}_3\bar{b}_1+\bar{a}_4\bar{b}_3 & \bar{a}_3\bar{b}_2+\bar{a}_4\bar{b}_4 \end{pmatrix}

and let TijlT_{ijl} be 11 if aˉibˉj\bar{a}_i\bar{b}_j appears in the above expression for cˉl\bar{c}_l and 00 otherwise, giving the desired formula for cˉl\bar{c}_l below.

cˉl=ijaˉibˉjTijl.\bar{c}_l = \sum_i\sum_j\bar{a}_i\bar{b}_jT_{ijl}.

To equate these two expressions, note

cˉl=kwl(k)Pk=kwl(k)(iaˉiαi(k))(jbˉjβj(k))=kwl(k)ijaˉibˉjαi(k)βj(k)=ijaˉibˉjkwl(k)αi(k)βj(k)\begin{aligned} \bar{c}_l &= \sum_kw_l^{(k)}P_k \\ &= \sum_kw_l^{(k)}(\sum_i \bar{a}_i\alpha^{(k)}_i)(\sum_j \bar{b}_j\beta^{(k)}_j) \\ &= \sum_kw_l^{(k)}\sum_i\sum_j\bar{a}_i\bar{b}_j\alpha^{(k)}_i\beta^{(k)}_j \\ &= \sum_i\sum_j\bar{a}_i\bar{b}_j\sum_kw_l^{(k)}\alpha^{(k)}_i\beta^{(k)}_j \end{aligned}

so we have

Tijl=kwl(k)αi(k)βj(k)T_{ijl} = \sum_kw_l^{(k)}\alpha^{(k)}_i\beta^{(k)}_j

an algebra problem, the solutions to which correspond to ways to compute C=ABC=AB using kk multiplies. Though, as hinted before, because our matrices have 4 entries each, the ii, jj, and ll indicies range over 4 values, making 444=644\cdot 4\cdot 4=64 equations. Adding to our complications, to recover Strassen's algorithm we'd set k=7k=7, meaning there are 347=843\cdot 4\cdot 7=84 wl(k)w_l^{(k)}, αi(k)\alpha^{(k)}_i, and βj(k)\beta^{(k)}_j variables. We will need a computer to solve this algebra problem.

I used Z3 for this, an SMT solver written at Microsoft. The code is straightforward, you can read it if you like, but trying to solve the system of equations above will fail at first blush. To narrow the search space, I added two constraints to the equations:

  1. Swapping PiP_i with PjP_j yields a new, but uninteresting, solution. To stop the solver from finding these duplicates, I require the PkP_ks to be lexographically ordered, viewing them as strings of αi(k)\alpha^{(k)}_i and βj(k)\beta^{(k)}_j variables.
  2. Negating PkP_k and all its corresponding wl(k)w^{(k)}_l values, or negating both of PkP_k's summations yields a new, but uninteresting, solution. To stop the solver from finding these duplicates, I require the first non-zero αi(k)\alpha^{(k)}_i and βj(k)\beta^{(k)}_j values in PkP_k's summations to be positive. This picks a sign for PkP_k, as exactly one of (1)(Σ)(-1)(\Sigma\dots) and (Σ)(\Sigma\dots) has a positive first non-zero term.

With these symmetries removed, the solver finds several Strassen-like schemes in a minute. Nice - we've rediscovered Strassen's algorithm from first principles.


Thanks to Pete, Jacob, Groceries, and Elam for interesting conversations, and Entrepreneurs First for for hosting the weekend hackathon where I got to weave this together.