The standard algorithm for multiplying two nΓ—nn\times n matrices requires n3n^3 multiplications, but, remarkably, in 1969, Strassen showed it could be done in β‰ˆn2.8\approx n^{2.8} multiplications. This is fascinating to me, but the algorithm itself is arcane and seemingly comes out of the blue. In this little post I hope to demystify it by showing how you could have discovered Strassen’s algorithm using a computer-algebra system.


The product of 2Γ—22\times 2 matricies

A=(a11a12a21a22),B=(b11b12b21b22)A=\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{pmatrix},\quad B=\begin{pmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{pmatrix}

is typically written

AB=(a11b11+a12b21a11b12+a12b22a21b11+a22b21a21b12+a22b22).AB = \begin{pmatrix} a_{11}b_{11}+a_{12}b_{21} & a_{11}b_{12}+a_{12}b_{22}\\ a_{21}b_{11}+a_{22}b_{21} & a_{21}b_{12}+a_{22}b_{22} \end{pmatrix}.

but this requires 8 multiplications and multiplication is computationally expensive. Can we do better?

Remarkably, yes. In 1969 Volker Strassen showed it can be done with 7:

P1=(a11+a22)(b11+b22)P2=(a21+a22)b11P3=a11(b12βˆ’b22)P4=a22(b21βˆ’b11)P5=(a11+a12)b22P6=(a21βˆ’a11)(b11+b12)P7=(a12βˆ’a22)(b21+b22)\begin{aligned} P_1 &= (a_{11}+a_{22})(b_{11}+b_{22})\\ P_2 &= (a_{21}+a_{22}) b_{11}\\ P_3 &= a_{11} (b_{12}-b_{22})\\ P_4 &= a_{22} (b_{21}-b_{11})\\ P_5 &= (a_{11}+a_{12}) b_{22}\\ P_6 &= (a_{21}-a_{11})(b_{11}+b_{12})\\ P_7 &= (a_{12}-a_{22})(b_{21}+b_{22}) \end{aligned}
AB=(P1+P4βˆ’P5+P7P3+P5P2+P4P1βˆ’P2+P3+P6)AB = \begin{pmatrix} P_1 + P_4 - P_5 + P_7 & P_3 + P_5\\ P_2 + P_4 & P_1 - P_2 + P_3 + P_6 \end{pmatrix}

While ostensibly about 2Γ—22\times 2 matrices, this technique can be used to multiply arbitrary nΓ—nn\times n matricies as well. The basic idea is to break the nΓ—nn\times n matricies into four n/2Γ—n/2n/2\times n/2 quadrant matrices and use Strassen's algorithm to compute the nΓ—nn\times n product using seven matrix-multiplies over the n/2Γ—n/2n/2\times n/2 quadrants. Letting t(n)t(n) denote the number of multiplies required to multiply nΓ—nn\times n matricies using Strassen's algorithm, this gives the following recurrence relation for the value of t(n)t(n):

t(n)={1ifΒ n=17t(n/2)otherwiset(n)=\begin{cases} 1 & \text{if } n=1 \\ 7t(n/2) &\text{otherwise}\end{cases}

When nn is not a power of two, we can pad the matrices with zeros to make the recurrence terminating. Doing so gives t(n)=7log⁑2n=nlog⁑27β‰ˆn2.8t(n)=7^{\log_2n}=n^{\log_27}\approx n^{2.8}.

I find this remarkable: It wouldn't have even occurred to me the typical n3n^3 algorithm for matrix multiplication is sub-optimal. Moreover, the expressions for PiP_i seemingly comes from thin air. How would one go about "discovering" Strassen's algorithm?

In 1970 Richard Brent gave an algebraic method (p. 33) which we will work with for the remainder of this essay. At a high level, we will set up an algebra problem, the solutions to which correspond to ways to multiply 2Γ—22\times 2 matrices using 7 multiplications. This will have 64 equations with 84 variables, so to avoid solving this by hand we will use a SMT solver.

First, let C=ABC=AB and to simplify indexing let aˉ=(a11,a12,a21,a22)\bar{a}=(a_{11},a_{12},a_{21},a_{22}) be AA in row-major form and let bˉ\bar{b} and cˉ\bar{c} be likewise. Because aijakla_{ij} a_{kl} nor bijbklb_{ij} b_{kl} appear in the expression for CC, the kkth product will be of the form

Pk=(βˆ‘iaΛ‰iΞ±i(k))(βˆ‘jbΛ‰jΞ²j(k))P_k = (\sum_i \bar{a}_i\alpha^{(k)}_i)(\sum_j\bar{b}_j\beta^{(k)}_j)

where Ξ±i(k),Ξ²i(k)∈{βˆ’1,0,1}\alpha^{(k)}_i,\beta^{(k)}_i\in\{-1,0,1\} are variables indicating the leading coefficent of aΛ‰i\bar{a}_i/bΛ‰j\bar{b}_j in the kkth product. For example, in Strassen's algorithm P3=a11(b12βˆ’b22)P_3 = a_{11} (b_{12}-b_{22}). As aΛ‰1\bar{a}_1 appears in P3P_3, Ξ±1(3)=1\alpha^{(3)}_1=1. As βˆ’bΛ‰4-\bar{b}_4 appears in P3P_3, Ξ²4(3)=βˆ’1\beta^{(3)}_4=-1.

cΛ‰l\bar{c}_l will be a sum of PkP_ks, so letting Ξ³l(k)∈{βˆ’1,0,1}\gamma_l^{(k)}\in\{-1,0,1\} denote PkP_k's coefficient in the expression for cΛ‰l\bar{c}_l,

cΛ‰l=βˆ‘kΞ³l(k)Pk.\bar{c}_l = \sum_k\gamma_l^{(k)}P_k.

To make this an algebra problem, we need another formula for cˉl\bar{c}_l. We can use the standard expression for C=ABC=AB

C=(aˉ1bˉ1+aˉ2bˉ3aˉ1bˉ2+aˉ2bˉ4aˉ3bˉ1+aˉ4bˉ3aˉ3bˉ2+aˉ4bˉ4)C = \begin{pmatrix} \bar{a}_1\bar{b}_1+\bar{a}_2\bar{b}_3 & \bar{a}_1\bar{b}_2+\bar{a}_2\bar{b}_4 \\ \bar{a}_3\bar{b}_1+\bar{a}_4\bar{b}_3 & \bar{a}_3\bar{b}_2+\bar{a}_4\bar{b}_4 \end{pmatrix}

and let TijlT_{ijl} be 11 if aˉibˉj\bar{a}_i\bar{b}_j appears in the above expression for cˉl\bar{c}_l and 00 otherwise, giving the desired formula for cˉl\bar{c}_l below.

cΛ‰l=βˆ‘iβˆ‘jaΛ‰ibΛ‰jTijl.\bar{c}_l = \sum_i\sum_j\bar{a}_i\bar{b}_jT_{ijl}.

To equate these two expressions, note

cΛ‰l=βˆ‘kwl(k)Pk=βˆ‘kwl(k)(βˆ‘iaΛ‰iΞ±i(k))(βˆ‘jbΛ‰jΞ²j(k))=βˆ‘kwl(k)βˆ‘iβˆ‘jaΛ‰ibΛ‰jΞ±i(k)Ξ²j(k)=βˆ‘iβˆ‘jaΛ‰ibΛ‰jβˆ‘kwl(k)Ξ±i(k)Ξ²j(k)\begin{aligned} \bar{c}_l &= \sum_kw_l^{(k)}P_k \\ &= \sum_kw_l^{(k)}(\sum_i \bar{a}_i\alpha^{(k)}_i)(\sum_j \bar{b}_j\beta^{(k)}_j) \\ &= \sum_kw_l^{(k)}\sum_i\sum_j\bar{a}_i\bar{b}_j\alpha^{(k)}_i\beta^{(k)}_j \\ &= \sum_i\sum_j\bar{a}_i\bar{b}_j\sum_kw_l^{(k)}\alpha^{(k)}_i\beta^{(k)}_j \end{aligned}

so we have

Tijl=βˆ‘kwl(k)Ξ±i(k)Ξ²j(k)T_{ijl} = \sum_kw_l^{(k)}\alpha^{(k)}_i\beta^{(k)}_j

an algebra problem, the solutions to which correspond to ways to compute C=ABC=AB using kk multiplies. Though, as hinted before, because our matrices have 4 entries each, the ii, jj, and ll indicies range over 4 values, making 4β‹…4β‹…4=644\cdot 4\cdot 4=64 equations. Adding to our complications, to recover Strassen's algorithm we'd set k=7k=7, meaning there are 3β‹…4β‹…7=843\cdot 4\cdot 7=84 wl(k)w_l^{(k)}, Ξ±i(k)\alpha^{(k)}_i, and Ξ²j(k)\beta^{(k)}_j variables. We will need a computer to solve this algebra problem.

I used Z3 for this, an SMT solver written at Microsoft. The code is straightforward, you can read it if you like, but trying to solve the system of equations above will fail at first blush. To narrow the search space, I added two constraints to the equations:

  1. Swapping PiP_i with PjP_j yields a new, but uninteresting, solution. To stop the solver from finding these duplicates, I require the PkP_ks to be lexographically ordered, viewing them as strings of Ξ±i(k)\alpha^{(k)}_i and Ξ²j(k)\beta^{(k)}_j variables.
  2. Negating PkP_k and all its corresponding wl(k)w^{(k)}_l values, or negating both of PkP_k's summations yields a new, but uninteresting, solution. To stop the solver from finding these duplicates, I require the first non-zero Ξ±i(k)\alpha^{(k)}_i and Ξ²j(k)\beta^{(k)}_j values in PkP_k's summations to be positive. This picks a sign for PkP_k, as exactly one of (βˆ’1)(Σ… )(-1)(\Sigma\dots) and (Σ… )(\Sigma\dots) has a positive first non-zero term.

With these symmetries removed, the solver finds several Strassen-like schemes in a minute. Nice - we've rediscovered Strassen's algorithm from first principles.


Thanks to Pete, Jacob, Groceries, and Elam for interesting conversations, and Entrepreneurs First for for hosting the weekend hackathon where I got to weave this together.