3 Comments

Beautiful, amazing post. It cleared out lot of things for me. There are very few resources which are putting things out core ideas in plain words (like `just set strides to 0 for broadcasting` etc) like this. Kudos!

Any idea how Tinygrad is achieving fusing of operations? I mean I can understand superficially that you'll do some kind of tracing ala compilers and figure out when ops can be fused. But is there a similar resource which explains how fusing is actually done? Thanks!

Expand full comment

Thanks!

Yes I have some idea how Tinygrad fuses. It basically does all operations lazily and builds a syntax tree of sorts, and then before actual execution (which in tinygrad can be: spit out some C code and run that) there is a bunch of handwritten code that does optimisation and fusion. At least that was the state when I looked at it maybe 6 months ago.

I am writing a third post on how to add automatic differentiation to Tensorken. That one will also describe how I added automatic fusion. spoiler: by implementing RawTensor with a Fuse type that delays the execution of multiplication until it sees the next operation. If that’s a sum, then it fuses.

Expand full comment

Thanks Kurt! That explanation makes sense. Though I guess have to actually see this in the source code of Tinygrad.

Eagerly looking forward to your implementation/blog of auto diff! 🙂

Expand full comment