3 Comments
User's avatar
Harshad's avatar

Beautiful, amazing post. It cleared out lot of things for me. There are very few resources which are putting things out core ideas in plain words (like `just set strides to 0 for broadcasting` etc) like this. Kudos!

Any idea how Tinygrad is achieving fusing of operations? I mean I can understand superficially that you'll do some kind of tracing ala compilers and figure out when ops can be fused. But is there a similar resource which explains how fusing is actually done? Thanks!

Expand full comment
Kurt's avatar

Thanks!

Yes I have some idea how Tinygrad fuses. It basically does all operations lazily and builds a syntax tree of sorts, and then before actual execution (which in tinygrad can be: spit out some C code and run that) there is a bunch of handwritten code that does optimisation and fusion. At least that was the state when I looked at it maybe 6 months ago.

I am writing a third post on how to add automatic differentiation to Tensorken. That one will also describe how I added automatic fusion. spoiler: by implementing RawTensor with a Fuse type that delays the execution of multiplication until it sees the next operation. If that’s a sum, then it fuses.

Expand full comment
Harshad's avatar

Thanks Kurt! That explanation makes sense. Though I guess have to actually see this in the source code of Tinygrad.

Eagerly looking forward to your implementation/blog of auto diff! 🙂

Expand full comment