I realized that my research style has evolved into this highly-empirical, inductive, ablation-heavy, and rigorous-to-a-fault nature. It’s good and I think that’s where my strengths lie especially given my experience in frontier LM training, but I’m starting to realize that this style is unsustainable in an environment where compute is scarce (e.g., academia): you can’t do all the ablations you want, no matter how creative you can be.
This term, I am slowly shifting towards more theory-heavy approaches. This is quite exciting, as I think these are the types of work that can only be done in an academic setting (industry will always ask: “how can we apply this at the present moment?"). Recently, I’ve been interested about geometric ML, I think it’s pretty cool.