Lj V. Miranda - Micro Blog

On Filipino NLP

Over the holidays, I’ve been thinking a lot about what it means to do Filipino NLP now that we’re in the age of LLMs. Multilingual LLMs are getting better and core NLP tasks such as NER or sentiment analysis are now streamlined by models like GPT-4.

I’ve decided to bet on post-training and adaptation. I believe that this unlocks several opportunities for resource-constrained and small Filipino NLP research communities to contribute to a larger whole. Here’s an excerpt from my blog post:

While I still believe in building artisanal Filipino NLP resources, I now see that we need to simultaneously support the development of multilingual LLMs by creating high-quality Filipino datasets and benchmarks. This way, we can actively push for the inclusion of Philippine languages in the next generation of multilingual LLMs, rather than just waiting for improvements to happen on their own.