_Featured Blog Post
Russian Natural Language Processing
It has become standard practice in the Natural Language Processing (NLP) community. Release a well-optimized English corpus model, and then procedurally apply it to dozens (or even hundreds) of additional foreign languages. These secondary language models are usually trained in a fully unsupervised manner. They're published a few months after the initial English version on ArXiv, and it all makes a big splash in the tech press. But can English-trained models be naively extended to supplementary non-English languages, or is some native-level understanding of a language required prior to a model update?