HLI’s Franz Och, Chief Data Scientist, wins NAACL Best Paper Award for Machine Learning Word Study

The North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL HLT) honored HLI’s Franz Och with the award for best long paper.  Three awards were presented, one for best long and two for best student papers.

Franz Och, HLI’s Chief Data Scientist, produced this paper while at Google, where he devised Google Translate, before joining HLI.  The paper, entitled Unsupervised Morphology Induction Using Word Embeddings (co-written with Radu Soricut of Google) explored learn word morphology in an unsupervised way from large amounts of text.

The construct does this by representing words in a high-dimensional vector space and representing morphological rules with ‘vector arithmetic’. So the system learns for example how the affix ‘ly’ changes the meaning of a word (immediate -> immediately) or what the prefix ‘over’ does to a word.  “The cool thing of the paper”, as Franz describes his work, is that the machine does this completely unsupervised, so it can be applied to all languages that have prefix or suffix morphology. The paper shows results for a diverse set of languages and shows that modeling this improves, for example, word similarity tests.