Upcoming Events

Presenter: Ben Snyder

Date: Mon. 11/9

Title: Multilingual Grammar Induction and the Decipherment of Ugaritic

In this talk, I will focus on two topics in multilingual learning: 1) The unsupervised induction of grammatical structure, and 2) the automatic decipherment of lost ancient languages. In the first task, the goal is to automatically discover the latent grammatical structure of multiple languages by exploiting cross-lingual variations in word order and syntax. These variations serve as a form of naturally occurring supervision that can guide unsupervised learning methods. To represent these variations, we develop a probabilistic tree alignment formalism that binds grammatical subtrees of translated sentences, while allowing them to diverge in structure when needed. An added benefit of this formalism is that it supports the efficient computation of probabilistic terms using dynamic programming. I will present experimental results on three language pairings. My current work is focused on deciphering lost languages with very limited text data. Computational decipherment is a long-standing open problem, and to date, no dead language has been deciphered by computational means. I will present an unsupervised model for inducing both character and morpheme level mappings between an unknown dead language and related living languages. Preliminary results on the Ugaritic language will be shown.