[Eigenword] (sometimes eigenfeature): a real-valued vector associated with a word that captures its meaning in the sense that distributionally similar words have similar eigenwords. Eigenwords are computed as the singular vectors of the matrix of co-occurrence of words and their contexts.
- context-oblivious: the vector does not depend on the context, only on the word
- context-specific: the vector does depend on the context
We use spectral methods (SVD) to building statistical language models. The resulting vector models of language are then used to predict a variety of properties of words including their entity type (E.g., person, place, organization ...), their part of speech, and their "meaning" (or at least their word sense). Canonical Correlation Analysis, CCA, a generalization of Principle Component Analysis (PCA), gives context-oblivious vector representations of words. More sophisticated spectral methods are used to estimate Hidden Markov Models (HMMs) and generative parsing models such as dependency parsers. These methods give context-dependent state estimates, which again improve performance on many NLP tasks.
contact: Prof. Lyle Ungar (firstname.lastname@example.org)
co-advisor: Prof. Dean Foster (Statistics)
To get a flavor for our approach,
- Read the introductory background on eigenwords and the survey From Frequency to Meaning: Vector Space Models of Semantics
for more detail on methods see Background and Method, especially in the git directories listed under "software"
For more information
- Multi-View Learning of Word Embeddings via CCA
- NIPS 2011: Dhillon, Foster and Ungar
- Spectral dimensionality reduction for HMMs
- ArXiV 2012: Foster, Rodu and Ungar
- Spectral Learning of Latent-Variable PCFGs
- ACL 2012: Cohen, Stratos, Collins, Foster and Ungar
- Spectral Dependency Parsing with Latent Variables
- EMNLP-CoNLL 2012: Dhillon, Rodu, Collins, Foster and Ungar
- Two Step CCA: A new spectral method for estimating vector models of words
- ICML 2012:Paramveer Dhillon, Jordan Rodu, Dean Foster and Lyle Ungar
- and supplemental material