From Spitkovsky et al. (2009)[1]:

We present an empirical study of two very simple approaches to unsupervised grammar induction. Both are based on Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, requires no initialization and bootstraps itself via iterated learning of increasingly longer sentences. This method substantially exceeds Klein and Manning’s published numbers and achieves 39.4% accuracy on Section 23 of the Wall Street Journal corpus — a result that is already competitive with the recent state-of-the-art. The second, Less is More, is based on the obser- vation that there is sometimes a trade-off between the quantity and complexity of training data. Using the standard linguistically-informed prior but training at the “sweet spot” — sentences up to length 15, it attains 44.1% accuracy, beating state-of-the-art. Both results generalize to the Brown corpus and shed light on opportunities in the present state of unsupervised dependency parsing.

References Edit

  1. Spitkovsky, V. I., Alshawi, H., & Jurafsky, D. (2009). Baby Steps: How “Less is More” in unsupervised dependency parsing. NIPS: Grammar Induction, Representation of Language and Language Learning, 1-10.