Core features (e.g. word form, part-of-speech) are cheap but insufficient for good performance. Different combinations of core features have very different impact and working out the most effective ones among exponentially many of them is an obstacle for NLP advancement.

Kernel methods Edit

According to Goldberg (2015)[1]: "Kernel methods (Shawe-Taylor & Cristianini, 2004)[2], and in particular polynomial kernels (Kudo & Matsumoto, 2003)[3] allow the feature designer to specify only core features [...] However, the classification efficiency in kernel methods scales linearly with the size of the training data, making them too slow for most practical purposes, and not suitable for training with large datasets."

Forward, backward and two-way selection Edit

Zhou et al. (2003)[4]: smart selection, lookahead,...?

Nilsson & Nugues (2010)[5] proposed a procedure to discover primitive features and combine them automatically. Their work is restricted in dependency parsing where they define neighbourhood between primitive features.

Ballesteros & Bohnet (2014)[6] used a joint forward-backward approach similar to MaltOptimizer[7].

Lei et al. (2014)[8]???

Neural network Edit

Chen & Manning (2014)[9], ...

References Edit

  1. Yoav Goldberg. 2015. A Primer on Neural Network Models for Natural Language Processing
  2. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.
  3. Kudo, T., & Matsumoto, Y. (2003). Fast Methods for Kernel-based Text Analysis. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pp. 24–31, Stroudsburg, PA, USA. Association for Computational Linguistics.
  4. Zhou, Y., Weng, F., Wu, L., & Schmidt, H. (2003). A Fast Algorithm for Feature Selection in Conditional Maximum Entropy Modeling. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Retrieved from
  5. Nilsson, P., & Nugues, P. (2010). Automatic Discovery of Feature Sets for Dependency Parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 824–832). Coling 2010 Organizing Committee.
  6. Ballesteros, M., & Bohnet, B. (2014). Automatic Feature Selection for Agenda-Based Dependency Parsing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 794–805). Dublin City University and Association for Computational Linguistics.
  7. Miguel Ballesteros and Joakim Nivre. 2014. MaltOptimizer: Fast and Effective Parser Optimization. Natural Language Engineering.
  8. Tao Lei, Yu Xin, Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. 2014. Low-rank tensors for scor- ing dependency structures. In Proceedings of the 52nd Annual Meeting of the Association for Com- putational Linguistics, volume 1, pages 1381–1391.
  9. Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Doha, Qatar: Association for Computational Linguistics.