Sources of information Edit

  • Lexical resources
    • WordNet
    • Ontologies
  • Corpora

Theoretical considerations Edit

  • Identity: maximal score for identical concepts

Triangle inequality Edit

Triangle inequality: if A is close to B, B is close to C then A and C cannot be too far apart.[1][2] Triangle inequality is one of metric axioms. If it doesn't hold then a measure of distance is not a proper metric.

Tversky argued that triangle inequality is not valid.[1] but Rada et al. (1989)[3] showed that his examples were inconsistent.

Lin (1998)[4] also argued that triangle inequality was undesirable but he used an artificial and limited example.

Similarity measures Edit

Purely WordNet Edit

Purely corpus-based Edit

Hybrid Edit

Applications Edit

  • Semantic Role Labeling: Fuerstenau and Lapata (2012)[5]
  • Textual Entailment: Berant et al. (2012)[6]
  • Question Answering: Surdeanu et al. (2011)[7]

Evaluation Edit

TODO: better than Spearman's rho? MaxDiff (Louviere 1991; Orme 2009) --> avoid “scale bias”?

References Edit

  1. 1.0 1.1 Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Reviews 84 (4): 327–352.
  2. There is also "reverse triangle inequality" for similarity: the similarity of A to C is greater than the sum of the similarity of A to B and the similarity of B to C. But it is shown to not hold (Rada et al., 1989).
  3. Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 19(1), 17-30.
  4. Lin, Dekang. 1998. An information-theoretic definition of similarity (PDF). In Proceedings of the 15th International Conference on Machine Learning, pages 296–304, July
  5. Hagen Fuerstenau and Mirella Lapata. Semisupervised semantic role labeling via structural alignment. Computational Linguistics, 38(1): 135–171, 2012.
  6. Jonathan Berant, Ido Dagan, and Jacob Goldberger. Learning entailment relations by global graph structure optimization. Computational Linguis- tics, 38(1):73–111, 2012.
  7. Mihai Surdeanu, Massimiliano Ciaramita, and Hugo Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37(2):351–383, 2011.

External links Edit