TODO: a very interesting (and might have lasting impact), best long paper in NAACL 2016: Andreas et al. (2016)[1]

With the wave of deep learning, researchers paid more and more attention to distributed representation. Although successful in many tasks, it has always been know that this approach has serious drawbacks that are strength of logic such as compositionality. Therefore the interest in combining them has also raised significantly.

We may frame this line of research in a larger topic combining symbolic and sub-symbolic approaches which was fashionable during 1980s-1990s (e.g. Hilton, 1986[2]; Ultsch, 1994[3], 1995[4]). However the aim of recent research has contracted and terminology has been much distilled.

Different models have been proposed to solve different specific tasks such as knowledge base completion (Socher et al. 2013[5]), small-scale reasoning (Rocktäschel 2014[6]).


Approaches Edit

Direct mapping Edit

Herbelot & Vecchi (2105)[7]: "We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimen- sions are predicates and weights are generalised quantifiers."

Proposition completion Edit

Hilton 1986 - tranined weights

From Hilton (1986): White rectangles stand for positive weights, black for negative weights, and the area of a rectangle encodes the magnitude of its weight. The two isomorphic family trees are represented in two rows, aligned to equivalent members.

Hilton (1986)[2] had his neural network learn two family trees and got interesting representations of family members as a by product. The trees were turned in to 104 propositions (person1, relation, person2) of which 100 were used for training. For each proposition, the neural network was given fillers of two first roles and asked to predict that of the third.

As of 2014, the paper was cited more than 500 times. The approach seems restricted regarding application and scalability.

Paccanaro & Hilton (2000)[8] proposed linear relational embedding which is somewhat simpler. Their later paper[9] extended the model to handle special cases where there is no answer or there are multiple answers.

Relation predicting Edit

Bowman (2014)[10] employed a neural network with one hidden layer and one softmax layer to predict the relation (one of entailment, reverse entailment, equivalent, alternation, negation, cover, and independent) between two phrases.

Relation classification Edit

Main article: Models of relation classification

TODO: Socher et al. 2013[5]

Socher's neural network for reasoning

From Socher et al. 2013

Probabilistic inference informed by distributional similarity Edit

Beltagy et al. (2013)[11] performed textual entailment recognization and semantic textual similarity by casting them as probabilistic entailment in Markov logic. For example, the similarity between two sentences:

S1: A man is slicing a cucumber.
S2: A man is slicing a zucchini.

is judged as judged as the average degree of mutual entailment ($ S_1 \models S_2 $ and $ S_2 \models S_1 $). Strictly speaking, S1 does not entail S2 and vice versa. The authors fixed this by adding the rule cucumber(x)→zucchini(x) | wt(cuc., zuc.) which literally means "if something is a cucumber, it is also a zucchini" (with inference cost=wt(...)). wt(.) is a function of the cosine similarity between two words.

TODO: Further development: Beltagy et al. (2014)[12].

References Edit

  1. Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (n.d.). Learning to Compose Neural Networks for Question Answering.
  2. 2.0 2.1 Hinton, G. E. (1986, August). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).
  3. Ultsch, A. (1994). The integration of neural networks with symbolic knowledge processing. In New Approaches in Classification and Data Analysis (pp. 445-454). Springer Berlin Heidelberg.
  4. Ultsch, A., & Korus, D. (1995, November). Integration of neural networks with knowledge-based systems. In Neural Networks, 1995. Proceedings., IEEE International Conference on (Vol. 4, pp. 1828-1833). IEEE.
  5. 5.0 5.1 Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems (pp. 926-934).
  6. Rocktäschel, T., Bosnjak, M., Singh, S., & Riedel, S. Low-Dimensional Embeddings of Logic. ACL 2014 Workshop on Semantic Parsing.
  7. Aure ́lie Herbelot and Eva Maria Vecchi. 2015. Building a shared world: Mapping distributional to model-theoretic semantic spaces PDF
  8. Paccanaro, A.,  and Hinton, G.E. Learning Distributed Representations by Mapping Concepts and Relations into a Linear Space. ICML-2000, Proceedings of the Seventeenth International Conference on Machine Learning, Langley P. (Ed.), 711-718, Stanford University, Morgan Kaufmann Publishers, San Francisco.
  9. Paccanaro, A., & Hinton, G. E. (2000). Extracting distributed representations of concepts and relations from positive and negative propositions. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on (Vol. 2, pp. 259-264). IEEE.
  10. Samuel R Bowman. 2014. Can recursive neural tensor networks learn logical reasoning? In ICLR’14.
  11. Beltagy, I., Chau, C., Boleda, G., Garrette, D., & Erk, K. (2013). Montague Meets Markov : Deep Semantics with Probabilistic Logical Form. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*Sem-2013), 11–21.
  12. Beltagy, I., Roller, S., Boleda, G., Erk, K., & Mooney, R. J. (2014). UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic. SemEval 2014, 796.