Training Edit

Problem: gradient vanishing or exploding.

Long Short-Term Memory Edit

Structurally constrained network Edit

Mikolov et al. (2015)[1] combine feed-forward NN with a cache model.

Rectified units with initialization trick Edit

Le et al. (2015)[2] uses rectified units with identity matrix or its scaled-down versions as recurrent matrices.

Limitations Edit

  • cannot model reduplication: Prickett (2017)[3] though it's possible with special treatment (Gu et al. 2016[4] and Alhama, 2017[5])

Can RNN model hierarchy? Edit

From Gulordava et al. (2018):

"Linzen et al. (2016) directly evaluated the ex- tent to which RNNs can approximate hierarchi- cal structure in corpus-extracted natural language data [...]

Bernardy and Lappin (2017) observed that RNNs are better at long-distance agreement when they construct rich lexical representations of words [...]

Early work showed that RNNs can, to a certain degree, handle data generated by context-free and even context-sensitive grammars (e.g., Elman, 1991, 1993; Rohde and Plaut, 1997; Christiansen and Chater, 1999; Gers and Schmidhuber, 2001; Cartling, 2008). [...]

We tentatively conclude that LM-trained RNNs can construct abstract grammatical representations of their input. This, in turn, suggests that the input itself contains enough information to trigger some form of syntactic learning in a system, such as an RNN, that does not contain an explicit prior bias in favour of syntactic structures."

Glossary Edit

  • highway connections (Srivastava et al., 2015)[6]
  • SRU = Simple Recurrent Unit (Lei et al. 2017)[7]
  • QRNN = quasi-recurrent neural network (Bradbury et al. 2016)[8]

References Edit

  1. Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv preprint arXiv:1412.7753.
  2. Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton, 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. URL
  3. Prickett, Brandon. "Vanilla Sequence-to-Sequence Neural Nets cannot Model Reduplication." (2017).
  4. Gu, Jiatao, et al. "Incorporating copying mechanism in sequence-to-sequence learning." arXiv preprint arXiv:1603.06393 (2016).
  5. Garrido Alhama, R. "Computational modelling of Artificial Language Learning." (2017).
  6. Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385, 2015.
  7. Lei, T., Zhang, Y., & Artzi, Y. (2017). Training RNNs as Fast as CNNs. Retrieved from
  8. Bradbury, James, Stephen Merity, Caiming Xiong, and Richard Socher. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).