Useful Links:
* Understanding LSTMs
* The Unreasonable Effectiveness of Recurrent Neural Networks
* Tensor Flow

Some Relevant Papers:
* Long Short Term Memory (Original Paper)
* Generating Sequences With Recurrent Neural Networks
* Why LSTM?
+ In theory, RNNs are absolutely capable of handling such long-term dependencies. A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them. Thankfully, LSTMs don’t have this problem!
Learning Long Term Dependencies With Gradient Descent is Difficult