LSTM is a type of Recurrent Neural Network. So in order to understand what a LSTM is, we must understand the basic structure of a Recurrent Neural Network(RNN).

What is RNN?

If you are familiar with Neural Networks, then you can imagine RNN as a recurrsive Netural Networks. Recurrent Neural Network takes in the output of the past classifiers and a new input to make a prediction for the new input

![alt text][RNNgraph]

Even though Recurrent is often referred as a Recursive Neural Network, it is more like a recursive hidden cell. The RNN cell applies a matrix, which is referred as the weights of the Neural Network, to every single input that it reads in. It performs a matrix multiplication to the input to produce the matrix of predictions.

![alt text][RNNsimplified]

In other words, you can consider the Recurrent Neural Network as a cell that performs some operation on each instance and incorporates previous outputs to predict the output.


Getting Technical

We can also represent RNN as a Feedforward Neural Network. The main difference between a Normal Feedforward Neural Network and a RNN is that some part of the inputs will be passed to the middle of the Neural Network instead of the start. However, there will only be one type of cell that processes the inputs and the parameters will be shared across the Neural Network.


I will write another blog that will explain the details of RNN


What is LSTM?

LSTM is very similar to the vanilla RNN, the main difference between them is the hidden layers. Both Vanilla RNN and LSTM have a cell for modifying the input and extracting required information. Vanilla RNN cell is the most basic form of RNN, as described above. LSTM cell is consist of three parts input, a memory cell, and output. The additional properties that LSTM memory cell holds is what distinguish it from vanilla RNNs’.

What are Memory cells?

A memory cell has write, read, and forget gate. This is often referred as the three gates for LSTM. They are called gates because their functionality is the control the flow of the incomers(or input) and filter those that do not qualify.

![alt text][LSTM Memory cell]

The way each gates control the flow of the input is by using a Sigmoid function. The output of the sigmoid function(also often referred as a sigmoid layer) is between 0 to 1. ![alt text][sigmoidgraph]

This sigmoid function allows the model to filter unwanted data through a mathematical way, which is much more efficient.


Getting Technical

Vanishing gradient problem and Exponential gradient problem is a common issue for Feedforward Neural Networks. We can solve the exponential gradient problem by putting a cap on the gradient. However, the Vanishing Gradient Problem is slightly harder to solve. The three gates of LSTM help solves the vanishing gradient problem.

Since multiplicative factor for each gate is a sigmoid function and sigmoid function is continuous everywhere, this means the function is differentiable. One of the benefits of having a differential multiplicaive factor means that we will be able to perform Back Propergation on the algorithm. Back Propagation will provide us a computationally inexpensive way to calculate the derivative of loss function, which will be necessary for tweaking the weights matrix.

LSTM also applies a Tanh function before senting its prediction to the output gate. The tanh function keeps output between -1 to 1 and transforms the output to reduce the effect of outlying values.


What are the applications for LSTM

LSTM is especially successful in many NLP tasks and analying time series data, or sequencial data. The property of LSTM allows it to capture patterns with long-term dependencies better than vanilla RNNs. It is usually used for sentence generation, word sentiment understanding, or sentence sentiment analysis.

[RNNgraph] [RNNsimplified] [sigmoidgraph] [LSTM Memory cell]