RNN — Teletype

Wait! What is actually RNN? 🤨

Well, RNN stands for "Recurrent Neural Network". You might wonder why we have some other NN (neural network) other than the traditional one. Trust me, I was in the same situation, my friend)

When we look at the traditional NN's, we can surely say that all the inputs and outputs, in there, are independent of each other. That's true, but what if we need to remember the previous steps?

Here comes the help of RNN. It's a type of NN but the output from the previous step are fed as input to the current step. It's like predicting next words of some sentence by looking at the previoud words. Issues with remembering are solved via the "Hidden Layer" in RNN, and RNN's really delicious feature is it's "Hidden State" (remembers the info about the sequence).

Credits: GeeksforGeeks

Let's break the concept a little bit 😏

Let's say that some calculation has been done. The RNN has it's own memory to remember that. While performing exactly the same task on the hidden layers and inputs to get the result (output), it just uses the same parameters for each input.

Diving deeper ⚠️

Credits: GeeksforGeeks

Okay! Here, we've somewhat a depper network. Three circles represent three hidden layers. These layers have their own w (weights) and b (biases). Since it is so, they don't remember the previous outputs, hence these layer are independent. Right?

It was just some NN, but how does the RNN deal with this? ~Well, by giving the same b's and w's to all layers, the RNN converts the independent activations into dependent ones. Here, it'll be reducing the complexity of increasing parameters and memorizing each previous result and giving it to the next hidden layer as the input.

Since, now, we've the same b's and w's, we can reduce the above network (reduce the number of hidden layers) with the help of RNN. The result will be a somewhat simpler network with a single recurrent layer 👇

Credits: GeeksforGeeks

Sweet formulas ✖️➕➖➗🟰

The formula for calculating the current state:

The formula for applying Activation function (tanh):

The formula for calculating output:

Are we done? ~No, let's see how we can train through RNN 🤔

I. The input's single-time step is given to the network.

II. Calculate that's current state using a set of current input and the previous state.

III. Here, ht becomes ht-1 for the next time step.

IV. It's possible to go as many time steps according to the problem and join the info from all the previous states.

V. It's time to calculate the output using the final current state when all the time steps are completed.

VI. The output is now compared to the actual output. In this case, the target output and the error are generated.

VII. The error is now back-propagated to the network to update the weights and hence the RNN is trained :)

Where do we apply RNN? 😶‍🌫️

Time series forecasting
Machine Translation
Speech recognition
Language modelling and generating text, and so on...

Pros of RNN 👍

Through time, an RNN can remember each and every piece of info. In cases of time series prediction, it can be really useful due to the feature to remember previous inputs as well. This is called "Long Short Term Memory".

To extend the effective pixel neighborhood, the RNNs are even used with convolutional layers.

Cons of RNN 👎

Training an RNN is a very difficult task.

If using tanh or relu as an activation function, it cannot process very long sequences.

Gradient vanishing and exploding problems.

Wanna more? ~Here are some awesome resources 💡

How Recurrent Neural Networks work

An Introduction to Recurrent Neural Networks and the Math That Powers Them

What is Recurrent Neural Network (RNN)? Deep Learning Tutorial 33 (Tensorflow, Keras & Python)

Credits

Most of the concepts and theories were inspired and taken from https://www.geeksforgeeks.org

Huge thanks for providing such a wonderful explanation for such a difficult topic)