Entry 22 of 24
ML Fundamentals Series
·2 min read

RNNs Don't Just Process Data: They Process Data in Order, and Order Is the Point

Feedforward networks and CNNs both assume inputs are independent of each other: shuffle the rows of a dataset and a feedforward network doesn't notice. That assumption breaks completely for text, audio, or time series, where the order of the data carries the signal. The sequence learning problem is exactly this: processing an input, or generating an output, that is ordered and can be of variable length.

Sequence tasks split into four shapes. Many-to-one: multiple inputs produce a single output, like classifying the sentiment of a whole sentence. One-to-many: a single input produces a sequence of outputs, like generating a caption from one image. Many-to-many (synchronous): an input sequence maps directly to an output sequence of the same length, step for step. Many-to-many (asynchronous): an input sequence of one length transforms into an output sequence of a different length, the shape of machine translation, where a five-word sentence in one language might take seven words in another.

Recurrent Neural Networks are built specifically to handle this. They process sequential data by retaining information from previous steps, which makes them effective for exactly the tasks where context and order matter and useless where they don't. The core piece is the recurrent neuron, which holds a hidden state: a running summary of everything the network has seen in the sequence so far, updated at every timestep and capturing dependencies across time rather than treating each input in isolation.

To actually train an RNN, you conceptually unfold it: expand the recurrent loop into a chain, one copy of the network per timestep, each one connected to the next by that shared hidden state. This unfolded view is what makes backpropagation possible for a network with a loop in it, and the resulting topology comes in the same four shapes as the sequence problems themselves: one-to-one, one-to-many, many-to-one, many-to-many.

The defining structural difference from a feedforward network is a feedback loop: information gets passed back to earlier computation instead of flowing strictly forward, which is precisely what gives the network memory of past states. Several variants build on this base architecture (vanilla, bidirectional, and the gated versions, LSTM and GRU), each addressing a different weakness in how well that memory actually holds up over long sequences. That weakness, it turns out, is the real story.