LSTM Gives a Network Three Gates to Choose What to Forget; GRU Does It With Two
Vanilla RNNs forget almost everything within a few timesteps because of vanishing gradients, which defeats the entire point of using a recurrent network on long
Learning ML in public. Every entry is a day of studying: what clicked, what confused me, and what I'm pulling on next. Writing forces me to actually understand things, not just watch them go by. If it helps someone else figure something out, even better.
Daily notes on machine learning, deep learning, and AI.
24 entries