Drop out and Biology, seriously they are related.... (TensorFlow + Python + Machine Learning + Dropout + Adam Optimizer)
So after looking through the code, if you are like me, you are prolly wondering what the heck drop out or adam optimizer is.
line 74 h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
Well dropout as it happens to be, is pretty cool, based on the theory of evolution and the role of sex in it, more specifically! Who know when you would study machine learning, you would be studying baby making theory too! Well, Dr. Geoffrey Hinton figured it out with his team. He's one smart dude.
So when you were born, you received half of your genes from each parent, added a bit of random mutation and then combined them to produce, you, the baby. Simple right?
Well its actually a super strange phenomena, or atleast back in the day it was. It would be more intuitive to asexually reproduce. Create an offspring with a slightly mutated (more evolved) copy of your genes thereby optimizing for fitness and survival. I mean, you start with a good set of genes, which you know for certain work together extremely well (they made you, you, and you are the best, right?) and pass them your kid.
It you think of the later, for two people with super large and complicated genes to come together and for half of their genes to work together in unison to make a someone, you would think that someone might be a bit messed up and mutated and overall the organism would go to kapoot.
(Same, Fit, Superperson) + (a bit of random) =
(Another Unique Superperson)
1/2 Large, complicated, person + 1/2 Another large, complicated different person =
Super complicated person, most likely Frankenstein.
Well nope. This doesn't happen.
Sexual reproduction is thought to break up these genes that worked so well together aka co-adapted. And yes. This does happen, butttttttttttttttttttttttt it happens to be the most advanced way organisms evolved.
Individual fitness/survivability etc is not what is important, but rather the mix-ability of genes. The ability for a set of genes to be able to work well with another random set of genes makes them robust. Kinda like a good employee, gotta be able to play nice with others..random others. The gene can't rely on its partner genes to be present, so it learns to do something useful by itself or with a small small set of other genes. This spreads the useful genes around, and actually reduces the complex co-adaptations that would reduce the chance of a new gene.
It's a crazy process that our bodies goes through to become us.
So in kind every hidden unit in a neural network trained with dropout should learn to work with a randomly sample of other hidden units. Making it more robust and drive toward awesomeness aka useful features while doing useful things with itself. This would be awesome, but really a poor solution, in the same way replica codes are a poor way to deal with a noisy channel. But if you figure it out, let me know.
An easier way to grasp this concept is closely related, successful conspiracies. To exploit the benefits of one based on the evolution of sexual reproduction and keeping this simple.
So Which would succeed?
1 big conspiracy with 50 players all doing their specific parts
10 conspiracies each with 5 people doing their specific parts
50 players, 1 theory. Okay, if static. But dynamic and new? Prolly 5 players, 1 theory multiplied by 10.
Ok ok. BUT HOW DOES IT RELATE TO DROPOUT?!!!?
What dropout does is pretty much this while solving a couple problem in machine learning, overfitting and providing a way to approximate combining exponentially many different neural network architectures efficiently.
How? We "drop out" the unit activation of a given layer by setting them to zero.
We sample a "thinned" out network from the standard.
At training, because the thinning is random have a probability, p, we effectively are sampling 2^n thinned networks with extensive weight sharing. Then at test time, you use the neural net with the scaled down training weights.
Therefore, dropout prevents co-adaptation of units and more complexity for the overall system aka removing the amount of players in the conspiracy theory.
It also can be used as a method of assembling many networks sharing the same weights. For each training example, a different set of units to drop is randomly chosen forcing each neural net. to play nice with others and be useful on its own.
Cool right? All that in one crazy line 74....