March 27th, 2017
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
Designing a neural network is a thousand times harder than I imagined.
After AlphaGo, I tasked myself with creating a neural network that would use Q-Learning to play Reversi (aka Othello).
At that point, I had already utilized Q-Learning (the tabular version, not using a neural network) for some very simple and mostly proof-of-concept projects, so I understood how it worked. I read up only perceptrons, relu, the benefits/disadvantages of having more/fewer layers, etc.
Then I actually started on the project, thinking “I know about Q-Learning, I know about neural networks, now I just need to use Keras and I’ll have a network ready to learn in about twenty lines of python.”
Boy was that naive. Regardless of how much you understand the CONCEPTS of neural networks, actually putting together an effective one that matches the problem state perfectly is so, so difficult (especially if there are no examples to build off of). How many layers? Dropout or no, and if so, how much? Do you flatten this layer, do you use relu, do you need a SECOND neural network to approximate one part of the q-function and another to approximate a different part?
I spent MONTHS messing with the hyperparameters, and got nowhere because I’m doing this on a desktop pc without CUDA, so it takes days to train a new configuration only to find out it hardly “learned” anything.
At one point after days of training, my agent actually had a 90% LOSE rate against an opponent that played totally randomly. To this day I am baffled by this.
I went into the project thinking “I have this working with a table, the q-learning part is in place — just need to drop in a neural net in place of the table and I’m good to go!” It’s been almost a year and I still haven’t figured this thing out.
Another post of the “If it makes you feel any better” type: I’m a relatively established researcher in NLP, having worked with a variety of methods from theoretical to empirical, publishing in the top venues with decent frequency, and still I’m having a really hard time to get into the deep learning (DL) stuff.
I’m training a sequence-to-sequence model and have been tuning hyperparameters for the last 2-3 months. I’m making progress, but painfully slowly due to the large time it takes to train and test models (I have a local Titan X and some Tesla K80′s in a remote cluster, to which I can send models expecting a latency of 3-4 days of queue and a throughput of around 4 models running simultaneously on average – probably more than many people can get, but still feels slow for this purpose) and the fact that hyperparameter optimization seems to be little more that blind guessing with some very rough rules of thumb. The randomness also doesn’t help, as running the same model with different random seeds I have noticed that there is huge variance in accuracy. So sometimes I tweak a parameter and get improvements, but who knows if they are significant or just luck with the initialization. I would have to run every experiment with a bunch of seeds to be sure, but that would mean waiting even more for results and my research would be old before I got to state of the art accuracy.
Maybe I’m just not good at it and I’m a bit bitter, but my feeling is that this DL revolution is turning research in my area from a battle of brain power and ingenuity to a battle of GPU power and economic means (in fact my brain doesn’t work much in this research project, as it spends most of the time waiting for results for some GPU – fortunately I have a lot of other non-DL research to do in parallel so the brain doesn’t get bored). In the same line, I can’t help but notice that most of the top DL NLP papers come from a very select few institutions with huge resources (even though there are heroic exceptions). This doesn’t happen as much with non-DL papers.
Good thing that there is still plenty of non-DL research to do, and if DL takes over the whole empirical arena, I’m not bad at theoretical research…