When I started playing with word2vec four years ago I needed (and luckily had) tons of supercomputer time. But because of advances in our understanding of word2vec, computing word vectors now takes fifteen minutes on a single run-of-the-mill computer with standard numerical libraries1. Word vectors are awesome but you don’t need a neural network – and definitely don’t need deep learning – to find them2. So if you’re using word vectors and aren’t gunning for state of the art or a paper publication then stop using word2vec.

When we’re finished you’ll measure word similarities:

facebook ~ twitter, google, ...

… and the classic word vector operations: zuckerberg - facebook + microsoft ~ nadella

…but you’ll do it mostly by counting words and dividing, no gradients harmed in the making!
Idioms in sentiment analysis
some nice small datasets for idioms
