The other day I built a crawler that checks links on your website to see if there are any links that you can update from HTTP to HTTPS.
I came up with an implementation that abstracts the coordination using channels and I would like to share it in this article.
kgretzky/dcrawl: Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
dcrawl - Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
