Mapping controversies can be highly facilitated by studying it from the prism of the web. Analysing the websites of the actors of a controversy and establishing a map from the links between them can be a source of great knowledge, although it can be quite complex to realize, especially for social scientists. Built as a free software available on GitHub, Hyphe was designed to propose researchers and students a web corpus curation tool featuring a research-driven web crawler. It provides users with a method to build web corpora with both granularity, flexibility and simple curation principles. Rather than websites, Hyphe manipulates WebEntities, which can be defined as a single page, as well as a subdomain, a combination of websites, and so on. Webpages relying within these WebEntities can then be crawled, in order to collecting all out-bounding links and text within the webpages of the entity. Most cited discovered WebEntities are then prospectable to enlarge your corpus before visualizing it as a network and export it for refinement within Gephi and publication with manylines.
There are many more scraper projects, mainly written in Python and made for scraping a specific web site or for indexing. But clearly, none is as easy to use as Scrapy and most have smaller communities.
