Web scraping with Node.JS and Cheerio | ButterCMS
Cheerio is a Node.js library that helps developers interpret and analyze web pages using a jQuery-like syntax. In this post, I will explain how to use Cheerio to scrape the web.
nodejs  cheerio  browserbased  javascript  scraping  tutorial 
5 days ago by cyberchucktx
How Do I Download an Entire Website for Offline Reading?
It’s easy enough to save individual web pages for offline reading, but what if you want to download an entire website? Well, it’s easier than you think! Here are four nifty tools you can use to download any website for offline reading, zero effort required.
Wget is a command-line utility that can retrieve all kinds of files over the HTTP and FTP protocols. Since websites are served through HTTP and most web media files are accessible through HTTP or FTP, this makes Wget an excellent tool for ripping websites.

While Wget is typically used to download single files, it can be used to recursively download all pages and files that are found through an initial page:
scraping  tools  archives  cache  commandline  wget 
5 days ago by paulbradshaw
Web Scraping 101 with Python & Beautiful Soup – codeburst
Webscraping is a method of data mining from web sites that uses software to extract all the information available from the targeted site by simulating human behavior. Each year, more and more…
scraping  bautifulsoup  python 
6 days ago by alabra
Scalable Web Scraping Software & Services. Free trial +1 (801) 995-4550
scraping  comercial 
8 days ago by kmara
khpeek/funda-scraper: Scraper of the Dutch real estate website, implemented in Python with Scrapy
Scraper of the Dutch real estate website, implemented in Python with Scrapy - khpeek/funda-scraper
dataset  scraping  python 
10 days ago by hay
Web scraping case fails under Dastar
The court also applied the 2003 Supreme Court case of Dastar Corp. v. Twentieth Century Fox Film Corp., 539 U.S. 23 (2003) to find that plaintiff’s Lanham Act claim failed. In Dastar, the Supreme Court concluded that “false designation of origin” as it is used in the Lanham Act attaches to the producer of tangible goods that are offered for sale, and not to the author of any idea, concept, or communication embodied in those goods. In this case, the defendant created the final product (website listings), albeit using the plaintiff’s content (just like in Dastar). Because plaintiff was not the source of the product (the duplicated listings), it did not have a claim under the Lanham Act.
scraping  law  t 
12 days ago by paulbradshaw
To scrape or not to scrape: technical and ethical challenges of collecting data off the web - Storybench
Best Practices for Scraping: From Ethics to Techniques. " just because you can scrape it does that mean you should? As a data journalist, when is web scraping the right choice? And, more importantly, when is it right for you?" CONTAINS GRAPHC - flow chart showing when you should or should not scrape.
scraping  python  ethics  journalism 
13 days ago by macloo

