scraper   2183

« earlier    

GitHub - vfreefly/kimurai: Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites
Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites
scraper  ruby 
4 days ago by DMisener
sindresorhus/get-urls
Get all urls in a string Get all URLs in a string


refrr:https://www.npmjs.com/package/url-regex
Get all urls in a string Get all URLs in a string


refrr:https://www.npmjs.com/package/url-regex
javascript  js  scraper  crawler  bookmarks  chrome  extension 
5 days ago by michaelfox
kevva/url-regex
Regular expression for matching URLs Regular expression for matching URLs


refrr:https://www.npmjs.com/package/url-regex
javascript  js  scraper  crawler  bookmarks  chrome  extension  url  regex 
5 days ago by michaelfox
Introduction — Fathom 2.3 documentation
Fathom is a JavaScript framework for extracting meaning from web pages, identifying parts like Previous/Next buttons, address forms, and the main textual content—or classifying a page as a whole. Essentially, it scores DOM nodes and extracts them based on conditions you specify. A Prolog-inspired system of types and annotations expresses dependencies between scoring steps and keeps state under control. It also provides the freedom to extend existing sets of scoring rules without editing them directly, so multiple third-party refinements can be mixed together.


refrr:https://www.npmjs.com/package/fathom-web
Fathom is a JavaScript framework for extracting meaning from web pages, identifying parts like Previous/Next buttons, address forms, and the main textual content—or classifying a page as a whole. Essentially, it scores DOM nodes and extracts them based on conditions you specify. A Prolog-inspired system of types and annotations expresses dependencies between scoring steps and keeps state under control. It also provides the freedom to extend existing sets of scoring rules without editing them directly, so multiple third-party refinements can be mixed together.


refrr:https://www.npmjs.com/package/fathom-web
javascript  js  scraper  crawler  bookmarks  chrome  extension 
5 days ago by michaelfox
An Open-Source Search Engine Takes Shape
Commercial search-engine providers soon could face a serious competitor if the vision of some open-source developers materializes. A team of open-source programmers recently launched a project called Nutch to provide search-engine software for free. Doug Cutting, president of the Nutch Organization, told TechNewsWorld that Nutch eventually will provide a transparent alternative to commercial Web search engines.


refrr:https://en.wikipedia.org/
Commercial search-engine providers soon could face a serious competitor if the vision of some open-source developers materializes. A team of open-source programmers recently launched a project called Nutch to provide search-engine software for free. Doug Cutting, president of the Nutch Organization, told TechNewsWorld that Nutch eventually will provide a transparent alternative to commercial Web search engines.


refrr:https://en.wikipedia.org/
scraper  crawler 
5 days ago by michaelfox
A case study in writing an open source search engine
Read 100 Computer Science Journals for Free The complete contents of more than 100 journals and 35 LNCS volumes are waiting for you inside the Computer Science Reading Room


refrr:https://en.wikipedia.org/
Read 100 Computer Science Journals for Free The complete contents of more than 100 journals and 35 LNCS volumes are waiting for you inside the Computer Science Reading Room


refrr:https://en.wikipedia.org/
scraper  crawler 
5 days ago by michaelfox
GitHub - grubbins2/IndeedJobScraperAndTFIDF: Scrape job descriptions from indeed.com, build term frequency–inverse document frequency vectors for all words in corpus and compare salience of different terms across descriptions.
Scrape job descriptions from indeed.com, build term frequency–inverse document frequency vectors for all words in corpus and compare salience of different terms across descriptions.
indeed  job  scraper 
7 days ago by stevecooks
opensemanticsearch/open-semantic-search-apps
Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites) https://opensemanticsearch.org/ Python/Django based webapps and user interfaces for search and meta data management


refrr:https://github.com/search?o=desc&q=solr+ui&s=stars&type=Repositories
Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites) https://opensemanticsearch.org/ Python/Django based webapps and user interfaces for search and meta data management


refrr:https://github.com/search?o=desc&q=solr+ui&s=stars&type=Repositories
search  db  solr  scraper  crawler  semantic  wiki  data  knowledge 
9 days ago by michaelfox
Web Scraping Using Python (article) - DataCamp
You'll learn how to extract data from the web, manipulate and clean data using Python's Pandas library, and data visualize using Python's Matplotlib library.
scrape  scraping  scraper  python  deep-learning  beautiful-soup  datacamp 
12 days ago by nharbour
Hunter
Hunter lets you find email addresses in seconds and connect with the people that matter for your business.
email  marketing  scraper  tools 
13 days ago by devin

« earlier    

related tags

amazon  analysis  analytics  apache  api  audio  automate  automation  bbq  beautiful-soup  bookmarks  bot  browser  chrome  circleci  cli  cloud  code  connections  contacts  content  conversion  convert  converter  crawl  crawler  cron  csv  curl  data  datacamp  datajournalism  dates  db  ddj  deep-learning  diff  diffbot  documentation  dom  download  driven  elasticsearch  email  emails  export  extension  extract  extraction  extractor  feed  feeds  find  freeware  generator  github  go  golang  graphql  hacking  headless  headlesschrome  howtheydidit  html  image  indeed  infosec  instagram  intelligence  interface  javascript  job  journalism  js  keyword  knowledge  learning  library  linkbuilding  linkedin  management  marketing  media_pc  metadata  moz  netsec  networks  nlp  node  ocr  osint  overview  overviewdocs  pandas  parser  pastebin  php  pinterest  platform  products  programming  proxy  puppeteer  py  python  quotes  readability  reading  recon  regex  requests  rss  ruby  saas  scanner  scrap  scrape  scraping  screenscraper  screenshot  scripting  search  searchengine  security  semantic  seo  service  sfi  shell  social  socialmedia  software  solr  spider  sportsdata  task  tech  testing  text  textanalysis  textsources  thumbnail  tool  toolkit  tools  twisted  twitter  type:tool  url  video  vixfutures  vx  web-scraper  web  webdevelopment  webservices  website  wiki  windows  woodworking  xpath  yhat  youtube  zend   

Copy this bookmark:



description:


tags: