datasets   8600

« earlier    

Datasets « Deep Learning
These datasets can be used for benchmarking deep learning algorithms
database  data  ai  datasets  machine_learning 
3 days ago by rrraul
1 Billion Word Language Model Benchmark
The purpose of the project is to make available a standard training and test setup for language modeling experiments.

The training/held-out data was produced from the WMT 2011 News Crawl data using a combination of Bash shell and Perl scripts distributed here.

This also means that your results on this data set are reproducible by the research community at large.

Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten feld-out data sets, for each of the following baseline models:

unpruned Katz (1.1B n-grams),
pruned Katz (~15M n-grams),
unpruned Interpolated Kneser-Ney (1.1B n-grams),
pruned Interpolated Kneser-Ney (~15M n-grams)
Happy benchmarking!
data  datasets 
3 days ago by ttpro1995
Amherst-Statistics/Cars-Scraping-Webinar: scraping and multivariate analysis CAUSE activity webinar
A full classroom example for scraping data from I can update my Honda example!
datasets  ba  regression 
3 days ago by sburer
Our first public datasets: Host-level WebGraph and PageRank! - Common Search
"Common Search is building an open source search engine with transparent rankings, and analyzing the hyperlinks on the web is a major part of this effort. To make that possible, we are going to publish datasets that will let contributors, students and researchers reproduce the rankings, submit improvements and hopefully use the underlying data for their own work."
datasets  web  search  pagerank  networks 
4 days ago by arsyed
ML-friendly Public Datasets | Kaggle
This Kaggle website has some clean data sets, e.g., for regression.
datasets  ba  analytics 
5 days ago by sburer
WebVectors: Models
part-of-speech tagged models
word2vec  datasets  nlp 
5 days ago by arnicas

« earlier    

related tags

action  ai  amazon  analysis  analytics  annotation  api  aws  ba  bigdata  bioinformatics  blm  blogging  blogs  businessmodels  calendar  candy  career  census  chords  clickbait  clusters  collections  corpus  cryptocurrencies  csv  data  data_science  dataanalysis  database  databases  datadecisions  datagovernance  dataquality  datascience  dataset  dataviz  dcatap  deep-learning  deeplearning  dev  dialog  discovery  dpir  dryad  edges2cats  enigma  ethics  facebook  figshare  finance  food  france  friends  generators  geo  geospatial  gis  graphics  guns  hacker-news-comments  haven  hcds  health  homework  image  images  interannotator  ip  isoforms  journalism  json-apis  json  jupyter  law  leak  leaks  libraries  linguistics  lungcancer  machine_learning  machinelearning  map  maps  media  medicaldiagnosis  mining  mobile  models  mooc  narrative  networks  nlp  oaweek2017  offshore  open  opendata  openinorderto  openstreetmap  pagerank  pandas  parser  personalbranding  pictures  poetics  policedata  policy  politics  pre-trained  pricing  programming  psc693  public  publicsector  publishing  python  qanda  question-answering  reasoning  recipes  regression  repositories  research  resources  rlang  rstats  russian  satellite  science  search  secondary  selfdrivingcar  sensor  sensors  shopping  social  spark  sqlite  standards  stata  stats  tax  teaching  television  tensorflow  text  timezones  tutorial  tv  twitter  udacity  un  video  vimeo  visual  visualization  web  wikidata  word2vec  writing  xplain  zenodo 

Copy this bookmark: