datasets   7408

« earlier    

🎶 Two thousand cans of craft beer on the wall 🎶.
The website CraftCans.com publishes a database of 2,000+ canned beers. For each beer, the database lists its name, style, brewery, size, alcohol level, and bitterness. The website doesn’t provide a direct download, but — as Jean-Nicholas Hould points out — you can basically just copy-paste the website’s data into your favorite spreadsheet program. Or, if you want something slightly cleaner, you can use this script. Related: This data-profiling tutorial by Hould, which uses the data. Also related: RateBeer.com’s API, but you’ll need to request a developer key to use it. Plus: This interactive graphic, which uses the RateBeer data to explore America’s microbrew epicenters. And also: Official brewery production stats from the U.S. Alcohol and Tobacco Tax and Trade Bureau. [h/t Daniel Brady]
DataSets  DataIsPlural 
yesterday by kanarinka
Fifty million doodles.
Google is clever: It created a drawing game, got 15 million people to play it, and then turned those doodles into into a public dataset of people drawing. You can download the raw data, or just browse the doodles online.
DataSets  DataIsPlural 
yesterday by kanarinka
Ransomware payments.
When the malware program known as “WannaCry” hit hundreds of thousands of computers earlier this month, it demanded that the computers’ owners pay $300 in Bitcoin — or lose all of their data. Keith Collins at Quartz has been using Blockchain’s API to track Bitcoin payments to the three digital wallets that the hackers designated to receive the ransoms. He’s published the data and is also using it to power a Twitter bot. Related: “Victims of the WannaCry ransomware attacks have stopped paying up” and “Inside the digital heist that terrorized the world—and only made $100k,” both by Collins. Previously: Historical Bitcoin prices (DIP 2017.03.08).
DataSets  DataIsPlural 
yesterday by kanarinka
Domestic radicalization.
The Profiles of Individual Radicalization in the United States (PIRUS) database “contains deidentified individual-level information on the backgrounds, attributes, and radicalization processes of nearly 1,500 violent and non-violent extremists who adhere to far right, far left, Islamist, or single issue ideologies in the United States” — including the Klu Klux Klan, the Taliban, and the Animal Liberation Front, among others. The dataset covers 1948 through 2013 and was released earlier this year by a team at the University of Maryland. [h/t Lorand Bodo]
DataSets  DataIsPlural 
yesterday by kanarinka
America’s card catalog.
Last week, the Library of Congress released its largest dataset ever: nearly 25 million records for books, maps, manuscripts and other items in its online catalog. For each item, the data includes standardized bibliographic information, such as the title, author, publication date, and genre. (The dataset represents the online catalog as it was in 2013; more recent data will cost you.) Related: A bit of background about the library’s MARC (Machine Readable Cataloging Records) data format.
DataSets  DataIsPlural 
yesterday by kanarinka
mdeff/fma: FMA: A Dataset For Music Analysis
"The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads."
datasets  music  audio 
2 days ago by arsyed
caesar0301/awesome-public-datasets: An awesome list of high-quality open datasets in public domains (on-going). By everyone, for everyone!
awesome-public-datasets - An awesome list of high-quality open datasets in public domains (on-going). By everyone, for everyone!
datasets  data 
5 days ago by MattieTK
*Such* an important dataset.
Grad students in Princeton’s computer science department have published a dataset they call Self-Annotated Reddit Corpus, or “SARC” for short. “The corpus has 1.3 million sarcastic statements — 10 times more than any previous dataset,” the authors write, and takes advantage of Reddit users’ habit of tagging sarcastic comments with an “/s”. Related: A dataset of sarcastic Amazon reviews. [h/t Carlos Somohano + Reddit user cavedave]
DataSets  DataIsPlural 
6 days ago by kanarinka
What do you do with a PhD in science?
The National Science Foundation’s Survey of Doctorate Recipients “is a longitudinal biennial survey conducted since 1973 that provides demographic and career history information about individuals with a research doctoral degree in a science, engineering, or health (SEH) field from a U.S. academic institution.” You can download aggregated data and detailed survey responses going back to 1993. The next release is scheduled for this month. Related: The NSF has published an interactive graphic of the data. [h/t Peter Aldhous]
DataSets  DataIsPlural 
6 days ago by kanarinka

« earlier    

related tags

_todo  _totweet  accessibility  ai  analytics  api  art  audio  aws  bigdata  business  catalog  cinema  citizen-science  computer-vision  computer_vision  computing  data  data_analytics  data_collection  data_science  dataanalysis  database  dataisplural  datascience  dataset  deep-learning  deep_learning  design  dialogue  diglib  discourse  drawing  driving  emoji  english  entities  europe  example  excel  faces  federal  fiction  film  finance  food  four  from  gender  github  go  googlesheets  graphics  guide  humanities  ieee  image  images  inclusion  inclusive-design  javascript  json  large  learning  list  machine  machine_learning  machinelearning  management  map  marketing  medical  municipal  music  nasa  neuralnets  news  nlp  of  pandas  parks  payroll  pedestrians  photo  policies  population  preservation  producthunt  programming  protests  python  qa  r  recipe  research  resources  science  semrush  sentiments  series  space  spreadsheet  spreadsheets  stages  statistics  summarization  text  time  tool  tools  training  tutorial  twitter  understanding  wikipedia  word-embeddings  word2vec 

Copy this bookmark:



description:


tags: