build   25970

« earlier    

GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.
toread  build  learning  tutorial  awesome  programming 
yesterday by LaptopHeaven
10 Ways You Can Build Arm Muscles Without Weights
0 Ways You Can Build Arm Muscles Without Weights
10  Ways  You  Can  Build  Arm  Muscles  Without  Weights 
2 days ago by kilroy2
Datasette Facets
Datasette Facets
Datasette 0.22 is out with the most significant new feature I’ve added since the initial release: faceted browse.
Datasette lets you deploy an instant web UI and JSON API for any SQLite database. csvs-to-sqlite makes it easy to create a SQLite database out of any collection of CSV files. Datasette Publish is a web app that can run these combined tools against CSV files you upload from your browser. And now the new Datasette Facets feature lets you explore any CSV file using faceted navigation with a couple of clicks.
Exploring characters from Marvel comics
Let’s use facets to explore every character in the Marvel Universe.
FiveThirtyEight have published a CSV file of 16,376 characters from Marvel comics, scraped from Wikia as part of the research for their 2014 story Comic Books Are Still Made By Men, For Men And About Men.
Here’s that CSV file loaded into the latest version of Datasette:
We start by applying the identity status, alignment and gender facets. Then we filter down to just the bad characters with a public identity, and apply the eye colour facet. Now we can filter to just the 20 bad characters with a public identity and yellow eyes.
At each stage along the way we could see numerical summaries of the other facets. That’s a pretty sophisticated piece of analysis we’ve been able to run with just a few clicks (and it works responsively on mobile as well).
I’ve published a full copy of everything else in the FiveThirtyEight data repository, which means you can find plenty more examples of facets in action at—one example: Actions under the Antiquities Act, faceted by states, pres_or_congress, action and current_agency.
Analyzing GSA IT Standards with Datasette Publish
The US government’s General Services Administration have a GitHub account, and they use it to publish a repository of assorted data as CSVs.
Let’s take one of those CSVS and analyze it with Datasette Facets, using the Datasette Publish web app to upload and process the CSV.
We’ll start with the it-standards.csv file, downloaded from their repo. We’ll upload it to Datasette Publish and add some associated metadata:
Here’s the result, with the Status and Deployment Type facets applied. And here’s a query showing just SaaS tools with status Approved—Preferred.
European Power Stations
The Open Power System Data project publishes data about electricity systems. They publish data in a number of formats, including SQLite databases. Let’s take their conventional_power_plants.sqlite file and explore it with Datasette. With Datasette installed, run the following commands in your terminal:
datasette conventional_power_plants.sqlite
This will start Datasette running at ready for you to explore the data.
Next we can publish the SQLite database directly to the internet using the datasette publish command-line tool:
$ datasette publish now conventional_power_plants.sqlite \
--source="Open Power System Data. 2018. Data Package Conventional power plants. Version 2018-02-27" \
--source_url="" \
--title="Conventional power plants" \
> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpufvxrzgp/datasette under simonw
> [in clipboard] (sfo1) [11s]
> Synced 3 files (1.28MB) [11s]
> Building…
> ▲ docker build
Sending build context to Docker daemon 1.343 MBkB
> Step 1/7 : FROM python:3
> 3: Pulling from library/python
> 3d77ce4481b1: Already exists
> 534514c83d69: Already exists
> Successfully built da7ac223e8aa
> Successfully tagged
> ▲ Storing image
> Build completed
> Verifying instantiation in sfo1
> [0] Serve! files=('conventional_power_plants.sqlite',) on port 8001
> [0] [2018-05-20 22:51:51 +0000] [1] [INFO] Goin' Fast @
> [0] [2018-05-20 22:51:51 +0000] [1] [INFO] Starting worker [1]
Finally, let’s give it a nicer URL using now alias:
now alias
The result can now be seen at
Here’s every conventional power plant in Europe faceted by country, energy source and technology.
Implementation notes
I love faceted search engines. One of my first approaches to understanding any new large dataset has long been to throw it into a faceted search engine and see what comes out. I’m the past I’ve built them using Solr, Elasticsearch, PostgreSQL and even Whoosh. I guess it was inevitable that I’d try to build one with SQLite.
You can follow the development of Datasette Facets in the now-closed issue #255 on GitHub.
Facets are requested by appending one or more ?_facet=colname parameters to the URL. This causes Datasette to run the following SQL query for each of those specified columns:
select colname as value, count(*) as count
from tablename where (current where clauses)
group by colname order by count desc limit 31
For large tables, this could get expensive. Datasette supports time limits for SQLite queries, and facets are given up to 200ms (by default, this limit can be customized) to finish executing. If the query doesn’t complete in the given time the user sees a warning that the facet could not be displayed.
We ask for 31 values in the limit clause even though we only display 30. This lets us detect if there are more values available and show a ... indicator to let the user know that the facets were truncated.
Datasette also suggests facets that you might want to apply. This is implemented using another query, this time run against every column that is not yet being used as a facet. If a table has 20 columns this means 20 queries, so they run with an even tighter 50ms time limit. The query looks like this:
select distinct colname
from tablename where (current where clauses)
limit 31
All we are doing here is trying to determine if the column in question has less than 30 unique values. The limit clause here means that if you run this query against a column with entirely distinct values (the primary key for example) the query will terminate extremely quickly—after it has found just the first 31 values.
Once the query has executed, we count the distinct values and check to see if this column, when used as a facet:
Will return 30 or less unique options
Will return more than one unique option
Will return less unique options than the current total number of filtered rows
If the query takes longer than 50ms we terminate it and do not suggest that column as a potential facet.
Facets via JSON
As with everything in Datasette, the facets you can view in your browser are also available as part of the JSON API (which ships with CORS headers so you can easily fetch data from JavaScript running in a browser on any web page).
To get back JSON, add .json to the path (before the first ?). Here’s that power plants example returned as JSON:
Please  let  me  know  if  you  build  something  interesting  with  Datasette  Facets!  from iphone
2 days ago by kgiverson
danistefanovic/build-your-own-x: 🤓 Build your own (insert technology here)
Lots of learning by building your own systems in different programming languages.
programming  Build  Project  Tutorial  JavaScipt 
5 days ago by lost_in_space

« earlier    

related tags

#12)  (byot  -  10  2  2018  2018:  6  6:  7  a  ai  an  and  android  annual  api  app  apple  apps  ar  architecture  arkit  arm  article  articles  aspnetcore  assembly  async  audio  awesome  azure  babel  bazel  beautiful  bicycle  block  boat  bond  bot  bottomed  box  brand  browser  brunson  build-your-own-x  c  can  cart  cd  chrome  ci  cli  clickfunnels  cloud  compatibility  condition  config  configuration  constellation  construction  continuous-delivery  continuous-integration  continuousintegration  create  dat  datasette  deck  deploy  deployment  depth  develop  developers  development  digital  diy  diy:  docker-compose  docker  dotnetcore  dual-blades  dyi  e-commerce  ekranoplane  electric  electron  entry  es6  everything  example  extension  facets!  fb  feeds  fence  feynman  fire  firefox  flat  flutter  for  foundation  funnel  future  gage  games  gaming  gauge  github  glue...  go  golang  gorilla  hardware  holger  home  house  hovercraft  how  howto  html  identify  if  important  income  interesting  ios  iot  java  javascipt  javascript  jenkins  jetboat  jm  joists  js  kiplinger  kitset  know  kubernetes'  ladder  large  learn  learning  let  linux  list  live  mac  machine  magazine  make  marketing  me  metal  microsoft  ml  mobile  modules  monster-hunter  more  mortar  mueller  multiplatform  muscles  music  mvn  mvp  need  needsediting  nextgen  no  node.js  node  nodejs  npm  nsynth  object  of  old  opinion  optimization  osx  out  own  papers  part  peddle  pi  pico8  pit  please  plugin  polycarbonate  portfolio  postgres  practice  product  programming  project  projects  protobuf  push  python  questionable  race  railing  railings  react-native  reddit  reference  release  renovision  research  resources  retailers  retro  return  reviews  rollup  router  rss  ruby  russell  sales  security  seo  serialization  server  shed  shelf  shiny  site  skiff  skirt  something  sound  stairs  startup  strategy  studs  suggestions  synthesizer  tent  test  this  tiny  tips  to  tool  tools  toread  trailer  travis  tumblr  tutorial  uh-14p  used  utility  validation  video  wall  ways  webdev  webextension  website  weights  windows  with  without  wood  wooden  woodworking  wordpress  work  workshop  xamarin  xcode  you  your  youtube  yurt  |   

Copy this bookmark: