copystar + data   70

The Code4Lib Journal – The Wise Use of Statistics in a Library-Oriented Environment
In publications of all kinds, one often finds statements such as “In 2007, there were 361 deadly traffic accidents in Switzerland, this is an increase of 6% compared to 2006” or “Graduate students borrowed 361 items from the library this month, which is an increase of 6% compared to last month’s 340.” Such a statement seems very easy to interpret at first glance. Either the number of loans rises, which is usually considered good [1] , or it falls, right? – Not exactly! Let’s have a closer look at the loans example: Every single user decides, more or less independently of the others, whether to borrow an item or not. The probability of a loan transaction can be influenced by weather, upcoming exams, holidays, opening hours, a cute librarian at the desk and a vast number of other things, but it still is a probability. Let’s assume that all of these influencing factors would not change for two consecutive weeks. By understanding this gedankenexperiment [2] as stochastic process, the weekly loan count would most likely be different even though the environment did not change. In case this puzzles you: Roll a die twice. The pip count will most likely be different, even though the environment did not change at all. This is quite a disturbing fact in connection with the above loan statistic, because an increase or decrease is not as explicit as we would like.

Just to show how meaningless the declaration of just two numbers is, I will briefly work out what kind of statement can be established from such a declaration
august 2019 by copystar
Every network tells a story

Build one quickly and easily to tell yours
march 2019 by copystar
Against Cleaning
Seeing Nonscalability in NYPL’s Crowdsourced Menus Project

Making menus is a scalable process. Although menus are sometimes handwritten or elaborately printed on ribbon-sewn silk, the format of a menu is designed to be scalable. Menus are an efficient typographical vehicle for communicating a set of offerings for often low-margin dining enterprises. Part of the way that we know that menus are scaleable is how alike they appear. “Eggs Benedict” or “caviar”, with their accompanying prices may fit interchangeably into the “slots” of the menu’s layout. Within the menus themselves, we also see evidence of the nexus of printing scalability, dish scalability, and cost in, for example, the use of ellipses to express different options: eggs with … cheese, … ham, … tomatoes, etc. The visual evidence of What’s on the Menu? shows us how headings, cover images, epigraphs—for all their surface variations—follow recognizable patterns. These strong genre conventions and the mass production of the menus as physical objects allow us to see and treat them as scaled and scalable.7
january 2019 by copystar
Canada Map
I taught my Data Visualization seminar in Philadelphia this past Friday and Saturday. It covers most of the content of my book, including a unit on making maps. The examples in the book are from the United States. But what about other places? Two of the participants were from Canada, and so here’s an example that walks through the process of grabbing a shapefile and converting it to a simple-features object for use in R. A self-contained R project with this code is available on GitHub.
december 2018 by copystar
VOSviewer - Visualizing scientific landscapes
VOSviewer is a software tool for constructing and visualizing bibliometric networks. These networks may for instance include journals, researchers, or individual publications, and they can be constructed based on citation, bibliographic coupling, co-citation, or co-authorship relations. VOSviewer also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature.
scholcomm  data  scholcomm-tools 
may 2018 by copystar
My Favorite Tool: Rasterio
New on the blog: Robert Sare on the benefits of rasterio for earth sciences research #Python #spatialdata
data  geo 
november 2017 by copystar
Visualization: Bubble Chart  |  Charts  |  Google Developers
We use a bubble chart ( to show this kind of data where the X-axis values are 1-7 to show day of week, the Y-axis values are 0-23 to show hour of day, and the bubble-size/color is patrons per hour. In the Google version cited, the size and color can vary independently to show different values but we found them to work well together, e.g. a small yellow bubble shows low usage and a large red bubble shows high usage.
may 2017 by copystar
What Facebook learned when it opened its data to every employee - O'Reilly Media
The democratization of data is one of the most powerful ideas to come out of data science. Everyone in an organization should have access to as much data as legally possible.
data  opendata 
may 2017 by copystar
history-timeline/ at master · ybogdanov/history-timeline · GitHub
It is inspired by Wait But Why's blog post about Horizontal History — the idea of taking a "horizontal" slice of time and tracing the lifetimes of all the famous people living at that time. It certainly gives you a fresh perspective on some particular era (a feel of that time, so to say), unlike the conventional “vertical” approach of learning who came after whom and what happened after what. I can imagine, how much fun the blog’s author had drawing all of those lifetime rectangles in the Numbers spreadsheet, but simple graphics have their limitations, and a lot of famous people simply didn’t make “the cut”. I wanted to play with the concept in a bigger scale. The motivation of this project is to make the idea expandable, interactive, and crowd-sourced, by leveraging from the modern software engineering tools and approaches.
may 2017 by copystar
Welcome to
This site is a collection of graphs showcasing the bike counts from the counters set up by the City along with some weather data from Environment Canada. The data is automatically scraped once a day from the respective sites.

Counts and Weather - Daily counts and the high for the day
Trending Averages - Rolling monthly average for each counter.
Individual Daily Graphs - Each installation measured in trips/hr.
Cumulative Counts - Cumulative totals since January 1, 2015.
Data Examples - Counts from each installation with hourly weather informations.

For more details on what data is available please check out the README on GitHub

Thanks to the City of Calgary, Eco-Counter, and Environment Canada for making the data available.

Calgary Bike Counter Page
Peace Bridge - Bikes and Pedestrians
Stephen Ave
7th St.
5th St. - North Leg, Under the Tracks, South Leg
12th Ave. - West Leg, Central Memorial, East Leg
8th Ave. - West Leg, Centre
9th Ave. - NMC


The graphs are generated by Grafana with Graphite/Whisper being used as the data store, and Python scripts to scrape the data. Scripts and how it is set up is available on GitHub.

There are a couple caveats; Grafana will sometimes bin results so the daily counts are off by 1 or 2 and the extra 1 or 2 are in they day before or day after. The other is that while the dashboards are set to correct for the browser - daylight savings time means the hourly and 15 minutes graphs appear an hour earlier than reality. (MDT = -6, instead of -7).
data  civictech 
may 2017 by copystar
Extract data from any website with 1 Click with Data Miner
Data Miner is an add-on for Google chrome browser that helps you extract data from web pages and into an Excel spreadsheet or CSV file
april 2017 by copystar
GitHub - nrchtct/monadicexploration: Seeing the Whole Through Its Parts
Monadic exploration is a new approach to interacting with relational information spaces that challenges the distinction between the whole and its parts. Building on the work of sociologists Gabriel Tarde and Bruno Latour we turn to the concept of the monad as a useful lens on online communities and collections that expands the possibility for creating meaning in their navigation. While existing interfaces tend to emphasize either the structure of the whole or details of a part, monadic exploration brings these opposing perspectives closer together in continuous movements between partially overlapping points of view. The resulting visualization reflects a given node’s relative position within a network using radial displacements and visual folding.
july 2016 by copystar
OSF | Home
*very interesting* It makes me wonder where we should similarly go: research enviros like Islandora or
data  opendata  openaccess  from twitter
june 2016 by copystar
Tools for Thought
visualizations using webgl
march 2016 by copystar
Have a repository full of Jupyter notebooks? With Binder, you can add a badge that opens those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere.
codelearning  data 
february 2016 by copystar
Networking the Republic of Letters, 1550-1750
Our new data structures will be made discoverable and manageable within the existing front end (implemented in Python/Pylons with search provided by Apache SOLR). This means new search forms, a restructuring of existing forms, updated filtering and faceting, updated browse lists, and – most excitingly – a reconfiguration of profile pages for letters, people, publications, and objects as dynamic, aggregated, configurable streams in which personal and intellectual events – letters sent and received, books and pamphlets written, rites of passage, professional associations, and so on – are seamlessly intertwined. For example, this Hartlibian person profile will be reconceived as something like the following, with underlined items indicating links to the relevant entity (which would also, in turn, be presented in streamed format):
february 2016 by copystar

Copy this bookmark: