tsuomela + data-science 124
First Python Notebook — First Python Notebook 1.0 documentation
9 weeks ago by tsuomela
"A step-by-step guide to analyzing data with Python and the Jupyter Notebook."
data-science
python
programming
notebook
9 weeks ago by tsuomela
SciServer – Collaborative data-driven science
10 weeks ago by tsuomela
"SciServer is a revolutionary new approach to doing science by bringing the analysis to the data. SciServer consists of data hosting services coupled with integrated Tools that work together to create a full-featured system."
data-science
big-data
10 weeks ago by tsuomela
[1710.00027v1] Toward a System Building Agenda for Data Integration
november 2018 by tsuomela
"In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the "pain points" of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges. "
research-data
management
integration
sharing
data-science
november 2018 by tsuomela
Enigma Labs | Temperature Anomalies
september 2018 by tsuomela
"Every day, the Global Historical Climatology Network collects temperatures from 90,000 weather stations. Dating back as far as the late 1700's, the records provide an incredible source of insight into our changing climate. Using this data, we can determine what the weather is normally like for most places on Earth. We can tell you that the average low temperature in New York City on January 11th is 29°F and that the average high temperature in Los Angeles on July 24th is 80°F. Once we know what temperatures to expect on any given day with a certain degree of confidence, we can sift out the uneventful days, leaving only anomalous weather events."
data-science
demonstration
weather
environment
temperature
climate-change
public-data
september 2018 by tsuomela
How to Make Better-Looking, More Readable Charts in R | FlowingData
august 2018 by tsuomela
"Defaults are generalized settings to work with many datasets. This is fine for analysis, but data graphics for presentation benefit from context-specific design."
data-science
visualization
r
tutorial
august 2018 by tsuomela
Data Love - The Seduction and Betrayal of Digital Technologies | Columbia University Press
august 2018 by tsuomela
"Intelligence services, government administrations, businesses, and a growing majority of the population are hooked on the idea that big data can reveal patterns and correlations in everyday life. Initiated by software engineers and carried out through algorithms, the mining of big data has sparked a silent revolution. But algorithmic analysis and data mining are not simply byproducts of media development or the logical consequences of computation. They are the radicalization of the Enlightenment's quest for knowledge and progress. Data Love argues that the "cold civil war" of big data is taking place not among citizens or between the citizen and government but within each of us. Roberto Simanowski elaborates on the changes data love has brought to the human condition while exploring the entanglements of those who—out of stinginess, convenience, ignorance, narcissism, or passion—contribute to the amassing of ever more data about their lives, leading to the statistical evaluation and individual profiling of their selves. Writing from a philosophical standpoint, Simanowski illustrates the social implications of technological development and retrieves the concepts, events, and cultural artifacts of past centuries to help decode the programming of our present."
book
publisher
data-science
data-mining
epistemology
august 2018 by tsuomela
CRAN - Package cdparcoord
september 2017 by tsuomela
For visualizing parallel coordinate plots.
data-science
visualization
parallel
library
september 2017 by tsuomela
Data, a first-class research output
may 2017 by tsuomela
" The Make Data Count (MDC) project is funded by the Alfred P. Sloan Foundation to develop and deploy the social and technical infrastructure necessary to elevate data to a first-class research output alongside more traditional products, such as publications. It will run between May 2017 and April 2019. The project will address the significant social as well as technical barriers to widespread incorporation of data-level metrics in the research data management ecosystem through consultation, recommendation, new technical capability, and community outreach. Project work will build upon long-standing partner initiatives supporting research data management and DLM, leverage prior Sloan investments in key technologies such as Lagotto, and enlist the cooperation of the research, library, funder, and publishing stakeholder communities."
research-data
management
metrics
altmetrics
data-science
data
publishing
scholarly-communication
may 2017 by tsuomela
Data School
april 2017 by tsuomela
"My name is Kevin Markham. I'm a data scientist and a teacher. Previously, I was the lead instructor for General Assembly's 11-week data science course in Washington, DC, as well an instructor fellow, responsible for training and mentoring new data science instructors. I have over 400 hours of classroom experience teaching data science in Python. Here are testimonials about my teaching."
weblog-individual
courses
statistics
data-science
april 2017 by tsuomela
CS109 Data Science
april 2017 by tsuomela
"Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries."
data-science
courses
open-education
school(Harvard)
april 2017 by tsuomela
Welcome to the NEON Data Skills Portal! – NEON Data Skills
april 2017 by tsuomela
"This site contains data lessons, background materials and other resources that support working with large spatio-temporal datasets, like those offered by the NEON project. We welcome any comments and feedback that you have and also materials that support or expand upon what’s available on this site!"
data-science
education
data-sources
tutorials
pedagogy
lessons
april 2017 by tsuomela
[1502.05256] Cultural Anthropology Through the Lens of Wikipedia - A Comparison of Historical Leadership Networks in the English, Chinese, Japanese and German Wikipedia
april 2017 by tsuomela
"In this paper we study the differences in historical worldview between Western and Eastern cultures, represented through the English, Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World's leaders since the beginning of written history, comparing them in the four different Wikipedias. "
anthropology
data-science
computational-science
april 2017 by tsuomela
NSI | Deeper Analyses. Clarifying Insights. Better Decisions.
april 2017 by tsuomela
"NSI is a professional services firm specializing in multidisciplinary data-driven analytics. Our niche is helping clients better and more reliably understand people and their behaviors on critical, complex decision-making problems. Our highly diverse and experienced team of researchers and analysts apply multidisciplinary social science techniques and various analytic methods to create deeper analyses and clarifying insights enabling our clients to make more informed and better decisions."
business
consulting
data-science
social-science
april 2017 by tsuomela
Reproducible Data Analysis in Jupyter | Pythonic Perambulations
march 2017 by tsuomela
"Python Data Science Handbook"
python
programming
ipython
data-science
march 2017 by tsuomela
GitHub - Factual/drake: Data workflow tool, like a "Make for data"
december 2016 by tsuomela
"Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates: which commands to execute (based on file timestamps) in what order to execute the commands (based on dependencies) Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows."
data-science
research
automation
scripting
reproducible
december 2016 by tsuomela
The Engine Room
december 2016 by tsuomela
"The Engine Room helps activists, organisations, and other social change agents make the most of data and technology to increase their impact. We are a non-profit organisation ourselves, and our international team is made up of experienced and committed practitioners. Since 2011, we have supported more than 200 organisations, big and small, from every corner of the globe. Technology and data have the potential to dramatically accelerate the impact of any group or organisation that promotes equality, justice, human rights, good governance and accountability."
data-science
analysis
data
non-profit
activism
december 2016 by tsuomela
related tags
3d ⊕ academic ⊕ activism ⊕ altmetrics ⊕ analysis ⊕ analytics ⊕ anthropology ⊕ api ⊕ apprenticeship ⊕ archives ⊕ assessment ⊕ astronomy ⊕ automation ⊕ bash ⊕ best-practices ⊕ big-data ⊕ book ⊕ books ⊕ boundaries ⊕ business ⊕ campaign ⊕ challenge ⊕ checklist ⊕ citizen-science ⊕ cleaning ⊕ cleanup ⊕ climate-change ⊕ command-line ⊕ commercial ⊕ communication ⊕ competition ⊕ computational-science ⊕ computer-science ⊕ conference ⊕ consulting ⊕ country(China) ⊕ courses ⊕ criticism ⊕ crowdsourcing ⊕ culturomics ⊕ data ⊕ data-curation ⊕ data-exploration ⊕ data-management ⊕ data-mining ⊕ data-science ⊖ data-sharing ⊕ data-sources ⊕ database ⊕ demonstration ⊕ description ⊕ design ⊕ digital-humanities ⊕ discipline ⊕ distribution ⊕ diy ⊕ economics ⊕ education ⊕ election ⊕ environment ⊕ epistemology ⊕ ethics ⊕ etl ⊕ example ⊕ examples ⊕ exploration ⊕ facebook ⊕ files ⊕ genetics ⊕ google ⊕ graduate-student ⊕ graphics ⊕ historiography ⊕ history ⊕ hubris ⊕ human-subjects ⊕ humanities ⊕ ide ⊕ ideology ⊕ illustration ⊕ infographics ⊕ information-science ⊕ infrastructure ⊕ integration ⊕ intuition ⊕ ipython ⊕ journal ⊕ language ⊕ languages ⊕ learning ⊕ lessons ⊕ libraries ⊕ library ⊕ list ⊕ machine-learning ⊕ management ⊕ manifesto ⊕ mathematics ⊕ metaphor ⊕ methodology ⊕ methods ⊕ metrics ⊕ modeling ⊕ models ⊕ news ⊕ non-profit ⊕ notebook ⊕ online ⊕ open-education ⊕ ornithology ⊕ package ⊕ parallel ⊕ pdf ⊕ pedagogy ⊕ philosophy ⊕ podcast ⊕ polling ⊕ practice ⊕ preservation ⊕ privacy ⊕ probability ⊕ products ⊕ professional-association ⊕ programming ⊕ psychology ⊕ public-data ⊕ public-understanding ⊕ publisher ⊕ publishing ⊕ python ⊕ r ⊕ reading ⊕ recommendation-systems ⊕ recommendations ⊕ reference ⊕ report ⊕ reproducible ⊕ research ⊕ research-data ⊕ resources ⊕ rhetoric ⊕ roles ⊕ scholarly-communication ⊕ school(Harvard) ⊕ science ⊕ scripting ⊕ sentiment ⊕ sharing ⊕ shell ⊕ skills ⊕ social-media ⊕ social-science ⊕ software ⊕ Spreadsheet ⊕ sql ⊕ standards ⊕ startup ⊕ statistics ⊕ stewardship ⊕ sts ⊕ survey ⊕ syllabi ⊕ teaching ⊕ temperature ⊕ templates ⊕ text-analysis ⊕ textbook ⊕ time-series ⊕ timeline ⊕ tips ⊕ tool ⊕ tools ⊕ topic-modeling ⊕ tutorial ⊕ tutorials ⊕ twitter ⊕ uncertainty ⊕ understanding ⊕ unix ⊕ video ⊕ visualization ⊕ weather ⊕ web ⊕ weblog-individual ⊕ weblog-organization ⊕Copy this bookmark: