tsuomela + data-science   124

First Python Notebook — First Python Notebook 1.0 documentation
"A step-by-step guide to analyzing data with Python and the Jupyter Notebook."
data-science  python  programming  notebook 
9 weeks ago by tsuomela
SciServer – Collaborative data-driven science
"SciServer is a revolutionary new approach to doing science by bringing the analysis to the data. SciServer consists of data hosting services coupled with integrated Tools that work together to create a full-featured system."
data-science  big-data 
10 weeks ago by tsuomela
[1710.00027v1] Toward a System Building Agenda for Data Integration
"In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the "pain points" of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges. "
research-data  management  integration  sharing  data-science 
november 2018 by tsuomela
Enigma Labs | Temperature Anomalies
"Every day, the Global Historical Climatology Network collects temperatures from 90,000 weather stations. Dating back as far as the late 1700's, the records provide an incredible source of insight into our changing climate. Using this data, we can determine what the weather is normally like for most places on Earth. We can tell you that the average low temperature in New York City on January 11th is 29°F and that the average high temperature in Los Angeles on July 24th is 80°F. Once we know what temperatures to expect on any given day with a certain degree of confidence, we can sift out the uneventful days, leaving only anomalous weather events."
data-science  demonstration  weather  environment  temperature  climate-change  public-data 
september 2018 by tsuomela
How to Make Better-Looking, More Readable Charts in R | FlowingData
"Defaults are generalized settings to work with many datasets. This is fine for analysis, but data graphics for presentation benefit from context-specific design."
data-science  visualization  r  tutorial 
august 2018 by tsuomela
Data Love - The Seduction and Betrayal of Digital Technologies | Columbia University Press
"Intelligence services, government administrations, businesses, and a growing majority of the population are hooked on the idea that big data can reveal patterns and correlations in everyday life. Initiated by software engineers and carried out through algorithms, the mining of big data has sparked a silent revolution. But algorithmic analysis and data mining are not simply byproducts of media development or the logical consequences of computation. They are the radicalization of the Enlightenment's quest for knowledge and progress. Data Love argues that the "cold civil war" of big data is taking place not among citizens or between the citizen and government but within each of us. Roberto Simanowski elaborates on the changes data love has brought to the human condition while exploring the entanglements of those who—out of stinginess, convenience, ignorance, narcissism, or passion—contribute to the amassing of ever more data about their lives, leading to the statistical evaluation and individual profiling of their selves. Writing from a philosophical standpoint, Simanowski illustrates the social implications of technological development and retrieves the concepts, events, and cultural artifacts of past centuries to help decode the programming of our present."
book  publisher  data-science  data-mining  epistemology 
august 2018 by tsuomela
Data, a first-class research output
" The Make Data Count (MDC) project is funded by the Alfred P. Sloan Foundation to develop and deploy the social and technical infrastructure necessary to elevate data to a first-class research output alongside more traditional products, such as publications. It will run between May 2017 and April 2019. The project will address the significant social as well as technical barriers to widespread incorporation of data-level metrics in the research data management ecosystem through consultation, recommendation, new technical capability, and community outreach. Project work will build upon long-standing partner initiatives supporting research data management and DLM, leverage prior Sloan investments in key technologies such as Lagotto, and enlist the cooperation of the research, library, funder, and publishing stakeholder communities."
research-data  management  metrics  altmetrics  data-science  data  publishing  scholarly-communication 
may 2017 by tsuomela
Data School
"My name is Kevin Markham. I'm a data scientist and a teacher. Previously, I was the lead instructor for General Assembly's 11-week data science course in Washington, DC, as well an instructor fellow, responsible for training and mentoring new data science instructors. I have over 400 hours of classroom experience teaching data science in Python. Here are testimonials about my teaching."
weblog-individual  courses  statistics  data-science 
april 2017 by tsuomela
CS109 Data Science
"Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries."
data-science  courses  open-education  school(Harvard) 
april 2017 by tsuomela
Welcome to the NEON Data Skills Portal! – NEON Data Skills
"This site contains data lessons, background materials and other resources that support working with large spatio-temporal datasets, like those offered by the NEON project. We welcome any comments and feedback that you have and also materials that support or expand upon what’s available on this site!"
data-science  education  data-sources  tutorials  pedagogy  lessons 
april 2017 by tsuomela
[1502.05256] Cultural Anthropology Through the Lens of Wikipedia - A Comparison of Historical Leadership Networks in the English, Chinese, Japanese and German Wikipedia
"In this paper we study the differences in historical worldview between Western and Eastern cultures, represented through the English, Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World's leaders since the beginning of written history, comparing them in the four different Wikipedias. "
anthropology  data-science  computational-science 
april 2017 by tsuomela
NSI | Deeper Analyses. Clarifying Insights. Better Decisions.
"NSI is a professional services firm specializing in multidisciplinary data-driven analytics. Our niche is helping clients better and more reliably understand people and their behaviors on critical, complex decision-making problems. Our highly diverse and experienced team of researchers and analysts apply multidisciplinary social science techniques and various analytic methods to create deeper analyses and clarifying insights enabling our clients to make more informed and better decisions."
business  consulting  data-science  social-science 
april 2017 by tsuomela
GitHub - Factual/drake: Data workflow tool, like a "Make for data"
"Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates: which commands to execute (based on file timestamps) in what order to execute the commands (based on dependencies) Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows."
data-science  research  automation  scripting  reproducible 
december 2016 by tsuomela
The Engine Room
"The Engine Room helps activists, organisations, and other social change agents make the most of data and technology to increase their impact. We are a non-profit organisation ourselves, and our international team is made up of experienced and committed practitioners. Since 2011, we have supported more than 200 organisations, big and small, from every corner of the globe. Technology and data have the potential to dramatically accelerate the impact of any group or organisation that promotes equality, justice, human rights, good governance and accountability."
data-science  analysis  data  non-profit  activism 
december 2016 by tsuomela
« earlier      
per page:    204080120160

related tags

3d  academic  activism  altmetrics  analysis  analytics  anthropology  api  apprenticeship  archives  assessment  astronomy  automation  bash  best-practices  big-data  book  books  boundaries  business  campaign  challenge  checklist  citizen-science  cleaning  cleanup  climate-change  command-line  commercial  communication  competition  computational-science  computer-science  conference  consulting  country(China)  courses  criticism  crowdsourcing  culturomics  data  data-curation  data-exploration  data-management  data-mining  data-science  data-sharing  data-sources  database  demonstration  description  design  digital-humanities  discipline  distribution  diy  economics  education  election  environment  epistemology  ethics  etl  example  examples  exploration  facebook  files  genetics  google  graduate-student  graphics  historiography  history  hubris  human-subjects  humanities  ide  ideology  illustration  infographics  information-science  infrastructure  integration  intuition  ipython  journal  language  languages  learning  lessons  libraries  library  list  machine-learning  management  manifesto  mathematics  metaphor  methodology  methods  metrics  modeling  models  news  non-profit  notebook  online  open-education  ornithology  package  parallel  pdf  pedagogy  philosophy  podcast  polling  practice  preservation  privacy  probability  products  professional-association  programming  psychology  public-data  public-understanding  publisher  publishing  python  r  reading  recommendation-systems  recommendations  reference  report  reproducible  research  research-data  resources  rhetoric  roles  scholarly-communication  school(Harvard)  science  scripting  sentiment  sharing  shell  skills  social-media  social-science  software  Spreadsheet  sql  standards  startup  statistics  stewardship  sts  survey  syllabi  teaching  temperature  templates  text-analysis  textbook  time-series  timeline  tips  tool  tools  topic-modeling  tutorial  tutorials  twitter  uncertainty  understanding  unix  video  visualization  weather  web  weblog-individual  weblog-organization 

Copy this bookmark:



description:


tags: