big_data   4007

« earlier    

Sarnia |
Sarnia allows a Drupal site to interact with and display data from Solr cores with arbitrary (non-Drupal) schemas, mainly by building Views. This is useful for Solr cores that index large, external (non-Drupal) datasets that are either not practical to store in Drupal or may already be indexed in Solr.

Sarnia treats records from Solr as Drupal entities, although listing, filtering, and displaying Sarnia entities should be done using Views.
search  solr  nutch  big_data 
yesterday by liberatr
Common Crawl
We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.
search  big_data  open_data 
2 days ago by liberatr
GitHub - IntelLabs/hpat
High Performance Analytics Toolkit (HPAT) is a compiler-based framework for big data analytics and machine learning on cluster/cloud environments that is both easy to use and extremely fast; it is orders of magnitude faster than alternatives like Apache Spark.
analytics  big_data  machine_learning 
6 days ago by edgaron
The Greatest Number - Triple Canopy
Theodore Porter, a historian of science at UCLA, describes the widespread adoption of quantification—the foundation of today’s algorithmic number-crunching—in Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (1995). He tells of nineteenth-century accountants and actuaries who distinguished themselves through their pursuit of objectivity, and the scores of professions that subsequently sought the same credibility and authority via tabulation. He also observes the ramifications of filtering all kinds of natural and social phenomena through numeric measurements.....

In Trust in Numbers, you chronicle the adoption of quantitative methodologies across a wide range of scientific and political domains, from British accountants and actuaries and French civil engineers in the nineteenth century to American ones in the twentieth. In each case, those who claimed the mantle of objectivity accrued power. Were these professions borrowing from the natural sciences? Or did these forms of quantification emerge from the burgeoning social sciences? And how did the spread of quantification legitimize the work of early social scientists?
THEODORE PORTER These quantitative approaches actually emerged within a variety of institutions, and never simply by imitating science or claiming the prestige of science. The methods of accountants, bookkeepers, and economists have their histories, worked out as solutions to their own problems. They aren’t alien impositions, imported wholesale from academic science; yet it was important to these professionals to achieve the dignity of science, which they construed in
terms of uniform and rigorous calculation....

Part of the danger of automating decision-making processes and downplaying human intuition has to do with what you call, in Trust in Numbers, the “moral distance encouraged by a quantitative method.”... in the early twentieth century in the United States, “middle-class philanthropists and social workers used statistics to learn about kinds of people whom they did not know, and often did not care to know, as persons.” How can the benefits of quantification be weighed against the diminution of empathy for—or a true understanding of the conditions of—the people being analyzed?...

In your article “Thin Description: Surface and Depth in Science and Science Studies” (2012), you note, “Statistics in the human domain retains an element of its primal meaning, state-istics, the descriptive science of the state.”

PORTER Yes, but today private businesses are also able to collect massive amounts of data. Instead of the centralized, planned counts of government censuses and surveys, they prefer chaotic counts of data drawn from transactions as they happen. Instead of a scientific approach to research, they rely on the outputs that result from social-media interactions, purchases, clicks on online ads, time spent on websites, and so on. People in the technology industry are extremely proud of the disruption represented in this move away from the centralized planned count...

The historian Daniel Rosenberg points out that “data” descends from the Latin for “given,” but the scholar Johanna Drucker argues that the word has always been misnomer. Noting the labor involved in measurement, she proposes “capta” as an alternative. “Statisticians know very well that no ‘data’ preexist their parameterization,” she writes in “Humanities Approaches to Graphical Display” (2011). “Data are capta, taken not given, constructed as an interpretation of the phenomenal world, not inherent in it.”...

The historian Daniel Rosenberg points out that “data” descends from the Latin for “given,” but the scholar Johanna Drucker argues that the word has always been misnomer. Noting the labor involved in measurement, she proposes “capta” as an alternative. “Statisticians know very well that no ‘data’ preexist their parameterization,” she writes in “Humanities Approaches to Graphical Display” (2011). “Data are capta, taken not given, constructed as an interpretation of the phenomenal world, not inherent in it.”
big_data  quantification  science  epistemology  statistics 
14 days ago by shannon_mattern
The Space between Us by Ryan D. Enos
The Space Between Us brings the connection between geography, psychology, and politics to life. By going into the neighborhoods of real cities, Enos shows how our perceptions of racial, ethnic, and religious groups are intuitively shaped by where these groups live and interact daily. Through the lens of numerous examples across the globe and drawing on a compelling combination of research techniques including field and laboratory experiments, big data analysis, and small-scale interactions, this timely book provides a new understanding of how geography shapes politics and how members of groups think about each other. Enos' analysis is punctuated with personal accounts from the field. His rigorous research unfolds in accessible writing that will appeal to specialists and non-specialists alike, illuminating the profound effects of social geography on how we relate to, think about, and politically interact across groups in the fabric of our daily lives.
book  geography  political_psychology  political_science  spatial_statistics  political_sociology  big_data  experiments  homophily  social_networks  networks  teaching  ? 
16 days ago by rvenkat
Curriculum Guidelines for Undergraduate Programs in Data Science | Annual Review of Statistics and Its Application
The Park City Math Institute 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in data science. The group consisted of 25 undergraduate faculty from a variety of institutions in the United States, primarily from the disciplines of mathematics, statistics, and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in data science.
data_science  big_data  course  pedagogy  teaching  for_friends 
22 days ago by rvenkat
Why Data Will Never Replace Thinking
Big data, it has been said, is making science obsolete. No longer do we need theories of genetics or linguistics or sociology, Wired editor Chris Anderson wrote in a manifesto four years ago: “With enough data, the numbers speak for themselves.”
hcds  big_data  data_science  research 
25 days ago by jaimoe
Privacy in the Age of Big Data | Stanford Law Review
The harvesting of large data sets and the use of analytics clearly implicate privacy concerns. The tasks of ensuring data security and protecting privacy become harder as information is multiplied and shared ever more widely around the world. Information regarding individuals’ health, location, electricity use, and online activity is exposed to scrutiny, raising concerns about profiling, discrimination, exclusion, and loss of control. Traditionally, organizations used various methods of de-identification (anonymization, pseudonymization, encryption, key-coding, data sharding) to distance data from real identities and allow analysis to proceed while at the same time containing privacy concerns. Over the past few years, however, computer scientists have repeatedly shown that even anonymized data can often be re-identified and attributed to specific individuals.[7]
law  privacy  hcds  data_science  big_data  anonymization 
28 days ago by jaimoe

« earlier    

related tags

%tech_bz↗  ?  2017  administrative_state  ai  air  algorithmic_accountability  algorithmic_bias  algorithms  algoritmer  analysis  analytics  anonymization  architecture  articles  artificial_intelligence  austerity  behaviorism  bias  bigdata  book  bookmarks_bar  business  capitalism  china  cia  citizen_data  classification  collaboration  company  competition_law  computational_fabrication  corbyn  course  courses  covariance  critique  data  data_analysis  data_centers  data_ethics  data_mining  data_ownership  data_privacy  data_science  data_visualization  data_viz  databases  david.blei  debt  deep_learning  digital_replicas  digital_scholarship  disciplinarity  dplyr  e_government  econometrics  economics  election  energy  environment  epistemology  estonia  ethical_ai  ethics  ethnography  experiments  facebook  finance  finland  for_friends  france  freedom_of_speech  funding  gafa  gary.marcus  geo  geography  google  governance  guardian  hadoop  hcds  health  homophily  hope  i_remain_skeptical  ifh  information_overload  intellectual_property  internet  intervention  interview_prep  law  law_enforcement  learning_&_books  learning_analytics  logistics  lrb  machine_learning  macron  mapping  market_research  media_architecture  methodology  methods  microeconomics  microsoft  my_work  neoliberalism  networked_public_sphere  networks  new_media  numpy  nutch  nytimes  observational_studies  online_experiments  open_data  opendata  opensource  opinion_mining  palantir  parametrics  paranoia  pedagogy  philosophy_of_technology  police  policing  political_psychology  political_science  political_sociology  politics  pollution  pre-crime  prediction  predictive_policing  price:free  privacy  programming  public_opinion  public_policy  public_sphere  python  quantification  r  r_programming  racism  raj.chetty  regulation  research  reuters  review  robots  sanitation  satellite  science  search  security  sensation  sensors  sentiment_analysis  sharing  singapore  slate  smart_cities  so_you_want_to_learn_x  social_media  social_networks  social_science  solr  spark  spatial_statistics  startup  statistics  surveillance  syllabus  system_architecture  t_subject  teaching  tech  ted_talks  text_mining  theory  thinker  topic_model  transparency  transportation  united_states_of_america  urban  urban_data  urban_intelligence  urban_planning  urban_science  uw  versioning  visibility  visualization  white_paper  wikipedia  will_davies  yahoo 

Copy this bookmark: