data-analysis   2049

« earlier    

[1712.05630] Sparse principal component analysis via random projections
We introduce a new method for sparse principal component analysis, based on the aggregation of eigenvector information from carefully-selected random projections of the sample covariance matrix. Unlike most alternative approaches, our algorithm is non-iterative, so is not vulnerable to a bad choice of initialisation. Our theory provides great detail on the statistical and computational trade-off in our procedure, revealing a subtle interplay between the effective sample size and the number of random projections that are required to achieve the minimax optimal rate. Numerical studies provide further insight into the procedure and confirm its highly competitive finite-sample performance.
dimension-reduction  statistics  data-analysis  algorithms  performance-measure  consider:lexicase  sparseness 
11 days ago by Vaguery
Oracle Labs PGX: Parallel Graph AnalytiX Overview
PGX is a toolkit for graph analysis - both running algorithms such as PageRank against graphs, and performing SQL-like pattern-matching against graphs, using the results of algorithmic analysis.  Algorithms are parallelized for extreme performance. The PGX toolkit includes both a single-node in-memory engine, and a distributed engine for extremely large graphs. Graphs can be loaded from a variety of sources including flat files, SQL and NoSQL databases and Apache Spark and Hadoop; incremental updates are supported.
data-analysis  big-data  esoteric  SQL  oracle 
13 days ago by kmt
API and JSON in R
Tutorial from Paul Bradshaw
R  Rstudio  data-analysis  JSON  API 
18 days ago by wanulfa
Introduction to data cleaning using Pandas
I’ve been using Excel for data cleaning until I discovered how powerful pandas are for data analysis and data cleaning. In this article I want to go over basics of how to use pandas for cleaning data in excel files.
Pandas  data  data-cleaning  data-analysis  python 
19 days ago by wanulfa
R, RStudio, and the tidyverse for data analysis
A tutorial in using R for data analysis, very useful for journalist
R  Rstudio  data-analysis 
19 days ago by wanulfa
A non-spatial account of place and grid cells based on clustering models of concept learning | bioRxiv
One view is that conceptual knowledge is organized as a "cognitive map" in the brain, using the circuitry in the medial temporal lobe (MTL) that supports spatial navigation. In contrast, we find that a domain-general learning algorithm explains key findings in both spatial and conceptual domains. When the clustering model is applied to spatial navigation tasks, so called place and grid cells emerge because of the relatively uniform sampling of possible inputs in these tasks. The same mechanism applied to conceptual tasks, where the overall space can be higher-dimensional and sampling sparser, leads to representations more aligned with human conceptual knowledge. Although the types of memory supported by the MTL are superficially dissimilar, the information processing steps appear shared.
models-and-modes  emergence  data-analysis  rather-interesting  to-write-about  consider:the-mangle 
25 days ago by Vaguery
The Dictatorship of Data - MIT Technology Review
McNamara was a numbers guy. Appointed the U.S. secretary of defense when tensions in Vietnam rose in the early 1960s, he insisted on getting data on everything he could. Only by applying statistical rigor, he believed, could decision makers understand a complex situation and make the right choices. The world in his view was a mass of unruly information that—if delineated, denoted, demarcated, and quantified—could be tamed by human hand and fall under human will. McNamara sought Truth, and that Truth could be found in data. Among the numbers that came back to him was the “body count.”
oh yeah, this is classic:
"McNamara rose swiftly up the ranks, trotting out a data point for every situation. Harried factory managers produced the figures he demanded—whether they were correct or not. When an edict came down that all inventory from one car model must be used before a new model could begin production, exasperated line managers simply dumped excess parts into a nearby river. The joke at the factory was that a fellow could walk on water—atop rusted pieces of 1950 and 1951 cars."
big-data  data-analysis  stats  bias  history  methodology  argument 
4 weeks ago by kmt
Jupyter notebooks as Markdown documents, Julia, Python or R scripts. Supports round-trip conversion.
jupyter  data-analysis  workfkflow  collaboration 
5 weeks ago by mjlassila

« earlier    

related tags

$  **  *  2018  @-public  advice  ai  algorithms  americana  analysis  analytics  api  architecture  argument  automatic  automation  awk  benchmarks  bias  bibliography  big-data  bigdata  biology  book  books  business  calcio  charts  cheat  cheatsheet  chicago  cli  code  collaboration  command-line  command  computation  consider:lexicase  consider:the-mangle  cooking  cosma-shalizi  course-notes  course  cross-validation  css  csv  cv  d3  d51  data-cleaning  data-frames  data-mining  data-science  data-sets  data-visualization  data  data_science  database  datamining  datascience  dataset  datasets  dataviz  deep-learning  deeplearning  defensive-coding  design  dev  differential-privacy  dimension-reduction  dimensionality-reduction  documentation  ebooks  ecology  economics  eda  emacs  emergence  esoteric  etl  example  explanation  exploratory-data-analysis  extract  feature-engineering  feature-extraction  files  football  free  github  google  googleanalytics  graphics  grid-search  growth  haskell  hci  hey-i-know-this-guy  history  howto  html  human-rights  import  information  introduction  java  jeopardy  js  json  jupyter  kaggle  language-processing  leading-questions  learning  library  linguistics  lists  livestream  logistic-regression  looking-to-see  machine-learning  machinelearning  maps  match-analysis  math  mathematical-recreations  methodology  migration  modelling  models-and-modes  money  monitoring  neoliberalism  neural-network  neural-networks  nhs-digital  numerical-methods  nyt  omni-mate  opencontent  oracle  org-mode  package  packages  pandas  parking  patreon  performance-measure  pinker  pipeline  politics  population-density  practical  prediction  predictive-analytics  programming  propaganda  psychology  python-libraries  python  r-project  r  race  rather-interesting  read-later  reading  recipes  reference  regression  relationships  repository  representation  research  reusable-holdout  review  rstats  rstudio  russian  rust  science  scipy  shape-analysis  soccer  social-network-analysis  social-networks  socialmedia  sparseness  spatial-analysis  spatial-statistics  spatial  sport  sports  sql  statistical-thinking  statistics  stats  structure  surveillance  survival-analysis  svm  symbolic-regression  tableau  technology  text-mining  text  textbook  tickets  tips-and-tricks  tips  to-understand  to-write-about  to:learn  to:make-second-nature  to:read  tool  tooling  tools  topology  transaltion  tutorial  twitch  unix  validation  visiual  visualisation  visualization  voting  war  webdev  workfkflow  writing 

Copy this bookmark: