heavy_tails   144

« earlier    

A Non‐Gaussian Spatio‐Temporal Model for Daily Wind Speeds Based on a Multi‐Variate Skew‐t Distribution - Tagle - - Journal of Time Series Analysis - Wiley Online Library
"Facing increasing domestic energy consumption from population growth and industrialization, Saudi Arabia is aiming to reduce its reliance on fossil fuels and to broaden its energy mix by expanding investment in renewable energy sources, including wind energy. A preliminary task in the development of wind energy infrastructure is the assessment of wind energy potential, a key aspect of which is the characterization of its spatio‐temporal behavior. In this study we examine the impact of internal climate variability on seasonal wind power density fluctuations over Saudi Arabia using 30 simulations from the Large Ensemble Project (LENS) developed at the National Center for Atmospheric Research. Furthermore, a spatio‐temporal model for daily wind speed is proposed with neighbor‐based cross‐temporal dependence, and a multi‐variate skew‐t distribution to capture the spatial patterns of higher‐order moments. The model can be used to generate synthetic time series over the entire spatial domain that adequately reproduce the internal variability of the LENS dataset."
to:NB  spatio-temporal_statistics  heavy_tails  meteorology  statistics  to_teach:data_over_space_and_time 
10 weeks ago by cshalizi
Skewed Wealth Distributions: Theory and Empirics
"Invariably across a cross-section of countries and time periods, wealth distributions are skewed to the right displaying thick upper tails, that is, large and slowly declining top wealth shares. In this survey we categorize the theoretical studies on the distribution of wealth in terms of the underlying economic mechanisms generating skewness and thick tails. Further, we show how these mechanisms can be micro-founded by the consumption-saving decisions of rational agents in specific economic and demographic environments. Finally we map the large empirical work on the wealth distribution to its theoretical underpinnings."
to:NB  heavy_tails  inequality  economics 
december 2018 by cshalizi
Deadly Quarrels by David Wilkinson - Paperback - University of California Press
"Lewis Fry Richardson was one of the first to develop the systematic study of the causes of war; yet his great war data archive, Statistics of Deadly Quarrels, posthumously published, has yet to be fully systematized and assimilated by war-causation scholars. David Wilkinson has reanalyzed Richardson's data and drawn together the results of kindred quantitative work on the causes of war, from other as well as from Richardson. He has translated this classic of international relations literature into contemporary idiom, fully and accurately presenting the substance of Richardson's idea and at the same time bringing it up to date with judicious comment, updating the references to the critical and successor literature, and dealing in some detail with Richardson himself. Professor Wilkinson lists among the findings: 1. the death toll of war is largely the product of a very few immense wars; 2. most wars do not escalate out of control, they are vey likely to be small, brief, and exclusive; 3. great powers have done most of the world's fighting, inflicting and suffering most of the casualties; 4. the propensity of any two groups to fight increases as the ethnocultural differences between them increase. Contemporary peace strategy would therefore seem to be to avoid World War III by promoting superpower detente, and reanimating, accelerating, and civilizing the process of world economic development.
"This title is part of UC Press's Voices Revived program, which commemorates University of California Press’s mission to seek out and cultivate the brightest minds and give them voice, reach, and impact. Drawing on a backlist dating to 1893, Voices Revived makes high-quality, peer-reviewed scholarship accessible once again using print-on-demand technology. This title was originally published in 1980."
to:NB  violence  war  heavy_tails  lives_of_the_scholars  books:noted 
october 2018 by cshalizi
How much has wealth concentration grown in the United States? A re-examination of data from 2001-2013
"Well known research based on capitalized income tax data shows robust growth in wealth concentration in the late 2000s. We show that these robust growth estimates rely on an assumption—homogeneous rates of return across the wealth distribution—that is not supported by data. When the capitalization model incorporates heterogeneous rates of return (on just interest-bearing assets), wealth concentration estimates in 2011 fall from 40.5% to 33.9%. These estimates are consistent in levels and trend with other micro wealth data and show that wealth concentration increases until the Great Recession, then declines before increasing again."
to:NB  economics  inequality  heavy_tails  class_struggles_in_america 
may 2018 by cshalizi
Robust Regression on Stationary Time Series: A Self‐Normalized Resampling Approach - Akashi - 2018 - Journal of Time Series Analysis - Wiley Online Library
"This article extends the self‐normalized subsampling method of Bai et al. (2016) to the M‐estimation of linear regression models, where the covariate and the noise are stationary time series which may have long‐range dependence or heavy tails. The method yields an asymptotic confidence region for the unknown coefficients of the linear regression. The determination of these regions does not involve unknown parameters such as the intensity of the dependence or the heaviness of the distributional tail of the time series. Additional simulations can be found in a supplement. The computer codes are available from the authors."
to:NB  time_series  statistics  linear_regression  heavy_tails  long-range_dependence 
may 2018 by cshalizi
The power of absolute discounting: all-dimensional distribution estimation
"Categorical models are the natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show (1) that absolute discounting recovers classical minimax KL-risk rates, (2) that it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) that it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the corner stone of these results. We validate the theory via synthetic data and an application to the Global Terrorism Database."
to:NB  to_read  density_estimation  statistics  heavy_tails 
november 2017 by cshalizi
Phys. Rev. Lett. 117, 230601 (2016) - Interevent Correlations from Avalanches Hiding Below the Detection Threshold
"Numerous systems ranging from deformation of materials to earthquakes exhibit bursty dynamics, which consist of a sequence of events with a broad event size distribution. Very often these events are observed to be temporally correlated or clustered, evidenced by power-law-distributed waiting times separating two consecutive activity bursts. We show how such interevent correlations arise simply because of a finite detection threshold, created by the limited sensitivity of the measurement apparatus, or used to subtract background activity or noise from the activity signal. Data from crack-propagation experiments and numerical simulations of a nonequilibrium crack-line model demonstrate how thresholding leads to correlated bursts of activity by separating the avalanche events into subavalanches. The resulting temporal subavalanche correlations are well described by our general scaling description of thresholding-induced correlations in crackling noise."
in_NB  heavy_tails  point_processes  time_series 
december 2016 by cshalizi
AEAweb: JEP (30,1) p. 185 - Power Laws in Economics: An Introduction
"Many of the insights of economics seem to be qualitative, with many fewer reliable quantitative laws. However a series of power laws in economics do count as true and nontrivial quantitative laws—and they are not only established empirically, but also understood theoretically. I will start by providing several illustrations of empirical power laws having to do with patterns involving cities, firms, and the stock market. I summarize some of the theoretical explanations that have been proposed. I suggest that power laws help us explain many economic phenomena, including aggregate economic fluctuations. I hope to clarify why power laws are so special, and to demonstrate their utility. In conclusion, I list some power-law-related economic enigmas that demand further exploration."
to:NB  heavy_tails  economics  to_be_shot_after_a_fair_trial  gabaix.xaiver 
february 2016 by cshalizi
[1504.04580] Robust estimation of U-statistics
"An important part of the legacy of Evarist Gin\'e is his fundamental contributions to our understanding of U-statistics and U-processes. In this paper we discuss the estimation of the mean of multivariate functions in case of possibly heavy-tailed distributions. In such situations, reliable estimates of the mean cannot be obtained by usual U-statistics. We introduce a new estimator, based on the so-called median-of-means technique. We develop performance bounds for this new estimator that generalizes an estimate of Arcones and Gin\'e (1993), showing that the new estimator performs, under minimal moment conditions, as well as classical U-statistics for bounded random variables. We discuss an application of this estimator to clustering."
in_NB  heavy_tails  statistics  estimation  deviation_inequalities  re:smoothing_adjacency_matrices  u-statistics 
december 2015 by cshalizi
Anticipating Rare Events: Can Acts of Terror, Use of Weapons of Mass Destruction or Other High Profile Acts Be Anticipated? A Scientific Perspective on Problems, Pitfalls and Prospective Solutions
"This white paper covers topics related to the field of anticipating/forecasting specific categories of 'rare events' such as acts of terror, use of a weapon of mass destruction, or other high profile attacks. It is primarily meant for the operational community in DoD, DHS, and other USG agencies. […] The body of work before you should be viewed as the commencement of a journey with a somewhat murky destination-an exploration of terra incognita. Indeed the challenge addressed in this white paper, that of anticipating 'rare events' is daunting and represents a gathering threat to national security. The threat is supercharged by the increasing lateral connectedness of global societies enabled by the internet, cell phones and other technologies. This 'connected collective' as Carl Hunt has termed it, has allowed violent ideologies to metastasize globally often with no hierarchical, command-directed rules to govern their expansion. It is the emergent franchising of violence whose metaphorical 'genome' is exposed to constant co-evolutionary pressures and non-linearity that results in continuous adaptation and increasing resiliency making the task of effectively anticipating their courses of action all the more difficult. So what distinguishes a rare event in the context of national security? The easy response is to describe them as unlikely actions of high consequence and for which there is a sparse historical record from which to develop predictive patterns or indications."

--- I wonder how much of a period piece this now appears.
to:NB  heavy_tails  terrorism_fears  terrorism  prediction  to_be_shot_after_a_fair_trial 
october 2015 by cshalizi
[1507.03293] Tail Analysis without Tail Information : A Worst-case Perspective
"Tail modeling refers to the task of selecting the best probability distributions that describe the occurrences of extreme events. One common bottleneck in this task is that, due to their very nature, tail data are often very limited. The conventional approach uses parametric fitting, but the validity of the choice of a parametric model is usually hard to verify. This paper describes a reasonable alternative that does not require any parametric assumption. The proposed approach is based on a worst-case analysis under the geometric premise of tail convexity, a feature shared by all known parametric tail distributions. We demonstrate that the worst-case convex tail behavior is either extremely light-tailed or extremely heavy-tailed. We also construct low-dimensional nonlinear programs that can both distinguish between the two cases and find the worst-case tail. Numerical results show that the proposed approach gives a competitive performance versus using conventional parametric methods."
to:NB  statistics  heavy_tails 
august 2015 by cshalizi
[1503.05077] Tail index estimation, concentration and adaptivity
"This paper presents an adaptive version of the Hill estimator based on Lespki's model selection method. This simple data-driven index selection method is shown to satisfy an oracle inequality and is checked to achieve the lower bound recently derived by Carpentier and Kim. In order to establish the oracle inequality, we derive non-asymptotic variance bounds and concentration inequalities for Hill estimators. These concentration inequalities are derived from Talagrand's concentration inequality for smooth functions of independent exponentially distributed random variables combined with three tools of Extreme Value Theory: the quantile transform, Karamata's representation of slowly varying functions, and R\'enyi's characterisation of the order statistics of exponential samples. The performance of this computationally and conceptually simple method is illustrated using Monte-Carlo simulations."
to:NB  heavy_tails  statistics  to_read  concentration_of_measure 
may 2015 by cshalizi
[1505.01547] Understanding the Heavy Tailed Dynamics in Human Behavior
"The recent availability of electronic datasets containing large volumes of communication data has made it possible to study human behavior on a larger scale than ever before. From this, it has been discovered that across a diverse range of data sets, the inter-event times between consecutive communication events obey heavy tailed power law dynamics. Explaining this has proved controversial, and two distinct hypotheses have emerged. The first holds that these power laws are fundamental, and arise from the mechanisms such as priority queuing that humans use to schedule tasks. The second holds that they are a statistical artifact which only occur in aggregated data when features such as circadian rhythms and burstiness are ignored. We use a large social media data set to test these hypotheses, and find that although models that incorporate circadian rhythms and burstiness do explain part of the observed heavy tails, there is residual unexplained heavy tail behavior which suggests a more fundamental cause. Based on this, we develop a new quantitative model of human behavior which improves on existing approaches, and gives insight into the mechanisms underlying human interactions."
in_NB  to_read  heavy_tails  time_series  point_processes  statistics 
may 2015 by cshalizi
What Do Data on Millions of U.S. Workers Reveal about Life-Cycle Earnings Risk?
"We study the evolution of individual labor earnings over the life cycle, using a large panel data set of earnings histories drawn from U.S. administrative records. Using fully nonparametric methods, our analysis reaches two broad conclusions. First, earnings shocks display substantial deviations from lognormality—the standard assumption in the literature on incomplete markets. In particular, earnings shocks display strong negative skewness and extremely high kurtosis—as high as 30 compared with 3 for a Gaussian distribution. The high kurtosis implies that, in a given year, most individuals experience very small earnings shocks, and a small but non-negligible number experience very large shocks. Second, these statistical properties vary significantly both over the life cycle and with the earnings level of individuals. We also estimate impulse response functions of earnings shocks and find important asymmetries: Positive shocks to high-income individuals are quite transitory, whereas negative shocks are very persistent; the opposite is true for low-income individuals. Finally, we use these rich sets of moments to estimate econometric processes with increasing generality to capture these salient features of earnings dynamics."

--- Last tag conditional on what exactly is in the "data appendix" at https://fguvenendotcom.files.wordpress.com/2014/04/moments_for_publication.xls
to:NB  to_read  economics  inequality  heavy_tails  to_teach:undergrad-ADA  statistics  great_risk_shift 
february 2015 by cshalizi
[1405.0058] Underestimating extreme events in power-law behavior due to machine-dependent cutoffs
"Power-law distributions are typical macroscopic features occurring in almost all complex systems observable in nature. As a result, researchers in quantitative analyses must often generate random synthetic variates obeying power-law distributions. The task is usually performed through standard methods that map uniform random variates into the desired probability space. Whereas all these algorithms are theoretically solid, in this paper we show that they are subject to severe machine-dependent limitations. As a result, two dramatic consequences arise: (i) the sampling in the tail of the distribution is not random but deterministic; (ii) the moments of the sample distribution, which are theoretically expected to diverge as functions of the sample sizes, converge instead to finite values. We provide quantitative indications for the range of distribution parameters that can be safely handled by standard libraries used in computational analyses. Whereas our findings indicate possible reinterpretations of numerical results obtained through flawed sampling methodologies, they also pave the way for the search for a concrete solution to this central issue shared by all quantitative sciences dealing with complexity."
to:NB  to_read  heavy_tails  approximation  computational_statistics  have_skimmed 
january 2015 by cshalizi
[1410.3192] Learning without Concentration for General Loss Functions
"We study prediction and estimation problems using empirical risk minimization, relative to a general convex loss function. We obtain sharp error rates even when concentration is false or is very restricted, for example, in heavy-tailed scenarios. Our results show that the error rate depends on two parameters: one captures the intrinsic complexity of the class, and essentially leads to the error rate in a noise-free (or realizable) problem; the other measures interactions between class members the target and the loss, and is dominant when the problem is far from realizable. We also explain how one may deal with outliers by choosing the loss in a way that is calibrated to the intrinsic complexity of the class and to the noise-level of the problem (the latter is measured by the distance between the target and the class)."
to:NB  learning_theory  heavy_tails  statistics  to_read  re:your_favorite_dsge_sucks 
january 2015 by cshalizi
[1408.1554] A complete data frame work for fitting power law distributions
"Over the last few decades power law distributions have been suggested as forming generative mechanisms in a variety of disparate fields, such as, astrophysics, criminology and database curation. However, fitting these heavy tailed distributions requires care, especially since the power law behaviour may only be present in the distributional tail. Current state of the art methods for fitting these models rely on estimating the cut-off parameter xmin. This results in the majority of collected data being discarded. This paper provides an alternative, principled approached for fitting heavy tailed distributions. By directly modelling the deviation from the power law distribution, we can fit and compare a variety of competing models in a single unified framework."
to:NB  heavy_tails  statistics  estimation  to_read 
august 2014 by cshalizi

« earlier    

related tags

advice  approximation  asteroids  astronomy  bad_data_analysis  barabasi.albert-laszlo  beran.jan  bibliometry  blogged  books  books:noted  bootstrap  branching_processes  bubonic_plague  catoni  central_limit_theorem  cities  class_struggles_in_america  clauset.aaron  complexity  computational_statistics  computers  concentration_of_measure  conferences  confidence_sets  contagion  convergence_of_stochastic_processes  copulas  data_analysis  density_estimation  deviation_inequalities  diffusion_of_innovations  distributions  dynamical_systems  earthquakes  ecology  economics  epidemic_models  epidemiology  epidemiology_of_ideas  ergodic_theory  estimation  extreme_values  filetype:pdf  foraging  funny:because_its_true  funny:geeky  funny:malicious  funny:pointed  gabaix.xaiver  gaussians  generic_chaining  geology  gigs  gives_physicists_a_bad_name  goerg.georg_m.  goodness-of-fit  graphical_models  great_risk_shift  have_read  have_skimmed  hypothesis_testing  in_nb  inequality  influence  kernel_estimators  kith_and_kin  krugman.paul  labor  large_deviations  learning_theory  levy_processes  libraries  liggett.thomas_m.  linear_regression  linguistics  lives_of_the_scholars  long-range_dependence  machine_learning  market_failures_in_everything  markov_models  media:document  medieval_eurasian_history  meteorology  minimax  mitchell.melanie  mongol_empire  natural_language_processing  nemenman.ilya  network_data_analysis  network_growth  networked_life  networks  neuroscience  newman.mark  nonparametrics  olivier  optimization  pac-bayes  papers  percolation  phase_transitions  physics  plagues_and_peoples  point_processes  political_networks  political_science  porter.mason_a.  prediction  preferential_attachment  price.derek_de_solla  probability  random_fields  random_structures  random_walks  re:aggregating_random_graphs  re:almost_none  re:homophily_and_confounding  re:smoothing_adjacency_matrices  re:stacs  re:your_favorite_dsge_sucks  regression  resnick.sidney  scaling  self-centered  self-organized_criticality  self-promotion  self-referential  shot_after_a_fair_trial  social_influence  social_media  social_networks  sociology  sociology_of_science  spatial_statistics  spatio-temporal_statistics  stanley.h._eugene  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  stochastic_processes  su.shi  terrorism  terrorism_fears  time_series  to:blog  to:nb  to_be_shot_after_a_fair_trial  to_read  to_teach:baby-nets  to_teach:complexity-and-inference  to_teach:data_over_space_and_time  to_teach:undergrad-ada  twitter  u-statistics  unions  violence  war  world_history  wu.wei_biao 

Copy this bookmark: