Swedberg, R.: The Art of Social Theory (eBook and Hardcover).
"In the social sciences today, students are taught theory by reading and analyzing the works of Karl Marx, Max Weber, and other foundational figures of the discipline. What they rarely learn, however, is how to actually theorize. The Art of Social Theory is a practical guide to doing just that.
"In this one-of-a-kind user’s manual for social theorists, Richard Swedberg explains how theorizing occurs in what he calls the context of discovery, a process in which the researcher gathers preliminary data and thinks creatively about it using tools such as metaphor, analogy, and typology. He guides readers through each step of the theorist’s art, from observation and naming to concept formation and explanation. To theorize well, you also need a sound knowledge of existing social theory. Swedberg introduces readers to the most important theories and concepts, and discusses how to go about mastering them. If you can think, you can also learn to theorize. This book shows you how."

- The endorsement from Schelling is interesting.

- ETA: see also https://pinboard.in/u:cshalizi/b:9a3e4323dd9c
to:NB  books:noted  social_science_methodology  theorizing  art_of_conjecture  heuristics  analogy  philosophy_of_science  context_of_discovery_vs_context_of_justification  methodological_advice 
15 days ago
Community and Social Change in America
"Did urbanization kill 'community' in the nineteenth century, or even earlier? In this highly regarded volume Bender argues not only that community survivedthe trials of industrialization and urbanization but that it remains a fundamental element of American society today."
in_NB  books:noted  american_history  sociology  cities 
16 days ago
Academic urban legends
"Many of the messages presented in respectable scientific publications are, in fact, based on various forms of rumors. Some of these rumors appear so frequently, and in such complex, colorful, and entertaining ways that we can think of them as academic urban legends. The explanation for this phenomenon is usually that authors have lazily, sloppily, or fraudulently employed sources, and peer reviewers and editors have not discovered these weaknesses in the manuscripts during evaluation. To illustrate this phenomenon, I draw upon a remarkable case in which a decimal point error appears to have misled millions into believing that spinach is a good nutritional source of iron. Through this example, I demonstrate how an academic urban legend can be conceived and born, and can continue to grow and reproduce within academia and beyond."
to:NB  social_life_of_the_mind  epidemiology_of_representations  epidemiology_of_ideas  science_as_a_social_process  academia  sociology_of_science  sociology  natural_history_of_truthiness  have_read  via:? 
19 days ago
Statistical text analysis for social science
"What can text corpora tell us about society? How can automatic text analysis algorithms efficiently and reliably analyze the social processes revealed in language production?
"This work develops statistical text analyses of dynamic social and news media datasets to extract indicators of underlying social phenomena, and to reveal how social factors guide linguistic production. This is illustrated through three case studies: first, examining whether sentiment expressed in social media can track opinion polls on economic and political topics; second, analyzing how novel online slang terms can be very specific to geographic and demo- graphic communities, and how these social factors affect their transmission over time; and third, automatically extracting political events from news articles, to assist analyses of the interactions of international actors over time.
"We summarize a variety of computational, linguistic, and statistical tools that are employed for these analyses; we also contribute MiTextExplorer, an interactive system for exploratory analysis of text data against document covariates, whose design was informed by the experience of researching these and other similar works. These case studies illustrate recurring themes toward developing text analysis as a social science methodology: computational and statistical complexity, and domain knowledge and linguistic assumptions."
to:NB  have_read  text_mining  data_mining  network_data_analysis  social_media  linguistics  social_science_methodology  kith_and_kin  oconnor.brendan  computational_statistics  statistics  sociology  sociolinguistics 
19 days ago
Experimental evidence for the influence of group size on cultural complexity : Nature : Nature Publishing Group
"The remarkable ecological and demographic success of humanity is largely attributed to our capacity for cumulative culture1, 2, 3. The accumulation of beneficial cultural innovations across generations is puzzling because transmission events are generally imperfect, although there is large variance in fidelity. Events of perfect cultural transmission and innovations should be more frequent in a large population4. As a consequence, a large population size may be a prerequisite for the evolution of cultural complexity4, 5, although anthropological studies have produced mixed results6, 7, 8, 9 and empirical evidence is lacking10. Here we use a dual-task computer game to show that cultural evolution strongly depends on population size, as players in larger groups maintained higher cultural complexity. We found that when group size increases, cultural knowledge is less deteriorated, improvements to existing cultural traits are more frequent, and cultural trait diversity is maintained more often. Our results demonstrate how changes in group size can generate both adaptive cultural evolution and maladaptive losses of culturally acquired skills. As humans live in habitats for which they are ill-suited without specific cultural adaptations11, 12, it suggests that, in our evolutionary past, group-size reduction may have exposed human societies to significant risks, including societal collapse13."

--- See also "communication arising", doi:10.1038/nature13411, and reply
to:NB  to_read  experimental_psychology  experimental_sociology  cultural_evolution  social_life_of_the_mind  collective_cognition  re:democratic_cognition  human_evolution 
4 weeks ago
Truth approximation, belief merging, and peer disagreement - Springer
"In this paper, we investigate the problem of truth approximation via belief merging, i.e., we ask whether, and under what conditions, a group of inquirers merging together their beliefs makes progress toward the truth about the underlying domain. We answer this question by proving some formal results on how belief merging operators perform with respect to the task of truth approximation, construed as increasing verisimilitude or truthlikeness. Our results shed new light on the issue of how rational (dis)agreement affects the inquirers’ quest for truth. In particular, they vindicate the intuition that scientific inquiry, and rational discussion in general, benefits from some heterogeneity in opinion and interaction among different viewpoints. The links between our approach and related analyses of truth tracking, judgment aggregation, and opinion dynamics, are also highlighted."
to:NB  epistemology  social_life_of_the_mind  collective_cognition  re:democratic_cognition  science_as_a_social_process  to_read 
4 weeks ago
Culture-dependent strategies in coordination games
"We examine different populations’ play in coordination games in online experiments with over 1,000 study participants. Study participants played a two-player coordination game that had multiple equilibria: two equilibria with highly asymmetric payoffs and another equilibrium with symmetric payoffs but a slightly lower total payoff. Study participants were predominantly from India and the United States. Study participants residing in India played the strategies leading to asymmetric payoffs significantly more frequently than study participants residing in the United States who showed a greater play of the strategy leading to the symmetric payoffs. In addition, when prompted to play asymmetrically, the population from India responded even more significantly than those from the United States. Overall, study participants’ predictions of how others would play were more accurate when the other player was from their own populations, and they coordinated significantly more frequently and earned significantly higher payoffs when matched with other study participants from their own population than when matched across populations."
to:NB  experimental_economics  institutions  game_theory  evolution_of_cooperation  cultural_differences  re:do-institutions-evolve  jackson.matthew_o.  to_read  homophily 
4 weeks ago
Multivariate statistics and the enactment of metabolic complexity
"This ethnographic study, based on fieldwork at the Computational and Systems Medicine laboratory at Imperial College London, shows how researchers in the field of metabolomics – the post-genomic study of the molecules and processes that make up metabolism – enact and coproduce complex views of biology with multivariate statistics. From this data-driven science, metabolism emerges as a multiple, informational and statistical object, which is both produced by and also necessitates particular forms of data production and analysis. Multivariate statistics emerge as ‘natural’ and ‘correct’ ways of engaging with a metabolism that is made up of many variables. In this sense, multivariate statistics allow researchers to engage with and conceptualize metabolism, and also disease and processes of life, as complex entities. Consequently, this article builds on studies of scientific practice and visualization to examine data as material objects rather than black-boxed representations. Data practices are not merely the technological components of experimentation, but are simultaneously technologies and methods and are intertwined with ways of seeing and enacting the biological world. Ultimately, this article questions the increasing invocation and role of complexity within biology, suggesting that discourses of complexity are often imbued with reductionist and determinist ways of thinking about biology, as scientists engage with complexity in calculated and controlled, but also limited, ways."
to:NB  to_read  ethnography  science_as_a_social_process  biochemical_networks  biology  statistics  complexity  data_analysis 
4 weeks ago
A Traditional City Primer
Look, the eye-candy is great, but let's get real. There were very powerful drives towards very large cities, in the form of economies of scale in production and infrastructure, and economies of agglomeration. At the same time, living in a city of great size (say > 500k, a huge metropolis by pre-modern standards) without transit sucks. Renaissance Florence --- a "traditional city" --- was counted very large at ~100k, or even ~70k, during its peak of prominence. This is smaller than a modern college town like Ann Arbor or Madison, and comparable to a resort town like Santa Fe. Show me this scaling up to even half a million and we'll talk.
cities  urbanism  design  architecture  via:arsyed  have_read  nostalgia  via:arthegall 
4 weeks ago
Can Achievement Peer Effect Estimates Inform Policy? A View from Inside the Black Box
"Empirical studies of peer effects rely on the assumption that peer spillovers can be measured through observables. However, in the education context, many theories of peer spillovers center around unobservables, such as ability, effort, or motivation. I show that when peer effects arise from unobservables, the typical empirical specifications will not measure these effects accurately, which may help explain differences in the magnitude and even sign of peer effect estimates across studies. I also show that under reasonable assumptions, these estimates cannot be applied to determine the effects of regrouping students, a central motivation of the literature."
to:NB  social_influence  economics  causal_inference  re:homophily_and_confounding  statistics 
5 weeks ago
[1404.7530] Design and analysis of experiments in networks: Reducing bias from interference
"Estimating the effects of interventions in networks is complicated when the units are interacting, such that the outcomes for one unit may depend on the treatment assignment and behavior of many or all other units (i.e., there is interference). When most or all units are in a single connected component, it is impossible to directly experimentally compare outcomes under two or more global treatment assignments since the network can only be observed under a single assignment. Familiar formalism, experimental designs, and analysis methods assume the absence of these interactions, and result in biased estimators of causal effects of interest. While some assumptions can lead to unbiased estimators, these assumptions are generally unrealistic, and we focus this work on realistic assumptions. Thus, in this work, we evaluate methods for designing and analyzing randomized experiments that aim to reduce this bias and thereby reduce overall error. In design, we consider the ability to perform random assignment to treatments that is correlated in the network, such as through graph cluster randomization. In analysis, we consider incorporating information about the treatment assignment of network neighbors. We prove sufficient conditions for bias reduction through both design and analysis in the presence of potentially global interference. Through simulations of the entire process of experimentation in networks, we measure the performance of these methods under varied network structure and varied social behaviors, finding substantial bias and error reductions. These improvements are largest for networks with more clustering and data generating processes with both stronger direct effects of the treatment and stronger interactions between units."
in_NB  experimental_design  causal_inference  statistics  network_data_analysis  network_experiments  re:do_not_adjust_your_receiver  eckles.dean  karrer.brian  ugander.johan  to_read 
6 weeks ago
Social selection and peer influence in an online social network
"Disentangling the effects of selection and influence is one of social science's greatest unsolved puzzles: Do people befriend others who are similar to them, or do they become more similar to their friends over time? Recent advances in stochastic actor-based modeling, combined with self-reported data on a popular online social network site, allow us to address this question with a greater degree of precision than has heretofore been possible. Using data on the Facebook activity of a cohort of college students over 4 years, we find that students who share certain tastes in music and in movies, but not in books, are significantly likely to befriend one another. Meanwhile, we find little evidence for the diffusion of tastes among Facebook friends—except for tastes in classical/jazz music. These findings shed light on the mechanisms responsible for observed network homogeneity; provide a statistically rigorous assessment of the coevolution of cultural tastes and social relationships; and suggest important qualifications to our understanding of both homophily and contagion as generic social processes."
to:NB  social_networks  social_influence  homophily  re:homophily_and_confounding  sociology  causal_inference  to_read  to_be_shot_after_a_fair_trial 
6 weeks ago
[1407.0224] Concentric Symmetry
"The quantification of symmetries in complex networks is typically done globally in terms of automorphisms. In this work we focus on local symmetries around nodes, which we call connectivity patterns. We develop two topological transformations that allow a concise characterization of the different types of symmetry appearing on networks and apply these concepts to six network models, namely the Erd\H{o}s-R\'enyi, Barab\'asi-Albert, random geometric graph, Waxman, Voronoi and rewired Voronoi models. Real-world networks, namely the scientific areas of Wikipedia, the world-wide airport network and the street networks of Oldenburg and San Joaquin, are also analyzed in terms of the proposed symmetry measurements. Several interesting results, including the high symmetry exhibited by the Erd\H{o}s-R\'enyi model, are observed and discussed."
to:NB  graph_theory  network_data_analysis 
7 weeks ago
[1407.0323] Laboratories of Oligarchy? How the Iron Law Extends to Peer Production
"Peer production projects like Wikipedia have inspired voluntary associations, collectives, social movements, and scholars to embrace open online collaboration as a model of democratic organization. However, many peer production projects exhibit entrenched leadership and deep inequalities, suggesting that they may not fulfill democratic ideals. Instead, peer production projects may conform to Robert Michels' "iron law of oligarchy," which proposes that democratic membership organizations become increasingly oligarchic as they grow. Using exhaustive data of internal processes from a sample of 683 wikis, we construct empirical measures of participation and test for increases in oligarchy associated with growth. In contrast to previous studies, we find support for Michels' iron law and conclude that peer production entails oligarchic organizational forms."
to:NB  to_read  social_life_of_the_mind  networked_life  institutions  inequality  peer_production  social_media  re:democratic_cognition 
7 weeks ago
[1407.0440] Measuring Team Creativity Through Longitudinal Social Signals
"Research into human dynamical systems has long sought to identify robust signals for human behavior. We have discovered a series of social network-based indicators that are reliable predictors of team creativity and collaborative innovation. We extract these signals from electronic records of interpersonal interactions, including e-mail, and face-to-face interaction measured via sociometric badges. The first of these signals is Rotating Leadership, measuring the degree to which, over time, actors in a team vary in how central they are to team's communication network's structure. The second is Rotating Contribution, which measures the degree to which, over time, actors in a team vary in the ratio of communications they distribute versus receive. The third is Prompt Response Time, which measures, over time, the responsiveness of actors to one another's communications. Finally, we demonstrate the predictive utility of these signals in a variety of contexts, showing them to be robust to various methods of evaluating innovation."

--- But how are they measuring "team creativity and collaborative innovation"? That seems key...
to:NB  to_read  innovation  social_life_of_the_mind  social_networks  re:democratic_cognition  to_be_shot_after_a_fair_trial 
7 weeks ago
[1406.2293] Gossip: Identifying Central Individuals in a Social Network
"We examine individuals' abilities to identify the highly central people in their social networks, where centrality is defined by diffusion centrality (Banerjee et al., 2013), which characterizes a node's influence in spreading information. We first show that diffusion centrality nests standard centrality measures -- degree, eigenvector and Katz-Bonacich centrality -- as extreme special cases. Next, we show that boundedly rational individuals can, simply by tracking sources of gossip, identify who is central in their social network in the specific sense of having high diffusion centrality. Finally, we examine whether the model's predictions are consistent with data in which we ask people in each of 35 villages whom would be the most effective point from which to initiate a diffusion process. We find that individuals accurately nominate central individuals in the diffusion centrality sense. Additionally, the nominated individuals are more central in the network than "village leaders" as well as those who are most central in a GPS sense. This suggests that individuals can rank others according to their centrality in the networks even without knowing the network, and that eliciting network centrality of others simply by asking individuals may be an inexpensive research and policy tool."
to:NB  network_data_analysis  social_networks  chandrashekar.arun  jackson.matthew_o. 
7 weeks ago
[1406.7542] Crowdsourcing for Participatory Democracies: Efficient Elicitation of Social Choice Functions
"We present theoretical and empirical results demonstrating the usefulness of voting rules for participatory democracies. We first give algorithms which efficiently elicit \epsilon-approximations to two prominent voting rules: the Borda rule and the Condorcet winner. This result circumvents previous prohibitive lower bounds and is surprisingly strong: even if the number of ideas is as large as the number of participants, each participant will only have to make a logarithmic number of comparisons, an exponential improvement over the linear number of comparisons previously needed. We demonstrate the approach in an experiment in Finland's recent off-road traffic law reform, observing that the total number of comparisons needed to achieve a fixed \epsilon approximation is linear in the number of ideas and that the constant is not large.
"Finally, we note a few other experimental observations which support the use of voting rules for aggregation. First, we observe that rating, one of the common alternatives to ranking, manifested effects of bias in our data. Second, we show that very few of the topics lacked a Condorcet winner, one of the prominent negative results in voting. Finally, we show data hinting at a potential future direction: the use of partial rankings as opposed to pairwise comparisons to further decrease the elicitation time."
to:NB  mechanism_design  social_choice  social_life_of_the_mind  collective_cognition  networked_life  re:democratic_cognition 
7 weeks ago
[1406.7729] Popularity and Performance: A Large-Scale Study
"Social scientists have long sought to understand why certain people, items, or options become more popular than others. One seemingly intuitive theory is that inherent value drives popularity. An alternative theory claims that popularity is driven by the rich-get-richer effect of cumulative advantage---certain options become more popular, not because they are higher quality, but because they are already relatively popular. Realistically, it seems likely that popularity is driven by neither one of these forces alone but rather both together. Recently, researchers have begun using large-scale online experiments to study the effect of cumulative advantage in realistic scenarios, but there have been no large-scale studies of the combination of these two effects. We are interested in studying a case where decision-makers observe explicit signals of both the popularity and the quality of various options. We derive a model for change in popularity as a function of past popularity and past perceived quality. Our model implies that we should expect an interaction between these two forces---popularity should amplify the effect of quality, so that the more popular an option is, the faster we expect it to increase in popularity with better perceived quality. We use a data set from eToro.com, an online social investment platform, to support this hypothesis."
to:NB  social_influence  collective_cognition  della_penna.nicholas  social_life_of_the_mind  path_dependence  experimental_sociology  re:democratic_cognition  to_read 
7 weeks ago
[1406.7588] Lessons Learned from an Experiment in Crowdsourcing Complex Citizen Engineering Tasks with Amazon Mechanical Turk
"We investigate the feasibility of obtaining highly trustworthy results using crowdsourcing on complex engineering tasks. Crowdsourcing is increasingly seen as a potentially powerful way of increasing the supply of labor for solving society's problems. While applications in domains such as citizen-science, citizen-journalism or knowledge organization (e.g., Wikipedia) have seen many successful applications, there have been fewer applications focused on solving engineering problems, especially those involving complex tasks. This may be in part because of concerns that low quality input into engineering analysis and design could result in failed structures leading to loss of life. We compared the quality of work of the anonymous workers of Amazon Mechanical Turk (AMT), an online crowdsourcing service, with the quality of work of expert engineers in solving the complex engineering task of evaluating virtual wind tunnel data graphs. On this representative complex engineering task, our results showed that there was little difference between expert engineers and crowdworkers in the quality of their work and explained reasons for these results. Along with showing that crowdworkers are effective at completing new complex tasks our paper supplies a number of important lessons that were learned in the process of collecting this data from AMT, which may be of value to other researchers."
to:NB  to_read  collective_cognition  networked_life  re:democratic_cognition 
7 weeks ago
[1406.7586] Facts and Figuring: An Experimental Investigation of Network Structure and Performance in Information and Solution Spaces
"Using data from a large laboratory experiment on problem solving in which we varied the structure of 16-person networks we investigate how an organization's network structure may be constructed to optimize performance in complex problem-solving tasks. Problem solving involves both search for information and search for theories to make sense of that information. We show that the effect of network structure is opposite for these two equally important forms of search. Dense clustering encourages members of a network to generate more diverse information, but it also has the power to discourage the generation of diverse theories: clustering promotes exploration in information space, but decreases exploration in solution space. Previous research, tending to focus on only one of those two spaces, had produced inconsistent conclusions about the value of network clustering. By adopting an experimental platform on which information was measured separately from solutions, we were able to reconcile past contradictions and clarify the effects of network clustering on performance. The finding both provides a sharper tool for structuring organizations for knowledge work and reveals the challenges inherent in manipulating network structure to enhance performance, as the communication structure that helps one aspect of problem solving may harm the other."
in_NB  to_read  experimental_sociology  experimental_psychology  collective_cognition  social_life_of_the_mind  lazer.david  re:democratic_cognition 
7 weeks ago
[1406.7564] Analytical reasoning task reveals limits of social learning in networks
"Social learning -by observing and copying others- is a highly successful cultural mechanism for adaptation, outperforming individual information acquisition and experience. Here, we investigate social learning in the context of the uniquely human capacity for reflective, analytical reasoning. A hallmark of the human mind is our ability to engage analytical reasoning, and suppress false associative intuitions. Through a set of lab-based network experiments, we find that social learning fails to propagate this cognitive strategy. When people make false intuitive conclusions, and are exposed to the analytic output of their peers, they recognize and adopt this correct output. But they fail to engage analytical reasoning in similar subsequent tasks. Thus, humans exhibit an 'unreflective copying bias,' which limits their social learning to the output, rather than the process, of their peers' reasoning -even when doing so requires minimal effort and no technical skill. In contrast to much recent work on observation-based social learning, which emphasizes the propagation of successful behavior through copying, our findings identify a limit on the power of social networks in situations that require analytical reasoning."

--- OK, before reading beyond the abstract, I have a problem. To learn from someone else here by just seeing their conclusion, you'd have to solve a potentially very complicated inverse problem, of reconstructing their train of reasoning. In a phrase: social learning of reasoning only works if you share the reasons. (Thus Mercier and Sperber, or for that matter Socrates.)
to:NB  to_read  experimental_psychology  social_life_of_the_mind  re:democratic_cognition  to_be_shot_after_a_fair_trial 
7 weeks ago
[1406.7563] When is a crowd wise?
"Numerous studies and anecdotes demonstrate the "wisdom of the crowd," the surprising accuracy of a group's aggregated judgments. Less is known, however, about the generality of crowd wisdom. For example, are crowds wise even if their members have systematic judgmental biases, or can influence each other before members render their judgments? If so, are there situations in which we can expect a crowd to be less accurate than skilled individuals? We provide a precise but general definition of crowd wisdom: A crowd is wise if a linear aggregate, for example a mean, of its members' judgments is closer to the target value than a randomly, but not necessarily uniformly, sampled member of the crowd. Building on this definition, we develop a theoretical framework for examining, a priori, when and to what degree a crowd will be wise. We systematically investigate the boundary conditions for crowd wisdom within this framework and determine conditions under which the accuracy advantage for crowds is maximized. Our results demonstrate that crowd wisdom is highly robust: Even if judgments are biased and correlated, one would need to nearly deterministically select only a highly skilled judge before an individual's judgment could be expected to be more accurate than a simple averaging of the crowd. Our results also provide an accuracy rationale behind the need for diversity of judgments among group members. Contrary to folk explanations of crowd wisdom which hold that judgments should ideally be independent so that errors cancel out, we find that crowd wisdom is maximized when judgments systematically differ as much as possible. We re-analyze data from two published studies that confirm our theoretical results."
to:NB  to_read  collective_cognition  re:democratic_cognition 
7 weeks ago
[1406.7551] Collective Intelligence in Citizen Science -- A Study of Performers and Talkers
"The recent emergence of online citizen science is illustrative of an efficient and effective means to harness the crowd in order to achieve a range of scientific discoveries. Fundamentally, citizen science projects draw upon crowds of non-expert volunteers to complete short Tasks, which can vary in domain and complexity. However, unlike most human-computational systems, participants in these systems, the `citizen scientists' are volunteers, whereby no incentives, financial or otherwise, are offered. Furthermore, encouraged by citizen science platforms such as Zooniverse, online communities have emerged, providing them with an environment to discuss, share ideas, and solve problems. In fact, it is the result of these forums that has enabled a number of scientific discoveries to be made. In this paper we explore the phenomenon of collective intelligence via the relationship between the activities of online citizen science communities and the discovery of scientific knowledge. We perform a cross-project analysis of ten Zooniverse citizen science projects and analyse the behaviour of users with regards to their Task completion activity and participation in discussion and discover collective behaviour amongst highly active users. Whilst our findings have implications for future citizen science design, we also consider the wider implications for understanding collective intelligence research in general."
to:NB  networked_life  sociology  sociology_of_science  re:democratic_cognition 
7 weeks ago
[1406.5481] Modeling and Measuring Graph Similarity: The Case for Centrality Distance
"The study of the topological structure of complex networks has fascinated researchers for several decades, and today we have a fairly good understanding of the types and reoccurring characteristics of many different complex networks. However, surprisingly little is known today about models to compare complex graphs, and quantitatively measure their similarity. This paper proposes a natural similarity measure for complex networks: centrality distance, the difference between two graphs with respect to a given node centrality. Centrality distances allow to take into account the specific roles of the different nodes in the network, and have many interesting applications. As a case study, we consider the closeness centrality in more detail, and show that closeness centrality distance can be used to effectively distinguish between randomly generated and actual evolutionary paths of two dynamic social networks."
to:NB  network_data_analysis  re:network_differences 
7 weeks ago
[1406.3411] VoG: Summarizing and Understanding Large Graphs
"How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph.
"Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph."
to:NB  network_data_analysis  mdl  statistics 
7 weeks ago
[1407.2256] Inferring latent structures via information inequalities
"One of the goals of probabilistic inference is to decide whether an empirically observed distribution is compatible with a candidate Bayesian network. However, Bayesian networks with hidden variables give rise to highly non-trivial constraints on the observed distribution. Here, we propose an information-theoretic approach, based on the insight that conditions on entropies of Bayesian networks take the form of simple linear inequalities. We describe an algorithm for deriving entropic tests for latent structures. The well-known conditional independence tests appear as a special case. While the approach applies for generic Bayesian networks, we presently adopt the causal view, and show the versatility of the framework by treating several relevant problems from that domain: detecting common ancestors, quantifying the strength of causal influence, and inferring the direction of causation from two-variable marginals."
to:NB  causal_inference  statistics  graphical_models  information_theory  to_read  janzing.dominik  causal_discovery 
7 weeks ago
[1406.1222] Discovering Structure in High-Dimensional Data Through Correlation Explanation
"We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for the simplest set of factors that can explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language."
to:NB  statistics  inference_to_latent_objects  hierarchical_structure  kith_and_kin  galstyan.aram  ver_steeg.greg  to_read 
7 weeks ago
[1406.6670] Learning the ergodic decomposition
"A Bayesian agent learns about the structure of a stationary process from ob- serving past outcomes. We prove that his predictions about the near future become ap- proximately those he would have made if he knew the long run empirical frequencies of the process."
to:NB  ergodic_theory  statistics  time_series  statistical_inference_for_stochastic_processes  bayesian_consistency 
7 weeks ago
[1406.6592] A Kriging procedure for processes indexed by graphs
"We provide a new kriging procedure of processes on graphs. Based on the construction of Gaussian random processes indexed by graphs, we extend to this framework the usual linear prediction method for spatial random fields, known as kriging. We provide the expression of the estimator of such a random field at unobserved locations as well as a control for the prediction error."
to:NB  statistics  spatial_statistics  network_data_analysis 
7 weeks ago
[1406.6018] A brief history of long memory
"Long memory plays an important role, determining the behaviour and predictibility of systems, in many fields; for instance, climate, hydrology, finance, networks and DNA sequencing. In particular, it is important to test if a process is exhibiting long memory since that impacts the confidence with which one may predict future events on the basis of a small amount of historical data. A major force in the development and study of long memory was the late Benoit B. Mandelbrot. Here we discuss the original motivation of the development of long memory and Mandelbrot's influence on this fascinating field. We will also elucidate the contrasting approaches to long memory in the physics and statistics communities with an eye towards their influence on modern practice in these fields."
in_NB  have_read  long-range_dependence  time_series  statistics  history_of_science  watkins.nicholas  kith_and_kin 
7 weeks ago
[1406.4592] Non-subjective power analysis to detect G*E interactions in Genome-Wide Association Studies in presence of confounding factor
"It is generally acknowledged that most complex diseases are affected in part by interactions between genes and genes and/or between genes and environmental factors. Taking into account environmental exposures and their interactions with genetic factors in genome-wide association studies (GWAS) can help to identify high-risk subgroups in the population and provide a better understanding of the disease. For this reason, many methods have been developed to detect gene-environment (G*E) interactions. Despite this, few loci that interact with environmental exposures have been identified so far. Indeed, the modest effect of G*E interactions as well as confounding factors entail low statistical power to detect such interactions. In this work, we provide a simulated dataset in order to study methods for detecting G*E interactions in GWAS in presence of confounding factor and population structure. Our work applies a recently introduced non-subjective method for H1 simulations called waffect and exploits the publicly available HapMap project to build a datasets with real genotypes and population structures. We use this dataset to study the impact of confounding factors and compare the relative performance of popular methods such as PLINK, random forests and linear mixed models to detect G*E interactions. Presence of confounding factor is an obstacle to detect G*E interactions in GWAS and the approaches considered in our power study all have insufficient power to detect the strong simulated interaction. Our simulated dataset could help to develop new methods which account for confounding factors through latent exposures in order to improve power."
to:NB  genetics  statistics  re:g_paper 
7 weeks ago
[1406.2462] Empirical risk minimization for heavy-tailed losses
"The purpose of this paper is to discuss empirical risk minimization when the losses are not necessarily bounded and may have a distribution with heavy tails. In such situations usual empirical averages may fail to provide reliable estimates and empirical risk minimization may provide large excess risk. However, some robust mean estimators proposed in the literature may be used to replace empirical means. In this paper we investigate empirical risk minimization based on a robust estimate proposed by Catoni. We develop performance bounds based on chaining arguments tailored to Catoni's mean estimator."
to:NB  learning_theory  heavy_tails  statistics  re:your_favorite_dsge_sucks 
7 weeks ago
[1406.2098] Learning directed acyclic graphs via bootstrap aggregating
"Probabilistic graphical models are graphical representations of probability distributions. Graphical models have applications in many fields including biology, social sciences, linguistic, neuroscience. In this paper, we propose directed acyclic graphs (DAGs) learning via bootstrap aggregating. The proposed procedure is named as DAGBag. Specifically, an ensemble of DAGs is first learned based on bootstrap resamples of the data and then an aggregated DAG is derived by minimizing the overall distance to the entire ensemble. A family of metrics based on the structural hamming distance is defined for the space of DAGs (of a given node set) and is used for aggregation. Under the high-dimensional-low-sample size setting, the graph learned on one data set often has excessive number of false positive edges due to over-fitting of the noise. Aggregation overcomes over-fitting through variance reduction and thus greatly reduces false positives. We also develop an efficient implementation of the hill climbing search algorithm of DAG learning which makes the proposed method computationally competitive for the high-dimensional regime. The DAGBag procedure is implemented in the R package dagbag."

--- Could this be adapted to the more-causally-sensible distance measure of [memory lapse] and Buhlmann?
to:NB  graphical_models  ensemble_methods  statistics  causal_inference  causal_discovery 
7 weeks ago
[1406.2083] Kernel MMD, the Median Heuristic and Distance Correlation in High Dimensions
"This paper is about two related methods for two sample testing and independence testing which have emerged over the last decade: Maximum Mean Discrepancy (MMD) for the former problem and Distance Correlation (dCor) for the latter. Both these methods have been suggested for high-dimensional problems, and sometimes claimed to be unaffected by increasing dimensionality of the samples. We will show theoretically and practically that the power of both methods (for different reasons) does actually decrease polynomially with dimension. We also analyze the median heuristic, which is a method for choosing tuning parameters of translation invariant kernels. We show that different bandwidth choices could result in the MMD decaying polynomially or even exponentially in dimension."
to:NB  hypothesis_testing  two-sample_tests  kernel_estimators  dependence_measures  kith_and_kin  wasserman.larry  singh.aarti  ramdas.aaditya  high-dimensional_statistics 
7 weeks ago
[1406.1845] Detecting Feature Interactions in Bagged Trees and Random Forests
"Additive models remain popular statistical tools due to their ease of interpretation and as a result, hypothesis tests for additivity have been developed to asses the appropriateness of these models. However, as data continues to grow in size and complexity, practicioners are relying more heavily on learning algorithms because of their predictive superiority. Due to the black-box nature of these learning methods, the increase in predictive power is assumed to come at the cost of interpretability and understanding. However, recent work suggests that many popular learning algorithms, such as bagged trees and random forests, have desireable asymptotic properties which allow for formal statistical inference when base learners are built with subsamples. This work extends the hypothesis tests previously developed and demonstrates that by constructing an appropriate test set, we may perform formal hypothesis tests for additivity amongst features. We develop notions of total and partial additivity and demonstrate that both tests can be carried out at no additional computational cost to the original ensemble. Simulations and demonstrations on real data are also provided."
to:NB  additive_models  ensemble_methods  statistics  hypothesis_testing  decision_trees  hooker.giles 
7 weeks ago
[1406.1037] Bootstrapping High Dimensional Time Series
"We focus on the problem of conducting inference for high dimensional weakly dependent time series. Our results are motivated by the applications in modern high dimensional inference including (1) constructing uniform confidence band for high dimensional mean vector and (2) specification testing on the second order property of high dimensional time series such as white noise testing and testing for bandedness of covariance matrix. In theory, we derive a Gaussian approximation result for the maximum of a sum of weakly dependent vectors by adapting Stein's method, where the dimension of the vectors is allowed to be exponentially larger than the sample size. Our result reveals an interesting phenomenon arising from the interplay between the dependence and dimensionality: the more dependent of the data vectors, the slower diverging rate of the dimension is allowed for obtaining valid statistical inference. Building on the Gaussian approximation result, we propose a blockwise multiplier (wild) bootstrap that is able to capture the dependence amongst and within the data vectors and thus provides high-quality distributional approximation to the distribution of the maximum of vector sum in the high dimensional context."
in_NB  to_read  bootstrap  time_series  high-dimensional_statistics  statistics  re:your_favorite_dsge_sucks 
7 weeks ago
[1406.0873] Unifying linear dimensionality reduction
"Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the deeper connections between all these methods have not been understood. Here we unify methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, undercomplete independent component analysis, linear regression, and more. This optimization framework helps elucidate some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This optimization framework further allows rapid development of novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, we suggest that our generic linear dimensionality reduction solver can move linear dimensionality reduction toward becoming a blackbox, objective-agnostic numerical technology."
to:NB  data_analysis  principal_components  factor_analysis  optimization  statistics  dimension_reduction  ghahramani.zoubin 
7 weeks ago
[1406.0118] Improved graph Laplacian via geometric self-consistency
"We address the problem of setting the kernel bandwidth used by Manifold Learning algorithms to construct the graph Laplacian. Exploiting the connection between manifold geometry, represented by the Riemannian metric, and the Laplace-Beltrami operator, we set the bandwidth by optimizing the Laplacian's ability to preserve the geometry of the data. Experiments show that this principled approach is effective and robust."
to:NB  manifold_learning  kernel_estimators  statistics 
7 weeks ago
[1406.0063] Causal network inference using biochemical kinetics
"Network models are widely used as structural summaries of biochemical systems. Statistical estimation of networks is usually based on linear or discrete models. However, the dynamics of these systems are generally nonlinear, suggesting that suitable nonlinear formulations may offer gains with respect to network inference and associated prediction problems. We present a general framework for both network inference and dynamical prediction that is rooted in nonlinear biochemical kinetics. This is done by considering a dynamical system based on a chemical reaction graph and associated kinetics parameters. Inference regarding both parameters and the reaction graph itself is carried out within a fully Bayesian framework. Prediction of dynamical behavior is achieved by averaging over both parameters and reaction graphs, allowing prediction even when the underlying reactions themselves are unknown or uncertain. Results, based on (i) data simulated from a mechanistic model of mitogen-activated protein kinase signaling and (ii) phosphoproteomic data from cancer cell lines, demonstrate that nonlinear formulations can yield gains in network inference and permit dynamical prediction in the challenging setting where the reaction graph is unknown."
to:NB  biochemical_networks  graphical_models  statistics  estimation 
7 weeks ago
[1406.0052] Variable selection in high-dimensional additive models based on norms of projections
"We consider the problem of variable selection in high-dimensional sparse additive models. The proposed method is motivated by geometric considerations in Hilbert spaces, and consists in comparing the norms of the projections of the data on various additive subspaces. Our main results are concentration inequalities which lead to conditions making variable selection possible. In special cases these conditions are known to be optimal. As an application we consider the problem of estimating single components. We show that, up to first order, one can estimate a single component as well as if the other components were known."
to:NB  additive_models  variable_selection  hilbert_space  statistics 
7 weeks ago
[1406.0013] Estimating Vector Fields on Manifolds and the Embedding of Directed Graphs
"This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model a directed graph as a finite set of observations from a diffusion on a manifold endowed with a vector field. This is the first generative model of its kind for directed graphs. We introduce a graph embedding algorithm that estimates all three features of this model: the low-dimensional embedding of the manifold, the data density and the vector field. In the process, we also obtain new theoretical results on the limits of "Laplacian type" matrices derived from directed graphs. The application of our method to both artificially constructed and real data highlights its strengths."
to:NB  network_data_analysis  manifold_learning  statistics 
7 weeks ago
I'm Looking For an FBI Profiler Who Really Gets Me
"I know the local detectives have done their best, but I really think it’s time for us to move on. This investigation isn’t going anywhere, and we all know it. It’s time to call in someone a little more sophisticated, someone more empathetic–someone who really gets me.
"Part of this is my fault, I know. I should have known that my pattern of elegant but brutal serial murders would go right over the heads of these local cops. But honestly, I thought that after the third victim, they’d catch on. My exquisite design, tableaux morts of the Greek muses surrounded by signifiers of their arts– it’s not that subtle. I thought they’d appreciate the elaborate staging, the intricate detail, the nuanced references to classical poetry and painting. I thought they’d see me as Apollo, leading my choir of muses to create the most beautiful and perfect work of art the world has ever seen.
"Instead, they’ve dubbed me the MILF Killer.
"Putting my own ego aside, that term is an insult to my victims. Clio, the history professor; Melpomene, the theatre patron; Terpsichore, the retired ballerina — these women were leaders, inspirations. And they weren’t even that hot. Is that what these yokels are looking for? Maybe if I were slashing coeds, I’d get more respect.
"But that’s not the kind of serial killer I am. I just wish someone understood that."
funny:morbid  affectionate_parody 
7 weeks ago
Dirtbag John Milton
"Did I request thee, Maker, from my Clay
To mould me Man, did I sollicite thee
From darkness to promote me, or here place
In this delicious Garden?
- Adam, Paradise Lost, John Milton

"“I didn’t ask to be born, you know.”
- Teens"
funny:geeky  funny:pointed  affectionate_parody  milton.john 
7 weeks ago
Pinboard Turns Five (Pinboard Blog)
"I enjoy the looking-glass aspect of our industry, where running a mildly profitable small business makes me a crazy maverick not afraid to break all the rules."
networked_life  moral_psychology 
7 weeks ago
I Keep Thinking That There Is Something Very Powerful and True in This Critique: But I Cannot Figure Out What It Is...: Afternoon Comment | Washington Center for Equitable Growth
"Suppose we decide that we are no longer going to:
"Pretend that agents–or economists–know the data-generating process…
"Recognize that people are not terribly committed to Bayesianism–that they do not model probabilities as if they have well-defined priors and all there is is risk…
"What do we then do–what kind of economic arguments do we make–once we have made those decisions?"

--- Y'know, I think I can hear Herb Simon rolling in his grave from here.
economics  social_science_methodology  bounded_rationality 
7 weeks ago
Complex Operational Decision Making in Networked Systems of Humans and Machines: A Multidisciplinary Approach
"Over the last two decades, computers have become omnipresent in daily life. Their increased power and accessibility have enabled the accumulation, organization, and analysis of massive amounts of data. These data, in turn, have been transformed into practical knowledge that can be applied to simple and complex decision making alike. In many of today's activities, decision making is no longer an exclusively human endeavor. In both virtual and real ways, technology has vastly extended people's range of movement, speed and access to massive amounts of data. Consequently, the scope of complex decisions that human beings are capable of making has greatly expanded. At the same time, some of these technologies have also complicated the decision making process. The potential for changes to complex decision making is particularly significant now, as advances in software, memory storage and access to large amounts of multimodal data have dramatically increased. Increasingly, our decision making process integrates input from human judgment, computing results and assistance, and networks. Human beings do not have the ability to analyze the vast quantities of computer-generated or -mediated data that are now available. How might humans and computers team up to turn data into reliable (and when necessary, speedy) decisions?
"Complex Operational Decision Making in Networked Systems of Humans and Machines explores the possibilities for better decision making through collaboration between humans and computers. This study is situated around the essence of decision making; the vast amounts of data that have become available as the basis for complex decision making; and the nature of collaboration that is possible between humans and machines in the process of making complex decisions. This report discusses the research goals and relevant milestones in several enabling subfields as they relate to enhanced human-machine collaboration for complex decision making; the relevant impediments and systems-integration challenges that are preventing technological breakthroughs in these subfields; and a sense of the research that is occurring in university, government and industrial labs outside of the United States, and the implications of this research for U.S. policy. The development of human-machine collaboration for complex decision making is still in its infancy relative to where cross-disciplinary research could take it over the next generation. Complex Operational Decision Making explores challenges to progress, impediments to achieving technological breakthroughs, opportunities, and key research goals."
to:NB  books:noted  decision-making  networks  distributed_systems  networked_life  computers  natural_born_cyborgs  collective_cognition 
7 weeks ago
Text Analysis with R for Students of Literature
"Text Analysis with R for Students of Literature is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological tool kit to include quantitative and computational approaches to the study of text. Computation provides access to information in text that we simply cannot gather using traditional qualitative methods of close reading and human synthesis. Text Analysis with R for Students of Literature provides a practical introduction to computational text analysis using the open source programming language R. R is extremely popular throughout the sciences and because of its accessibility, R is now used increasingly in other research areas. Readers begin working with text right away and each chapter works through a new technique or process such that readers gain a broad exposure to core R procedures and a basic understanding of the possibilities of computational text analysis at both the micro and macro scale. Each chapter builds on the previous as readers move from small scale “microanalysis” of single texts to large scale “macroanalysis” of text corpora, and each chapter concludes with a set of practice exercises that reinforce and expand upon the chapter lessons. The book’s focus is on making the technical palatable and making the technical useful and immediately gratifying."
to:NB  books:noted  data_analysis  R  text_mining  humanities 
7 weeks ago
Democracy in the Making: How Activist Groups Form (Oxford Studies in Culture and Politics) by Kathleen M. Blee - Powell's Books
"With civic engagement commonly understood to be on the decline and traditional bases of community and means of engagement increasingly fractured, how do people become involved in collective civic action? How do activist groups form? What hampers the ability of these groups to invigorate political life, and what enables it?
"Kathleen Blee's groundbreaking new study provides a provocative answer: the early times matter. By following grassroots groups from their very beginnings, Blee traces how their sense of possibility shrinks over time as groups develop a shared sense of who they are that forecloses options that were once open. At the same time, she charts the turning points at which options re-open and groups become receptive to change and reinvention.
"Based on observing more than sixty grassroots groups in Pittsburgh for three years, Democracy in the Making is an unprecedented look at how ordinary people come together to change society. It gives a close-up look at the deliberations of activists on the left and right as they work for animal rights, an end to the drug trade in their neighbourhood, same-sex marriage, global peace, and more. It shows how grassroots activism can provide an alternative to civic disengagement and a forum for envisioning how the world can be transformed. At the same time, it documents how activist groups become mired in dysfunctional and undemocratic patterns that their members dislike, but cannot fix.
"By analyzing the possibilities and pitfalls that face nascent activist organizations, Blee reveals how critical early choices are to the success of grassroots activism. Vital for scholars and activists alike, this practical yet profound study shows us, through the examples of both groups that flourish and those that flounder, how grassroots activism can better live up to its democratic potential."
to:NB  books:noted  social_movements  political_science  democracy  pittsburgh  via:orgtheory 
7 weeks ago
Axiomatizing bounded rationality: the priority heuristic - Springer
"This paper presents an axiomatic framework for the priority heuristic, a model of bounded rationality in Selten’s (in: Gigerenzer and Selten (eds.) Bounded rationality: the adaptive toolbox, 2001) spirit of using empirical evidence on heuristics. The priority heuristic predicts actual human choices between risky gambles well. It implies violations of expected utility theory such as common consequence effects, common ratio effects, the fourfold pattern of risk taking and the reflection effect. We present an axiomatization of a parameterized version of the heuristic which generalizes the heuristic in order to account for individual differences and inconsistencies. The axiomatization uses semiorders (Luce, Econometrica 24:178–191, 1956), which have an intransitive indifference part and a transitive strict preference component. The axiomatization suggests new testable predictions of the priority heuristic and makes it easier for theorists to study the relation between heuristics and other axiomatic theories such as cumulative prospect theory."
to:NB  heuristics  decision_theory  cognitive_science  gigerenzer.gerd 
7 weeks ago
Piketty in R markdown – we need some help from the crowd | Simply Statistics
The non-proportional spacing of points on the time axis bugged me too, but I think it's more a case of spreadsheet defaults than anything else.
piketty.thomas  economics  data_sets  to_teach:statcomp 
7 weeks ago
Model change and reliability in scientific inference - Springer
"One persistent challenge in scientific practice is that the structure of the world can be unstable: changes in the broader context can alter which model of a phenomenon is preferred, all without any overt signal. Scientific discovery becomes much harder when we have a moving target, and the resulting incorrect understandings of relationships in the world can have significant real-world and practical consequences. In this paper, we argue that it is common (in certain sciences) to have changes of context that lead to changes in the relationships under study, but that standard normative accounts of scientific inquiry have assumed away this problem. At the same time, we show that inference and discovery methods can “protect” themselves in various ways against this possibility by using methods with the novel methodological virtue of “diligence.” Unfortunately, this desirable virtue provably is incompatible with other desirable methodological virtues that are central to reliable inquiry. No scientific method can provide every virtue that we might want."
to:NB  philosophy_of_science  epistemology  non-stationarity  danks.david 
7 weeks ago
The Slack Wire: The Rentier Would Prefer Not to Be Euthanized
"The rentiers would prefer not to be euthanized. Under capitalism, the elite are those who own (or control) money. Their function is, in a broad sense, to provide liquidity. To the extent that pure money-holders facilitate production, it is because money serves as a coordination mechanism, bridging gaps — over time and especially with unknown or untrusted counterparties — that would otherwise prevent cooperation from taking place. [1] In a world where liquidity is abundant, this coordination function is evidently obsolete and can no longer be a source of authority or material rewards."

--- I'm sympathetic (the new mode of production being prepared within the old FTW), but isn't there a fallacy of aggregation here? Why would liquidity (etc.) still be abundant if its provision is _not_ paid for? Admittedly, going from being a Master of the Universe to a modestly-compensated provider of a hum-drum service would be a big come-down, like going from hydraulic despotism to being a plumber, and maybe that'd be enough for Mason's argument to work.
economics  political_economy  finance  financial_crisis_of_2007--  mason.j.w. 
7 weeks ago
LIWC: Linguistic Inquiry and Word Count
Have they really just stuck words into various categories, and then counted up how often they appear in the document? It seems so, since "It was a beautiful funeral" scores as 20% positive, 0% negative. (If so: problem set for the kids in statistical computing?) Maybe this would get the emotional drift from a long piece of text, but from short snippets like Twitter or Facebook status updates, this has got to be super noisy.
Memo to self, look at whether CMU has a site license before shelling out $29.95.

ETA: The classic "I am in no way unhappy" scores as 1/6 negative.
text_mining  linguistics  psychology  to:blog  to_teach:statcomp 
7 weeks ago
Review-a-Day - Manliness by Harvey C. Mansfield, reviewed by The New Republic Online - Powell's Books
The usual caveats to interpreting Straussians apply. This book would seem to be aimed at "gentlemen", and so the illogicality and appeal to prejudice would be features, not bugs in an attempt at exercising reason.
book_reviews  evisceration  philosophy  mansfield.harvey_c.  nussbaum.martha  feminism  masculinity  sexist_idiocy  have_read  via:? 
8 weeks ago
Hospitals Are Mining Patients' Credit Card Data to Predict Who Will Get Sick - Businessweek
So, do they actually have labeled training data which links patient outcomes to patient purchases, or are they just trolling for stuff that sounds bad?
medicine  data_mining  national_surveillance_state  via:clay  have_read  to:blog  to_teach:data-mining 
8 weeks ago
[1406.6956] Order-Optimal Estimation of Functionals of Discrete Distributions
"We propose a general framework for the construction and analysis of estimators for a wide class of functionals of discrete distributions, where the alphabet size S is unknown and may be scaling with the number of observations n. We treat the respective regions where the functional is "nonsmooth" and "smooth" separately. In the "nonsmooth" regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the "smooth" regime, we apply a bias-corrected version of the Maximum Likelihood Estimator (MLE).
"We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: the entropy and the R\'enyi entropy of order α. We obtain the best known upper bounds for the maximum mean squared error incurred in estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n=Θ(S/lnS) for entropy estimation. We also demonstrate that it suffices to have n=ω(S1/α/lnS) for estimating the R\'enyi entropy of order α, 0<α<1. Conversely, we establish a minimax lower bound that establishes optimality of this sample complexity to within a lnS‾‾‾‾√ factor.
"We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with the popular MLE and with the order-optimal entropy estimator of Valiant and Valiant. As we illustrate with a few experiments, our approach results in shorter running time and higher accuracy."
in_NB  entropy_estimation  statistics  to_read  estimation  information_theory 
8 weeks ago
[1406.6959] Maximum Likelihood Estimation of Functionals of Discrete Distributions
"The Maximum Likelihood Estimator (MLE) is widely used in estimating functionals of discrete probability distributions, and involves "plugging-in" the empirical distribution of the data. In this work we propose a general framework and procedure to analyze the performance of the MLE in estimating functionals of discrete distributions, under the worst-case mean squared error criterion. In particular, we use approximation theory to bound the bias incurred by the MLE, and concentration inequalities to bound the variance. We highlight our techniques by considering two important information measures: the entropy, and the R\'enyi entropy of order α. For entropy estimation, we show that it is necessary and sufficient to have n=ω(S) observations for the MLE to be consistent, where S represents the alphabet size. In addition, we obtain that it is necessary and sufficient to consider n=ω(S1/α) samples for the MLE to consistently estimate ∑Si=1pαi,0<α<1. For both these problems, the MLE achieves the best possible sample complexity up to logarithmic factors. When α>1, we show that n=ω(max(S2/α−1,1)) samples suffice."
in_NB  to_read  statistics  entropy_estimation  estimation  information_theory 
8 weeks ago
Capital in Pikettys Capital
This is, I think, the best explanation of why the Cambridge Capital Controversy is quite irrelevant here.
economics  piketty.thomas  via:jbdelong  have_read 
8 weeks ago
What’s the Matter With Eastern Kentucky? - NYTimes.com
"The public debate about the haves and the have-nots tends to focus on the 1 percent, especially on the astonishing, breakaway wealth in cities like New York, San Francisco and Washington and the great disparities contained therein. But what has happened in the smudge of the country between New Orleans and Pittsburgh — the Deep South and Appalachia — is in many ways as remarkable as what has happened in affluent cities. In some places, decades of growth have failed to raise incomes, and of late, poverty has become more concentrated not in urban areas but in rural ones."

--- My experience of living in this part of the country is that lots of these places look like they lost the Cold War. Maybe in some sense they did.
--- I also think the ghost of LBJ could fairly complain about much of the political framing in this article. "Does the welfare state actually make life better for these poor people?" is a very different question from "has the welfare state eliminated poverty?", let alone from "has the welfare state brought equal prosperity to all areas of the country?"
have_read  inequality  class_struggles_in_america  rural_decay  appalachia  poverty  whats_gone_wrong_with_america  via:jbdelong 
8 weeks ago
Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa
"Equilibrium models of isolation by distance predict an increase in genetic differentiation with geographic distance. Here we find a linear relationship between genetic and geographic distance in a worldwide sample of human populations, with major deviations from the fitted line explicable by admixture or extreme isolation. A close relationship is shown to exist between the correlation of geographic distance and genetic differentiation (as measured by F ST) and the geographic pattern of heterozygosity across populations. Considering a worldwide set of geographic locations as possible sources of the human expansion, we find that heterozygosities in the globally distributed populations of the data set are best explained by an expansion originating in Africa and that no geographic origin outside of Africa accounts as well for the observed patterns of genetic diversity. Although the relationship between F ST and geographic distance has been interpreted in the past as the result of an equilibrium model of drift and dispersal, simulation shows that the geographic pattern of heterozygosities in this data set is consistent with a model of a serial founder effect starting at a single origin. Given this serial-founder scenario, the relationship between genetic and geographic distance allows us to derive bounds for the effects of drift and natural selection on human genetic variation."

--- Contributed rather than peer-reviewed, so who knows?
to:NB  to_read  human_genetics  historical_genetics  human_evolution  cavalli-sforza.l.luca 
9 weeks ago
Experimental evidence of massive-scale emotional contagion through social networks
"Emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. Emotional contagion is well established in laboratory experiments, with people transferring positive and negative emotions to others. Data from a large real-world social network, collected over a 20-y period suggests that longer-lasting moods (e.g., depression, happiness) can be transferred through networks [Fowler JH, Christakis NA (2008) BMJ 337:a2338], although the results are controversial. In an experiment with people who use Facebook, we test whether emotional contagion occurs outside of in-person interaction between individuals by reducing the amount of emotional content in the News Feed. When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred. These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion via social networks. This work also suggests that, in contrast to prevailing assumptions, in-person interaction and nonverbal cues are not strictly necessary for emotional contagion, and that the observation of others’ positive experiences constitutes a positive experience for people."
to:NB  to_read  social_influence  contagion  experimental_psychology  psychology  re:homophily_and_confounding  to_be_shot_after_a_fair_trial 
9 weeks ago
The Big Five Personality Traits - Psychological Entities or Statistical Constructs?
"The present study employed multivariate genetic item-level analyses to examine the ontology and the genetic and environmental etiology of the Big Five personality dimensions, as measured by the NEO Five Factor Inventory (NEO-FFI) [Costa and McCrae, Revised NEO personality inventory (NEO PI-R) and NEO five- factor inventory (NEO-FFI) professional manual, 1992; Hoekstra et al., NEO personality questionnaires NEO-PI-R, NEO-FFI: manual, 1996]. Common and independent pathway model comparison was used to test whether the five personality dimensions fully mediate the genetic and environmental effects on the items, as would be expected under the realist interpretation of the Big Five. In addition, the dimensionalities of the latent genetic and environ- mental structures were examined. Item scores of a popu- lation-based sample of 7,900 adult twins (including 2,805 complete twin pairs; 1,528 MZ and 1,277 DZ) on the Dutch version of the NEO-FFI were analyzed. Although both the genetic and the environmental covariance components display a 5-factor structure, applications of common and independent pathway modeling showed that they do not comply with the collinearity constraints entailed in the common pathway model. Implications for the substantive interpretation of the Big Five are discussed."
to:NB  psychometrics  human_genetics  graphical_models  model_selection  statistics  re:g_paper  have_read  borsboom.denny 
9 weeks ago
Vergara vs. California: Are the top 0.1% buying their version of education reform? - The Washington Post
"Common Core and Vergara are, of course, isolated instances, but they are both important and representative. In case after case, theories and approaches favored by a handful of very wealthy individuals received preferential treatment in the education debate. You cannot call that a democratic process."
education  inequality  class_struggles_in_america  us_politics  have_read 
9 weeks ago
« earlier      
academia afghanistan agent-based_models american_history archaeology art bad_data_analysis bad_science_journalism bayesian_consistency bayesianism biochemical_networks book_reviews books:noted books:owned books:recommended bootstrap cartoons cats causal_inference causality central_asia central_limit_theorem class_struggles_in_america classifiers climate_change clustering cognitive_science collective_cognition comics community_discovery complexity computational_statistics computer_networks_as_provinces_of_the_commonwealth_of_letters confidence_sets corruption coveted cthulhiana cultural_criticism cultural_exchange data_analysis data_mining debunking decision-making decision_theory delong.brad democracy density_estimation dimension_reduction distributed_systems dynamical_systems econometrics economic_history economic_policy economics education empirical_processes ensemble_methods entropy_estimation epidemic_models ergodic_theory estimation evisceration evolutionary_biology experimental_psychology exponential_families finance financial_crisis_of_2007-- financial_markets financial_speculation fmri food fraud funny funny:academic funny:geeky funny:laughing_instead_of_screaming funny:malicious graph_theory graphical_models have_read heard_the_talk heavy_tails high-dimensional_statistics history_of_ideas history_of_science human_genetics hypothesis_testing ideology imperialism in_nb inequality inference_to_latent_objects information_theory institutions kernel_methods kith_and_kin krugman.paul large_deviations lasso learning_theory liberman.mark likelihood linguistics literary_criticism machine_learning macro_from_micro macroeconomics manifold_learning market_failures_in_everything markov_models mixing mixture_models model_selection modeling modern_ruins monte_carlo moral_psychology moral_responsibility mortgage_crisis natural_history_of_truthiness network_data_analysis networked_life networks neural_data_analysis neuroscience non-equilibrium nonparametrics obama.barack optimization our_decrepit_institutions philosophy philosophy_of_science photos physics pittsburgh political_economy political_science practices_relating_to_the_transmission_of_genetic_information prediction pretty_pictures principal_components probability programming progressive_forces psychology r racism random_fields re:almost_none re:aos_project re:democratic_cognition re:do-institutions-evolve re:g_paper re:homophily_and_confounding re:network_differences re:smoothing_adjacency_matrices re:social_networks_as_sensor_networks re:stacs re:your_favorite_dsge_sucks recipes regression regulation running_dogs_of_reaction science_as_a_social_process science_fiction simulation social_influence social_life_of_the_mind social_media social_networks social_science_methodology sociology something_about_america sparsity spatial_statistics statistical_inference_for_stochastic_processes statistical_mechanics statistics stochastic_processes text_mining the_american_dilemma the_continuing_crises time_series to:blog to:nb to_be_shot_after_a_fair_trial to_read to_teach:complexity-and-inference to_teach:data-mining to_teach:statcomp to_teach:undergrad-ada track_down_references us-iraq_war us_politics utter_stupidity variable_selection vast_right-wing_conspiracy via:? via:jbdelong via:klk visual_display_of_quantitative_information whats_gone_wrong_with_america why_oh_why_cant_we_have_a_better_academic_publishing_system why_oh_why_cant_we_have_a_better_press_corps

Copy this bookmark: