European Protest and Coercion Data
Via Jeremy Vine's Data is Plural
data:social-science  polisci:protest 
10 days ago
Fiscal Crises
"This paper presents a new database of fiscal crises covering different country groups, including low-income developing countries (LIDCs) that have been mostly ignored in the past. Countries faced on average two crises since 1970, with the highest frequency in LIDCs and lowest in advanced economies."

@via Data Is Plural newsletter, by Jeremy Singer-Vine
data:economics  data:social-science  political-economy 
11 days ago
World Prison Brief | an online database comprising information on prisons and the use of imprisonment around the world
"The World Prison Brief is an online database providing free access to information on prison systems around the world."
violence:prison  data:social-science 
15 days ago
Clayton Thyne
Lots of Stata rep mats, using the Powell and Thyne dataset on coups.
data:social-science  polisci:state  stata 
23 days ago
Bilateral Labor Agreements Dataset | University of Chicago Law School
"International labor treaties. Bilateral labor agreements regulate the migration of workers between two countries, and the Bilateral Labor Agreements Dataset aims to catalog as many of these treaties as it can. So far the University of Chicago Law School professors and researchers running the initiative have identified 582 treaties signed between 1945 and 2015. “However, this list is almost certainly underinclusive,” they write. “Many BLAs are not deposited in the major international treaty databases and they often do not receive much, if any, publicity.” [h/t Adam Chilton]"

data:social-science  data:legislation  data:dyadic 
6 weeks ago
Party Facts
"Party Facts links datasets on political parties and provides an online platform about parties and their history as recorded in social science datasets."
data:social-science  polisci:parties 
7 weeks ago
ToxicDocs by Gautam Shine
Document analysis and classification for the ToxicDocs collection
data:social-science  data:text  python 
june 2018
The False Allure of Hashing for Anonymization
MD5 hashing is inefficient when the data are structured.

Leads to this also interesting story from 2014 [1] (de-anonymizing cabs in New York City). See also [2], about differential privacy.

tech:security  tech:cryptography  data:social-science  data:privacy 
june 2018
Mass Mobilization in Autocracies Database
"The Mass Mobilization in Autocracies Database (MMAD) contains sub-national data on mass mobilization events in autocracies worldwide."
data:social-science  polisci 
may 2018
"Rulers, Elections, and Irregular Governance dataset"
april 2018
Welcome @PartiRep | Partirep
"This website is the home of the PARTIREP research project. PARTIREP analyzes changing patterns of participation and representation in modern democracies."
data:social-science  polisci:parties  polisci:democracy 
march 2018
Maddison Project Database 2018 | Releases | Maddison Historical Statistics | Historical Development | University of Groningen
"The Maddison Project Database provides information on comparative economic growth and income levels over the very long run. The 2018 version of this database covers 169 countries and the period up to 2016."
data:social-science  data:economics  via:xmarquez 
january 2018
CRAN - Package ess
"Download data from the European Social Survey directly from their website <>. There are two family of functions that allow you to download and interactively check all countries and rounds available."
data:social-science  stats:surveys  r  data:availability 
december 2017
Ethnographic Atlas - InterSciWiki
Dataset mentioned in Carles Boix's _Political Order and Inequality_, although he does not say which digitised version he used. Googling the data returns lots of different results, with no clear dominant one.

data:social-science  ethnography 
november 2017
Conquest and State Size Database
Oona A. Hathaway & Scott J. Shapiro, Conquest and State Size Database, Users may also wish to reference Oona A. Hathaway & Scott J. Shapiro, The Internationalists (2017), particularly chapters 13 and 14.
data:social-science  history  violence:war  violence:colonialism 
november 2017
RoR datasets - Irina Mirkina
"'Regions of Russia (RoR), 1990-2015' is an exhaustive database, created for research on regional development. It includes over 2000 indicators from numerous sources, representing a wide range of economic, political, and social changes in 82 Russian regions during 25 years of reforms and rebounds. Indicators, most of which have never been published in English before, cover various topics, including trade, production, population structure, labour, investment, economic growth, climate, crime rates, terrorism, education, health care, culture, banking system, insurance, services, communication, and infrastructure."
data:social-science  world:russia 
september 2017
MEDW Survey Data Now Available in Open Access! | Making Electoral Democracy Work
"The MEDW team is proud to announce that all survey data from the Making Electoral Democracy Work project are now available in open access in the Harvard Dataverse. In total, this represents 32 studies from national, subnational and supranational elections, which were conducted in seven established democracies between 2010 and 2016 (see the list below). The core data module is an aggregated dataset that combines information from 27 surveys fielded in 11 regions from five countries (Canada, France, Germany, Spain, and Switzerland) and that spans over 40 000 respondents. The project dataverse also includes 5 “special datasets” (additional studies conducted in Belgium, France, Germany, Spain and Sweden) that are not incorporated in the aggregated dataset but contain a large number of questions asked in the regular MEDW surveys."
data:social-science  polisci:democracy  stats:surveys 
august 2017
QuantGov: A New Home for Open Source Policy Analytics
"QuantGov is an effort to expand the frontiers of economic and legal research by providing an open-source framework to uncover the latent data in legal text. QuantGov grew out of the RegData project, which seeks to solve the longstanding problem of the quantification of federal regulation for use in economic analysis."
data:social-science  data:economics  soc:quantification 
july 2017
Data & Tools | International IDEA
"International IDEA's Global Database on Elections and Democracy" (included in the Quality of Government dataset).
data:social-science  polisci:elections  polisci:democracy 
april 2017
Hunting Game: Targeting the Big Five | Rene Bekkers
"What is wrong with the data for Bahrain? The patterns of responses for cases from Bahrain, it turns out, are surprisingly often a series of ten exactly the same values, such as 1111111111 or 555555555555. I routinely check data from surveys for such patterns. While it is impossible to prove this, serial response patterns suggest fabrication of data. Participants and/or interviewers skipping questions may follow such patterns. Almost half of all the cases from Bahrain follow such a pattern. Other countries with a relatively high proportion of serial pattern responses are South Africa, Singapore, and China. The two countries for which the BFI-10 behaves close to what previous research has reported, the Netherlands and Germany, have a very low occurrence of serial pattern responses."

The author also points at translation, which is my first hunch (idem "democracy" in Vietnam).

data:social-science  stats:surveys  psychology  stats:measurement 
april 2017
Problems with the Big Five assessment in the World Values Survey – Erik Gahner Larsen
"In the present paper, we show for the first time that the Big Five data from WVS Wave 6 is extremely problematic: items from the same trait correlate negatively with each other as often as not, occasionally to truly extreme degrees. Particular caution is warranted for any future research aiming to use this data, as we do not identify any straightforward solution to the data’s challenges." -- I am not surprised.

stats:surveys  data:social-science  psychology  stats:measurement 
april 2017
Yes, Virginia, the social and data sciences are “science” | asecondmouse
"[fn1] So I’m not going to be using the phrase “data science” much in this essay, though in most of the “data sciences” one is interested in human individual and social behavior, so it’s the same thing."
data:social-science  data:science  social-science  epistemology:causality  polisci  polisci:profession 
april 2017
Correlates of State Policy | IPPSR
"The Correlates of State Policy Project aims to compile, disseminate, and encourage the use of data relevant to U.S. state policy research, tracking policy differences across and change over time in the 50 states. We have gathered more than nine-hundred variables from various sources and assembled them into one large, useful dataset."
data:social-science  polisci:federalism  usa:politics  usa:economy 
april 2017
[News] Datasets
"An awesome list of high-quality datasets. We have extracted the datasets from the repositories so they be accessed and analyzed by anyone. "
data:social-science  journalism 
april 2017
Political elites - CIRCaP | Centre for the Study of Political Change - University of Siena
"EURélite: European Political Elites in Comparison. The Long Road to Convergence – coordinated by Prof. Maurizio Cotta and Prof. Heinrich Best (University of Jena), aimed at the development of historical databases on the European parliamentary elites and ministries."
data:social-science  polisci:parliaments 
february 2017
eliflab/European-Parliament-Open-Data: European Parliament Open Data // Twitter
"This file gathers together all the Twitter accounts of the Members of the European Parliament in office."
web:twitter  polisci:parliaments  data:social-science 
january 2017
The Correlates of Media Freedom: An Introduction of the Global Media Freedom Dataset* | Political Science Research and Methods | Cambridge Core
Global Media Freedom Dataset, 196 countries, 1948-2014

"Media freedom has motivated substantial political activism, yet there are surprising gaps in the related academic literature, in part because of a lack of historic and consistent data. We introduce the newly expanded Global Media Freedom Dataset, which includes 196 countries from 1948 to 2012, and demonstrate how it can be used to test previous hypotheses and assumptions about the correlates of media freedom."
data:social-science  journalism  from twitter_favs
december 2016
Phoenix Data Project
"The Phoenix dataset is a new, near real-time event dataset created using the next-generation event data coding software, PETRARCH. The data is generated using news content scraped from over 400 sources. This scraped content is run through a processing pipeline that produces coded event data as a final output. Our current settings produce roughly 3,000 coded events per day. These coded events are in the standard who-did-what-to-whom format typically associated with event data. Each event is coded along on multiple dimensions, specifically source and target actors and event type."

december 2016
South African Social Attitudes Survey (SASAS)
From the ESS team: "Data will soon be available from the 2015 South African Social Attitudes Survey that replicated ESS #justice, #democracy and #health items"
data:social-science  stats:surveys 
december 2016
Trade Research - Temporary Trade Barriers Database
"The Temporary Trade Barriers Database (TTBD) website hosts newly collected, freely available, and detailed data on more than thirty different national governments’ use of policies such as antidumping (AD), global safeguards (SG), China-specific transitional safeguard (CSG) measures, and countervailing duties (CVD). The information provided in this detailed data base will cover over 95% of the global use of these particular import-restricting trade remedy instruments."

data:economics  data:social-science  polisci:ir  political-economy 
december 2016
bowerth/nsoApi - Gitter
"The last 12 months, I have been creating functions to access information from National Statistical Institutes."
r  data:social-science 
december 2016
Data - The Reform Capacity of Governments
School system and Head of Government data up to 2012.
november 2016
rudeboybert/okcupiddata: R Package of Cleaned OkCupid Data
"R package of cleaned profile data from OkCupid Profile Data for Introductory Statistics and Data Science Courses (Journal of Statistics Education 2015): 59,946 OkCupid users who were living within 25 miles of San Francisco, had active profiles on June 26, 2012, were online in the previous year, and had at least one picture in their profile." -- screams "TEACH ME"

data:social-science  usa:society  stats:teaching 
september 2016
Constituency-Level Elections Archive (CLEA)
"The Constituency-Level Elections Archive (CLEA) is a repository of detailed election results at the constituency level for lower house legislative elections from around the world. Our motivation is to preserve and consolidate these valuable data in one comprehensive and reliable resource that is ready for analysis and publicly available at no cost. This public good is expected to be of use to a range of audiences for research, education, and policy-making."
data:social-science  polisci:elections 
may 2016
