driscoll + nlp   41

A Method of Automated Nonparametric Content Analysis for Social Science | Gary King
A Method of Automated Nonparametric Content Analysis for Social Science

Hopkins, Daniel, and Gary King. "A Method of Automated Nonparametric Content Analysis for Social Science." American Journal of Political Science 54, no. 1 (2010): 229-247. copy at http://j.mp/jNFDgI


The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.
twitter  bigdata  data  nlp  methodology  method  text  analysis  analytics  academia  research 
september 2012 by driscoll

related tags

academia  ai  algorithm  analysis  analytics  api  archaeology  archive  art  audio  automation  bias  bigdata  book  books  bot  bullying  cartography  chat  civility  code  coding  collaboration  community  compsci  computer  computerscience  computing  conversation  critcode  cs  dashboard  data  datamining  dating  debate  deeplearning  demo  dh  dialogue  dictionary  digital  digitalhumanities  digitization  discourse  election  email  emoji  english  fractal  freespeech  fun  gaming  generator  globalization  google  harassment  hate  hatespeech  heavymetal  hiphop  history  homepage  humanities  humor  information  internet  ir  javascript  journalism  jprg  language  lda  lexicalblend  library  linguistics  literature  localization  lyrics  machinelearning  machinetranslation  map  mapping  maps  markup  math  mathematics  media  messaging  metal  meter  method  methodology  mining  ml  mobile  moderation  module  mozilla  music  natural  naturallanguage  neologism  netart  network  neuralnet  news  newspapers  nl  nlp  nltk  novel  opensource  people  poetry  politics  pop  processing  programming  publishing  python  pytorch  racism  rap  reference  regulation  research  retrieval  rpg  science  search  semantic  sentiment  sexism  similarity  slang  sms  socialmedia  software  speech  speechrecognition  speechtotext  string  stylometrics  text  textanalysis  texting  textprocessing  textual  tokenize  tokenizer  tools  topicmodeling  translation  trump  twitter  typing  visualization  voice  web  weird  word  writing  zotero 

Copy this bookmark: