nlp   29293

« earlier    

Structured Generation of Technical Reading Lists
Jonathan Gordon
USC Information Sciences Institute
Marina del Rey, CA, USA
Stephen Aguilar
USC Rossier School of Education
Los Angeles, CA, USA
Emily Sheng
USC Information Sciences Institute
Marina del Rey, CA, USA
Gully Burns
USC Information Sciences Institute
Marina del Rey, CA, USA
21 hours ago by hustwj
CSCI 582: Computational Journalism
This course is designed to teach application of big data and data science in textual domains, particularly in Journalism and Reporting. The topics include data journalism, natural language processing, visualization, automated fact-checking and story finding, social media sensing and web data analysis. This course will also explore Journalism and Reporting focused open source tools. In a nutshell, this is an ideal course for computer science students who are fascinated with natural language and for journalism students who are enthusiastic about data.
courses  NLP 
yesterday by hustwj
GitHub - RaRe-Technologies/gensim: Topic Modelling for Humans
Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
python  opensource  nlp 
yesterday by tguemes
Building Brundage Bot – Hacker Noon
I concatenated the title of each paper to its abstract and created tf-idf n-gram features (up to trigrams) from the text. I then concatenated one-hot-encoded vectors representing the paper’s authors and arXiv category. I filtered out n-grams that appeared less than 30 times in the training set (out of ~25k total abstracts) and authors who appeared less than 3 times. This left around 17k total features.

Finally, I held out a randomly-selected 10% of the data as a test set and trained a logistic regression using sklearn. I added L1 regularization (with the parameter chosen by cross-validation) and a class-weighted loss loss to help with the large number of features and class imbalance.
NLP  ML  CNN  example 
2 days ago by foodbaby
tools and knowledge needed to begin anonymizing documents they have written.

It does this by firing up JStylo libraries (an author detection application also develped by PSAL) to detect stylometric patterns and determine features (like word length, bigrams, trigrams, etc.) that the user should remove/add to help obsure their style and identity.
analysis  writing  privacy  github  anonymous  nlp 
3 days ago by sprague

« earlier    

related tags

adam-kalai  ai  algorithm  algorithms  analysis  anonymous  ap  api  artificialintelligence  asr  audio  autoestima  bias  bigdata  blogs  bots  chat  cnn  compsci  computerscience  computing  conversation  corpora  corpus  course  courses  creativity  cs  data  datasets  deep-learning  deep_learning  deeplearning  dialog  documentation  edu  embedded  embedding  embeddings  emotions  english  ethics  example  firefox  generative  github  google  googletranslate  graphics  howto  ia  image  informationretrieval  ios11  iosdev  java  javascript  js  kaggle  keras  keyword  language-detection  language  law  lectures  legal  length  library  linguistics  literature  machine-learning  machine_learning  machinelearning  microsoft  ml  mozilla  music  narrative  natural-language-processing  natural  naturallanguage  netneutrality  newsroom  nlg  nlproc  nodejs  npm  opendata  opensource  papers  parsing  pixelbuds  pnl  privacy  process  processing  programming  protest  pwc  python  rap  readability  research  robotics  rpa  rstats  search  sentiment  sentimentanalysis  slides  software  spacy  spark  speech  speechrecognition  speechtotext  stanford  statistics  summary  swift  t  tagger  teaching  tensorflow  text  text_analysis  textanalysis  textbook  textmining  toolkit  tools  treebanks  trolls  tutorial  twitter  usecases  verification  vietnamese  voice-recognition  voice  voicerecognition  word-embedding  word-embeddings  word2vec  word_emedding  wordembedding  writing 

Copy this bookmark: