nlp   29285

« earlier    

Building Brundage Bot – Hacker Noon
I concatenated the title of each paper to its abstract and created tf-idf n-gram features (up to trigrams) from the text. I then concatenated one-hot-encoded vectors representing the paper’s authors and arXiv category. I filtered out n-grams that appeared less than 30 times in the training set (out of ~25k total abstracts) and authors who appeared less than 3 times. This left around 17k total features.

Finally, I held out a randomly-selected 10% of the data as a test set and trained a logistic regression using sklearn. I added L1 regularization (with the parameter chosen by cross-validation) and a class-weighted loss loss to help with the large number of features and class imbalance.
NLP  ML  CNN  example 
yesterday by foodbaby
tools and knowledge needed to begin anonymizing documents they have written.

It does this by firing up JStylo libraries (an author detection application also develped by PSAL) to detect stylometric patterns and determine features (like word length, bigrams, trigrams, etc.) that the user should remove/add to help obsure their style and identity.
analysis  writing  privacy  github  anonymous  nlp 
2 days ago by sprague
Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. The Stanford Natural Language Processing Group
software  nlp  treebanks  java 
4 days ago by jerid.francom

« earlier    

related tags

adam-kalai  agents  ai  algorithm  algorithms  analysis  anonymous  ap  api  archaeology  asr  audio  audio_recognition  bias  bigdata  blogs  bots  chat  chatbot  classification  cnn  coding  compsci  computerscience  computing  conversation  corpora  corpus  course  courses  creativity  cs  data-structures  data  datasets  deep-learning  deep_learning  deeplearning  dialog  dialogflow  documentation  edu  embedded  embedding  embeddings  emotions  english  ethics  example  firefox  gaming  generative  github  globalization  google  graphics  history  howto  ia  image  informationretrieval  ios11  iosdev  java  javascript  jprg  js  kaggle  keras  keyword  language-detection  language  law  lectures  legal  length  library  linguistics  literature  localization  machine-learning  machine_learning  machinelearning  machinetranslation  microsoft  ml  movies  mozilla  music  narrative  natural-language-processing  natural  naturallanguage  netneutrality  neuralnet  newsroom  nlg  nlproc  nodejs  npm  opendata  opensource  papers  parsing  privacy  process  processing  programming  protest  python  rap  readability  research  robotics  rpa  rpg  search  sentiment  sentimentanalysis  sequence  slides  software  spacy  spark  speech  speechrecognition  speechtotext  stanford  statistics  summary  swift  t  tagger  teaching  tensorflow  text  textanalysis  textbook  textmining  toolkit  tools  translation  treebanks  trolls  tutorial  twitter  usecases  verification  vietnamese  voice-recognition  voice  voice_recognition  voicerecognition  word-embedding  word-embeddings  word2vec  word_emedding  wordembedding  writing 

Copy this bookmark: