nhaliday + nlp   65

Ask HN: Getting into NLP in 2018? | Hacker News
syllogism (spaCy author):
I think it's probably a bad strategy to try to be the "NLP guy" to potential employers. You'd do much better off being a software engineer on a project with people with ML or NLP expertise.

NLP projects fail a lot. If you line up a job as a company's first NLP person, you'll probably be setting yourself up for failure. You'll get handed an idea that can't work, you won't know enough about how to push back to change it into something that might, etc. After the project fails, you might get a chance to fail at a second one, but maybe not a third. This isn't a great way to move into any new field.

I think a cunning plan would be to angle to be the person who "productionises" models.

Basically, don't just work on having more powerful solutions. Make sure you've tried hard to have easier problems as well --- that part tends to be higher leverage.

hn  q-n-a  discussion  tech  programming  machine-learning  nlp  strategy  career  planning  human-capital  init  advice  books  recommendations  course  unit  links  automation  project  examples  applications  multi  mooc  lectures  video  data-science  org:com  roadmap  summary  error  applicability-prereqs  ends-means  telos-atelos  cost-benefit 
29 days ago by nhaliday
Ask HN: Favorite note-taking software? | Hacker News
Ask HN: What is your ideal note-taking software and/or hardware?: https://news.ycombinator.com/item?id=13221158

my wishlist as of 2019:
- web + desktop macOS + mobile iOS (at least viewing on the last but ideally also editing)
- sync across all those
- open-source data format that's easy to manipulate for scripting purposes
- flexible organization: mostly tree hierarchical (subsuming linear/unorganized) but with the option for directed (acyclic) graph (possibly a second layer of structure/linking)
- can store plain text, LaTeX, diagrams, and raster/vector images (video prob not necessary except as links to elsewhere)
- full-text search
- somehow digest/import data from Pinboard, Workflowy, Papers 3/Bookends, and Skim, ideally absorbing most of their functionality
- so, eg, track notes/annotations side-by-side w/ original PDF/DjVu/ePub documents (to replace Papers3/Bookends/Skim), and maybe web pages too (to replace Pinboard)
- OCR of handwritten notes (how to handle equations/diagrams?)
- various forms of NLP analysis of everything (topic models, clustering, etc)
- maybe version control (less important than export)

- Evernote prob ruled out do to heavy use of proprietary data formats (unless I can find some way to export with tolerably clean output)
- Workflowy/Dynalist are good but only cover a subset of functionality I want
- org-mode doesn't interact w/ mobile well (and I haven't evaluated it in detail otherwise)
- TiddlyWiki/Zim are in the running, but not sure about mobile
- idk about vimwiki but I'm not that wedded to vim and it seems less widely used than org-mode/TiddlyWiki/Zim so prob pass on that
- Quiver/Joplin/Inkdrop look similar and cover a lot of bases, TODO: evaluate more
- Trilium looks especially promising, tho read-only mobile and for macOS desktop look at this: https://github.com/zadam/trilium/issues/511
- RocketBook is interesting scanning/OCR solution but prob not sufficient due to proprietary data format
- TODO: many more candidates, eg, TreeSheets, Gingko, OneNote (macOS?...), Notion (proprietary data format...), Zotero, Nodebook (https://nodebook.io/landing), Polar (https://getpolarized.io), Roam (looks very promising)

Ask HN: What do you use for you personal note taking activity?: https://news.ycombinator.com/item?id=15736102

Ask HN: What are your note-taking techniques?: https://news.ycombinator.com/item?id=9976751

Ask HN: How do you take notes (useful note-taking strategies)?: https://news.ycombinator.com/item?id=13064215

Ask HN: How to get better at taking notes?: https://news.ycombinator.com/item?id=21419478

Ask HN: How did you build up your personal knowledge base?: https://news.ycombinator.com/item?id=21332957
nice comment from math guy on structure and difference between math and CS: https://news.ycombinator.com/item?id=21338628
useful comment collating related discussions: https://news.ycombinator.com/item?id=21333383
Designing a Personal Knowledge base: https://news.ycombinator.com/item?id=8270759
Ask HN: How to organize personal knowledge?: https://news.ycombinator.com/item?id=17892731
Do you use a personal 'knowledge base'?: https://news.ycombinator.com/item?id=21108527
Ask HN: How do you share/organize knowledge at work and life?: https://news.ycombinator.com/item?id=21310030

other stuff:
plain text: https://news.ycombinator.com/item?id=21685660

Tiago Forte: https://www.buildingasecondbrain.com

hn search: https://hn.algolia.com/?query=notetaking&type=story

Slant comparison commentary: https://news.ycombinator.com/item?id=7011281

good comparison of options here in comments here (and Trilium itself looks good): https://news.ycombinator.com/item?id=18840990



Roam: https://news.ycombinator.com/item?id=21440289

intriguing but probably not appropriate for my needs: https://www.sophya.ai/

Inkdrop: https://news.ycombinator.com/item?id=20103589

Joplin: https://news.ycombinator.com/item?id=15815040


Leo Editor (combines tree outlining w/ literate programming/scripting, I think?): https://news.ycombinator.com/item?id=17769892

Frame: https://news.ycombinator.com/item?id=18760079

Notion: https://news.ycombinator.com/item?id=18904648


maybe not the best source for a review/advice

interesting comment(s) about tree outliners and spreadsheets: https://news.ycombinator.com/item?id=21170434

hn  discussion  recommendations  software  tools  desktop  app  notetaking  exocortex  wkfly  wiki  productivity  multi  comparison  crosstab  properties  applicability-prereqs  nlp  info-foraging  chart  webapp  reference  q-n-a  retention  workflow  reddit  social  ratty  ssc  learning  studying  commentary  structure  thinking  network-structure  things  collaboration  ocr  trees  graphs  LaTeX  search  todo  project  money-for-time  synchrony  pinboard  state  duplication  worrydream  simplification-normalization  links  minimalism  design  neurons  ai-control  openai  miri-cfar  parsimony  intricacy 
8 weeks ago by nhaliday
Why is Google Translate so bad for Latin? A longish answer. : latin
> All it does its correlate sequences of up to five consecutive words in texts that have been manually translated into two or more languages.
That sort of system ought to be perfect for a dead language, though. Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.

We're not exactly inundated with brand new Latin to translate.
> Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.
What makes you think that the Google folks haven't done so and used that to create the language models they use?
> That sort of system ought to be perfect for a dead language, though.
Perhaps. But it will be bad at translating novel English sentences to Latin.
foreign-lang  reddit  social  discussion  language  the-classics  literature  dataset  measurement  roots  traces  syntax  anglo  nlp  stackex  links  q-n-a  linguistics  lexical  deep-learning  sequential  hmm  project  arrows  generalization  state-of-art  apollonian-dionysian  machine-learning  google 
june 2019 by nhaliday
Basic Error Rates
This page describes human error rates in a variety of contexts.

Most of the error rates are for mechanical errors. A good general figure for mechanical error rates appears to be about 0.5%.

Of course the denominator differs across studies. However only fairly simple actions are used in the denominator.

The Klemmer and Snyder study shows that much lower error rates are possible--in this case for people whose job consisted almost entirely of data entry.

The error rate for more complex logic errors is about 5%, based primarily on data on other pages, especially the program development page.
org:junk  list  links  objektbuch  data  database  error  accuracy  human-ml  machine-learning  ai  pro-rata  metrics  automation  benchmarks  marginal  nlp  language  density  writing  dataviz  meta:reading  speedometer 
may 2019 by nhaliday
AI-complete - Wikipedia
In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI.[1] To call a problem AI-complete reflects an attitude that it would not be solved by a simple specific algorithm.

AI-complete problems are hypothesised to include computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real world problem.[2]

Currently, AI-complete problems cannot be solved with modern computer technology alone, but would also require human computation. This property can be useful, for instance to test for the presence of humans as with CAPTCHAs, and for computer security to circumvent brute-force attacks.[3][4]


AI-complete problems are hypothesised to include:

Bongard problems
Computer vision (and subproblems such as object recognition)
Natural language understanding (and subproblems such as text mining, machine translation, and word sense disambiguation[8])
Dealing with unexpected circumstances while solving any real world problem, whether it's navigation or planning or even the kind of reasoning done by expert systems.


Current AI systems can solve very simple and/or restricted versions of AI-complete problems, but never in their full generality. When AI researchers attempt to "scale up" their systems to handle more complicated, real world situations, the programs tend to become excessively brittle without commonsense knowledge or a rudimentary understanding of the situation: they fail as unexpected circumstances outside of its original problem context begin to appear. When human beings are dealing with new situations in the world, they are helped immensely by the fact that they know what to expect: they know what all things around them are, why they are there, what they are likely to do and so on. They can recognize unusual situations and adjust accordingly. A machine without strong AI has no other skills to fall back on.[9]
concept  reduction  cs  computation  complexity  wiki  reference  properties  computer-vision  ai  risk  ai-control  machine-learning  deep-learning  language  nlp  order-disorder  tactics  strategy  intelligence  humanity  speculation  crux 
march 2018 by nhaliday
Thursday assorted links - Marginal REVOLUTION
2. “A new study of English spelling practices demonstrates that the way we spell words is much more orderly and self-organizing than previously thought.”

3. Why we cry, and the economics of weeping. And Michael Cannon on the new health care bill.

4. Economic ideas we should forget (keep on clicking through to see the whole list). By no means do I always agree — the Coase theorem??
econotariat  marginal-rev  links  language  news  org:sci  nlp  emergent  history  medieval  early-modern  mostly-modern  economics  error  map-territory  simler  status  signaling  anthropology  evopsych  postrat  current-events 
march 2017 by nhaliday
Peter Norvig, the meaning of polynomials, debugging as psychotherapy | Quomodocumque
He briefly showed a demo where, given values of a polynomial, a machine can put together a few lines of code that successfully computes the polynomial. But the code looks weird to a human eye. To compute some quadratic, it nests for-loops and adds things up in a funny way that ends up giving the right output. So has it really ”learned” the polynomial? I think in computer science, you typically feel you’ve learned a function if you can accurately predict its value on a given input. For an algebraist like me, a function determines but isn’t determined by the values it takes; to me, there’s something about that quadratic polynomial the machine has failed to grasp. I don’t think there’s a right or wrong answer here, just a cultural difference to be aware of. Relevant: Norvig’s description of “the two cultures” at the end of this long post on natural language processing (which is interesting all the way through!)
mathtariat  org:bleg  nibble  tech  ai  talks  summary  philosophy  lens  comparison  math  cs  tcs  polynomials  nlp  debugging  psychology  cog-psych  complex-systems  deep-learning  analogy  legibility  interpretability  composition-decomposition  coupling-cohesion  apollonian-dionysian  heavyweights 
march 2017 by nhaliday
Performance Trends in AI | Otium
Deep learning has revolutionized the world of artificial intelligence. But how much does it improve performance? How have computers gotten better at different tasks over time, since the rise of deep learning?

In games, what the data seems to show is that exponential growth in data and computation power yields exponential improvements in raw performance. In other words, you get out what you put in. Deep learning matters, but only because it provides a way to turn Moore’s Law into corresponding performance improvements, for a wide class of problems. It’s not even clear it’s a discontinuous advance in performance over non-deep-learning systems.

In image recognition, deep learning clearly is a discontinuous advance over other algorithms. But the returns to scale and the improvements over time seem to be flattening out as we approach or surpass human accuracy.

In speech recognition, deep learning is again a discontinuous advance. We are still far away from human accuracy, and in this regime, accuracy seems to be improving linearly over time.

In machine translation, neural nets seem to have made progress over conventional techniques, but it’s not yet clear if that’s a real phenomenon, or what the trends are.

In natural language processing, trends are positive, but deep learning doesn’t generally seem to do better than trendline.


The learned agent performs much better than the hard-coded agent, but moves more jerkily and “randomly” and doesn’t know the law of reflection. Similarly, the reports of AlphaGo producing “unusual” Go moves are consistent with an agent that can do pattern-recognition over a broader space than humans can, but which doesn’t find the “laws” or “regularities” that humans do.

Perhaps, contrary to the stereotype that contrasts “mechanical” with “outside-the-box” thinking, reinforcement learners can “think outside the box” but can’t find the box?

ratty  core-rats  summary  prediction  trends  analysis  spock  ai  deep-learning  state-of-art  🤖  deepgoog  games  nlp  computer-vision  nibble  reinforcement  model-class  faq  org:bleg  shift  chart  technology  language  audio  accuracy  speaking  foreign-lang  definite-planning  china  asia  microsoft  google  ideas  article  speedometer  whiggish-hegelian  yvain  ssc  smoothness  data  hsu  scitariat  genetics  iq  enhancement  genetic-load  neuro  neuro-nitgrit  brain-scan  time-series  multiplicative  iteration-recursion  additive  multi  arrows 
january 2017 by nhaliday
How to Get into Natural Language Processing | Hacker News
We’re excited to introduce a new series we’re calling Paths. Each post will outline an emerging technology and give you clear steps on how to get started in that field.
hn  commentary  yc  tech  startups  business  data-science  nlp  ai 
january 2017 by nhaliday
Information Processing: Thought vectors and the dimensionality of the space of concepts
If we trained a deep net to translate sentences about Physics from Martian to English, we could (roughly) estimate the "conceptual depth" of the subject. We could even compare two different subjects, such as Physics versus Art History.
hsu  ai  deep-learning  google  speculation  commentary  news  language  embeddings  neurons  thinking  papers  summary  scitariat  dimensionality  conceptual-vocab  vague  nlp  nibble  state-of-art  features 
december 2016 by nhaliday
The goal of the Lean Forward project is to collaborate with number theorists to formally prove theorems about research mathematics and to address the main usability issues hampering the adoption of proof assistants in mathematical circles. The theorems will be selected together with our collaborators to guide the development of formal libraries and verified tools.

mostly happening in the Netherlands


A Review of the Lean Theorem Prover: https://jiggerwit.wordpress.com/2018/09/18/a-review-of-the-lean-theorem-prover/
- Thomas Hales
seems like a Coq might be a better starter if I ever try to get into proof assistants/theorem provers

edit: on second thought this actually seems like a wash for beginners

An Argument for Controlled Natural Languages in Mathematics: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-controlled-natural-languages-in-mathematics/
By controlled natural language for mathematics (CNL), we mean an artificial language for the communication of mathematics that is (1) designed in a deliberate and explicit way with precise computer-readable syntax and semantics, (2) based on a single natural language (such as Chinese, Spanish, or English), and (3) broadly understood at least in an intuitive way by mathematically literate speakers of the natural language.

The definition of controlled natural language is intended to exclude invented languages such as Esperanto and Logjam that are not based on a single natural language. Programming languages are meant to be excluded, but a case might be made for TeX as the first broadly adopted controlled natural language for mathematics.

Perhaps it is best to start with an example. Here is a beautifully crafted CNL text created by Peter Koepke and Steffen Frerix. It reproduces a theorem and proof in Rudin’s Principles of mathematical analysis almost word for word. Their automated proof system is able to read and verify the proof.

research  math  formal-methods  msr  multi  homepage  research-program  skunkworks  math.NT  academia  ux  CAS  mathtariat  expert-experience  cost-benefit  nitty-gritty  review  critique  rant  types  learning  intricacy  functional  performance  c(pp)  ocaml-sml  comparison  ecosystem  DSL  tradeoffs  composition-decomposition  interdisciplinary  europe  germanic  grokkability  nlp  language  heavyweights  inference  rigor  automata-languages  repo  software  tools  syntax  frontier  state-of-art  pls  grokkability-clarity  technical-writing  database  lifts-projections 
january 2016 by nhaliday

bundles : academeacmframe

related tags

2016-election  academia  accuracy  acm  acmtariat  additive  advanced  advice  aggregator  ai  ai-control  analogy  analysis  anglo  announcement  anthropology  api  apollonian-dionysian  app  applicability-prereqs  applications  arbitrage  arms  arrows  article  asia  atoms  attention  audio  authoritarianism  automata-languages  automation  bayesian  benchmarks  best-practices  biases  big-picture  biotech  boltzmann  books  bots  brain-scan  browser  business  c(pp)  career  CAS  chart  chemistry  china  classic  classification  clever-rats  cocktail  cog-psych  collaboration  commentary  common-case  communication  comparison  competition  complex-systems  complexity  composition-decomposition  computation  computer-vision  concept  conceptual-vocab  context  contrarianism  convexity-curvature  cool  core-rats  correlation  cost-benefit  coupling-cohesion  course  critique  crosstab  crux  cs  culture  current-events  curvature  data  data-science  database  dataset  dataviz  debate  debugging  deep-learning  deepgoog  definite-planning  density  design  desktop  developing-world  dimensionality  direction  dirty-hands  discussion  distribution  draft  drugs  DSL  dumb-ML  duplication  dynamic  early-modern  economics  econotariat  ecosystem  embeddings  embodied  emergent  emotion  ends-means  engineering  enhancement  ensembles  entropy-like  error  essay  ethical-algorithms  europe  evopsych  examples  existence  exocortex  expert  expert-experience  explanans  explanation  exploratory  exposition  extrema  facebook  faq  features  film  foreign-lang  formal-methods  french  frontier  functional  futurism  games  gender  generalization  generative  genetic-load  genetics  germanic  google  gotchas  government  gradient-descent  graphical-models  graphs  grokkability  grokkability-clarity  ground-up  heavyweights  heuristic  history  hmm  hn  homepage  howto  hsu  human-capital  human-ml  humanity  ideas  idk  inference  info-foraging  information-theory  init  innovation  integration-extension  intelligence  interdisciplinary  interpretability  interview  intricacy  iq  isotropy  iteration-recursion  kernels  korea  language  latent-variables  LaTeX  learning  learning-theory  lectures  legibility  len:short  lens  let-me-see  lexical  libraries  lifts-projections  linear-algebra  linear-models  linearity  liner-notes  linguistics  links  list  literature  local-global  low-hanging  machine-learning  map-territory  marginal  marginal-rev  markov  math  math.NT  mathtariat  matrix-factorization  measurement  media  medicine  medieval  meta:reading  meta:research  metabuch  metrics  microsoft  minimalism  miri-cfar  mit  ML-MAP-E  model-class  model-selection  models  money-for-time  monte-carlo  mooc  mostly-modern  msr  multi  multiplicative  narrative  network-structure  neuro  neuro-nitgrit  neurons  news  nibble  nitty-gritty  nlp  nonlinearity  notetaking  novelty  numerics  objektbuch  ocaml-sml  ocr  off-convex  offense-defense  online-learning  open-problems  openai  operational  optimization  order-disorder  orders  org:bleg  org:com  org:edu  org:inst  org:junk  org:mag  org:mat  org:nat  org:popup  org:rec  org:sci  oss  overflow  p:***  p:someday  PAC  papers  parsimony  pdf  pennsylvania  people  performance  philosophy  pinboard  planning  pls  podcast  polisci  politics  polynomials  popsci  postrat  pragmatic  prediction  preprint  princeton  pro-rata  probability  productivity  prof  programming  project  properties  psychedelics  psychology  python  q-n-a  qra  quality  questions  quiz  random  ranking  rant  ratty  recommendations  reddit  reduction  reference  reflection  regularization  reinforcement  repo  research  research-program  retention  review  rhetoric  rigor  risk  roadmap  robotics  roots  saas  sample-complexity  sanjeev-arora  sci-comp  science  scifi-fantasy  scitariat  search  sequential  shift  short-circuit  signal-noise  signaling  similarity  simler  simplification-normalization  singularity  skunkworks  smoothness  social  social-science  software  sparsity  speaking  speculation  speedometer  spock  ssc  stackex  stanford  startups  state  state-of-art  stats  status  stochastic-processes  strategy  stream  strings  structure  study  studying  summary  survey  synchrony  syntax  synthesis  systems  tactics  talks  tcs  tech  tech-infrastructure  technical-writing  technology  techtariat  telos-atelos  the-classics  thesis  things  thinking  time  time-series  todo  toolkit  tools  top-n  traces  tradeoffs  trees  trends  trivia  trump  turing  tutorial  tv  twitter  types  unaffiliated  uniqueness  unit  unsupervised  ux  vague  VC-dimension  video  visualization  vulgar  webapp  weird  whiggish-hegelian  wiki  wkfly  wonkish  workflow  world  worrydream  writing  yc  yoga  yvain  👳  🔬  🤖 

Copy this bookmark: