nhaliday + project   117

Ask HN: Getting into NLP in 2018? | Hacker News
syllogism (spaCy author):
I think it's probably a bad strategy to try to be the "NLP guy" to potential employers. You'd do much better off being a software engineer on a project with people with ML or NLP expertise.

NLP projects fail a lot. If you line up a job as a company's first NLP person, you'll probably be setting yourself up for failure. You'll get handed an idea that can't work, you won't know enough about how to push back to change it into something that might, etc. After the project fails, you might get a chance to fail at a second one, but maybe not a third. This isn't a great way to move into any new field.

I think a cunning plan would be to angle to be the person who "productionises" models.
...
.--
...

Basically, don't just work on having more powerful solutions. Make sure you've tried hard to have easier problems as well --- that part tends to be higher leverage.

https://news.ycombinator.com/item?id=14008752
https://news.ycombinator.com/item?id=12916498
https://algorithmia.com/blog/introduction-natural-language-processing-nlp
hn  q-n-a  discussion  tech  programming  machine-learning  nlp  strategy  career  planning  human-capital  init  advice  books  recommendations  course  unit  links  automation  project  examples  applications  multi  mooc  lectures  video  data-science  org:com  roadmap  summary  error  applicability-prereqs  ends-means  telos-atelos  cost-benefit 
4 days ago by nhaliday
The Open Steno Project | Hacker News
https://web.archive.org/web/20170315133208/http://www.danieljosephpetersen.com/posts/programming-and-stenography.html
I think at the end of the day, the Plover guys are trying to solve the wrong problem. Stenography is a dying field. I don’t wish anyone to lose their livelihood, but realistically speaking, the job should not exist once text to speech technology advances far enough. I’m not claiming that the field will be replaced by it, but I also don’t love the idea of people having to learn such an inane and archaic system.
hn  commentary  keyboard  speed  efficiency  writing  language  maker  homepage  project  multi  techtariat  cost-benefit  critique  expert-experience  programming  backup  contrarianism 
9 days ago by nhaliday
Ask HN: Favorite note-taking software? | Hacker News
Ask HN: What is your ideal note-taking software and/or hardware?: https://news.ycombinator.com/item?id=13221158

my wishlist as of 2019:
- web + desktop macOS + mobile iOS (at least viewing on the last but ideally also editing)
- sync across all those
- open-source data format that's easy to manipulate for scripting purposes
- flexible organization: mostly tree hierarchical (subsuming linear/unorganized) but with the option for directed (acyclic) graph (possibly a second layer of structure/linking)
- can store plain text, LaTeX, diagrams, and raster/vector images (video prob not necessary except as links to elsewhere)
- full-text search
- somehow digest/import data from Pinboard, Workflowy, Papers 3/Bookends, and Skim, ideally absorbing most of their functionality
- so, eg, track notes/annotations side-by-side w/ original PDF/DjVu/ePub documents (to replace Papers3/Bookends/Skim), and maybe web pages too (to replace Pinboard)
- OCR of handwritten notes (how to handle equations/diagrams?)
- various forms of NLP analysis of everything (topic models, clustering, etc)
- maybe version control (less important than export)

candidates?:
- Evernote prob ruled out do to heavy use of proprietary data formats (unless I can find some way to export with tolerably clean output)
- Workflowy/Dynalist are good but only cover a subset of functionality I want
- org-mode doesn't interact w/ mobile well (and I haven't evaluated it in detail otherwise)
- TiddlyWiki/Zim are in the running, but not sure about mobile
- idk about vimwiki but I'm not that wedded to vim and it seems less widely used than org-mode/TiddlyWiki/Zim so prob pass on that
- Quiver/Joplin/Inkdrop look similar and cover a lot of bases, TODO: evaluate more
- Trilium looks especially promising, tho read-only mobile and for macOS desktop look at this: https://github.com/zadam/trilium/issues/511
- RocketBook is interesting scanning/OCR solution but prob not sufficient due to proprietary data format
- TODO: many more candidates, eg, TreeSheets, Gingko, OneNote (macOS?...), Notion (proprietary data format...), Zotero, Nodebook (https://nodebook.io/landing), Polar (https://getpolarized.io), Roam (looks very promising)

Ask HN: What do you use for you personal note taking activity?: https://news.ycombinator.com/item?id=15736102

Ask HN: What are your note-taking techniques?: https://news.ycombinator.com/item?id=9976751

Ask HN: How do you take notes (useful note-taking strategies)?: https://news.ycombinator.com/item?id=13064215

Ask HN: How to get better at taking notes?: https://news.ycombinator.com/item?id=21419478

Ask HN: How did you build up your personal knowledge base?: https://news.ycombinator.com/item?id=21332957
nice comment from math guy on structure and difference between math and CS: https://news.ycombinator.com/item?id=21338628
useful comment collating related discussions: https://news.ycombinator.com/item?id=21333383
highlights:
Designing a Personal Knowledge base: https://news.ycombinator.com/item?id=8270759
Ask HN: How to organize personal knowledge?: https://news.ycombinator.com/item?id=17892731
Do you use a personal 'knowledge base'?: https://news.ycombinator.com/item?id=21108527
Ask HN: How do you share/organize knowledge at work and life?: https://news.ycombinator.com/item?id=21310030

other stuff:
https://www.getdnote.com/blog/how-i-built-personal-knowledge-base-for-myself/
Tiago Forte: https://www.buildingasecondbrain.com

hn search: https://hn.algolia.com/?query=notetaking&type=story

Slant comparison commentary: https://news.ycombinator.com/item?id=7011281

good comparison of options here in comments here (and Trilium itself looks good): https://news.ycombinator.com/item?id=18840990

https://en.wikipedia.org/wiki/Comparison_of_note-taking_software

wikis:
https://www.slant.co/versus/5116/8768/~tiddlywiki_vs_zim
https://www.wikimatrix.org/compare/tiddlywiki+zim
http://tiddlymap.org/
https://www.zim-wiki.org/manual/Plugins/BackLinks_Pane.html
https://zim-wiki.org/manual/Plugins/Link_Map.html

apps:
Roam: https://news.ycombinator.com/item?id=21440289

intriguing but probably not appropriate for my needs: https://www.sophya.ai/

Inkdrop: https://news.ycombinator.com/item?id=20103589

Joplin: https://news.ycombinator.com/item?id=15815040
https://news.ycombinator.com/item?id=21555238

Leo Editor (combines tree outlining w/ literate programming/scripting, I think?): https://news.ycombinator.com/item?id=17769892

Frame: https://news.ycombinator.com/item?id=18760079

https://www.reddit.com/r/TheMotte/comments/cb18sy/anyone_use_a_personal_wiki_software_to_catalog/
Notion: https://news.ycombinator.com/item?id=18904648

Anki:
https://www.reddit.com/r/Anki/comments/as8i4t/use_anki_for_technical_books/
https://www.freecodecamp.org/news/how-anki-saved-my-engineering-career-293a90f70a73/

interesting comment(s) about tree outliners and spreadsheets: https://news.ycombinator.com/item?id=21170434
hn  discussion  recommendations  software  tools  desktop  app  notetaking  exocortex  wkfly  wiki  productivity  multi  comparison  crosstab  properties  applicability-prereqs  nlp  info-foraging  chart  webapp  reference  q-n-a  retention  workflow  reddit  social  ratty  ssc  learning  studying  commentary  structure  thinking  network-structure  things  collaboration  ocr  trees  graphs  LaTeX  search  todo  project  money-for-time  synchrony  pinboard  state  duplication  worrydream  simplification-normalization  links  minimalism  design  neurons  ai-control  openai  miri-cfar 
5 weeks ago by nhaliday
Is there a common method for detecting the convergence of the Gibbs sampler and the expectation-maximization algorithm? - Quora
In practice and theory it is much easier to diagnose convergence in EM (vanilla or variational) than in any MCMC algorithm (including Gibbs sampling).

https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged
There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.
nibble  q-n-a  qra  acm  stats  probability  limits  convergence  distribution  sampling  markov  monte-carlo  ML-MAP-E  checking  equilibrium  stylized-facts  gelman  levers  mixing  empirical  plots  manifolds  multi  fixed-point  iteration-recursion  heuristic  expert-experience  theory-practice  project 
5 weeks ago by nhaliday
A Formal Verification of Rust's Binary Search Implementation
Part of the reason for this is that it’s quite complicated to apply mathematical tools to something unmathematical like a functionally unpure language (which, unfortunately, most programs tend to be written in). In mathematics, you don’t expect a variable to suddenly change its value, and it only gets more complicated when you have pointers to those dang things:

“Dealing with aliasing is one of the key challenges for the verification of imperative programs. For instance, aliases make it difficult to determine which abstractions are potentially affected by a heap update and to determine which locks need to be acquired to avoid data races.” 1

While there are whole logics focused on trying to tackle these problems, a master’s thesis wouldn’t be nearly enough time to model a formal Rust semantics on top of these, so I opted for a more straightforward solution: Simply make Rust a purely functional language!

Electrolysis: Simple Verification of Rust Programs via Functional Purification
If you know a bit about Rust, you may have noticed something about that quote in the previous section: There actually are no data races in (safe) Rust, precisely because there is no mutable aliasing. Either all references to some datum are immutable, or there is a single mutable reference. This means that mutability in Rust is much more localized than in most other imperative languages, and that it is sound to replace a destructive update like

p.x += 1
with a functional one – we know there’s no one else around observing p:

let p = Point { x = p.x + 1, ..p };
techtariat  plt  programming  formal-methods  rust  arrows  reduction  divide-and-conquer  correctness  project  state  functional  concurrency  direct-indirect  pls  examples  simplification-normalization  compilers 
august 2019 by nhaliday
Why is Google Translate so bad for Latin? A longish answer. : latin
hmm:
> All it does its correlate sequences of up to five consecutive words in texts that have been manually translated into two or more languages.
That sort of system ought to be perfect for a dead language, though. Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.

We're not exactly inundated with brand new Latin to translate.
--
> Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.
What makes you think that the Google folks haven't done so and used that to create the language models they use?
> That sort of system ought to be perfect for a dead language, though.
Perhaps. But it will be bad at translating novel English sentences to Latin.
foreign-lang  reddit  social  discussion  language  the-classics  literature  dataset  measurement  roots  traces  syntax  anglo  nlp  stackex  links  q-n-a  linguistics  lexical  deep-learning  sequential  hmm  project  arrows  generalization  state-of-art  apollonian-dionysian  machine-learning  google 
june 2019 by nhaliday
Burrito: Rethinking the Electronic Lab Notebook
Seems very well-suited for ML experiments (if you can get it to work), also the nilfs aspect is cool and basically implements exactly one of the my project ideas (mini-VCS for competitive programming). Unfortunately gnarly installation instructions specify running it on Linux VM: https://github.com/pgbovine/burrito/blob/master/INSTALL. Linux is hard requirement due to nilfs.
techtariat  project  tools  devtools  linux  programming  yak-shaving  integration-extension  nitty-gritty  workflow  exocortex  scholar  software  python  app  desktop  notetaking  state  machine-learning  data-science  nibble  sci-comp  oly  vcs  multi  repo  paste  homepage  research 
may 2019 by nhaliday
Dimensions - Geert Hofstede
http://geerthofstede.com/culture-geert-hofstede-gert-jan-hofstede/6d-model-of-national-culture/

https://www.reddit.com/r/europe/comments/4g88kt/eu28_countries_ranked_by_hofstedes_cultural/
https://archive.is/rXnII

https://hbdchick.wordpress.com/2013/09/07/national-individualism-collectivism-scores/

Individualism and Collectivism in Israeli Society: Comparing Religious and Secular High-School Students: https://sci-hub.tw/https://link.springer.com/article/10.1023/A:1016945121604
A common collective basis of mutual value consensus was found in the two groups; however, as predicted, there were differences between secular and religious students on the three kinds of items, since the religious scored higher than the secular students on items emphasizing collectivist orientation. The differences, however, do not fit the common theoretical framework of collectivism-individualism, but rather tend to reflect the distinction between in-group and universal collectivism.

Individualism and Collectivism in Two Conflicted Societies: Comparing Israeli-Jewish and Palestinian-Arab High School Students: https://sci-hub.tw/http://journals.sagepub.com/doi/10.1177/0044118X01033001001
Both groups were found to be more collectivistic than individualistic oriented. However, as predicted, the Palestinians scored higher than the Israeli students on items emphasizing in-group collectivist orientation (my nationality, my country, etc.). The differences between the two groups tended to reflect some subdistinctions such as different elements of individualism and collectivism. Moreover, they reflected the historical context and contemporary influences, such as the stage where each society is at in the nation-making process.

Religion as culture: religious individualism and collectivism among american catholics, jews, and protestants.: https://www.ncbi.nlm.nih.gov/pubmed/17576356
We propose the theory that religious cultures vary in individualistic and collectivistic aspects of religiousness and spirituality. Study 1 showed that religion for Jews is about community and biological descent but about personal beliefs for Protestants. Intrinsic and extrinsic religiosity were intercorrelated and endorsed differently by Jews, Catholics, and Protestants in a pattern that supports the theory that intrinsic religiosity relates to personal religion, whereas extrinsic religiosity stresses community and ritual (Studies 2 and 3). Important life experiences were likely to be social for Jews but focused on God for Protestants, with Catholics in between (Study 4). We conclude with three perspectives in understanding the complex relationships between religion and culture.

Inglehart–Welzel cultural map of the world: https://en.wikipedia.org/wiki/Inglehart%E2%80%93Welzel_cultural_map_of_the_world
Live cultural map over time 1981 to 2015: https://www.youtube.com/watch?v=ABWYOcru7js

https://en.wikipedia.org/wiki/Post-materialism

https://ourworldindata.org/materialism-and-post-materialism
By Income of the Country

Most of the low post-materialism, high income countries are East Asian :(. Some decent options: Norway, Netherlands, Iceland (surprising!). Other Euro countries fall into that category but interest me less for other reasons.

https://graphpaperdiaries.com/2016/06/10/materialism-and-post-materialism/

Postmaterialism and the Economic Condition: https://www.jstor.org/stable/2111573
prof  psychology  social-psych  values  culture  cultural-dynamics  anthropology  individualism-collectivism  expression-survival  long-short-run  time-preference  uncertainty  outcome-risk  gender  egalitarianism-hierarchy  things  phalanges  group-level  world  tools  comparison  data  database  n-factor  occident  social-norms  project  microfoundations  multi  maps  visualization  org:junk  psych-architecture  personality  hari-seldon  discipline  self-control  geography  shift  developing-world  europe  the-great-west-whale  anglosphere  optimate  china  asia  japan  sinosphere  orient  MENA  reddit  social  discussion  backup  EU  inequality  envy  britain  anglo  nordic  ranking  top-n  list  eastern-europe  germanic  gallic  mediterranean  cog-psych  sociology  guilt-shame  duty  tribalism  us-them  cooperate-defect  competition  gender-diff  metrics  politics  wiki  concept  society  civilization  infographic  ideology  systematic-ad-hoc  let-me-see  general-survey  chart  video  history  metabuch  dynamic  trends  plots  time-series  reference  water  mea 
june 2017 by nhaliday
Reuters Institute Digital News Report 2017
Section 3.2, p. 39 has polarization data
A new way to chart ideological leanings in news media: https://www.axios.com/a-new-way-to-chart-ideological-leanings-in-news-media-2475716743.html
(using Twitter follows)
Exploring the Ideological Nature of Journalists’ Social Networks on Twitter and Associations with News Story Content: https://drive.google.com/file/d/0B8CcT_0LwJ8QVnJMR1QzcGNuTkk/view
Visualizing Political Polarization on Twitter: http://www.theoutgroup.org/
Dear Mainstream Media: Why so liberal?: https://www.washingtonpost.com/blogs/erik-wemple/wp/2017/01/27/dear-mainstream-media-why-so-liberal/
Political Leanings of US Journalists vs. the Public in 2002

Topline Results: 2017 Texas Media & Society Survey: https://moody.utexas.edu/sites/default/files/TMASS_2017Topline_final.pdf
https://twitter.com/gelliottmorris/status/915295562123108352
https://archive.is/sE5cg
Some interesting results from a poll about media & polarization that I presented today for @AStraussInst <THREAD>
pdf  news  org:lite  media  database  data  analysis  politics  polarization  poll  values  time-use  world  usa  europe  EU  britain  internet  tv  social  white-paper  org:ngo  org:edu  ideology  multi  visualization  spatial  exploratory  polisci  wonkish  network-structure  twitter  techtariat  ssc  neocons  info-dynamics  project  org:junk  journos-pundits  info-foraging  track-record  objektbuch  chart  commentary  backup  org:rec  distribution  biases  comparison  within-without  input-output  supply-demand 
june 2017 by nhaliday
Distribution of Word Lengths in Various Languages - Ravi Parikh's Website
Note that this visualization isn't normalized based on usage. For example the English word 'the' is used frequently, while the word 'lugubrious' is rarely used; however both words count the same in computing the histogram and average word lengths. A great idea for a follow-up would be to use language corpuses instead of word lists in order to build these histograms.
techtariat  data  visualization  project  anglo  language  foreign-lang  distribution  expectancy  measure  lexical 
june 2017 by nhaliday
Comprehensive Military Power: World’s Top 10 Militaries of 2015 - The Unz Review
gnon  military  defense  scale  top-n  list  ranking  usa  china  asia  analysis  data  sinosphere  critique  russia  capital  magnitude  street-fighting  individualism-collectivism  europe  germanic  world  developing-world  latin-america  MENA  india  war  meta:war  history  mostly-modern  world-war  prediction  trends  realpolitik  strategy  thucydides  great-powers  multi  news  org:mag  org:biz  org:foreign  current-events  the-bones  org:rec  org:data  org:popup  skunkworks  database  dataset  power  energy-resources  heavy-industry  economics  growth-econ  foreign-policy  geopolitics  maps  project  expansionism  the-world-is-just-atoms  civilization  let-me-see  wiki  reference  metrics  urban  population  japan  britain  gallic  allodium  definite-planning  kumbaya-kult  peace-violence  urban-rural  wealth  wealth-of-nations  econ-metrics  dynamic  infographic 
june 2017 by nhaliday
Lost and Found | West Hunter
I get the distinct impression that someone (probably someone other than Varro) came up with an approximation of germ theory 1500 years before Girolamo Fracastoro. But his work was lost.

Everybody knows, or should know, that the vast majority of Classical literature has not been preserved. Those lost works contained facts and ideas that might have value today – certainly there are topics that we understand much better because of insights from Classical literature. For example, Reich and Patterson find that some of the Indian castes have existed for something like three thousand years: this is easier to believe when you consider that Megasthenes wrote about the caste system as early as 300 BC.

We don’t put much effort into recovering lost Classical literature. But there are ways in which we could push harder – by increased funding for work on the Herculaneum scrolls, or the Oxyrhynchus papyri collection, for example. Some old-fashioned motivated archaeology might get lucky and find another set of Amarna cuneiform letters, or a new Antikythera mechanism.

https://westhunt.wordpress.com/2012/03/06/spontaneous-generation/
Here we have yet another case in which a discovery was possible for a long time before it was actually accepted. Aristotle is the villain here: he clearly endorses spontaneous generation of many plants and animals. On the other hand, I don’t remember him saying that people should accept all of his conclusions uncritically and without further experimentation for the next couple of thousand years, which is what happened. So maybe we’re all guilty.

...

Part of the funny here (not even counting practical experience) is that almost every educated man over these two millennia had read, and indeed studied deeply, a work with a fairly clear statement of the actual fly->egg->maggot->fly process. As I as I can tell, only one person (Redi) seems to have picked up on this.

“But the more Achilles gazed, the greater rose his desire for vengeance, and his eyes flashed terribly, like coals beneath his lids, as he lifted the god’s marvellous gifts and exulted. When he had looked his fill on their splendour, he spoke to Thetis winged words; ‘Mother, the god grants me a gift fit for the immortals, such as no mortal smith could fashion. Now I shall arm myself for war. Yet I fear lest flies infest the wounds the bronze blades made, and maggots breed in the corpse of brave Patroclus, and now his life is fled, rot the flesh, and disfigure all his body.’ ”

You’d think a blind man would have noticed this.

Anyhow, the lesson is clear. Low hanging fruit can persist for a long time if the conventional wisdom is wrong – and sometimes it is.

http://www.bede.org.uk/literature.htm

Transmission of the Greek Classics: https://en.wikipedia.org/wiki/Transmission_of_the_Greek_Classics
https://www.quora.com/How-much-writing-from-ancient-Greece-is-preserved-Is-it-a-finite-amount-that-someone-could-potentially-read

By way of comparison, the complete Loeb Classical Library (which includes all the important classical texts) has 337 volumes for Ancient Greek --- and those aren't 100,000 word-long door-stoppers.
https://www.loebclassics.com/
$65/year for individuals (I wonder if public libraries have subscriptions?)

http://www.roger-pearse.com/weblog/2009/10/26/reference-for-the-claim-that-only-1-of-ancient-literature-survives/
http://www.patheos.com/blogs/geneveith/2015/01/finding-the-lost-texts-of-classical-antiquity/
http://www.historyofinformation.com/narrative/loss-of-information.php
http://www.bede.org.uk/literature.htm

https://twitter.com/futurepundit/status/927344648154112000
https://archive.is/w86uL
1/ Thinking about what Steven Greenblatt described in The Swerve as a mass extinction of ancient books (we have little of what they wrote)
2/ If I could go back in time to, say, 100 AD or 200 AD I would go with simple tech for making books last for a thousand years. Possible?

https://www.gnxp.com/WordPress/2018/01/28/the-rapid-fading-of-information/
I’ve put a lot of content out there over the years. Probably on the order of 5 million words across my blogs. Some publications here and there. Lots of tweets. But very little of it will persist into future generations. Digital is evanescent.

But so is paper. I believe that even good hardcover books probably won’t last more than a few hundred years.

Perhaps we should go back to some form of cuneiform? Stone and metal will last thousands of years.

How long does a paperback book last?: https://www.quora.com/How-long-does-a-paperback-book-last

A 500 years vault for books?: https://worldbuilding.stackexchange.com/questions/137583/a-500-years-vault-for-books
There are about four solutions that have actually worked in history

1. The desert method
2. Give them to an institution which will preserve them
3. The opposite of secrecy: duplicate them extensively

4. Transcribe them to durable materials

It is hard to keep books for a really long time because paper, parchment and papyrus are easily destroyed. However books have been produced on much more durable materials. Nowadays a holographic copy can be laser etched into stainless steel. In Sumer, 5300 years ago they pressed them into clay tablets. If the document was important, they fired the clay; otherwise they just let it dry. The fired versions are close to indestructible.
west-hunter  scitariat  discussion  ideas  speculation  history  iron-age  mediterranean  the-classics  innovation  low-hanging  spreading  disease  parasites-microbiome  🔬  archaeology  discovery  epidemiology  canon  multi  literature  fiction  agriculture  india  asia  pop-structure  social-structure  ethnography  the-trenches  nihil  flux-stasis  science  medieval  europe  the-great-west-whale  letters  info-dynamics  being-right  scale  wiki  reference  trivia  cocktail  curiosity  enlightenment-renaissance-restoration-reformation  article  q-n-a  qra  data  database  project  toys  religion  christianity  civilization  twitter  social  gedanken  gnon  backup  time  volo-avolo  brands  money  gnxp  store  stackex  traces  sequential  knowledge  pro-rata 
may 2017 by nhaliday
:feed v1 - /fora/posts/~2017.4.12..21.14.00..fe17~
The goal of this demo was to show that building a Twitter replacement actually isn't that hard at all; and it can be done almost entirely on the frontend. As shown, you don't even have to use React/Redux. But that's probably the way to go if you want to build the real thing.
techtariat  urbit  software  decentralized  twitter  social  internet  web  programming  tutorial  project  gnon 
april 2017 by nhaliday
Dgsh – Directed graph shell | Hacker News
I've worked with and looked at a lot of data processing helpers. Tools, that try to help you build data pipelines, for the sake of performance, reproducibility or simply code uniformity.
What I found so far: Most tools, that invent a new language or try to cram complex processes into lesser suited syntactical environments are not loved too much.

...

I'll give dgsh a try. The tool reuse approach and the UNIX spirit seems nice. But my initial impression of the "C code metrics" example from the site is mixed: It reminds me of awk, about which one of the authors said, that it's a beautiful language, but if your programs getting longer than hundred lines, you might want to switch to something else.

Two libraries which have a great grip at the plumbing aspect of data processing systems are airflow and luigi. They are python libraries and with it you have a concise syntax and basically all python libraries plus non-python tools with a command line interface at you fingertips.

I am curious, what kind of process orchestration tools people use and can recommend?

--

Exactly our experience too, from complex machine learning workflows in various aspects of drug discovery.
We basically did not really find any of the popular DSL-based bioinformatics pipeline tools (snakemake, bpipe etc) to fit the bill. Nextflow came close, but in fact allows quite some custom code too.

What worked for us was to use Spotify's Luigi, which is a python library rather than DSL.

The only thing was that we had to develop a flow-based inspired API on top of Luigi's more functional programming based one, in order to make defining dependencies fluent and easy enough to specify for our complex workflows.

Our flow-based inspired Luigi API (SciLuigi) for complex workflows, is available at:

https://github.com/pharmbio/sciluigi

--

We have measured many of the examples against the use of temporary files and the web report one against (single-threaded) implementations in Perl and Java. In almost all cases dgsh takes less wall clock time, but often consumes more CPU resources.
commentary  project  programming  terminal  worrydream  pls  plt  unix  hn  graphs  tools  devtools  let-me-see  composition-decomposition  yak-shaving  workflow  exocortex  hmm  cool  software  desktop  sci-comp  stock-flow  performance  comparison  links  libraries  python 
january 2017 by nhaliday
Cryptpad: Zero Knowledge, Collaborative Real Time Editing | Hacker News
comments have interesting discussion of use of "zero-knowledge" in practice
commentary  hn  project  software  tools  crypto  privacy  hmm  engineering 
september 2016 by nhaliday
« earlier      
per page:    204080120160

bundles : engmeta

related tags

:)  :/  abstraction  accretion  accuracy  acm  acmtariat  acne  advanced  adversarial  advice  aggregator  agriculture  ai  ai-control  algorithms  alien-character  allodium  analogy  analysis  anglo  anglosphere  announcement  anonymity  anthropology  antiquity  apollonian-dionysian  app  applicability-prereqs  applications  archaeology  arrows  art  article  asia  atoms  attention  audio  auto-learning  automata-languages  automation  backup  barons  bayesian  beauty  being-right  benchmarks  best-practices  biases  big-picture  biodet  bitcoin  blockchain  blog  books  bots  branches  brands  britain  broad-econ  browser  build-packaging  c(pp)  caching  canon  capital  career  chan  charity  chart  checking  checklists  chemistry  china  christianity  civic  civilization  classic  classification  clever-rats  client-server  cliometrics  clojure  cocktail  cocoa  code-dive  code-organizing  cog-psych  collaboration  comics  commentary  community  comparison  competition  compilers  composition-decomposition  compression  computer-memory  computer-vision  concentration-of-measure  concept  concurrency  consumerism  context  contracts  contrarianism  convergence  cool  cooperate-defect  coordination  correctness  correlation  cost-benefit  counter-revolution  coupling-cohesion  course  critique  crosstab  crypto  crypto-anarchy  cryptocurrency  cultural-dynamics  culture  curiosity  current-events  cycles  data  data-science  data-structures  database  dataset  dataviz  dbs  debt  debugging  decentralized  deep-learning  deep-materialism  deepgoog  defense  definite-planning  design  desktop  developing-world  devops  devtools  dimensionality  diogenes  direct-indirect  dirty-hands  discipline  discovery  discussion  disease  distributed  distribution  divide-and-conquer  diy  dotnet  draft  dropbox  drugs  dumb-ML  duplication  duty  dynamic  dynamical  dysgenics  early-modern  eastern-europe  econ-metrics  economics  econotariat  editors  effective-altruism  efficiency  egalitarianism-hierarchy  EGT  elegance  embeddings  embodied  emotion  empirical  ends-means  energy-resources  engineering  enlightenment-renaissance-restoration-reformation  ensembles  envy  epidemiology  equilibrium  error  essay  ethnography  EU  europe  examples  exegesis-hermeneutics  exocortex  expansionism  expectancy  experiment  expert-experience  explanation  exploratory  exposition  expression-survival  facebook  features  fiction  film  fitsci  fixed-point  flexibility  flux-stasis  foreign-lang  foreign-policy  form-design  formal-methods  forum  frameworks  free  french  frontend  frontier  functional  gallic  games  gbooks  GCTA  gedanken  gelman  gender  gender-diff  general-survey  generalization  generative  genetics  geography  geometry  geopolitics  germanic  get-fit  gig-econ  git  github  glitch  gnon  gnu  gnxp  golang  google  gradient-descent  graphics  graphs  great-powers  group-level  growth-econ  guide  guilt-shame  gwern  hacker  haidt  hardware  hari-seldon  haskell  hci  health  heavy-industry  heavyweights  heuristic  hg  history  hmm  hn  homepage  homo-hetero  howto  huge-data-the-biggest  human-capital  human-ml  hypothesis-testing  ide  ideas  ideology  impact  india  individualism-collectivism  inequality  info-dynamics  info-foraging  infographic  init  innovation  input-output  integration-extension  intellectual-property  interface  interface-compatibility  internet  interview-prep  intricacy  ios  IoT  iron-age  israel  iteration-recursion  japan  javascript  journos-pundits  judaism  jvm  kernels  keyboard  knowledge  kumbaya-kult  language  latent-variables  latex  latin-america  learning  lectures  len:long  let-me-see  letters  levers  lexical  libraries  limits  linear-algebra  linearity  liner-notes  linguistics  links  linux  list  literature  lived-experience  llvm  local-global  long-short-run  low-hanging  machine-learning  magnitude  maker  manifolds  maps  marginal  markov  matching  math  math.DS  matrix-factorization  meaningness  measure  measurement  mechanics  media  medieval  mediterranean  memory-management  MENA  meta:science  meta:war  metabuch  metal-to-virtual  metaprogramming  methodology  metrics  michael-nielsen  microbiz  microfoundations  military  minimalism  minimum-viable  miri-cfar  mit  mixing  ML-MAP-E  mobile  model-class  models  modernity  money  money-for-time  monte-carlo  mooc  morality  mostly-modern  move-fast-(and-break-things)  multi  music  n-factor  nascent-state  neocons  network-structure  networking  neuro  neurons  news  nibble  nihil  nitty-gritty  nlp  noahpinion  nonlinearity  nonparametric  nordic  notation  notetaking  numerics  objektbuch  ocaml-sml  occident  ocr  oly  oly-programming  openai  optimate  orders  org:biz  org:bleg  org:com  org:data  org:edu  org:foreign  org:junk  org:lite  org:mag  org:med  org:nat  org:ngo  org:popup  org:rec  org:sci  organization  orient  os  oscillation  oss  outcome-risk  outliers  overflow  p2p  papers  parametric  parasites-microbiome  parsimony  paste  path-dependence  pdf  peace-violence  people  performance  personality  phalanges  philosophy  photography  physics  pic  pinboard  planning  play  plots  pls  plt  polarization  polis  polisci  politics  poll  pop-structure  population  positivity  postmortem  power  ppl  pragmatic  prediction  prepping  prioritizing  privacy  pro-rata  probability  productivity  prof  programming  project  properties  property-rights  proposal  protestant-catholic  protocol-metadata  psych-architecture  psychedelics  psychology  python  q-n-a  qra  quality  quixotic  quora  random  ranking  ratty  reading  realness  realpolitik  rec-math  recommendations  reddit  reduction  reference  reflection  reinforcement  religion  replication  repo  research  retention  retrofit  rhetoric  roadmap  roots  russia  rust  sample-complexity  sampling  scale  scaling-tech  scholar  sci-comp  science  science-anxiety  scitariat  search  security  self-control  sequential  shift  shipping  SIGGRAPH  signal-noise  signum  similarity  simplification-normalization  simulation  sinosphere  skunkworks  sleuthin  slides  smart-contracts  social  social-norms  social-psych  social-structure  society  sociology  software  spatial  speculation  speed  spreading  ssc  stackex  startups  stat-power  state  state-of-art  static-dynamic  stats  stock-flow  store  stories  strategy  stream  street-fighting  strings  structure  studying  stylized-facts  subculture  summary  supply-demand  sv  synchrony  syntax  system-design  systematic-ad-hoc  systems  szabo  tactics  tails  talks  tcstariat  tech  tech-infrastructure  technical-writing  techtariat  telos-atelos  terminal  the-bones  the-classics  the-great-west-whale  the-trenches  the-world-is-just-atoms  theory-practice  theos  thesis  things  thinking  thucydides  time  time-complexity  time-preference  time-series  time-use  tip-of-tongue  todo  tools  top-n  toxoplasmosis  toys  traces  track-record  tradeoffs  trees  trends  tribalism  trivia  trump  turchin  tutorial  tv  twitter  types  ui  uncertainty  unit  universalism-particularism  unix  urban  urban-rural  urbit  us-them  usa  usaco-ioi  ux  values  vcs  vgr  video  virtu  virtualization  visual-understanding  visualization  visuo  volo-avolo  vr  war  water  wealth  wealth-of-nations  web  webapp  west-hunter  white-paper  wiki  wire-guided  within-group  within-without  wkfly  wonkish  workflow  world  world-war  worrydream  writing  yak-shaving  zeitgeist  zooming  🎩  🔬  🖥 

Copy this bookmark:



description:


tags: