nhaliday + vcs   63

The Future of Mathematics? [video] | Hacker News
Kevin Buzzard (the Lean guy)

- general reflection on proof asssistants/theorem provers
- Kevin Hale's formal abstracts project, etc
- thinks of available theorem provers, Lean is "[the only one currently available that may be capable of formalizing all of mathematics eventually]" (goes into more detail right at the end, eg, quotient types)
hn  commentary  discussion  video  talks  presentation  math  formal-methods  expert-experience  msr  frontier  state-of-art  proofs  rigor  education  higher-ed  optimism  prediction  lens  search  meta:research  speculation  exocortex  skunkworks  automation  research  math.NT  big-surf  software  parsimony  cost-benefit  intricacy  correctness  programming  pls  python  functional  haskell  heavyweights  research-program  review  reflection  multi  pdf  slides  oly  experiment  span-cover  git  vcs  teaching  impetus  academia  composition-decomposition  coupling-cohesion  database  trust  types  plt  lifts-projections  induction  critique  beauty  truth  elegance  aesthetics 
8 weeks ago by nhaliday
Three best practices for building successful data pipelines - O'Reilly Media
Drawn from their experiences and my own, I’ve identified three key areas that are often overlooked in data pipelines, and those are making your analysis:
1. Reproducible
2. Consistent
3. Productionizable


Science that cannot be reproduced by an external third party is just not science — and this does apply to data science. One of the benefits of working in data science is the ability to apply the existing tools from software engineering. These tools let you isolate all the dependencies of your analyses and make them reproducible.

Dependencies fall into three categories:
1. Analysis code ...
2. Data sources ...
3. Algorithmic randomness ...


Establishing consistency in data

There are generally two ways of establishing the consistency of data sources. The first is by checking-in all code and data into a single revision control repository. The second method is to reserve source control for code and build a pipeline that explicitly depends on external data being in a stable, consistent format and location.

Checking data into version control is generally considered verboten for production software engineers, but it has a place in data analysis. For one thing, it makes your analysis very portable by isolating all dependencies into source control. Here are some conditions under which it makes sense to have both code and data in source control:
Small data sets ...
Regular analytics ...
Fixed source ...

Productionizability: Developing a common ETL

1. Common data format ...
2. Isolating library dependencies ...

Rigorously enforce the idempotency constraint
For efficiency, seek to load data incrementally
Always ensure that you can efficiently process historic data
Partition ingested data at the destination
Rest data between tasks
Pool resources for efficiency
Store all metadata together in one place
Manage login details in one place
Specify configuration details once
Parameterize sub flows and dynamically run tasks where possible
Execute conditionally
Develop your own workflow framework and reuse workflow components

more focused on details of specific technologies:

techtariat  org:com  best-practices  engineering  code-organizing  machine-learning  data-science  yak-shaving  nitty-gritty  workflow  config  vcs  replication  homo-hetero  multi  org:med  design  system-design  links  shipping  minimalism  volo-avolo  causation  random  invariance  structure  arrows  protocol-metadata  interface-compatibility 
august 2019 by nhaliday
How to work with GIT/SVN — good practices - Jakub Kułak - Medium
best part of this is the links to other guides
Commit Often, Perfect Later, Publish Once: https://sethrobertson.github.io/GitBestPractices/

My Favourite Git Commit: https://news.ycombinator.com/item?id=21289827
I use the following convention to start the subject of commit(posted by someone in a similar HN thread):
org:med  techtariat  tutorial  faq  guide  howto  workflow  devtools  best-practices  vcs  git  engineering  programming  multi  reference  org:junk  writing  technical-writing  hn  commentary  jargon  list  objektbuch  examples  analysis 
june 2019 by nhaliday
Fossil: Home
VCS w/ builtin issue tracking and wiki used by SQLite
tools  devtools  software  vcs  wiki  debugging  integration-extension  oss  dbs 
may 2019 by nhaliday
Is backing up a MySQL database in Git a good idea? - Software Engineering Stack Exchange
*no: list of alternatives*

Top 2 answers contradict each other but both agree that you should at least version the schema and other scripts.

My impression is that the guy linked in the accepted answer is arguing for a minority practice.
q-n-a  stackex  programming  engineering  dbs  vcs  git  debate  critique  backup  best-practices  flux-stasis  nitty-gritty  gotchas  init  advice  code-organizing  multi  hmm  idk  contrarianism  rhetoric  links  system-design 
may 2019 by nhaliday
Burrito: Rethinking the Electronic Lab Notebook
Seems very well-suited for ML experiments (if you can get it to work), also the nilfs aspect is cool and basically implements exactly one of the my project ideas (mini-VCS for competitive programming). Unfortunately gnarly installation instructions specify running it on Linux VM: https://github.com/pgbovine/burrito/blob/master/INSTALL. Linux is hard requirement due to nilfs.
techtariat  project  tools  devtools  linux  programming  yak-shaving  integration-extension  nitty-gritty  workflow  exocortex  scholar  software  python  app  desktop  notetaking  state  machine-learning  data-science  nibble  sci-comp  oly  vcs  multi  repo  paste  homepage  research 
may 2019 by nhaliday
When to use C over C++, and C++ over C? - Software Engineering Stack Exchange
You pick C when
- you need portable assembler (which is what C is, really) for whatever reason,
- your platform doesn't provide C++ (a C compiler is much easier to implement),
- you need to interact with other languages that can only interact with C (usually the lowest common denominator on any platform) and your code consists of little more than the interface, not making it worth to lay a C interface over C++ code,
- you hack in an Open Source project (many of which, for various reasons, stick to C),
- you don't know C++.
In all other cases you should pick C++.


At the same time, I have to say that @Toll's answers (for one obvious example) have things just about backwards in most respects. Reasonably written C++ will generally be at least as fast as C, and often at least a little faster. Readability is generally much better, if only because you don't get buried in an avalanche of all the code for even the most trivial algorithms and data structures, all the error handling, etc.


As it happens, C and C++ are fairly frequently used together on the same projects, maintained by the same people. This allows something that's otherwise quite rare: a study that directly, objectively compares the maintainability of code written in the two languages by people who are equally competent overall (i.e., the exact same people). At least in the linked study, one conclusion was clear and unambiguous: "We found that using C++ instead of C results in improved software quality and reduced maintenance effort..."


(Side-note: Check out Linus Torvads' rant on why he prefers C to C++. I don't necessarily agree with his points, but it gives you insight into why people might choose C over C++. Rather, people that agree with him might choose C for these reasons.)


Why would anybody use C over C++? [closed]: https://stackoverflow.com/questions/497786/why-would-anybody-use-c-over-c
Joel's answer is good for reasons you might have to use C, though there are a few others:
- You must meet industry guidelines, which are easier to prove and test for in C.
- You have tools to work with C, but not C++ (think not just about the compiler, but all the support tools, coverage, analysis, etc)
- Your target developers are C gurus
- You're writing drivers, kernels, or other low level code
- You know the C++ compiler isn't good at optimizing the kind of code you need to write
- Your app not only doesn't lend itself to be object oriented, but would be harder to write in that form

In some cases, though, you might want to use C rather than C++:
- You want the performance of assembler without the trouble of coding in assembler (C++ is, in theory, capable of 'perfect' performance, but the compilers aren't as good at seeing optimizations a good C programmer will see)
- The software you're writing is trivial, or nearly so - whip out the tiny C compiler, write a few lines of code, compile and you're all set - no need to open a huge editor with helpers, no need to write practically empty and useless classes, deal with namespaces, etc. You can do nearly the same thing with a C++ compiler and simply use the C subset, but the C++ compiler is slower, even for tiny programs.
- You need extreme performance or small code size, and know the C++ compiler will actually make it harder to accomplish due to the size and performance of the libraries
- You contend that you could just use the C subset and compile with a C++ compiler, but you'll find that if you do that you'll get slightly different results depending on the compiler.

Regardless, if you're doing that, you're using C. Is your question really "Why don't C programmers use C++ compilers?" If it is, then you either don't understand the language differences, or you don't understand compiler theory.


- Because they already know C
- Because they're building an embedded app for a platform that only has a C compiler
- Because they're maintaining legacy software written in C
- You're writing something on the level of an operating system, a relational database engine, or a retail 3D video game engine.
q-n-a  stackex  programming  engineering  pls  best-practices  impetus  checklists  c(pp)  systems  assembly  compilers  hardware  embedded  oss  links  study  evidence-based  devtools  performance  rant  expert-experience  types  blowhards  linux  git  vcs  debate  rhetoric  worse-is-better/the-right-thing  cracker-prog  multi  metal-to-virtual  interface-compatibility 
may 2019 by nhaliday
Delta debugging - Wikipedia
good overview of with examples: https://www.csm.ornl.gov/~sheldon/bucket/Automated-Debugging.pdf

Not as useful for my usecases (mostly contest programming) as QuickCheck. Input is generally pretty structured and I don't have a long history of code in VCS. And when I do have the latter git-bisect is probably enough.

good book tho: http://www.whyprogramsfail.com/toc.php
WHY PROGRAMS FAIL: A Guide to Systematic Debugging\
wiki  reference  programming  systems  debugging  c(pp)  python  tools  devtools  links  hmm  formal-methods  divide-and-conquer  vcs  git  search  yak-shaving  pdf  white-paper  multi  examples  stories  books  unit  caltech  recommendations  advanced  correctness 
may 2019 by nhaliday
I'm curious what Gerrit gets them that Github doesn't have natively, too. | Hacker News
There are a lot of things lacking about GitHub's code review process (pull requests). Off the top of my head:
- Merging a pull request (almost) always creates a merge commit, polluting the change history. Gerrit will automatically rebase the changes atop the master branch head, leaving a nice linear history.

- Pull requests require the contributor to create a public fork of the repository they're committing to. I can see how this works for some people, but I find it gross for each contributor to the Go project to have their own public fork. What a mess.

- Comments on pull requests are sent as soon as they are created. Gerrit allows you to make many comments on a change, as drafts, and then send them all in one go, sending just a single email. This is much easier to manage for a large project.

- Gerrit has the notion of multiple 'patch sets' for a particular change, and you can see diffs between patch sets, so it's much easier to progressively review large changes.

And there are many other small issues with the GitHub pull request process that make it untenable for the Go project.
FYI if you have a "CONTRIBUTING" or "CONTRIBUTING.md" document at the project root, a neat little info bar "Please read the [contributing] guidelines before proceeding" will show up to anyone filing a bug or a PR.
- scrollaway
engineering  best-practices  collaboration  tech  sv  oss  github  vcs  howto  comparison  critique  nitty-gritty  golang 
may 2016 by nhaliday
The Next Generation of Software Stacks | StackShare
most interesting part to me:
GECS have a clear bias towards certain types of applications and services as well. These preferences are particularly apparent in the analytics stack. Tools typically aimed primarily at marketing teams—tools like Crazy Egg, Optimizely, and Google Analytics, the most popular tool on Stackshare—are extremely unpopular among GECS. These services are being replaced by tools that are aimed a serving both marketing and analytics teams. Segment, Mixpanel, Heap, and Amplitude, which provide flexible access to raw data, are well-represented among GECS, suggesting that these companies are looking to understand user behavior beyond clicks and page views.
data  analysis  business  startups  tech  planning  techtariat  org:com  ecosystem  software  saas  network-structure  integration-extension  cloud  github  oss  vcs  amazon  communication  trends  pro-rata  crosstab  visualization  sv  programming  pls  web  javascript  frontend  marketing  tech-infrastructure 
april 2016 by nhaliday

bundles : techie

related tags

ability-competence  abstraction  academia  accretion  advanced  advice  aesthetics  aggregator  allodium  amazon  analogy  analysis  anomie  aphorism  app  arrows  assembly  attention  automation  backup  bangbang  beauty  best-practices  better-explained  big-picture  big-surf  blog  blowhards  books  bounded-cognition  browser  build-packaging  business  c(pp)  caching  calculator  caltech  career  carmack  CAS  causation  chart  cheatsheet  checking  checklists  civic  client-server  cloud  code-dive  code-organizing  collaboration  comics  commentary  communication  community  comparison  compilers  composition-decomposition  computer-memory  concurrency  confidence  config  contradiction  contrarianism  correctness  correlation  cost-benefit  coupling-cohesion  cracker-prog  critique  crooked  crosstab  cynicism-idealism  dan-luu  data  data-science  database  dataviz  dbs  debate  debugging  decentralized  degrees-of-freedom  design  desktop  devops  devtools  differential  direct-indirect  direction  discipline  discussion  distributed  divide-and-conquer  documentation  dotnet  dropbox  DSL  ecosystem  editors  education  elegance  embedded  engineering  epistemic  error  error-handling  evidence-based  examples  exocortex  experiment  expert-experience  explanation  exploratory  exposition  facebook  faq  feynman  fiction  flexibility  flux-stasis  form-design  formal-methods  forum  frontend  frontier  functional  giants  git  github  golang  google  gotchas  graphs  grokkability  grokkability-clarity  guide  gwern  hacker  hardware  haskell  hci  heavyweights  hg  higher-ed  history  hmm  hn  homepage  homo-hetero  howto  huge-data-the-biggest  ide  ideas  idk  impetus  induction  info-foraging  init  integration-extension  interface  interface-compatibility  internet  intricacy  invariance  jargon  javascript  jobs  jvm  knowledge  latency-throughput  learning  legibility  lens  lesswrong  let-me-see  libraries  lifts-projections  linear-algebra  links  linux  lisp  list  llvm  lol  long-short-run  longform  machine-learning  madisonian  management  marketing  math  math.CA  math.CO  math.NT  measure  measurement  media  meta:prediction  meta:reading  meta:research  metal-to-virtual  methodology  metrics  microbiz  mindful  minimalism  minimum-viable  models  money-for-time  move-fast-(and-break-things)  msr  multi  network-structure  networking  neurons  news  nibble  nitty-gritty  nostalgia  notetaking  number  numerics  objektbuch  ocaml-sml  oly  oop  optimism  optimization  org:com  org:junk  org:med  organization  os  oss  papers  parable  pareto  parsimony  paste  pdf  people  performance  pic  planning  plots  pls  plt  polynomials  pragmatic  prediction  presentation  prioritizing  priors-posteriors  pro-rata  profile  programming  project  proofs  propaganda  protocol-metadata  python  q-n-a  quality  quotes  r-lang  random  rant  rationality  ratty  reading  realness  recommendations  reddit  reference  reflection  regularizer  replication  repo  research  research-program  responsibility  retention  retrofit  review  rhetoric  rigidity  rigor  roadmap  roots  rsc  rust  saas  safety  scala  scale  scaling-tech  scholar  sci-comp  science  scifi-fantasy  search  security  self-control  sequential  shipping  similarity  skunkworks  sleuthin  slides  social  software  span-cover  speculation  spreading  stackex  stanford  startups  state  state-of-art  static-dynamic  stock-flow  stories  structure  study  sub-super  subculture  summary  summer-2014  sv  syntax  system-design  systems  talks  teaching  tech  tech-infrastructure  technical-writing  techtariat  terminal  thinking  time  time-series  time-use  tools  top-n  traces  track-record  tracker  tradeoffs  trends  trust  truth  tutorial  types  ubiquity  ui  uncertainty  unit  unix  ux  vcs  video  visualization  volo-avolo  vulgar  web  white-paper  whole-partial-many  wiki  wire-guided  workflow  working-stiff  worse-is-better/the-right-thing  writing  yak-shaving  yc  🖥 

Copy this bookmark: