nhaliday + dbs   47

"Performance Matters" by Emery Berger - YouTube
Stabilizer is a tool that enables statistically sound performance evaluation, making it possible to understand the impact of optimizations and conclude things like the fact that the -O2 and -O3 optimization levels are indistinguishable from noise (sadly true).

Since compiler optimizations have run out of steam, we need better profiling support, especially for modern concurrent, multi-threaded applications. Coz is a new "causal profiler" that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations.

- randomize extraneous factors like code layout and stack size to avoid spurious speedups
- simulate speedup of component of concurrent system (to assess effect of optimization before attempting) by slowing down the complement (all but that component)
- latency vs. throughput, Little's law
video  presentation  programming  engineering  nitty-gritty  performance  devtools  compilers  latency-throughput  concurrency  legacy  causation  wire-guided  let-me-see  manifolds  pro-rata  tricks  endogenous-exogenous  control  random  signal-noise  comparison  marginal  llvm  systems  hashing  computer-memory  build-packaging  composition-decomposition  coupling-cohesion  local-global  dbs  direct-indirect  symmetry  research  models  metal-to-virtual  linux  measurement  simulation  magnitude  realness  hypothesis-testing 
5 weeks ago by nhaliday
The Law of Leaky Abstractions – Joel on Software
[TCP/IP example]

All non-trivial abstractions, to some degree, are leaky.

...

- Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the “grain of the wood” — one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it’s really just an abstraction, which leaks when there’s a page fault and certain memory fetches take way more nanoseconds than other memory fetches.

- The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify “where a=b and b=c and a=c” than if you only specify “where a=b and b=c” even though the result set is the same. You’re not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.

...

- C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + “bar” to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type “foo” + “bar”, because string literals in C++ are always char*’s, never strings. The abstraction has sprung a leak that the language doesn’t let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn’t just add a native string class to the language itself eludes me at the moment.)

- And you can’t drive as fast when it’s raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it’s raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can’t see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.

One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I’m training someone to be a C++ programmer, it would be nice if I never had to teach them about char*’s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they’ll write the code “foo” + “bar”, and truly bizarre things will happen, and then I’ll have to stop and teach them all about char*’s anyway.

...

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying “learn how to do it manually first, then use the wizzy tool to save time.” Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.
techtariat  org:com  working-stiff  essay  programming  cs  software  abstraction  worrydream  thinking  intricacy  degrees-of-freedom  networking  examples  traces  no-go  volo-avolo  tradeoffs  c(pp)  pls  strings  dbs  transportation  driving  analogy  aphorism  learning  paradox  systems  elegance  nitty-gritty  concrete  cracker-prog  metal-to-virtual  protocol-metadata  design  system-design 
july 2019 by nhaliday
Fossil: Home
VCS w/ builtin issue tracking and wiki used by SQLite
tools  devtools  software  vcs  wiki  debugging  integration-extension  oss  dbs 
may 2019 by nhaliday
Is backing up a MySQL database in Git a good idea? - Software Engineering Stack Exchange
*no: list of alternatives*

https://stackoverflow.com/questions/115369/do-you-use-source-control-for-your-database-items
Top 2 answers contradict each other but both agree that you should at least version the schema and other scripts.

My impression is that the guy linked in the accepted answer is arguing for a minority practice.
q-n-a  stackex  programming  engineering  dbs  vcs  git  debate  critique  backup  best-practices  flux-stasis  nitty-gritty  gotchas  init  advice  code-organizing  multi  hmm  idk  contrarianism  rhetoric  links  system-design 
may 2019 by nhaliday
its-not-software - steveyegge2
You don't work in the software industry.

...

So what's the software industry, and how do we differ from it?

Well, the software industry is what you learn about in school, and it's what you probably did at your previous company. The software industry produces software that runs on customers' machines — that is, software intended to run on a machine over which you have no control.

So it includes pretty much everything that Microsoft does: Windows and every application you download for it, including your browser.

It also includes everything that runs in the browser, including Flash applications, Java applets, and plug-ins like Adobe's Acrobat Reader. Their deployment model is a little different from the "classic" deployment models, but it's still software that you package up and release to some unknown client box.

...

Servware

Our industry is so different from the software industry, and it's so important to draw a clear distinction, that it needs a new name. I'll call it Servware for now, lacking anything better. Hardware, firmware, software, servware. It fits well enough.

Servware is stuff that lives on your own servers. I call it "stuff" advisedly, since it's more than just software; it includes configuration, monitoring systems, data, documentation, and everything else you've got there, all acting in concert to produce some observable user experience on the other side of a network connection.
techtariat  sv  tech  rhetoric  essay  software  saas  devops  engineering  programming  contrarianism  list  top-n  best-practices  applicability-prereqs  desktop  flux-stasis  homo-hetero  trends  games  thinking  checklists  dbs  models  communication  tutorial  wiki  integration-extension  frameworks  api  whole-partial-many  metrics  retrofit  c(pp)  pls  code-dive  planning  working-stiff  composition-decomposition  libraries  conceptual-vocab  amazon  system-design  cracker-prog  tech-infrastructure  blowhards 
may 2019 by nhaliday
Recitation 25: Data locality and B-trees
The same idea can be applied to trees. Binary trees are not good for locality because a given node of the binary tree probably occupies only a fraction of a cache line. B-trees are a way to get better locality. As in the hash table trick above, we store several elements in a single node -- as many as will fit in a cache line.

B-trees were originally invented for storing data structures on disk, where locality is even more crucial than with memory. Accessing a disk location takes about 5ms = 5,000,000ns. Therefore if you are storing a tree on disk you want to make sure that a given disk read is as effective as possible. B-trees, with their high branching factor, ensure that few disk reads are needed to navigate to the place where data is stored. B-trees are also useful for in-memory data structures because these days main memory is almost as slow relative to the processor as disk drives were when B-trees were introduced!
nibble  org:junk  org:edu  cornell  lecture-notes  exposition  programming  engineering  systems  dbs  caching  performance  memory-management  os  computer-memory  metal-to-virtual 
september 2017 by nhaliday
Anatomy of an SQL Index: What is an SQL Index
“An index makes the query fast” is the most basic explanation of an index I have ever seen. Although it describes the most important aspect of an index very well, it is—unfortunately—not sufficient for this book. This chapter describes the index structure in a less superficial way but doesn't dive too deeply into details. It provides just enough insight for one to understand the SQL performance aspects discussed throughout the book.

B-trees, etc.
techtariat  tutorial  explanation  performance  programming  engineering  dbs  trees  data-structures  nibble  caching  metal-to-virtual  abstraction  applications 
september 2017 by nhaliday
Camlistore
renamed to https://perkeep.org/

very similar thing by Rob Pike: https://upspin.io
https://news.ycombinator.com/item?id=13700492
Hi, Camlistore author here.
Andrew Gerrand worked with me on Camlistore too and is one of the Upspin authors.

The main difference I see is that Camlistore can model POSIX filesystems for backup and FUSE, but that's not its preferred view of the world. It is perfectly happy modeling a tweet or a "like" on its own, without any name in the world.

Upspin's data model is very much a traditional filesystem.

Also, upspin cared about the interop between different users from day 1 with keyservers etc, whereas for Camlistore that was not the primary design criteria. (We're only starting to work on that now in Camlistore).

But there is some similarity for sure, and Andrew knows both.
tools  golang  cloud  yak-shaving  software  libraries  google  oss  exocortex  nostalgia  summer-2014  retention  database  dbs  multi  rsc  networking  web  distributed  hn  commentary 
october 2016 by nhaliday

bundles : engframetechie

related tags

ability-competence  abstraction  accretion  acm  advanced  advice  algorithms  allodium  amazon  analogy  analysis  aphorism  api  app  apple  applicability-prereqs  applications  arrows  assembly  atoms  attention  audio  automata-languages  backup  bangbang  best-practices  big-picture  bitcoin  blockchain  blog  blowhards  books  bots  bounded-cognition  browser  build-packaging  business  c(pp)  caching  caltech  career  carmack  CAS  causation  cheatsheet  checking  checklists  chemistry  client-server  cloud  code-dive  code-organizing  collaboration  commentary  common-case  communication  community  comparison  compilers  composition-decomposition  computer-memory  computer-vision  concept  conceptual-vocab  concrete  concurrency  contrarianism  control  cornell  correctness  cost-benefit  coupling-cohesion  course  cracker-prog  critique  crosstab  crypto  cryptocurrency  cs  dan-luu  data  data-science  data-structures  database  dataset  dataviz  dbs  debate  debugging  deep-learning  degrees-of-freedom  design  desktop  devops  devtools  differential  direct-indirect  discipline  discussion  distributed  documentation  dotnet  driving  DSL  ecosystem  editors  electromag  elegance  embedded  endogenous-exogenous  engineering  essay  examples  exocortex  experiment  expert-experience  explanation  exposition  facebook  finance  flexibility  flux-stasis  form-design  formal-methods  forum  fourier  frameworks  free  frontend  frontier  functional  game-theory  games  gedanken  git  golang  google  gotchas  graphics  graphs  ground-up  guide  hardware  hashing  haskell  hg  hmm  hn  homepage  homo-hetero  howto  huge-data-the-biggest  human-capital  hypothesis-testing  ide  idk  IEEE  info-dynamics  info-foraging  init  integration-extension  interdisciplinary  interface  interface-compatibility  internet  intersection-connectedness  interview  interview-prep  intricacy  ios  iterative-methods  jargon  javascript  jobs  jvm  knowledge  latency-throughput  learning  lecture-notes  lectures  legacy  lens  let-me-see  libraries  linear-algebra  linear-models  links  linux  lisp  list  llvm  local-global  logic  machine-learning  magnitude  management  manifolds  marginal  marketing  math  math.CA  math.CO  math.NT  matrix-factorization  measure  measurement  mechanics  media  memory-management  metal-to-virtual  methodology  metrics  minimalism  minimum-viable  mobile  model-class  models  move-fast-(and-break-things)  multi  multiplicative  networking  nibble  nitty-gritty  no-go  nonlinearity  nostalgia  notetaking  numerics  objektbuch  ocaml-sml  oop  optimization  orders  ORFE  org:com  org:edu  org:junk  organization  os  oss  osx  overflow  p2p  paradox  paste  people  performance  physics  planning  plots  pls  plt  polynomials  pragmatic  prediction  presentation  prioritizing  priors-posteriors  pro-rata  probability  productivity  prof  programming  project  properties  protocol-metadata  python  q-n-a  qra  quixotic  quora  r-lang  random  reading  realness  recommendations  recruiting  reddit  reference  reflection  regularizer  repo  research  responsibility  retention  retrofit  review  rhetoric  roadmap  rsc  rust  saas  scala  scaling-tech  sci-comp  search  security  self-control  sequential  shipping  short-circuit  SIGGRAPH  signal-noise  similarity  simulation  slides  social  software  spatial  stackex  stanford  startups  state  static-dynamic  stats  stock-flow  stories  stream  strings  stylized-facts  summary  summer-2014  sv  symmetry  syntax  system-design  systems  tech  tech-infrastructure  techtariat  terminal  thinking  time  time-use  tools  top-n  traces  track-record  tradeoffs  transportation  trees  trends  tricks  tutorial  types  ubiquity  uncertainty  unit  unix  vcs  video  virtualization  volo-avolo  vr  web  webapp  whole-partial-many  wiki  wire-guided  working-stiff  worrydream  worse-is-better/the-right-thing  writing  yak-shaving  🖥 

Copy this bookmark:



description:


tags: