nhaliday + nitty-gritty   250

javascript - ReactJS - Does render get called any time "setState" is called? - Stack Overflow
By default - yes.

There is a method boolean shouldComponentUpdate(object nextProps, object nextState), each component has this method and it's responsible to determine "should component update (run render function)?" every time you change state or pass new props from parent component.

You can write your own implementation of shouldComponentUpdate method for your component, but default implementation always returns true - meaning always re-run render function.

...

Next part of your question:

If so, why? I thought the idea was that React only rendered as little as needed - when state changed.

There are two steps of what we may call "render":

Virtual DOM render: when render method is called it returns a new virtual dom structure of the component. As I mentioned before, this render method is called always when you call setState(), because shouldComponentUpdate always returns true by default. So, by default, there is no optimization here in React.

Native DOM render: React changes real DOM nodes in your browser only if they were changed in the Virtual DOM and as little as needed - this is that great React's feature which optimizes real DOM mutation and makes React fast.
q-n-a  stackex  programming  intricacy  nitty-gritty  abstraction  state  frontend  web  javascript  libraries  facebook  frameworks  explanation  summary  models 
13 days ago by nhaliday
What does it mean when a CSS rule is grayed out in Chrome's element inspector? - Stack Overflow
It seems that a strike-through indicates that a rule was overridden, but what does it mean when a style is grayed out?

--

Greyed/dimmed out text, can mean either

1. it's a default rule/property the browser applies, which includes defaulted short-hand properties.
2. It involves inheritance which is a bit more complicated.

...

In the case where a rule is applied to the currently selected element due to inheritance (i.e. the rule was applied to an ancestor, and the selected element inherited it), chrome will again display the entire ruleset.

The rules which are applied to the currently selected element appear in normal text.

If a rule exists in that set but is not applied because it's a non-inheritable property (e.g. background color), it will appear as greyed/dimmed text.

https://stackoverflow.com/questions/34712218/what-does-it-mean-when-chrome-dev-tools-shows-a-computed-property-greyed-out
Please note, not the Styles panel (I know what greyed-out means in that context—not applied), but the next panel over, the Computed properties panel.
--
The gray calculated properties are neither default, nor inherited. This only occurs on properties that were not defined for the element, but _calculated_ from either its children or parent _based on runtime layout rendering_.

Take this simple page as an example, display is default and font-size is inherited:
q-n-a  stackex  programming  frontend  web  DSL  form-design  devtools  explanation  trivia  tip-of-tongue  direct-indirect  trees  spreading  multi  nitty-gritty  static-dynamic  constraint-satisfaction  ui  browser  properties 
19 days ago by nhaliday
exponential function - Feynman's Trick for Approximating $e^x$ - Mathematics Stack Exchange
1. e^2.3 ~ 10
2. e^.7 ~ 2
3. e^x ~ 1+x
e = 2.71828...

errors (absolute, relative):
1. +0.0258, 0.26%
2. -0.0138, -0.68%
3. 1 + x approximates e^x on [-.3, .3] with absolute error < .05, and relative error < 5.6% (3.7% for [0, .3]).
nibble  q-n-a  overflow  math  feynman  giants  mental-math  calculation  multiplicative  AMT  identity  objektbuch  explanation  howto  estimate  street-fighting  stories  approximation  data  trivia  nitty-gritty 
25 days ago by nhaliday
CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures" - YouTube
- idk how I feel about this
- makes a distinction between efficiency (basically asymptotic complexity, "doing less work") and performance ("doing that work faster"). idiosyncratic terminology but similar to the "two performance aesthetics" described here: https://pinboard.in/u:nhaliday/b:913a284640c5
- some bikeshedding about vector::reserve and references
- "discontiguous data structures are the root of all evil" (cache-locality, don't use linked lists, etc)
- stacks? queues? just use vector. also suggests circular buffers. says std::deque is really bad
- std::map is bad too (for real SWE, not oly-programming). if you want ordered associative container, just binary search in vector
- std::unordered_map is poorly implemented, unfortunately (due to requirement for buckets in API)
- good implementation of hash table uses open addressing and local (linear?) probing
video  presentation  performance  nitty-gritty  best-practices  working-stiff  programming  c(pp)  systems  data-structures  algorithms  jvm  pls  metal-to-virtual  stylized-facts  rhetoric  expert-experience  google  llvm  efficiency  time-complexity  mobile  computer-memory  caching  oly-programming  common-case  hashing  multi  energy-resources  methodology  trees 
4 weeks ago by nhaliday
"Performance Matters" by Emery Berger - YouTube
Stabilizer is a tool that enables statistically sound performance evaluation, making it possible to understand the impact of optimizations and conclude things like the fact that the -O2 and -O3 optimization levels are indistinguishable from noise (sadly true).

Since compiler optimizations have run out of steam, we need better profiling support, especially for modern concurrent, multi-threaded applications. Coz is a new "causal profiler" that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations.

- randomize extraneous factors like code layout and stack size to avoid spurious speedups
- simulate speedup of component of concurrent system (to assess effect of optimization before attempting) by slowing down the complement (all but that component)
- latency vs. throughput, Little's law
video  presentation  programming  engineering  nitty-gritty  performance  devtools  compilers  latency-throughput  concurrency  legacy  causation  wire-guided  let-me-see  manifolds  pro-rata  tricks  endogenous-exogenous  control  random  signal-noise  comparison  marginal  llvm  systems  hashing  computer-memory  build-packaging  composition-decomposition  coupling-cohesion  local-global  dbs  direct-indirect  symmetry  research  models  metal-to-virtual  linux  measurement  simulation  magnitude  realness  hypothesis-testing 
5 weeks ago by nhaliday
Two Performance Aesthetics: Never Miss a Frame and Do Almost Nothing - Tristan Hume
I’ve noticed when I think about performance nowadays that I think in terms of two different aesthetics. One aesthetic, which I’ll call Never Miss a Frame, comes from the world of game development and is focused on writing code that has good worst case performance by making good use of the hardware. The other aesthetic, which I’ll call Do Almost Nothing comes from a more academic world and is focused on algorithmically minimizing the work that needs to be done to the extent that there’s barely any work left, paying attention to the performance at all scales.

[ed.: Neither of these exactly matches TCS performance PoV but latter is closer (the focus on diffs is kinda weird).]

...

Never Miss a Frame

In game development the most important performance criteria is that your game doesn’t miss frame deadlines. You have a target frame rate and if you miss the deadline for the screen to draw a new frame your users will notice the jank. This leads to focusing on the worst case scenario and often having fixed maximum limits for various quantities. This property can also be important in areas other than game development, like other graphical applications, real-time audio, safety-critical systems and many embedded systems. A similar dynamic occurs in distributed systems where one server needs to query 100 others and combine the results, you’ll wait for the slowest of the 100 every time so speeding up some of them doesn’t make the query faster, and queries occasionally taking longer (e.g because of garbage collection) will impact almost every request!

...

In this kind of domain you’ll often run into situations where in the worst case you can’t avoid processing a huge number of things. This means you need to focus your effort on making the best use of the hardware by writing code at a low level and paying attention to properties like cache size and memory bandwidth.

Projects with inviolable deadlines need to adjust different factors than speed if the code runs too slow. For example a game might decrease the size of a level or use a more efficient but less pretty rendering technique.

Aesthetically: Data should be tightly packed, fixed size, and linear. Transcoding data to and from different formats is wasteful. Strings and their variable lengths and inefficient operations must be avoided. Only use tools that allow you to work at a low level, even if they’re annoying, because that’s the only way you can avoid piles of fixed costs making everything slow. Understand the machine and what your code does to it.

Personally I identify this aesthetic most with Jonathan Blow. He has a very strong personality and I’ve watched enough of videos of him that I find imagining “What would Jonathan Blow say?” as a good way to tap into this aesthetic. My favourite articles about designs following this aesthetic are on the Our Machinery Blog.

...

Do Almost Nothing

Sometimes, it’s important to be as fast as you can in all cases and not just orient around one deadline. The most common case is when you simply have to do something that’s going to take an amount of time noticeable to a human, and if you can make that time shorter in some situations that’s great. Alternatively each operation could be fast but you may run a server that runs tons of them and you’ll save on server costs if you can decrease the load of some requests. Another important case is when you care about power use, for example your text editor not rapidly draining a laptop’s battery, in this case you want to do the least work you possibly can.

A key technique for this approach is to never recompute something from scratch when it’s possible to re-use or patch an old result. This often involves caching: keeping a store of recent results in case the same computation is requested again.

The ultimate realization of this aesthetic is for the entire system to deal only in differences between the new state and the previous state, updating data structures with only the newly needed data and discarding data that’s no longer needed. This way each part of the system does almost no work because ideally the difference from the previous state is very small.

Aesthetically: Data must be in whatever structure scales best for the way it is accessed, lots of trees and hash maps. Computations are graphs of inputs and results so we can use all our favourite graph algorithms to optimize them! Designing optimal systems is hard so you should use whatever tools you can to make it easier, any fixed cost they incur will be made negligible when you optimize away all the work they need to do.

Personally I identify this aesthetic most with my friend Raph Levien and his articles about the design of the Xi text editor, although Raph also appreciates the other aesthetic and taps into it himself sometimes.

...

_I’m conflating the axes of deadline-oriented vs time-oriented and low-level vs algorithmic optimization, but part of my point is that while they are different, I think these axes are highly correlated._

...

Text Editors

Sublime Text is a text editor that mostly follows the Never Miss a Frame approach. ...

The Xi Editor is designed to solve this problem by being designed from the ground up to grapple with the fact that some operations, especially those interacting with slow compilers written by other people, can’t be made instantaneous. It does this using a fancy asynchronous plugin model and lots of fancy data structures.
...

...

Compilers

Jonathan Blow’s Jai compiler is clearly designed with the Never Miss a Frame aesthetic. It’s written to be extremely fast at every level, and the language doesn’t have any features that necessarily lead to slow compiles. The LLVM backend wasn’t fast enough to hit his performance goals so he wrote an alternative backend that directly writes x86 code to a buffer without doing any optimizations. Jai compiles something like 100,000 lines of code per second. Designing both the language and compiler to not do anything slow lead to clean build performance 10-100x faster than other commonly-used compilers. Jai is so fast that its clean builds are faster than most compilers incremental builds on common project sizes, due to limitations in how incremental the other compilers are.

However, Jai’s compiler is still O(n) in the codebase size where incremental compilers can be O(n) in the size of the change. Some compilers like the work-in-progress rust-analyzer and I think also Roslyn for C# take a different approach and focus incredibly hard on making everything fully incremental. For small changes (the common case) this can let them beat Jai and respond in milliseconds on arbitrarily large projects, even if they’re slower on clean builds.

Conclusion
I find both of these aesthetics appealing, but I also think there’s real trade-offs that incentivize leaning one way or the other for a given project. I think people having different performance aesthetics, often because one aesthetic really is better suited for their domain, is the source of a lot of online arguments about making fast systems. The different aesthetics also require different bases of knowledge to pursue, like knowledge of data-oriented programming in C++ vs knowledge of abstractions for incrementality like Adapton, so different people may find that one approach seems way easier and better for them than the other.

I try to choose how to dedicate my effort to pursuing each aesthetics on a per project basis by trying to predict how effort in each direction would help. Some projects I know if I code it efficiently it will always hit the performance deadline, others I know a way to drastically cut down on work by investing time in algorithmic design, some projects need a mix of both. Personally I find it helpful to think of different programmers where I have a good sense of their aesthetic and ask myself how they’d solve the problem. One reason I like Rust is that it can do both low-level optimization and also has a good ecosystem and type system for algorithmic optimization, so I can more easily mix approaches in one project. In the end the best approach to follow depends not only on the task, but your skills or the skills of the team working on it, as well as how much time you have to work towards an ambitious design that may take longer for a better result.
techtariat  reflection  things  comparison  lens  programming  engineering  cracker-prog  carmack  games  performance  big-picture  system-design  constraint-satisfaction  metrics  telos-atelos  distributed  incentives  concurrency  cost-benefit  tradeoffs  systems  metal-to-virtual  latency-throughput  abstraction  marginal  caching  editors  strings  ideas  ui  common-case  examples  applications  flux-stasis  nitty-gritty  ends-means  thinking  summary  correlation  degrees-of-freedom  c(pp)  rust  interface  integration-extension  aesthetics  interface-compatibility  efficiency  adversarial 
10 weeks ago by nhaliday
multithreading - C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming? - Stack Overflow
I like the analogy of abandonment of sequential consistency to special relativity tho I think (emphasis on *think*, not know...) GR might be actually be the more appropriate one
q-n-a  stackex  programming  pls  c(pp)  systems  metal-to-virtual  computer-memory  concurrency  intricacy  nitty-gritty  analogy  comparison  physics  relativity  advanced 
august 2019 by nhaliday
How good are decisions?
A statement I commonly hear in tech-utopian circles is that some seeming inefficiency can’t actually be inefficient because the market is efficient and inefficiencies will quickly be eliminated. A contentious example of this is the claim that companies can’t be discriminating because the market is too competitive to tolerate discrimination. A less contentious example is that when you see a big company doing something that seems bizarrely inefficient, maybe it’s not inefficient and you just lack the information necessary to understand why the decision was efficient.

Unfortunately, arguments like this are difficult to settle because, even in retrospect, it’s usually not possible to get enough information to determine the precise “value” of a decision. Even in cases where the decision led to an unambiguous success or failure, there are so many factors that led to the result that it’s difficult to figure out precisely why something happened.

One nice thing about sports is that they often have detailed play-by-play data and well-defined win criteria which lets us tell, on average, what the expected value of a decision is. In this post, we’ll look at the cost of bad decision making in one sport and then briefly discuss why decision quality in sports might be the same or better as decision quality in other fields.

Just to have a concrete example, we’re going to look at baseball, but you could do the same kind of analysis for football, hockey, basketball, etc., and my understanding is that you’d get a roughly similar result in all of those cases.

We’re going to model baseball as a state machine, both because that makes it easy to understand the expected value of particular decisions and because this lets us talk about the value of decisions without having to go over most of the rules of baseball.

exactly the kinda thing Dad likes
techtariat  dan-luu  data  analysis  examples  nitty-gritty  sports  street-fighting  automata-languages  models  optimization  arbitrage  data-science  cost-benefit  tactics  baseball  low-hanging 
august 2019 by nhaliday
Three best practices for building successful data pipelines - O'Reilly Media
Drawn from their experiences and my own, I’ve identified three key areas that are often overlooked in data pipelines, and those are making your analysis:
1. Reproducible
2. Consistent
3. Productionizable

...

Science that cannot be reproduced by an external third party is just not science — and this does apply to data science. One of the benefits of working in data science is the ability to apply the existing tools from software engineering. These tools let you isolate all the dependencies of your analyses and make them reproducible.

Dependencies fall into three categories:
1. Analysis code ...
2. Data sources ...
3. Algorithmic randomness ...

...

Establishing consistency in data
...

There are generally two ways of establishing the consistency of data sources. The first is by checking-in all code and data into a single revision control repository. The second method is to reserve source control for code and build a pipeline that explicitly depends on external data being in a stable, consistent format and location.

Checking data into version control is generally considered verboten for production software engineers, but it has a place in data analysis. For one thing, it makes your analysis very portable by isolating all dependencies into source control. Here are some conditions under which it makes sense to have both code and data in source control:
Small data sets ...
Regular analytics ...
Fixed source ...

Productionizability: Developing a common ETL
...

1. Common data format ...
2. Isolating library dependencies ...

https://blog.koresoftware.com/blog/etl-principles
Rigorously enforce the idempotency constraint
For efficiency, seek to load data incrementally
Always ensure that you can efficiently process historic data
Partition ingested data at the destination
Rest data between tasks
Pool resources for efficiency
Store all metadata together in one place
Manage login details in one place
Specify configuration details once
Parameterize sub flows and dynamically run tasks where possible
Execute conditionally
Develop your own workflow framework and reuse workflow components

more focused on details of specific technologies:
https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7

https://www.cloudera.com/documentation/director/cloud/topics/cloud_de_best_practices.html
techtariat  org:com  best-practices  engineering  code-organizing  machine-learning  data-science  yak-shaving  nitty-gritty  workflow  config  vcs  replication  homo-hetero  multi  org:med  design  system-design  links  shipping  minimalism  volo-avolo  causation  random  invariance  structure  arrows  protocol-metadata  interface-compatibility 
august 2019 by nhaliday
Karol Kuczmarski's Blog – A Haskell retrospective
Even in this hypothetical scenario, I posit that the value proposition of Haskell would still be a tough sell.

There is this old quote from Bjarne Stroustrup (creator of C++) where he says that programming languages divide into those everyone complains about, and those that no one uses.
The first group consists of old, established technologies that managed to accrue significant complexity debt through years and decades of evolution. All the while, they’ve been adapting to the constantly shifting perspectives on what are the best industry practices. Traces of those adaptations can still be found today, sticking out like a leftover appendix or residual tail bone — or like the built-in support for XML in Java.

Languages that “no one uses”, on the other hand, haven’t yet passed the industry threshold of sufficient maturity and stability. Their ecosystems are still cutting edge, and their future is uncertain, but they sometimes champion some really compelling paradigm shifts. As long as you can bear with things that are rough around the edges, you can take advantage of their novel ideas.

Unfortunately for Haskell, it manages to combine the worst parts of both of these worlds.

On one hand, it is a surprisingly old language, clocking more than two decades of fruitful research around many innovative concepts. Yet on the other hand, it bears the signs of a fresh new technology, with relatively few production-grade libraries, scarce coverage of some domains (e.g. GUI programming), and not too many stories of commercial successes.

There are many ways to do it
String theory
Errors and how to handle them
Implicit is better than explicit
Leaky modules
Namespaces are apparently a bad idea
Wild records
Purity beats practicality
techtariat  reflection  functional  haskell  programming  pls  realness  facebook  pragmatic  cost-benefit  legacy  libraries  types  intricacy  engineering  tradeoffs  frontier  homo-hetero  duplication  strings  composition-decomposition  nitty-gritty  error  error-handling  coupling-cohesion  critique  ecosystem  c(pp)  aphorism 
august 2019 by nhaliday
Errors in Math Functions (The GNU C Library)
https://stackoverflow.com/questions/22259537/guaranteed-precision-of-sqrt-function-in-c-c
For C99, there are no specific requirements. But most implementations try to support Annex F: IEC 60559 floating-point arithmetic as good as possible. It says:

An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex.

And:

The sqrt functions in <math.h> provide the IEC 60559 square root operation.

IEC 60559 (equivalent to IEEE 754) says about basic operations like sqrt:

Except for binary <-> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination's format.

The final step consists of rounding according to several rounding modes but the result must always be the closest representable value in the target precision.

[ed.: The list of other such correctly rounded functions is included in the IEEE-754 standard (which I've put w/ the C1x and C++2x standard drafts) under section 9.2, and it mainly consists of stuff that can be expressed in terms of exponentials (exp, log, trig functions, powers) along w/ sqrt/hypot functions.

Fun fact: this question was asked by Yeputons who has a codeforces profile.]
https://stackoverflow.com/questions/20945815/math-precision-requirements-of-c-and-c-standard
oss  libraries  systems  c(pp)  numerics  documentation  objektbuch  list  linux  unix  multi  q-n-a  stackex  programming  nitty-gritty  sci-comp  accuracy  types  approximation  IEEE  protocol-metadata  gnu 
july 2019 by nhaliday
Inventor CEOs - Marginal REVOLUTION
One in five U.S. high-technology firms are led by CEOs with hands-on innovation experience as inventors. Firms led by “Inventor CEOs” are associated with higher quality innovation, especially when the CEO is a high-impact inventor. During an Inventor CEO’s tenure, firms file a greater number of patents and more valuable patents in technology classes where the CEO’s hands-on experience lies. Utilizing plausibly exogenous CEO turnovers to address the matching of CEOs to firms suggests these effects are causal. The results can be explained by an Inventor CEO’s superior ability to evaluate, select, and execute innovative investment projects related to their own hands-on experience.
econotariat  marginal-rev  commentary  study  summary  economics  industrial-org  management  leadership  the-world-is-just-atoms  realness  nitty-gritty  innovation  novelty  business  growth-econ  ability-competence  intellectual-property 
july 2019 by nhaliday
The Law of Leaky Abstractions – Joel on Software
[TCP/IP example]

All non-trivial abstractions, to some degree, are leaky.

...

- Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the “grain of the wood” — one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it’s really just an abstraction, which leaks when there’s a page fault and certain memory fetches take way more nanoseconds than other memory fetches.

- The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify “where a=b and b=c and a=c” than if you only specify “where a=b and b=c” even though the result set is the same. You’re not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.

...

- C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + “bar” to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type “foo” + “bar”, because string literals in C++ are always char*’s, never strings. The abstraction has sprung a leak that the language doesn’t let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn’t just add a native string class to the language itself eludes me at the moment.)

- And you can’t drive as fast when it’s raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it’s raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can’t see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.

One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I’m training someone to be a C++ programmer, it would be nice if I never had to teach them about char*’s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they’ll write the code “foo” + “bar”, and truly bizarre things will happen, and then I’ll have to stop and teach them all about char*’s anyway.

...

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying “learn how to do it manually first, then use the wizzy tool to save time.” Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.
techtariat  org:com  working-stiff  essay  programming  cs  software  abstraction  worrydream  thinking  intricacy  degrees-of-freedom  networking  examples  traces  no-go  volo-avolo  tradeoffs  c(pp)  pls  strings  dbs  transportation  driving  analogy  aphorism  learning  paradox  systems  elegance  nitty-gritty  concrete  cracker-prog  metal-to-virtual  protocol-metadata  design  system-design 
july 2019 by nhaliday
Computer latency: 1977-2017
If we look at overall results, the fastest machines are ancient. Newer machines are all over the place. Fancy gaming rigs with unusually high refresh-rate displays are almost competitive with machines from the late 70s and early 80s, but “normal” modern computers can’t compete with thirty to forty year old machines.

...

If we exclude the game boy color, which is a different class of device than the rest, all of the quickest devices are Apple phones or tablets. The next quickest device is the blackberry q10. Although we don’t have enough data to really tell why the blackberry q10 is unusually quick for a non-Apple device, one plausible guess is that it’s helped by having actual buttons, which are easier to implement with low latency than a touchscreen. The other two devices with actual buttons are the gameboy color and the kindle 4.

After that iphones and non-kindle button devices, we have a variety of Android devices of various ages. At the bottom, we have the ancient palm pilot 1000 followed by the kindles. The palm is hamstrung by a touchscreen and display created in an era with much slower touchscreen technology and the kindles use e-ink displays, which are much slower than the displays used on modern phones, so it’s not surprising to see those devices at the bottom.

...

Almost every computer and mobile device that people buy today is slower than common models of computers from the 70s and 80s. Low-latency gaming desktops and the ipad pro can get into the same range as quick machines from thirty to forty years ago, but most off-the-shelf devices aren’t even close.

If we had to pick one root cause of latency bloat, we might say that it’s because of “complexity”. Of course, we all know that complexity is bad. If you’ve been to a non-academic non-enterprise tech conference in the past decade, there’s a good chance that there was at least one talk on how complexity is the root of all evil and we should aspire to reduce complexity.

Unfortunately, it's a lot harder to remove complexity than to give a talk saying that we should remove complexity. A lot of the complexity buys us something, either directly or indirectly. When we looked at the input of a fancy modern keyboard vs. the apple 2 keyboard, we saw that using a relatively powerful and expensive general purpose processor to handle keyboard inputs can be slower than dedicated logic for the keyboard, which would both be simpler and cheaper. However, using the processor gives people the ability to easily customize the keyboard, and also pushes the problem of “programming” the keyboard from hardware into software, which reduces the cost of making the keyboard. The more expensive chip increases the manufacturing cost, but considering how much of the cost of these small-batch artisanal keyboards is the design cost, it seems like a net win to trade manufacturing cost for ease of programming.

...

If you want a reference to compare the kindle against, a moderately quick page turn in a physical book appears to be about 200 ms.

https://twitter.com/gravislizard/status/927593460642615296
almost everything on computers is perceptually slower than it was in 1983
https://archive.is/G3D5K
https://archive.is/vhDTL
https://archive.is/a3321
https://archive.is/imG7S
techtariat  dan-luu  performance  time  hardware  consumerism  objektbuch  data  history  reflection  critique  software  roots  tainter  engineering  nitty-gritty  ui  ux  hci  ios  mobile  apple  amazon  sequential  trends  increase-decrease  measure  analysis  measurement  os  systems  IEEE  intricacy  desktop  benchmarks  rant  carmack  system-design  degrees-of-freedom  keyboard  terminal  editors  links  input-output  networking  world  s:**  multi  twitter  social  discussion  tech  programming  web  internet  speed  backup  worrydream  interface  metal-to-virtual  latency-throughput  workflow  form-design  interface-compatibility 
july 2019 by nhaliday
Which of Haskell and OCaml is more practical? For example, in which aspect will each play a key role? - Quora
- Tikhon Jelvis,

Haskell.

This is a question I'm particularly well-placed to answer because I've spent quite a bit of time with both Haskell and OCaml, seeing both in the real world (including working at Jane Street for a bit). I've also seen the languages in academic settings and know many people at startups using both languages. This gives me a good perspective on both languages, with a fairly similar amount of experience in the two (admittedly biased towards Haskell).

And so, based on my own experience rather than the languages' reputations, I can confidently say it's Haskell.

Parallelism and Concurrency

...

Libraries

...

Typeclasses vs Modules

...

In some sense, OCaml modules are better behaved and founded on a sounder theory than Haskell typeclasses, which have some serious drawbacks. However, the fact that typeclasses can be reliably inferred whereas modules have to be explicitly used all the time more than makes up for this. Moreover, extensions to the typeclass system enable much of the power provided by OCaml modules.

...

Of course, OCaml has some advantages of its own as well. It has a performance profile that's much easier to predict. The module system is awesome and often missed in Haskell. Polymorphic variants can be very useful for neatly representing certain situations, and don't have an obvious Haskell analog.

While both languages have a reasonable C FFI, OCaml's seems a bit simpler. It's hard for me to say this with any certainty because I've only used the OCaml FFI myself, but it was quite easy to use—a hard bar for Haskell's to clear. One really nice use of modules in OCaml is to pass around values directly from C as abstract types, which can help avoid extra marshalling/unmarshalling; that seemed very nice in OCaml.

However, overall, I still think Haskell is the more practical choice. Apart from the reasoning above, I simply have my own observations: my Haskell code tends to be clearer, simpler and shorter than my OCaml code. I'm also more productive in Haskell. Part of this is certainly a matter of having more Haskell experience, but the delta is limited especially as I'm working at my third OCaml company. (Of course, the first two were just internships.)

Both Haskell and OCaml are uniquivocally superb options—miles ahead of any other languages I know. While I do prefer Haskell, I'd choose either one in a pinch.

--
I've looked at F# a bit, but it feels like it makes too many tradeoffs to be on .NET. You lose the module system, which is probably OCaml's best feature, in return for an unfortunate, nominally typed OOP layer.

I'm also not invested in .NET at all: if anything, I'd prefer to avoid it in favor of simplicity. I exclusively use Linux and, from the outside, Mono doesn't look as good as it could be. I'm also far more likely to interoperate with a C library than a .NET library.

If I had some additional reason to use .NET, I'd definitely go for F#, but right now I don't.

https://www.reddit.com/r/haskell/comments/3huexy/what_are_haskellers_critiques_of_f_and_ocaml/
https://www.reddit.com/r/haskell/comments/3huexy/what_are_haskellers_critiques_of_f_and_ocaml/cub5mmb/
Thinking about it now, it boils down to a single word: expressiveness. When I'm writing OCaml, I feel more constrained than when I'm writing Haskell. And that's important: unlike so many others, what first attracted me to Haskell was expressiveness, not safety. It's easier for me to write code that looks how I want it to look in Haskell. The upper bound on code quality is higher.

...

Perhaps it all boils down to OCaml and its community feeling more "worse is better" than Haskell, something I highly disfavor.

...

Laziness or, more strictly, non-strictness is big. A controversial start, perhaps, but I stand by it. Unlike some, I do not see non-strictness as a design mistake but as a leap in abstraction. Perhaps a leap before its time, but a leap nonetheless. Haskell lets me program without constantly keeping the code's order in my head. Sure, it's not perfect and sometimes performance issues jar the illusion, but they are the exception not the norm. Coming from imperative languages where order is omnipresent (I can't even imagine not thinking about execution order as I write an imperative program!) it's incredibly liberating, even accounting for the weird issues and jinks I'd never see in a strict language.

This is what I imagine life felt like with the first garbage collectors: they may have been slow and awkward, the abstraction might have leaked here and there, but, for all that, it was an incredible advance. You didn't have to constantly think about memory allocation any more. It took a lot of effort to get where we are now and garbage collectors still aren't perfect and don't fit everywhere, but it's hard to imagine the world without them. Non-strictness feels like it has the same potential, without anywhere near the work garbage collection saw put into it.

...

The other big thing that stands out are typeclasses. OCaml might catch up on this front with implicit modules or it might not (Scala implicits are, by many reports, awkward at best—ask Edward Kmett about it, not me) but, as it stands, not having them is a major shortcoming. Not having inference is a bigger deal than it seems: it makes all sorts of idioms we take for granted in Haskell awkward in OCaml which means that people simply don't use them. Haskell's typeclasses, for all their shortcomings (some of which I find rather annoying), are incredibly expressive.

In Haskell, it's trivial to create your own numeric type and operators work as expected. In OCaml, while you can write code that's polymorphic over numeric types, people simply don't. Why not? Because you'd have to explicitly convert your literals and because you'd have to explicitly open a module with your operators—good luck using multiple numeric types in a single block of code! This means that everyone uses the default types: (63/31-bit) ints and doubles. If that doesn't scream "worse is better", I don't know what does.

...

There's more. Haskell's effect management, brought up elsewhere in this thread, is a big boon. It makes changing things more comfortable and makes informal reasoning much easier. Haskell is the only language where I consistently leave code I visit better than I found it. Even if I hadn't worked on the project in years. My Haskell code has better longevity than my OCaml code, much less other languages.

http://blog.ezyang.com/2011/02/ocaml-gotchas/
One observation about purity and randomness: I think one of the things people frequently find annoying in Haskell is the fact that randomness involves mutation of state, and thus be wrapped in a monad. This makes building probabilistic data structures a little clunkier, since you can no longer expose pure interfaces. OCaml is not pure, and as such you can query the random number generator whenever you want.

However, I think Haskell may get the last laugh in certain circumstances. In particular, if you are using a random number generator in order to generate random test cases for your code, you need to be able to reproduce a particular set of random tests. Usually, this is done by providing a seed which you can then feed back to the testing script, for deterministic behavior. But because OCaml's random number generator manipulates global state, it's very easy to accidentally break determinism by asking for a random number for something unrelated. You can work around it by manually bracketing the global state, but explicitly handling the randomness state means providing determinism is much more natural.
q-n-a  qra  programming  pls  engineering  nitty-gritty  pragmatic  functional  haskell  ocaml-sml  dotnet  types  arrows  cost-benefit  tradeoffs  concurrency  libraries  performance  expert-experience  composition-decomposition  comparison  critique  multi  reddit  social  discussion  techtariat  reflection  review  random  data-structures  numerics  rand-approx  sublinear  syntax  volo-avolo  causation  scala  jvm  ecosystem  metal-to-virtual 
june 2019 by nhaliday
Regex cheatsheet
Many programs use regular expression to find & replace text. However, they tend to come with their own different flavor.

You can probably expect most modern software and programming languages to be using some variation of the Perl flavor, "PCRE"; however command-line tools (grep, less, ...) will often use the POSIX flavor (sometimes with an extended variant, e.g. egrep or sed -r). ViM also comes with its own syntax (a superset of what Vi accepts).

This cheatsheet lists the respective syntax of each flavor, and the software that uses it.

accidental complexity galore
techtariat  reference  cheatsheet  documentation  howto  yak-shaving  editors  strings  syntax  examples  crosstab  objektbuch  python  comparison  gotchas  tip-of-tongue  automata-languages  pls  trivia  properties  libraries  nitty-gritty  intricacy  degrees-of-freedom  DSL  programming 
june 2019 by nhaliday
Hardware is unforgiving
Today, anyone with a CS 101 background can take Geoffrey Hinton's course on neural networks and deep learning, and start applying state of the art machine learning techniques in production within a couple months. In software land, you can fix minor bugs in real time. If it takes a whole day to run your regression test suite, you consider yourself lucky because it means you're in one of the few environments that takes testing seriously. If the architecture is fundamentally flawed, you pull out your copy of Feathers' “Working Effectively with Legacy Code” and you apply minor fixes until you're done.

This isn't to say that software isn't hard, it's just a different kind of hard: the sort of hard that can be attacked with genius and perseverance, even without experience. But, if you want to build a ship, and you "only" have a decade of experience with carpentry, milling, metalworking, etc., well, good luck. You're going to need it. With a large ship, “minor” fixes can take days or weeks, and a fundamental flaw means that your ship sinks and you've lost half a year of work and tens of millions of dollars. By the time you get to something with the complexity of a modern high-performance microprocessor, a minor bug discovered in production costs three months and five million dollars. A fundamental flaw in the architecture will cost you five years and hundreds of millions of dollars2.

Physical mistakes are costly. There's no undo and editing isn't simply a matter of pressing some keys; changes consume real, physical resources. You need enough wisdom and experience to avoid common mistakes entirely – especially the ones that can't be fixed.
techtariat  comparison  software  hardware  programming  engineering  nitty-gritty  realness  roots  explanans  startups  tech  sv  the-world-is-just-atoms  examples  stories  economics  heavy-industry  hard-tech  cs  IEEE  oceans  trade  korea  asia  recruiting  britain  anglo  expert-experience  growth-econ  world  developing-world  books  recommendations  intricacy  dan-luu  age-generation  system-design  correctness  metal-to-virtual  psycho-atoms  move-fast-(and-break-things)  kumbaya-kult 
june 2019 by nhaliday
Bareiss algorithm - Wikipedia
During the execution of Bareiss algorithm, every integer that is computed is the determinant of a submatrix of the input matrix. This allows, using the Hadamard inequality, to bound the size of these integers. Otherwise, the Bareiss algorithm may be viewed as a variant of Gaussian elimination and needs roughly the same number of arithmetic operations.
nibble  ground-up  cs  tcs  algorithms  complexity  linear-algebra  numerics  sci-comp  fields  calculation  nitty-gritty 
june 2019 by nhaliday
package writing - Where do I start LaTeX programming? - TeX - LaTeX Stack Exchange
I think there are three categories which need to be mastered (perhaps not all in the same degree) in order to become comfortable around TeX programming:

1. TeX programming. That's very basic, it deals with expansion control, counters, scopes, basic looping constructs and so on.

2. TeX typesetting. That's on a higher level, it includes control over boxes, lines, glues, modes, and perhaps about 1000 parameters.

3. Macro packages like LaTeX.
q-n-a  stackex  programming  latex  howto  nitty-gritty  yak-shaving  links  list  recommendations  books  guide  DSL 
may 2019 by nhaliday
packages - Are the TeX semantics and grammar defined somewhere in some official documents? - TeX - LaTeX Stack Exchange
The grammar of each TeX command is more or less completely given in The TeXBook. Note, however, that unlike most programming languages the lexical analysis and tokenisation of the input cannot be separated from execution as the catcode table which controls tokenisation is dynamically changeable. Thus parsing TeX tends to defeat most parser generation tools.

LaTeX is a set of macros written in TeX so is defined by its implementation, although there is fairly extensive documentation in The LaTeX Companion, the LaTeX book (LaTeX: A Document Preparation System), and elsewhere.
q-n-a  stackex  programming  compilers  latex  yak-shaving  nitty-gritty  syntax 
may 2019 by nhaliday
performance - What is the difference between latency, bandwidth and throughput? - Stack Overflow
Latency is the amount of time it takes to travel through the tube.
Bandwidth is how wide the tube is.
The amount of water flow will be your throughput

Vehicle Analogy:

Container travel time from source to destination is latency.
Container size is bandwidth.
Container load is throughput.

--

Note, bandwidth in particular has other common meanings, I've assumed networking because this is stackoverflow but if it was a maths or amateur radio forum I might be talking about something else entirely.
q-n-a  stackex  programming  IEEE  nitty-gritty  definition  jargon  network-structure  metrics  speedometer  time  stock-flow  performance  latency-throughput  amortization-potential  thinking 
may 2019 by nhaliday
What every computer scientist should know about floating-point arithmetic
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations
"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:
https://en.wikipedia.org/wiki/Pairwise_summation
In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs
However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]
nibble  pdf  papers  programming  systems  numerics  nitty-gritty  intricacy  approximation  accuracy  types  sci-comp  multi  q-n-a  stackex  hmm  oly-programming  accretion  formal-methods  yak-shaving  wiki  reference  algorithms  yoga  ground-up  divide-and-conquer  fourier  books  tidbits  chart  caltech  nostalgia 
may 2019 by nhaliday
c++ - Why is the code in most STL implementations so convoluted? - Stack Overflow
A similar questions have been previously posed:

Is there a readable implementation of the STL

Why STL implementation is so unreadable? How C++ could have been improved here?

--

Neil Butterworth, now listed as "anon", provided a useful link in his answer to the SO question "Is there a readable implementation of the STL?". Quoting his answer there:

There is a book The C++ Standard Template Library, co-authored by the original STL designers Stepanov & Lee (together with P.J. Plauger and David Musser), which describes a possible implementation, complete with code - see http://www.amazon.co.uk/C-Standard-Template-Library/dp/0134376331.

See also the other answers in that thread.

Anyway, most of the STL code (by STL I here mean the STL-like subset of the C++ standard library) is template code, and as such must be header-only, and since it's used in almost every program it pays to have that code as short as possible.

Thus, the natural trade-off point between conciseness and readability is much farther over on the conciseness end of the scale than with "normal" code.

--

About the variables names, library implementors must use "crazy" naming conventions, such as names starting with an underscore followed by an uppercase letter, because such names are reserved for them. They cannot use "normal" names, because those may have been redefined by a user macro.

Section 17.6.3.3.2 "Global names" §1 states:

Certain sets of names and function signatures are always reserved to the implementation:

Each name that contains a double underscore or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.

Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

(Note that these rules forbid header guards like __MY_FILE_H which I have seen quite often.)

--

Implementations vary. libc++ for example, is much easier on the eyes. There's still a bit of underscore noise though. As others have noted, the leading underscores are unfortunately required. Here's the same function in libc++:
q-n-a  stackex  programming  engineering  best-practices  c(pp)  systems  pls  nitty-gritty  libraries  code-organizing  grokkability  grokkability-clarity 
may 2019 by nhaliday
c++ - Debugging template instantiations - Stack Overflow
Yes, there is a template metaprogramming debugger. Templight

https://github.com/mikael-s-persson/templight
--
Seems to be dead now, though :( [ed.: Partially true. They've merged pull requests recently tho.]
--
Metashell is still in active development though: github.com/metashell/metashell
q-n-a  stackex  nitty-gritty  pls  types  c(pp)  debugging  devtools  tools  programming  howto  advice  checklists  multi  repo  wire-guided  static-dynamic  compilers  performance  measurement  time  latency-throughput 
may 2019 by nhaliday
Is backing up a MySQL database in Git a good idea? - Software Engineering Stack Exchange
*no: list of alternatives*

https://stackoverflow.com/questions/115369/do-you-use-source-control-for-your-database-items
Top 2 answers contradict each other but both agree that you should at least version the schema and other scripts.

My impression is that the guy linked in the accepted answer is arguing for a minority practice.
q-n-a  stackex  programming  engineering  dbs  vcs  git  debate  critique  backup  best-practices  flux-stasis  nitty-gritty  gotchas  init  advice  code-organizing  multi  hmm  idk  contrarianism  rhetoric  links  system-design 
may 2019 by nhaliday
oop - Functional programming vs Object Oriented programming - Stack Overflow
When you anticipate a different kind of software evolution:
- Object-oriented languages are good when you have a fixed set of operations on things, and as your code evolves, you primarily add new things. This can be accomplished by adding new classes which implement existing methods, and the existing classes are left alone.
- Functional languages are good when you have a fixed set of things, and as your code evolves, you primarily add new operations on existing things. This can be accomplished by adding new functions which compute with existing data types, and the existing functions are left alone.

When evolution goes the wrong way, you have problems:
- Adding a new operation to an object-oriented program may require editing many class definitions to add a new method.
- Adding a new kind of thing to a functional program may require editing many function definitions to add a new case.

This problem has been well known for many years; in 1998, Phil Wadler dubbed it the "expression problem". Although some researchers think that the expression problem can be addressed with such language features as mixins, a widely accepted solution has yet to hit the mainstream.

What are the typical problem definitions where functional programming is a better choice?

Functional languages excel at manipulating symbolic data in tree form. A favorite example is compilers, where source and intermediate languages change seldom (mostly the same things), but compiler writers are always adding new translations and code improvements or optimizations (new operations on things). Compilation and translation more generally are "killer apps" for functional languages.
q-n-a  stackex  programming  engineering  nitty-gritty  comparison  best-practices  cost-benefit  functional  data-structures  arrows  flux-stasis  atoms  compilers  examples  pls  plt  oop  types 
may 2019 by nhaliday
Burrito: Rethinking the Electronic Lab Notebook
Seems very well-suited for ML experiments (if you can get it to work), also the nilfs aspect is cool and basically implements exactly one of the my project ideas (mini-VCS for competitive programming). Unfortunately gnarly installation instructions specify running it on Linux VM: https://github.com/pgbovine/burrito/blob/master/INSTALL. Linux is hard requirement due to nilfs.
techtariat  project  tools  devtools  linux  programming  yak-shaving  integration-extension  nitty-gritty  workflow  exocortex  scholar  software  python  app  desktop  notetaking  state  machine-learning  data-science  nibble  sci-comp  oly  vcs  multi  repo  paste  homepage  research 
may 2019 by nhaliday
Why is Software Engineering so difficult? - James Miller
basic message: No silver bullet!

most interesting nuggets:
Scale and Complexity
- Windows 7 > 50 million LOC
Expect a staggering number of bugs.

Bugs?
- Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing.
- At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.

Bug removal
- Testing typically exercises only half the code.

Better bug removal?
- There are better ways to do testing that do produce fantastic programs.”
- Are we sure about this fact?
* No, its only an opinion!
* In general Software Engineering has ....
NO FACTS!

So why not do this?
- The costs are unbelievable.
- It’s not unusual for the qualification process to produce a half page of documentation for each line of code.
pdf  slides  engineering  nitty-gritty  programming  best-practices  roots  comparison  cost-benefit  software  systematic-ad-hoc  structure  error  frontier  debugging  checking  formal-methods  context  detail-architecture  intricacy  big-picture  system-design  correctness  scale  scaling-tech  shipping  money  data  stylized-facts  street-fighting  objektbuch  pro-rata  estimate  pessimism  degrees-of-freedom  volo-avolo  no-go  things  thinking  summary  quality  density  methodology 
may 2019 by nhaliday
unix - How can I profile C++ code running on Linux? - Stack Overflow
If your goal is to use a profiler, use one of the suggested ones.

However, if you're in a hurry and you can manually interrupt your program under the debugger while it's being subjectively slow, there's a simple way to find performance problems.

Just halt it several times, and each time look at the call stack. If there is some code that is wasting some percentage of the time, 20% or 50% or whatever, that is the probability that you will catch it in the act on each sample. So that is roughly the percentage of samples on which you will see it. There is no educated guesswork required. If you do have a guess as to what the problem is, this will prove or disprove it.

You may have multiple performance problems of different sizes. If you clean out any one of them, the remaining ones will take a larger percentage, and be easier to spot, on subsequent passes. This magnification effect, when compounded over multiple problems, can lead to truly massive speedup factors.

Caveat: Programmers tend to be skeptical of this technique unless they've used it themselves. They will say that profilers give you this information, but that is only true if they sample the entire call stack, and then let you examine a random set of samples. (The summaries are where the insight is lost.) Call graphs don't give you the same information, because they don't summarize at the instruction level, and
they give confusing summaries in the presence of recursion.
They will also say it only works on toy programs, when actually it works on any program, and it seems to work better on bigger programs, because they tend to have more problems to find. They will say it sometimes finds things that aren't problems, but that is only true if you see something once. If you see a problem on more than one sample, it is real.

http://poormansprofiler.org/

gprof, Valgrind and gperftools - an evaluation of some tools for application level CPU profiling on Linux: http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
gprof is the dinosaur among the evaluated profilers - its roots go back into the 1980’s. It seems it was widely used and a good solution during the past decades. But its limited support for multi-threaded applications, the inability to profile shared libraries and the need for recompilation with compatible compilers and special flags that produce a considerable runtime overhead, make it unsuitable for using it in today’s real-world projects.

Valgrind delivers the most accurate results and is well suited for multi-threaded applications. It’s very easy to use and there is KCachegrind for visualization/analysis of the profiling data, but the slow execution of the application under test disqualifies it for larger, longer running applications.

The gperftools CPU profiler has a very little runtime overhead, provides some nice features like selectively profiling certain areas of interest and has no problem with multi-threaded applications. KCachegrind can be used to analyze the profiling data. Like all sampling based profilers, it suffers statistical inaccuracy and therefore the results are not as accurate as with Valgrind, but practically that’s usually not a big problem (you can always increase the sampling frequency if you need more accurate results). I’m using this profiler on a large code-base and from my personal experience I can definitely recommend using it.
q-n-a  stackex  programming  engineering  performance  devtools  tools  advice  checklists  hacker  nitty-gritty  tricks  lol  multi  unix  linux  techtariat  analysis  comparison  recommendations  software  measurement  oly-programming  concurrency  debugging  metabuch 
may 2019 by nhaliday
Teach debugging
A friend of mine and I couldn't understand why some people were having so much trouble; the material seemed like common sense. The Feynman Method was the only tool we needed.

1. Write down the problem
2. Think real hard
3. Write down the solution

The Feynman Method failed us on the last project: the design of a divider, a real-world-scale project an order of magnitude more complex than anything we'd been asked to tackle before. On the day he assigned the project, the professor exhorted us to begin early. Over the next few weeks, we heard rumors that some of our classmates worked day and night without making progress.

...

And then, just after midnight, a number of our newfound buddies from dinner reported successes. Half of those who started from scratch had working designs. Others were despondent, because their design was still broken in some subtle, non-obvious way. As I talked with one of those students, I began poring over his design. And after a few minutes, I realized that the Feynman method wasn't the only way forward: it should be possible to systematically apply a mechanical technique repeatedly to find the source of our problems. Beneath all the abstractions, our projects consisted purely of NAND gates (woe to those who dug around our toolbox enough to uncover dynamic logic), which outputs a 0 only when both inputs are 1. If the correct output is 0, both inputs should be 1. The input that isn't is in error, an error that is, itself, the output of a NAND gate where at least one input is 0 when it should be 1. We applied this method recursively, finding the source of all the problems in both our designs in under half an hour.

How To Debug Any Program: https://www.blinddata.com/blog/how-to-debug-any-program-9
May 8th 2019 by Saketh Are

Start by Questioning Everything

...

When a program is behaving unexpectedly, our attention tends to be drawn first to the most complex portions of the code. However, mistakes can come in all forms. I've personally been guilty of rushing to debug sophisticated portions of my code when the real bug was that I forgot to read in the input file. In the following section, we'll discuss how to reliably focus our attention on the portions of the program that need correction.

Then Question as Little as Possible

Suppose that we have a program and some input on which its behavior doesn’t match our expectations. The goal of debugging is to narrow our focus to as small a section of the program as possible. Once our area of interest is small enough, the value of the incorrect output that is being produced will typically tell us exactly what the bug is.

In order to catch the point at which our program diverges from expected behavior, we must inspect the intermediate state of the program. Suppose that we select some point during execution of the program and print out all values in memory. We can inspect the results manually and decide whether they match our expectations. If they don't, we know for a fact that we can focus on the first half of the program. It either contains a bug, or our expectations of what it should produce were misguided. If the intermediate state does match our expectations, we can focus on the second half of the program. It either contains a bug, or our understanding of what input it expects was incorrect.

Question Things Efficiently

For practical purposes, inspecting intermediate state usually doesn't involve a complete memory dump. We'll typically print a small number of variables and check whether they have the properties we expect of them. Verifying the behavior of a section of code involves:

1. Before it runs, inspecting all values in memory that may influence its behavior.
2. Reasoning about the expected behavior of the code.
3. After it runs, inspecting all values in memory that may be modified by the code.

Reasoning about expected behavior is typically the easiest step to perform even in the case of highly complex programs. Practically speaking, it's time-consuming and mentally strenuous to write debug output into your program and to read and decipher the resulting values. It is therefore advantageous to structure your code into functions and sections that pass a relatively small amount of information between themselves, minimizing the number of values you need to inspect.

...

Finding the Right Question to Ask

We’ve assumed so far that we have available a test case on which our program behaves unexpectedly. Sometimes, getting to that point can be half the battle. There are a few different approaches to finding a test case on which our program fails. It is reasonable to attempt them in the following order:

1. Verify correctness on the sample inputs.
2. Test additional small cases generated by hand.
3. Adversarially construct corner cases by hand.
4. Re-read the problem to verify understanding of input constraints.
5. Design large cases by hand and write a program to construct them.
6. Write a generator to construct large random cases and a brute force oracle to verify outputs.
techtariat  dan-luu  engineering  programming  debugging  IEEE  reflection  stories  education  higher-ed  checklists  iteration-recursion  divide-and-conquer  thinking  ground-up  nitty-gritty  giants  feynman  error  input-output  structure  composition-decomposition  abstraction  systematic-ad-hoc  reduction  teaching  state  correctness  multi  oly  oly-programming  metabuch  neurons  problem-solving  wire-guided  marginal  strategy  tactics  methodology  simplification-normalization 
may 2019 by nhaliday
« earlier      
per page:    204080120160

bundles : emojigrowthngpropstkvague

related tags

2016-election  aaronson  ability-competence  abortion-contraception-embryo  absolute-relative  abstraction  academia  accretion  accuracy  acm  acmtariat  additive  advanced  adversarial  advice  aesthetics  africa  age-generation  aggregator  aging  agriculture  ai  ai-control  akrasia  albion  algebra  algorithmic-econ  algorithms  alignment  allodium  altruism  amazon  amortization-potential  AMT  analogy  analysis  analytical-holistic  anglo  anglosphere  announcement  anomie  anthropic  anthropology  antidemos  antiquity  aphorism  api  apollonian-dionysian  app  apple  applicability-prereqs  applications  approximation  arbitrage  archaeology  architecture  aristos  arms  arrows  art  article  ascetic  asia  assembly  atmosphere  atoms  attaq  attention  authoritarianism  automata-languages  automation  aversion  axelrod  backup  bangbang  bare-hands  barons  baseball  bayesian  behavioral-econ  behavioral-gen  being-right  ben-recht  benchmarks  best-practices  better-explained  betting  bias-variance  biases  bible  big-peeps  big-picture  big-surf  big-yud  bio  biodet  bioinformatics  biophysical-econ  bits  blog  blowhards  books  bootstraps  bostrom  bounded-cognition  brain-scan  branches  brands  britain  broad-econ  browser  buddhism  build-packaging  business  c(pp)  c:**  c:***  caching  calculation  calculator  california  caltech  canon  capital  capitalism  career  carmack  cartoons  CAS  causation  chapman  characterization  charity  chart  cheatsheet  checking  checklists  chemistry  china  christianity  civic  civil-liberty  civilization  clarity  class  class-warfare  classic  classification  clever-rats  client-server  climate-change  cliometrics  closure  cloud  coalitions  coarse-fine  cocktail  cocoa  code-dive  code-organizing  coding-theory  cog-psych  cohesion  cold-war  collaboration  coming-apart  commentary  common-case  communication  community  comparison  compensation  competition  compilers  complex-systems  complexity  composition-decomposition  computation  computer-memory  concentration-of-measure  concept  conceptual-vocab  concrete  concurrency  confidence  config  confluence  confounding  confucian  conquest-empire  constraint-satisfaction  consumerism  context  contradiction  contrarianism  control  convexity-curvature  cooking  cool  cooperate-defect  coordination  core-rats  corporation  correctness  correlation  corruption  cost-benefit  cost-disease  counter-revolution  counterexample  coupling-cohesion  course  cracker-prog  creative  crime  criminal-justice  criminology  critique  crooked  crosstab  crux  crypto  cs  cultural-dynamics  culture  curiosity  current-events  curvature  cybernetics  cycles  cynicism-idealism  d-lang  dan-luu  dark-arts  darwinian  data  data-science  data-structures  database  dataset  dataviz  dbs  death  debate  debt  debugging  decentralized  decision-making  decision-theory  deep-learning  deep-materialism  deepgoog  defense  definite-planning  definition  degrees-of-freedom  democracy  demographic-transition  demographics  dennett  density  dependence-independence  descriptive  design  desktop  detail-architecture  deterrence  developing-world  developmental  devops  devtools  diaspora  diet  differential  dimensionality  diogenes  direct-indirect  direction  dirty-hands  discipline  discovery  discrete  discrimination  discussion  distributed  distribution  diversity  divide-and-conquer  documentation  domestication  dotnet  douthatish  draft  driving  drugs  DSL  duality  duplication  dynamic  dynamical  dysgenics  early-modern  earth  eastern-europe  ecology  econ-metrics  econ-productivity  econometrics  economics  econotariat  ecosystem  ed-yong  eden  eden-heaven  editors  education  EEA  effect-size  effective-altruism  efficiency  egalitarianism-hierarchy  EGT  eh  elections  electromag  elegance  elite  embodied  embodied-pack  embodied-street-fighting  emotion  empirical  ems  encyclopedic  endo-exo  endocrine  endogenous-exogenous  ends-means  energy-resources  engineering  enhancement  enlightenment-renaissance-restoration-reformation  ensembles  entropy-like  environment  envy  epidemiology  epistemic  equilibrium  erik-demaine  error  error-handling  essay  estimate  ethnocentrism  europe  events  evidence-based  evolution  evopsych  examples  exegesis-hermeneutics  existence  exit-voice  exocortex  expanders  expansionism  experiment  expert  expert-experience  explanans  explanation  exploratory  exposition  expression-survival  extrema  facebook  faq  farmers-and-foragers  fermi  fertility  feudal  feynman  fiction  field-study  fields  finance  finiteness  fitness  flexibility  fluid  flux-stasis  focus  food  foreign-lang  foreign-policy  form-design  formal-methods  formal-values  forms-instances  fourier  frameworks  free  free-riding  frequency  frequentist  frontend  frontier  functional  futurism  gallic  game-theory  games  garett-jones  gavisti  gedanken  gender  general-survey  generalization  generative  genetics  genomics  geography  geometry  germanic  giants  gilens-page  git  github  gnon  gnosis-logos  gnu  gnxp  god-man-beast-victim  golang  google  gotchas  government  gradient-descent  graph-theory  graphical-models  graphics  graphs  gravity  gray-econ  great-powers  greedy  gregory-clark  grokkability  grokkability-clarity  ground-up  group-level  group-selection  growth-econ  GT-101  guide  guilt-shame  GWAS  gwern  h2o  hacker  hanson  happy-sad  hard-tech  hardness  hardware  hari-seldon  harvard  hashing  haskell  hci  health  healthcare  heavy-industry  heavyweights  henrich  heuristic  hg  hi-order-bits  high-variance  higher-ed  history  hmm  hn  homepage  homo-hetero  honor  housing  howto  hsu  huge-data-the-biggest  human-bean  human-capital  human-study  humanity  humility  huntington  hypochondria  hypothesis-testing  icml  ide  ideas  identification-equivalence  identity  ideology  idk  IEEE  iidness  illusion  impact  impetus  incentives  increase-decrease  india  individualism-collectivism  industrial-org  industrial-revolution  inequality  inference  info-dynamics  info-foraging  infographic  information-theory  infrastructure  init  innovation  input-output  insight  instinct  institutions  integral  integration-extension  intel  intellectual-property  intelligence  interdisciplinary  interests  interface  interface-compatibility  internet  interpretation  intersection  intersection-connectedness  intervention  interview-prep  intricacy  intuition  invariance  investigative-journo  investing  ios  iq  iran  iron-age  islam  israel  iteration-recursion  janus  japan  jargon  javascript  jobs  journos-pundits  judaism  judgement  justice  jvm  keyboard  knowledge  korea  kumbaya-kult  labor  land  language  large-factor  larry-summers  latency-throughput  latent-variables  latex  latin-america  law  leadership  learning  learning-theory  lecture-notes  lectures  lee-kuan-yew  left-wing  legacy  len:long  len:short  lens  lesswrong  let-me-see  letters  levers  leviathan  lexical  libraries  life-history  lifts-projections  linear-algebra  linear-models  linearity  linguistics  links  linux  lisp  list  literature  live-coding  llvm  local-global  logic  lol  long-short-run  long-term  longevity  longform  love-hate  low-hanging  lower-bounds  machine-learning  macro  madisonian  magnitude  malaise  malthus  management  managerial-state  manifolds  map-territory  maps  marginal  marginal-rev  market-failure  market-power  markets  markov  martial  math  math.CA  math.CO  math.CV  math.DS  math.NT  mathtariat  matrix-factorization  meaningness  measure  measurement  mechanics  mechanism-design  media  medicine  medieval  mediterranean  memory-management  MENA  mena4  mental-math  meta-analysis  meta:medicine  meta:prediction  meta:rhetoric  meta:science  meta:war  metabolic  metabuch  metal-to-virtual  metameta  methodology  metrics  microbiz  microfoundations  microsoft  migration  military  minimalism  minimum-viable  miri-cfar  mit  mobile  mobility  model-class  model-organism  models  modernity  moloch  moments  monetary-fiscal  money  money-for-time  monte-carlo  morality  mostly-modern  move-fast-(and-break-things)  msr  multi  multiplicative  mutation  myth  n-factor  narrative  nascent-state  nationalism-globalism  natural-experiment  nature  navigation  near-far  negotiation  neocons  network-structure  networking  neuro  neuro-nitgrit  neurons  new-religion  news  nibble  nietzschean  nihil  nitty-gritty  nl-and-so-can-you  nlp  no-go  noblesse-oblige  nonlinearity  nostalgia  notetaking  novelty  nuclear  null-result  number  numerics  nutrition  nyc  objektbuch  ocaml-sml  occam  occident  oceans  ocw  old-anglo  oly  oly-programming  oop  open-closed  open-problems  operational  opioids  optimate  optimism  optimization  order-disorder  orders  ORFE  org:anglo  org:biz  org:bleg  org:com  org:data  org:davos  org:econlib  org:edge  org:edu  org:euro  org:fin  org:gov  org:health  org:junk  org:lite  org:local  org:mag  org:mat  org:med  org:nat  org:ngo  org:rec  org:sci  organization  organizing  orient  os  oscillation  oss  outcome-risk  outdoors  outliers  overflow  oxbridge  p:null  p:someday  p:whenever  paganism  papers  paradox  parasites-microbiome  parenting  pareto  parsimony  paste  path-dependence  patho-altruism  patience  paul-romer  paying-rent  pdf  peace-violence  people  performance  personal-finance  pessimism  phalanges  pharma  phase-transition  philosophy  phys-energy  physics  pic  piketty  piracy  planning  plots  pls  plt  poast  polarization  policy  polis  polisci  political-econ  politics  poll  polynomials  pop-diff  population  population-genetics  positivity  ppl  pragmatic  pre-2013  pre-ww2  prediction  preimage  prepping  preprint  presentation  primitivism  prioritizing  priors-posteriors  pro-rata  probability  problem-solving  productivity  profile  programming  progression  project  proofs  propaganda  properties  property-rights  proposal  protestant-catholic  protocol-metadata  prudence  psych-architecture  psycho-atoms  psychology  psychometrics  public-goodish  public-health  publishing  putnam-like  puzzles  python  q-n-a  qra  quality  quantified-self  quantitative-qualitative  quantum  quantum-info  questions  quixotic  quotes  race  rand-approx  random  randy-ayndy  rant  rat-pack  rationality  ratty  reading  realness  realpolitik  reason  recommendations  recruiting  reddit  redistribution  reduction  reference  reflection  regional-scatter-plots  regression  regression-to-mean  regularization  regularizer  reinforcement  relativity  religion  rent-seeking  replication  repo  reputation  research  research-program  responsibility  retention  retrofit  review  revolution  rhetoric  rhythm  rigidity  rigor  risk  ritual  roadmap  robotics  robust  roots  rot  rsc  russia  rust  s-factor  s:*  s:**  s:***  safety  sampling  sanctity-degradation  sapiens  scala  scale  scaling-tech  scholar  sci-comp  science  science-anxiety  scifi-fantasy  scitariat  scott-sumner  SDP  search  securities  security  selection  self-control  self-interest  sequential  sex  sexuality  shalizi  shift  shipping  short-circuit  signal-noise  signaling  signum  similarity  simplex  simplification-normalization  simulation  singularity  sinosphere  skeleton  skunkworks  sky  sleuthin  slides  slippery-slope  smoothness  social  social-capital  social-choice  social-norms  social-psych  social-science  social-structure  sociality  society  sociology  software  solid-study  space  space-complexity  span-cover  spatial  speaking  spearhead  speculation  speed  speedometer  spock  sports  spreading  ssc  stackex  stagnation  stamina  stanford  startups  stat-mech  stat-power  state  state-of-art  statesmen  static-dynamic  stats  status  stock-flow  stoic  store  stories  strategy  stream  street-fighting  stress  strings  structure  study  studying  stylized-facts  subculture  subjective-objective  sublinear  submodular  summary  supply-demand  survey  sv  symmetry  synchrony  syntax  synthesis  system-design  systematic-ad-hoc  systems  tactics  tainter  taxes  tcs  tcstariat  teaching  tech  tech-infrastructure  technical-writing  technocracy  technology  techtariat  telos-atelos  temperance  temperature  terminal  terrorism  tetlock  texas  the-basilisk  the-bones  the-classics  the-devil  the-great-west-whale  the-monster  the-self  the-south  the-trenches  the-watchers  the-west  the-world-is-just-atoms  theory-of-mind  theory-practice  theos  thermo  thick-thin  things  thinking  threat-modeling  tidbits  time  time-complexity  time-preference  time-series  time-use  tip-of-tongue  todo  tools  top-n  traces  track-record  tracker  trade  tradeoffs  tradition  transitions  transportation  trees  trends  tribalism  tricki  tricks  trivia  trust  truth  tumblr  turing  tutorial  tv  twitter  types  ubiquity  ui  unaffiliated  uncertainty  unintended-consequences  unit  universalism-particularism  unix  unsupervised  urban  urban-rural  us-them  usa  ux  vague  values  vampire-squid  variance-components  vcs  video  virginia-DC  virtu  visual-understanding  visualization  visuo  vitality  volo-avolo  von-neumann  walls  walter-scheidel  war  war-nerd  water  waves  wealth  wealth-of-nations  web  west-hunter  westminster  whiggish-hegelian  white-paper  whole-partial-many  wiki  wild-ideas  winner-take-all  wire-guided  within-group  within-without  woah  wonkish  wordlessness  workflow  working-stiff  workshop  world  world-war  wormholes  worrydream  worse-is-better/the-right-thing  writing  wtf  xenobio  yak-shaving  yoga  yvain  zeitgeist  zero-positive-sum  🌞  🎩  🐸  🔬  🖥  🤖  🦉 

Copy this bookmark:



description:


tags: