nhaliday + yak-shaving   258

Learn to speak vim — verbs, nouns, and modifiers! - Yan Pritzker
learn some verbs: v (visual), c (change), d (delete), y (yank/copy). these are the most important. there are others
learn some modifiers: i (inside), a (around), t (till..finds a character), f (find..like till except including the char), / (search..find a string/regex)
learn some text objects: w (word), s (sentence) p (paragraph) b (block/parentheses), t (tag, works for html/xml) there are others
techtariat  howto  objektbuch  top-n  list  editors  composition-decomposition  DSL  lexical  atoms  yak-shaving  language 
august 2019 by nhaliday
selenium - What is the difference between cssSelector & Xpath and which is better with respect to performance for cross browser testing? - Stack Overflow
CSS selectors perform far better than Xpath and it is well documented in Selenium community. Here are some reasons,
- Xpath engines are different in each browser, hence make them inconsistent
- IE does not have a native xpath engine, therefore selenium injects its own xpath engine for compatibility of its API. Hence we lose the advantage of using native browser features that WebDriver inherently promotes.
- Xpath tend to become complex and hence make hard to read in my opinion
However there are some situations where, you need to use xpath, for example, searching for a parent element or searching element by its text (I wouldn't recommend the later).
--
I’m going to hold the unpopular on SO selenium tag opinion that XPath is preferable to CSS in the longer run.

This long post has two sections - first I'll put a back-of-the-napkin proof the performance difference between the two is 0.1-0.3 milliseconds (yes; that's 100 microseconds), and then I'll share my opinion why XPath is more powerful.

...

With the performance out of the picture, why do I think xpath is better? Simple – versatility, and power.

Xpath is a language developed for working with XML documents; as such, it allows for much more powerful constructs than css.
For example, navigation in every direction in the tree – find an element, then go to its grandparent and search for a child of it having certain properties.
It allows embedded boolean conditions – cond1 and not(cond2 or not(cond3 and cond4)); embedded selectors – "find a div having these children with these attributes, and then navigate according to it".
XPath allows searching based on a node's value (its text) – however frowned upon this practice is, it does come in handy especially in badly structured documents (no definite attributes to step on, like dynamic ids and classes - locate the element by its text content).

The stepping in css is definitely easier – one can start writing selectors in a matter of minutes; but after a couple of days of usage, the power and possibilities xpath has quickly overcomes css.
And purely subjective – a complex css is much harder to read than a complex xpath expression.
q-n-a  stackex  comparison  best-practices  programming  yak-shaving  python  javascript  web  frontend  performance  DSL  debate  grokkability  trees  grokkability-clarity 
august 2019 by nhaliday
How can lazy importing be implemented in Python? - Quora
The Mercurial revision control system has the most solid lazy import implementation I know of. Note well that it's licensed under the GPL, so you can't simply use that code in a project of your own.
- Bryan O'Sullivan
q-n-a  qra  programming  python  howto  examples  performance  tricks  time  latency-throughput  yak-shaving  expert-experience  hg  build-packaging  oss  property-rights  intellectual-property 
august 2019 by nhaliday
How to make a fast command line tool in Python
An overview of why Python programs tend to be slow to start running, and some techniques Bazaar uses to start quickly, such as lazy imports.
techtariat  presentation  howto  objektbuch  tutorial  python  programming  performance  tricks  time  latency-throughput  yak-shaving  build-packaging 
august 2019 by nhaliday
The Compositional Nature of Vim - Ismail Badawi
https://medium.com/@mkozlows/why-atom-cant-replace-vim-433852f4b4d1
1976 was a good year for text editors. At MIT, Richard Stallman and Guy Steele wrote the first version of Emacs. And over at Berkeley, Bill Joy wrote vi (though it wouldn’t be called that for a few years yet).
It’s reductionist to say that these two editors were each built around one big idea, but what the hell, let’s be reductionist. Because what stands out today, looking at modern editors like Sublime Text and Atom, is how Emacs’ big idea has been thoroughly learned — and how vi’s big idea hasn’t.

Emacs and Extensibility
Vi and Composability
techtariat  editors  yak-shaving  productivity  workflow  composition-decomposition  metabuch  howto  syntax  lexical  objektbuch  degrees-of-freedom  flexibility  DSL  multi  integration-extension  org:med  atoms 
august 2019 by nhaliday
Three best practices for building successful data pipelines - O'Reilly Media
Drawn from their experiences and my own, I’ve identified three key areas that are often overlooked in data pipelines, and those are making your analysis:
1. Reproducible
2. Consistent
3. Productionizable

...

Science that cannot be reproduced by an external third party is just not science — and this does apply to data science. One of the benefits of working in data science is the ability to apply the existing tools from software engineering. These tools let you isolate all the dependencies of your analyses and make them reproducible.

Dependencies fall into three categories:
1. Analysis code ...
2. Data sources ...
3. Algorithmic randomness ...

...

Establishing consistency in data
...

There are generally two ways of establishing the consistency of data sources. The first is by checking-in all code and data into a single revision control repository. The second method is to reserve source control for code and build a pipeline that explicitly depends on external data being in a stable, consistent format and location.

Checking data into version control is generally considered verboten for production software engineers, but it has a place in data analysis. For one thing, it makes your analysis very portable by isolating all dependencies into source control. Here are some conditions under which it makes sense to have both code and data in source control:
Small data sets ...
Regular analytics ...
Fixed source ...

Productionizability: Developing a common ETL
...

1. Common data format ...
2. Isolating library dependencies ...

https://blog.koresoftware.com/blog/etl-principles
Rigorously enforce the idempotency constraint
For efficiency, seek to load data incrementally
Always ensure that you can efficiently process historic data
Partition ingested data at the destination
Rest data between tasks
Pool resources for efficiency
Store all metadata together in one place
Manage login details in one place
Specify configuration details once
Parameterize sub flows and dynamically run tasks where possible
Execute conditionally
Develop your own workflow framework and reuse workflow components

more focused on details of specific technologies:
https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7

https://www.cloudera.com/documentation/director/cloud/topics/cloud_de_best_practices.html
techtariat  org:com  best-practices  engineering  code-organizing  machine-learning  data-science  yak-shaving  nitty-gritty  workflow  config  vcs  replication  homo-hetero  multi  org:med  design  system-design  links  shipping  minimalism  volo-avolo  causation  random  invariance  structure  arrows  protocol-metadata  interface-compatibility 
august 2019 by nhaliday
Call graph - Wikipedia
I've found both static and dynamic versions useful (former mostly when I don't want to go thru pain of compiling something)

best options AFAICT:

C/C++ and maybe Go: https://github.com/gperftools/gperftools
https://gperftools.github.io/gperftools/cpuprofile.html

static: https://github.com/Vermeille/clang-callgraph
https://stackoverflow.com/questions/5373714/how-to-generate-a-calling-graph-for-c-code
I had to go through some extra pain to get this to work:
- if you use Homebrew LLVM (that's slightly incompatible w/ macOS c++filt, make sure to pass -n flag)
- similarly macOS sed needs two extra backslashes for each escape of the angle brackets

another option: doxygen

Go: https://stackoverflow.com/questions/31362332/creating-call-graph-in-golang
both static and dynamic in one tool

Java: https://github.com/gousiosg/java-callgraph
both static and dynamic in one tool

Python:
https://github.com/gak/pycallgraph
more up-to-date forks: https://github.com/daneads/pycallgraph2 and https://github.com/YannLuo/pycallgraph
old docs: https://pycallgraph.readthedocs.io/en/master/

static: https://github.com/davidfraser/pyan

various: https://github.com/jrfonseca/gprof2dot

I believe all the dynamic tools listed here support weighting nodes and edges by CPU time/samples (inclusive and exclusive of descendants) and discrete calls. In the case of the gperftools and the Java option you probably have to parse the output to get the latter, tho.

IIRC Dtrace has probes for function entry/exit. So that's an option as well.
concept  wiki  reference  tools  devtools  graphs  trees  programming  code-dive  let-me-see  big-picture  libraries  software  recommendations  list  top-n  links  c(pp)  golang  python  javascript  jvm  stackex  q-n-a  howto  yak-shaving  visualization  dataviz  performance  structure  oss  osx  unix  linux  static-dynamic 
july 2019 by nhaliday
Laurence Tratt: What Challenges and Trade-Offs do Optimising Compilers Face?
Summary
It's important to be realistic: most people don't care about program performance most of the time. Modern computers are so fast that most programs run fast enough even with very slow language implementations. In that sense, I agree with Daniel's premise: optimising compilers are often unimportant. But “often” is often unsatisfying, as it is here. Users find themselves transitioning from not caring at all about performance to suddenly really caring, often in the space of a single day.

This, to me, is where optimising compilers come into their own: they mean that even fewer people need care about program performance. And I don't mean that they get us from, say, 98 to 99 people out of 100 not needing to care: it's probably more like going from 80 to 99 people out of 100 not needing to care. This is, I suspect, more significant than it seems: it means that many people can go through an entire career without worrying about performance. Martin Berger reminded me of A N Whitehead’s wonderful line that “civilization advances by extending the number of important operations which we can perform without thinking about them” and this seems a classic example of that at work. Even better, optimising compilers are widely tested and thus generally much more reliable than the equivalent optimisations performed manually.

But I think that those of us who work on optimising compilers need to be honest with ourselves, and with users, about what performance improvement one can expect to see on a typical program. We have a tendency to pick the maximum possible improvement and talk about it as if it's the mean, when there's often a huge difference between the two. There are many good reasons for that gap, and I hope in this blog post I've at least made you think about some of the challenges and trade-offs that optimising compilers are subject to.

[1]
Most readers will be familiar with Knuth’s quip that “premature optimisation is the root of all evil.” However, I doubt that any of us have any real idea what proportion of time is spent in the average part of the average program. In such cases, I tend to assume that Pareto’s principle won't be far too wrong (i.e. that 80% of execution time is spent in 20% of code). In 1971 a study by Knuth and others of Fortran programs, found that 50% of execution time was spent in 4% of code. I don't know of modern equivalents of this study, and for them to be truly useful, they'd have to be rather big. If anyone knows of something along these lines, please let me know!
techtariat  programming  compilers  performance  tradeoffs  cost-benefit  engineering  yak-shaving  pareto  plt  c(pp)  rust  golang  trivia  data  objektbuch  street-fighting  estimate  distribution  pro-rata 
july 2019 by nhaliday
python - Executing multi-line statements in the one-line command-line? - Stack Overflow
you could do
> echo -e "import sys\nfor r in range(10): print 'rob'" | python
or w/out pipes:
> python -c "exec(\"import sys\nfor r in range(10): print 'rob'\")"
or
> (echo "import sys" ; echo "for r in range(10): print 'rob'") | python

[ed.: In fish
> python -c "import sys"\n"for r in range(10): print 'rob'"]
q-n-a  stackex  programming  yak-shaving  pls  python  howto  terminal  parsimony  syntax  gotchas 
july 2019 by nhaliday
linux - How do I insert a tab character in Iterm? - Stack Overflow
However, this isn't an iTerm thing, this is your shell that's doing it.

ctrl-V for inserting nonprintable literals doesn't work in fish, neither in vi mode nor emacs mode. prob easiest to just switch to bash.
q-n-a  stackex  terminal  yak-shaving  gotchas  keyboard  tip-of-tongue  strings 
june 2019 by nhaliday
Regex cheatsheet
Many programs use regular expression to find & replace text. However, they tend to come with their own different flavor.

You can probably expect most modern software and programming languages to be using some variation of the Perl flavor, "PCRE"; however command-line tools (grep, less, ...) will often use the POSIX flavor (sometimes with an extended variant, e.g. egrep or sed -r). ViM also comes with its own syntax (a superset of what Vi accepts).

This cheatsheet lists the respective syntax of each flavor, and the software that uses it.

accidental complexity galore
techtariat  reference  cheatsheet  documentation  howto  yak-shaving  editors  strings  syntax  examples  crosstab  objektbuch  python  comparison  gotchas  tip-of-tongue  automata-languages  pls  trivia  properties  libraries  nitty-gritty  intricacy  degrees-of-freedom  DSL  programming 
june 2019 by nhaliday
An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development
The choice of an efficient document preparation system is an important decision for any academic researcher. To assist the research community, we report a software usability study in which 40 researchers across different disciplines prepared scholarly texts with either Microsoft Word or LaTeX. The probe texts included simple continuous text, text with tables and subheadings, and complex text with several mathematical equations. We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software. We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems. Individuals, institutions, and journals should carefully consider the ramifications of this finding when choosing document preparation strategies, or requiring them of authors.

...

However, our study suggests that LaTeX should be used as a document preparation system only in cases in which a document is heavily loaded with mathematical equations. For all other types of documents, our results suggest that LaTeX reduces the user’s productivity and results in more orthographical, grammatical, and formatting errors, more typos, and less written text than Microsoft Word over the same duration of time. LaTeX users may argue that the overall quality of the text that is created with LaTeX is better than the text that is created with Microsoft Word. Although this argument may be true, the differences between text produced in more recent editions of Microsoft Word and text produced in LaTeX may be less obvious than it was in the past. Moreover, we believe that the appearance of text matters less than the scientific content and impact to the field. In particular, LaTeX is also used frequently for text that does not contain a significant amount of mathematical symbols and formula. We believe that the use of LaTeX under these circumstances is highly problematic and that researchers should reflect on the criteria that drive their preferences to use LaTeX over Microsoft Word for text that does not require significant mathematical representations.

...

A second decision criterion that factors into the choice to use a particular software system is reflection about what drives certain preferences. A striking result of our study is that LaTeX users are highly satisfied with their system despite reduced usability and productivity. From a psychological perspective, this finding may be related to motivational factors, i.e., the driving forces that compel or reinforce individuals to act in a certain way to achieve a desired goal. A vital motivational factor is the tendency to reduce cognitive dissonance. According to the theory of cognitive dissonance, each individual has a motivational drive to seek consonance between their beliefs and their actual actions. If a belief set does not concur with the individual’s actual behavior, then it is usually easier to change the belief rather than the behavior [6]. The results from many psychological studies in which people have been asked to choose between one of two items (e.g., products, objects, gifts, etc.) and then asked to rate the desirability, value, attractiveness, or usefulness of their choice, report that participants often reduce unpleasant feelings of cognitive dissonance by rationalizing the chosen alternative as more desirable than the unchosen alternative [6, 7]. This bias is usually unconscious and becomes stronger as the effort to reject the chosen alternative increases, which is similar in nature to the case of learning and using LaTeX.

...

Given these numbers it remains an open question to determine the amount of taxpayer money that is spent worldwide for researchers to use LaTeX over a more efficient document preparation system, which would free up their time to advance their respective field. Some publishers may save a significant amount of money by requesting or allowing LaTeX submissions because a well-formed LaTeX document complying with a well-designed class file (template) is much easier to bring into their publication workflow. However, this is at the expense of the researchers’ labor time and effort. We therefore suggest that leading scientific journals should consider accepting submissions in LaTeX only if this is justified by the level of mathematics presented in the paper. In all other cases, we think that scholarly journals should request authors to submit their documents in Word or PDF format. We believe that this would be a good policy for two reasons. First, we think that the appearance of the text is secondary to the scientific merit of an article and its impact to the field. And, second, preventing researchers from producing documents in LaTeX would save time and money to maximize the benefit of research and development for both the research team and the public.

[ed.: I sense some salt.

And basically no description of how "# errors" was calculated.]

https://news.ycombinator.com/item?id=8797002
I question the validity of their methodology.
At no point in the paper is exactly what is meant by a "formatting error" or a "typesetting error" defined. From what I gather, the participants in the study were required to reproduce the formatting and layout of the sample text. In theory, a LaTeX file should strictly be a semantic representation of the content of the document; while TeX may have been a raw typesetting language, this is most definitely not the intended use case of LaTeX and is overall a very poor test of its relative advantages and capabilities.
The separation of the semantic definition of the content from the rendering of the document is, in my opinion, the most important feature of LaTeX. Like CSS, this allows the actual formatting to be abstracted away, allowing plain (marked-up) content to be written without worrying about typesetting.
Word has some similar capabilities with styles, and can be used in a similar manner, though few Word users actually use the software properly. This may sound like a relatively insignificant point, but in practice, almost every Word document I have seen has some form of inconsistent formatting. If Word disallowed local formatting changes (including things such as relative spacing of nested bullet points), forcing all formatting changes to be done in document-global styles, it would be a far better typesetting system. Also, the users would be very unhappy.
Yes, LaTeX can undeniably be a pain in the arse, especially when it comes to trying to get figures in the right place; however the combination of a simple, semantic plain-text representation with a flexible and professional typesetting and rendering engine are undeniable and completely unaddressed by this study.
--
It seems that the test was heavily biased in favor of WYSIWYG.
Of course that approach makes it very simple to reproduce something, as has been tested here. Even simpler would be to scan the document and run OCR. The massive problem with both approaches (WYSIWYG and scanning) is that you can't generalize any of it. You're doomed repeating it forever.
(I'll also note the other significant issue with this study: when the ratings provided by participants came out opposite of their test results, they attributed it to irrational bias.)

https://www.nature.com/articles/d41586-019-01796-1
Over the past few years however, the line between the tools has blurred. In 2017, Microsoft made it possible to use LaTeX’s equation-writing syntax directly in Word, and last year it scrapped Word’s own equation editor. Other text editors also support elements of LaTeX, allowing newcomers to use as much or as little of the language as they like.

https://news.ycombinator.com/item?id=20191348
study  hmm  academia  writing  publishing  yak-shaving  technical-writing  software  tools  comparison  latex  scholar  regularizer  idk  microsoft  evidence-based  science  desktop  time  efficiency  multi  hn  commentary  critique  news  org:sci  flux-stasis  duplication  metrics  biases 
june 2019 by nhaliday
macos - Converting cron to launchd - MAILTO - Stack Overflow
one way to convert to launchd is the lazy way (how i do it)

You pay $0.99 for Lingon from the app store; then you can just fill out a few fields and it makes the launchd...

otherwise: a launchd would look like this
q-n-a  stackex  howto  yak-shaving  osx  desktop  automation 
june 2019 by nhaliday
bibliographies - bibtex vs. biber and biblatex vs. natbib - TeX - LaTeX Stack Exchange
- bibtex and biber are external programs that process bibliography information and act (roughly) as the interface between your .bib file and your LaTeX document.
- natbib and biblatex are LaTeX packages that format citations and bibliographies; natbib works only with bibtex, while biblatex (at the moment) works with both bibtex and biber.)

natbib
The natbib package has been around for quite a long time, and although still maintained, it is fair to say that it isn't being further developed. It is still widely used, and very reliable.

Advantages
...
- The resulting bibliography code can be pasted directly into a document (often required for journal submissions). See Biblatex: submitting to a journal.

...

biblatex
The biblatex package is being actively developed in conjunction with the biber backend.

Advantages
*lots*

Disadvantages
- Journals and publishers may not accept documents that use biblatex if they have a house style with its own natbib compatible .bst file.
q-n-a  stackex  latex  comparison  cost-benefit  writing  scholar  technical-writing  yak-shaving  publishing 
may 2019 by nhaliday
c++ - How to check if LLDB loaded debug symbols from shared libraries? - Stack Overflow
Now this question is also answered in official LDDB documentation in "Troubleshooting LLDB" section, please see "How do I check if I have debug symbols?": https://lldb.llvm.org/use/troubleshooting.html#how-do-i-check-if-i-have-debug-symbols It gives a slightly different approach, even though the approach from the accepted answer worked quite fine for me. – dying_sphynx Nov 3 '18 at 10:58

One fairly simple way to do it is:
(lldb) image lookup -vn <SomeFunctionNameThatShouldHaveDebugInfo>
q-n-a  stackex  programming  yak-shaving  gotchas  howto  debugging  build-packaging  llvm  multi  documentation 
may 2019 by nhaliday
package writing - Where do I start LaTeX programming? - TeX - LaTeX Stack Exchange
I think there are three categories which need to be mastered (perhaps not all in the same degree) in order to become comfortable around TeX programming:

1. TeX programming. That's very basic, it deals with expansion control, counters, scopes, basic looping constructs and so on.

2. TeX typesetting. That's on a higher level, it includes control over boxes, lines, glues, modes, and perhaps about 1000 parameters.

3. Macro packages like LaTeX.
q-n-a  stackex  programming  latex  howto  nitty-gritty  yak-shaving  links  list  recommendations  books  guide  DSL 
may 2019 by nhaliday
packages - Are the TeX semantics and grammar defined somewhere in some official documents? - TeX - LaTeX Stack Exchange
The grammar of each TeX command is more or less completely given in The TeXBook. Note, however, that unlike most programming languages the lexical analysis and tokenisation of the input cannot be separated from execution as the catcode table which controls tokenisation is dynamically changeable. Thus parsing TeX tends to defeat most parser generation tools.

LaTeX is a set of macros written in TeX so is defined by its implementation, although there is fairly extensive documentation in The LaTeX Companion, the LaTeX book (LaTeX: A Document Preparation System), and elsewhere.
q-n-a  stackex  programming  compilers  latex  yak-shaving  nitty-gritty  syntax 
may 2019 by nhaliday
documentation - Materials for learning TikZ - TeX - LaTeX Stack Exchange
The way I learned all three was basically demand-driven --- "learning by doing". Whenever I needed something "new", I'd dig into the manual and try stuff until either it worked (not always most elegantly), or in desperation go to the examples website, or moan here on TeX-'n-Friends. Occasionally supplemented by trying to answer "challenging" questions here.

yeah I kinda figured that was the right approach. just not worth the time to be proactive.
q-n-a  stackex  latex  list  links  tutorial  guide  learning  yak-shaving  recommendations  programming  visuo  dataviz  prioritizing  technical-writing 
may 2019 by nhaliday
What every computer scientist should know about floating-point arithmetic
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations
"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:
https://en.wikipedia.org/wiki/Pairwise_summation
In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs
However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]
nibble  pdf  papers  programming  systems  numerics  nitty-gritty  intricacy  approximation  accuracy  types  sci-comp  multi  q-n-a  stackex  hmm  oly-programming  accretion  formal-methods  yak-shaving  wiki  reference  algorithms  yoga  ground-up  divide-and-conquer  fourier  books  tidbits  chart  caltech  nostalgia 
may 2019 by nhaliday
« earlier      
per page:    204080120160

bundles : engframemetapropstechie

related tags

-_-  ability-competence  abstraction  academia  accretion  accuracy  acmtariat  advanced  advertising  advice  aggregator  ai  alg-combo  algebra  algorithms  analogy  analysis  antiquity  aphorism  api  app  apple  applicability-prereqs  approximation  arbitrage  aristos  arrows  art  asia  atoms  attention  audio  auto-learning  automata-languages  automation  backup  bangbang  bayesian  beauty  behavioral-gen  benchmarks  best-practices  better-explained  biases  bifl  big-list  big-peeps  big-picture  bio  biodet  bitcoin  blog  books  bots  bounded-cognition  brands  browser  build-packaging  business  c(pp)  caching  calculator  caltech  career  carmack  CAS  causation  characterization  chart  cheatsheet  checking  checklists  chemistry  china  civic  civilization  class  classic  client-server  cloud  coalitions  code-dive  code-organizing  collaboration  comics  commentary  communication  community  comparison  compensation  compilers  composition-decomposition  compression  computer-memory  computer-vision  concept  conceptual-vocab  concurrency  confidence  config  constraint-satisfaction  consumerism  contest  contradiction  contrarianism  cool  coordination  core-rats  correctness  cost-benefit  coupling-cohesion  cracker-econ  cracker-prog  creative  critique  crosstab  crypto  cs  culture  current-events  d-lang  dan-luu  data  data-science  database  dataset  dataviz  dbs  debate  debugging  decision-making  deep-learning  degrees-of-freedom  density  design  desktop  developing-world  devops  devtools  diogenes  direct-indirect  dirty-hands  discipline  discussion  distributed  distribution  divide-and-conquer  diy  documentation  dotnet  dropbox  DSL  dumb-ML  duplication  dynamic  early-modern  econ-metrics  ecosystem  editors  education  efficiency  eh  elegance  elite  email  engineering  enhancement  enlightenment-renaissance-restoration-reformation  epistemic  ergo  error  error-handling  essay  estimate  ethics  europe  evidence-based  examples  exocortex  expert-experience  explanation  exploratory  extra-introversion  facebook  faq  features  feynman  fiction  flexibility  flux-stasis  foreign-lang  form-design  formal-methods  formal-values  forum  fourier  frameworks  french  frequency  frontend  frontier  functional  games  gbooks  generalization  genetics  genomics  giants  gif  git  github  gnu  golang  google  gotchas  government  gowers  graphics  graphs  grokkability  grokkability-clarity  ground-up  guide  gwern  hacker  hanson  hardware  hashing  haskell  heavyweights  heuristic  hg  hi-order-bits  higher-ed  history  hmm  hn  homepage  homo-hetero  howto  hsu  huge-data-the-biggest  human-bean  ide  ideas  identification-equivalence  idk  impetus  incentives  india  info-dynamics  info-econ  info-foraging  infographic  init  innovation  insight  institutions  integration-extension  intellectual-property  interface-compatibility  internet  intersection-connectedness  interview  intricacy  invariance  investigative-journo  investing  iq  iron-age  iteration-recursion  jargon  javascript  jobs  judgement  julia  jvm  kaggle  keyboard  keyboards  knowledge  language  latency-throughput  latex  law  leadership  learning  legacy  len:long  lens  lesswrong  let-me-see  letters  lexical  libraries  limits  linear-algebra  links  linux  list  literature  llvm  logic  lol  long-short-run  long-term  longform  low-hanging  machine-learning  madisonian  management  maps  math  math.AT  math.CA  math.CT  mathtariat  measure  measurement  media  medicine  medieval  mediterranean  memory-management  mental-math  meta:prediction  meta:reading  metabuch  metal-to-virtual  methodology  metrics  microbiz  microsoft  military  mindful  minimalism  mobile  models  money  money-for-time  mooc  morality  mostly-modern  move-fast-(and-break-things)  multi  multiplicative  murray  music  network-structure  networking  news  nibble  nihil  nitty-gritty  nonlinearity  nostalgia  notation  notetaking  numerics  objektbuch  ocaml-sml  occident  ocr  oly  oly-programming  oop  open-closed  openai  optimization  orders  org:bleg  org:com  org:edu  org:fin  org:junk  org:lite  org:med  org:rec  org:sci  organization  orient  os  oss  osx  outliers  overflow  packaging  papers  pareto  parsimony  paste  pdf  people  performance  personal-finance  personality  pessimism  philosophy  physics  pic  pinboard  piracy  plan9  planning  plots  pls  plt  poast  podcast  policy  polisci  politics  pragmatic  prediction  presentation  prioritizing  priors-posteriors  privacy  pro-rata  probability  productivity  profile  programming  project  properties  property-rights  proposal  protocol-metadata  prudence  psych-architecture  psychometrics  publishing  python  q-n-a  qra  quantitative-qualitative  quiz  quotes  r-lang  random  ranking  rant  rationality  ratty  recommendations  reddit  reference  reflection  regularizer  reinforcement  religion  replication  repo  research  retention  retrofit  review  rhetoric  right-wing  rigidity  robust  roots  rsc  rust  s-factor  saas  safety  sanctity-degradation  scala  scaling-tech  scaling-up  scholar  scholar-pack  sci-comp  science  scifi-fantasy  scitariat  search  security  sequential  shipping  simplification-normalization  sinosphere  skunkworks  sleep  sleuthin  slides  social  social-science  soft-question  software  spatial  sports  spotify  spreading  stackex  startups  state  state-of-art  static-dynamic  stats  stock-flow  stories  stream  street-fighting  stress  strings  structure  study  studying  sub-super  subculture  success  summary  summer-2014  supply-demand  sv  syntax  system-design  systems  taxes  tcs  tcstariat  tech  tech-infrastructure  technical-writing  technology  techtariat  terminal  the-classics  the-great-west-whale  theos  things  thinking  tidbits  time  time-preference  time-series  time-use  tip-of-tongue  todo  tools  top-n  traces  track-record  tradeoffs  trees  trends  tricks  trivia  trump  tutorial  tv  twitter  types  ubiquity  ui  uncertainty  unit  universalism-particularism  unix  usa  ux  vcs  video  virginia-DC  virtu  virtualization  visualization  visuo  vitality  volo-avolo  vulgar  web  webapp  white-paper  whole-partial-many  wiki  wire-guided  wkfly  workflow  working-stiff  world  worrydream  worse-is-better/the-right-thing  writing  yak-shaving  yoga  🖥 

Copy this bookmark:



description:


tags: