nhaliday + syntax   49

The Compositional Nature of Vim - Ismail Badawi
1976 was a good year for text editors. At MIT, Richard Stallman and Guy Steele wrote the first version of Emacs. And over at Berkeley, Bill Joy wrote vi (though it wouldn’t be called that for a few years yet).
It’s reductionist to say that these two editors were each built around one big idea, but what the hell, let’s be reductionist. Because what stands out today, looking at modern editors like Sublime Text and Atom, is how Emacs’ big idea has been thoroughly learned — and how vi’s big idea hasn’t.

Emacs and Extensibility
Vi and Composability
techtariat  editors  yak-shaving  productivity  workflow  composition-decomposition  metabuch  howto  syntax  lexical  objektbuch  degrees-of-freedom  flexibility  DSL  multi  integration-extension  org:med  atoms 
11 weeks ago by nhaliday
Panel: Systems Programming in 2014 and Beyond | Lang.NEXT 2014 | Channel 9
- Bjarne Stroustrup, Niko Matsakis, Andrei Alexandrescu, Rob Pike
- 2014 so pretty outdated but rare to find a discussion with people like this together
- pretty sure Jonathan Blow asked a couple questions
- Rob Pike compliments Rust at one point. Also kinda softly rags on dynamic typing at one point ("unit testing is what they have instead of static types").
video  presentation  debate  programming  pls  c(pp)  systems  os  rust  d-lang  golang  computer-memory  legacy  devtools  formal-methods  concurrency  compilers  syntax  parsimony  google  intricacy  thinking  cost-benefit  degrees-of-freedom  facebook  performance  people  rsc  cracker-prog  critique  types  checking  api  flux-stasis  engineering  time  wire-guided  worse-is-better/the-right-thing  static-dynamic  latency-throughput 
july 2019 by nhaliday
python - Executing multi-line statements in the one-line command-line? - Stack Overflow
you could do
> echo -e "import sys\nfor r in range(10): print 'rob'" | python
or w/out pipes:
> python -c "exec(\"import sys\nfor r in range(10): print 'rob'\")"
> (echo "import sys" ; echo "for r in range(10): print 'rob'") | python

[ed.: In fish
> python -c "import sys"\n"for r in range(10): print 'rob'"]
q-n-a  stackex  programming  yak-shaving  pls  python  howto  terminal  parsimony  syntax  gotchas 
july 2019 by nhaliday
Why is Google Translate so bad for Latin? A longish answer. : latin
> All it does its correlate sequences of up to five consecutive words in texts that have been manually translated into two or more languages.
That sort of system ought to be perfect for a dead language, though. Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.

We're not exactly inundated with brand new Latin to translate.
> Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.
What makes you think that the Google folks haven't done so and used that to create the language models they use?
> That sort of system ought to be perfect for a dead language, though.
Perhaps. But it will be bad at translating novel English sentences to Latin.
foreign-lang  reddit  social  discussion  language  the-classics  literature  dataset  measurement  roots  traces  syntax  anglo  nlp  stackex  links  q-n-a  linguistics  lexical  deep-learning  sequential  hmm  project  arrows  generalization  state-of-art  apollonian-dionysian  machine-learning  google 
june 2019 by nhaliday
Which of Haskell and OCaml is more practical? For example, in which aspect will each play a key role? - Quora
- Tikhon Jelvis,


This is a question I'm particularly well-placed to answer because I've spent quite a bit of time with both Haskell and OCaml, seeing both in the real world (including working at Jane Street for a bit). I've also seen the languages in academic settings and know many people at startups using both languages. This gives me a good perspective on both languages, with a fairly similar amount of experience in the two (admittedly biased towards Haskell).

And so, based on my own experience rather than the languages' reputations, I can confidently say it's Haskell.

Parallelism and Concurrency




Typeclasses vs Modules


In some sense, OCaml modules are better behaved and founded on a sounder theory than Haskell typeclasses, which have some serious drawbacks. However, the fact that typeclasses can be reliably inferred whereas modules have to be explicitly used all the time more than makes up for this. Moreover, extensions to the typeclass system enable much of the power provided by OCaml modules.


Of course, OCaml has some advantages of its own as well. It has a performance profile that's much easier to predict. The module system is awesome and often missed in Haskell. Polymorphic variants can be very useful for neatly representing certain situations, and don't have an obvious Haskell analog.

While both languages have a reasonable C FFI, OCaml's seems a bit simpler. It's hard for me to say this with any certainty because I've only used the OCaml FFI myself, but it was quite easy to use—a hard bar for Haskell's to clear. One really nice use of modules in OCaml is to pass around values directly from C as abstract types, which can help avoid extra marshalling/unmarshalling; that seemed very nice in OCaml.

However, overall, I still think Haskell is the more practical choice. Apart from the reasoning above, I simply have my own observations: my Haskell code tends to be clearer, simpler and shorter than my OCaml code. I'm also more productive in Haskell. Part of this is certainly a matter of having more Haskell experience, but the delta is limited especially as I'm working at my third OCaml company. (Of course, the first two were just internships.)

Both Haskell and OCaml are uniquivocally superb options—miles ahead of any other languages I know. While I do prefer Haskell, I'd choose either one in a pinch.

I've looked at F# a bit, but it feels like it makes too many tradeoffs to be on .NET. You lose the module system, which is probably OCaml's best feature, in return for an unfortunate, nominally typed OOP layer.

I'm also not invested in .NET at all: if anything, I'd prefer to avoid it in favor of simplicity. I exclusively use Linux and, from the outside, Mono doesn't look as good as it could be. I'm also far more likely to interoperate with a C library than a .NET library.

If I had some additional reason to use .NET, I'd definitely go for F#, but right now I don't.

Thinking about it now, it boils down to a single word: expressiveness. When I'm writing OCaml, I feel more constrained than when I'm writing Haskell. And that's important: unlike so many others, what first attracted me to Haskell was expressiveness, not safety. It's easier for me to write code that looks how I want it to look in Haskell. The upper bound on code quality is higher.


Perhaps it all boils down to OCaml and its community feeling more "worse is better" than Haskell, something I highly disfavor.


Laziness or, more strictly, non-strictness is big. A controversial start, perhaps, but I stand by it. Unlike some, I do not see non-strictness as a design mistake but as a leap in abstraction. Perhaps a leap before its time, but a leap nonetheless. Haskell lets me program without constantly keeping the code's order in my head. Sure, it's not perfect and sometimes performance issues jar the illusion, but they are the exception not the norm. Coming from imperative languages where order is omnipresent (I can't even imagine not thinking about execution order as I write an imperative program!) it's incredibly liberating, even accounting for the weird issues and jinks I'd never see in a strict language.

This is what I imagine life felt like with the first garbage collectors: they may have been slow and awkward, the abstraction might have leaked here and there, but, for all that, it was an incredible advance. You didn't have to constantly think about memory allocation any more. It took a lot of effort to get where we are now and garbage collectors still aren't perfect and don't fit everywhere, but it's hard to imagine the world without them. Non-strictness feels like it has the same potential, without anywhere near the work garbage collection saw put into it.


The other big thing that stands out are typeclasses. OCaml might catch up on this front with implicit modules or it might not (Scala implicits are, by many reports, awkward at best—ask Edward Kmett about it, not me) but, as it stands, not having them is a major shortcoming. Not having inference is a bigger deal than it seems: it makes all sorts of idioms we take for granted in Haskell awkward in OCaml which means that people simply don't use them. Haskell's typeclasses, for all their shortcomings (some of which I find rather annoying), are incredibly expressive.

In Haskell, it's trivial to create your own numeric type and operators work as expected. In OCaml, while you can write code that's polymorphic over numeric types, people simply don't. Why not? Because you'd have to explicitly convert your literals and because you'd have to explicitly open a module with your operators—good luck using multiple numeric types in a single block of code! This means that everyone uses the default types: (63/31-bit) ints and doubles. If that doesn't scream "worse is better", I don't know what does.


There's more. Haskell's effect management, brought up elsewhere in this thread, is a big boon. It makes changing things more comfortable and makes informal reasoning much easier. Haskell is the only language where I consistently leave code I visit better than I found it. Even if I hadn't worked on the project in years. My Haskell code has better longevity than my OCaml code, much less other languages.

One observation about purity and randomness: I think one of the things people frequently find annoying in Haskell is the fact that randomness involves mutation of state, and thus be wrapped in a monad. This makes building probabilistic data structures a little clunkier, since you can no longer expose pure interfaces. OCaml is not pure, and as such you can query the random number generator whenever you want.

However, I think Haskell may get the last laugh in certain circumstances. In particular, if you are using a random number generator in order to generate random test cases for your code, you need to be able to reproduce a particular set of random tests. Usually, this is done by providing a seed which you can then feed back to the testing script, for deterministic behavior. But because OCaml's random number generator manipulates global state, it's very easy to accidentally break determinism by asking for a random number for something unrelated. You can work around it by manually bracketing the global state, but explicitly handling the randomness state means providing determinism is much more natural.
q-n-a  qra  programming  pls  engineering  nitty-gritty  pragmatic  functional  haskell  ocaml-sml  dotnet  types  arrows  cost-benefit  tradeoffs  concurrency  libraries  performance  expert-experience  composition-decomposition  comparison  critique  multi  reddit  social  discussion  techtariat  reflection  review  random  data-structures  numerics  rand-approx  sublinear  syntax  volo-avolo  causation  scala  jvm  ecosystem  metal-to-virtual 
june 2019 by nhaliday
Regex cheatsheet
Many programs use regular expression to find & replace text. However, they tend to come with their own different flavor.

You can probably expect most modern software and programming languages to be using some variation of the Perl flavor, "PCRE"; however command-line tools (grep, less, ...) will often use the POSIX flavor (sometimes with an extended variant, e.g. egrep or sed -r). ViM also comes with its own syntax (a superset of what Vi accepts).

This cheatsheet lists the respective syntax of each flavor, and the software that uses it.

accidental complexity galore
techtariat  reference  cheatsheet  documentation  howto  yak-shaving  editors  strings  syntax  examples  crosstab  objektbuch  python  comparison  gotchas  tip-of-tongue  automata-languages  pls  trivia  properties  libraries  nitty-gritty  intricacy  degrees-of-freedom  DSL  programming 
june 2019 by nhaliday
packages - Are the TeX semantics and grammar defined somewhere in some official documents? - TeX - LaTeX Stack Exchange
The grammar of each TeX command is more or less completely given in The TeXBook. Note, however, that unlike most programming languages the lexical analysis and tokenisation of the input cannot be separated from execution as the catcode table which controls tokenisation is dynamically changeable. Thus parsing TeX tends to defeat most parser generation tools.

LaTeX is a set of macros written in TeX so is defined by its implementation, although there is fairly extensive documentation in The LaTeX Companion, the LaTeX book (LaTeX: A Document Preparation System), and elsewhere.
q-n-a  stackex  programming  compilers  latex  yak-shaving  nitty-gritty  syntax 
may 2019 by nhaliday
What makes Java easier to parse than C? - Stack Overflow
Parsing C++ is getting hard. Parsing Java is getting to be just as hard

cf the Linked questions too, lotsa good stuff
q-n-a  stackex  compilers  pls  plt  jvm  c(pp)  intricacy  syntax  automata-languages  cost-benefit  incentives  legacy 
may 2019 by nhaliday
Perseus Digital Library
This is actually really useful.

- Load English translation side-by-side if available.
- Click on any word and see the best guess for definition+inflection given context.
tools  reference  history  iron-age  mediterranean  the-classics  canon  foreign-lang  linguistics  database  quixotic  stoic  syntax  lexical  exocortex  aggregator  search 
february 2019 by nhaliday
Roman naming conventions - Wikipedia
The distinguishing feature of Roman nomenclature was the use of both personal names and regular surnames. Throughout Europe and the Mediterranean, other ancient civilizations distinguished individuals through the use of single personal names, usually dithematic in nature. Consisting of two distinct elements, or "themes", these names allowed for hundreds or even thousands of possible combinations. But a markedly different system of nomenclature arose in Italy, where the personal name was joined by a hereditary surname. Over time, this binomial system expanded to include additional names and designations.[1][2]

In ancient Rome, a gens (/ˈɡɛns/ or /ˈdʒɛnz/), plural gentes, was a family consisting of all those individuals who shared the same nomen and claimed descent from a common ancestor. A branch of a gens was called a stirps (plural stirpes). The gens was an important social structure at Rome and throughout Italy during the period of the Roman Republic. Much of an individual's social standing depended on the gens to which he belonged. Certain gentes were considered patrician, others plebeian, while some had both patrician and plebeian branches. The importance of membership in a gens declined considerably in imperial times.[1][2]


The word gens is sometimes translated as "race" or "nation", meaning a people descended from a common ancestor (rather than sharing a common physical trait). It can also be translated as "clan" or "tribe", although the word tribus has a separate and distinct meaning in Roman culture. A gens could be as small as a single family, or could include hundreds of individuals. According to tradition, in 479 BC the gens Fabia alone were able to field a militia consisting of three hundred and six men of fighting age. The concept of the gens was not uniquely Roman, but was shared with communities throughout Italy, including those who spoke Italic languages such as Latin, Oscan, and Umbrian as well as the Etruscans. All of these peoples were eventually absorbed into the sphere of Roman culture.[1][2][3][4]


Persons could be adopted into a gens and acquire its nomen. A libertus, or "freedman", usually assumed the nomen (and sometimes also the praenomen) of the person who had manumitted him, and a naturalized citizen usually took the name of the patron who granted his citizenship. Freedmen and newly enfranchised citizens were not technically part of the gentes whose names they shared, but within a few generations it often became impossible to distinguish their descendants from the original members. In practice this meant that a gens could acquire new members and even new branches, either by design or by accident.[1][2][7]

Ancient Greek personal names: https://en.wikipedia.org/wiki/Ancient_Greek_personal_names
Ancient Greeks usually had one name, but another element was often added in semi-official contexts or to aid identification: a father’s name (patronym) in the genitive case, or in some regions as an adjectival formulation. A third element might be added, indicating the individual’s membership in a particular kinship or other grouping, or city of origin (when the person in question was away from that city). Thus the orator Demosthenes, while proposing decrees in the Athenian assembly, was known as "Demosthenes, son of Demosthenes of Paiania"; Paiania was the deme or regional sub-unit of Attica to which he belonged by birth. If Americans used that system, Abraham Lincoln would have been called "Abraham, son of Thomas of Kentucky" (where he was born). In some rare occasions, if a person was illegitimate or fathered by a non-citizen, they might use their mother's name (metronym) instead of their father's. Ten days after a birth, relatives on both sides were invited to a sacrifice and feast called dekátē (δεκάτη), 'tenth day'; on this occasion the father formally named the child.[3]


In many contexts, etiquette required that respectable women be spoken of as the wife or daughter of X rather than by their own names.[6] On gravestones or dedications, however, they had to be identified by name. Here, the patronymic formula "son of X" used for men might be replaced by "wife of X", or supplemented as "daughter of X, wife of Y".

Many women bore forms of standard masculine names, with a feminine ending substituted for the masculine. Many standard names related to specific masculine achievements had a common feminine equivalent; the counterpart of Nikomachos, "victorious in battle", would be Nikomachē. The taste mentioned above for giving family members related names was one motive for the creation of such feminine forms. There were also feminine names with no masculine equivalent, such as Glykera "sweet one"; Hedistē "most delightful".
wiki  history  iron-age  mediterranean  the-classics  conquest-empire  culture  language  foreign-lang  social-norms  kinship  class  legacy  democracy  status  multi  gender  syntax  protocol-metadata 
august 2018 by nhaliday
parsing - lexers vs parsers - Stack Overflow
Yes, they are very different in theory, and in implementation.

Lexers are used to recognize "words" that make up language elements, because the structure of such words is generally simple. Regular expressions are extremely good at handling this simpler structure, and there are very high-performance regular-expression matching engines used to implement lexers.

Parsers are used to recognize "structure" of a language phrases. Such structure is generally far beyond what "regular expressions" can recognize, so one needs "context sensitive" parsers to extract such structure. Context-sensitive parsers are hard to build, so the engineering compromise is to use "context-free" grammars and add hacks to the parsers ("symbol tables", etc.) to handle the context-sensitive part.

Neither lexing nor parsing technology is likely to go away soon.

They may be unified by deciding to use "parsing" technology to recognize "words", as is currently explored by so-called scannerless GLR parsers. That has a runtime cost, as you are applying more general machinery to what is often a problem that doesn't need it, and usually you pay for that in overhead. Where you have lots of free cycles, that overhead may not matter. If you process a lot of text, then the overhead does matter and classical regular expression parsers will continue to be used.
q-n-a  stackex  programming  compilers  explanation  comparison  jargon  strings  syntax  lexical  automata-languages 
november 2017 by nhaliday
The weirdest people in the world?
Abstract: Behavioral scientists routinely publish broad claims about human psychology and behavior in the world’s top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers – often implicitly – assume that either there is little variation across human populations, or that these “standard subjects” are as representative of the species as any other population. Are these assumptions justified? Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior – hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation. Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity. We close by proposing ways to structurally re-organize the behavioral sciences to best tackle these challenges.
pdf  study  microfoundations  anthropology  cultural-dynamics  sociology  psychology  social-psych  cog-psych  iq  biodet  behavioral-gen  variance-components  psychometrics  psych-architecture  visuo  spatial  morality  individualism-collectivism  n-factor  justice  egalitarianism-hierarchy  cooperate-defect  outliers  homo-hetero  evopsych  generalization  henrich  europe  the-great-west-whale  occident  organizing  🌞  universalism-particularism  applicability-prereqs  hari-seldon  extrema  comparison  GT-101  ecology  EGT  reinforcement  anglo  language  gavisti  heavy-industry  marginal  absolute-relative  reason  stylized-facts  nature  systematic-ad-hoc  analytical-holistic  science  modernity  behavioral-econ  s:*  illusion  cool  hmm  coordination  self-interest  social-norms  population  density  humanity  sapiens  farmers-and-foragers  free-riding  anglosphere  cost-benefit  china  asia  sinosphere  MENA  world  developing-world  neurons  theory-of-mind  network-structure  nordic  orient  signum  biases  usa  optimism  hypocrisy  humility  within-without  volo-avolo  domes 
november 2017 by nhaliday
Latin spelling and pronunciation - Wikipedia
From Solodow's "Latin Alive": Classical Latin (for literature as opposed to common tongue) was formed out of crucible of nativist anxieties

The earliest continuous Latin texts we have date from the late third and early second centuries B.C.E., beginning with the comedies of Plautus. The Latin in these and the other texts that follow them for the next century displays a certain amount of variety, as we might expect: a large and expressive vocabulary, some freedom with genders, declensions, and conjugations, a certain diversity in inflections and syntax. But in the first half of the first century B.C.E.;., this changed quickly and definitively. A group of men set about to find and fix a suitable form for the language. Their goal was to settle the language once and for all, and, in an important sense, they succeeded. These men, of whom the two most familiar are Caesar (100-44 B.C.E.) and Cicero (106-43 B.C.E.), did not constitute an academy of the Latin language, like those established in modern times for French and Spanish. Instead, by their own conscious practice they shaped the language into a form that seemed pure and worthy.

Their concerted effort to give the Latin language a fixed form was driven in part by the linguistic unsettledness and disorder they perceived around them. Language - actual spoken language - perhaps always appears messy to the ears and eyes of some, but at that time and place the messiness may have been very marked. Rome from its beginnings had been a city of immigrants, and the conquests abroad and other social upheavals of the preceding century had brought into the capital a swarm of people who did not speak Latin as their native language or were not familiar with the variety characteristic of the city. Some men consequently feared the disappearance of authentic, correct Latin. In his history of Roman oratory, Cicero links the deplorable linguistic situation of his day with social changes: “In those days [a century earlier] nearly everybody who had lived in this city and not been corrupted by home-bred provincialism spoke correctly. But the passage of time unquestionably changed the situation for the worse, no less at Rome than in Greece. Many people from different places who spoke a debased language poured into Athens and into this city. The language therefore needs to be purified” (Brutus 258).

Another impetus was the recognition that the linguistic situation, if grave, was not irremediable. Here the model of the Greek language played an important part. As Cicero draws a parallel between the problems at Rome and those at Athens, so he and his contemporaries looked to the latter for guidance in finding a solution. The dialect of Athens, known as Attic, which had established itself among the various Greek dialects as the one most prestigious and most suitable for refined speech and writing, had itselfpassed through a period of conscious purification; this purified Attic Greek served the Romans as an example. And at the same time that Attic offered a model to imitate, Greek rhetoricians were extolling the virtues of language that was logical, unambiguous, and otherwise clear.

Goaded by the current unhappy state of Latin and drawn by a vision of how it might be bettered, Caesar, Cicero, and others set about the task of purifying Latin. They shunned rusticitas “rusticity,” anything that smacked of the countryside. They strove for urbanitas “urbanity, refinement,” and in the sphere of language this was synonymous with Latinitas “(genuine) Latin-ness”; this equation is evident in the passage quoted from Cicero, who identifies as the genuine and desirable variety of Latin the one that had been spoken in the city of Rome by native Romans.
language  foreign-lang  mediterranean  the-classics  wiki  reference  history  iron-age  medieval  early-modern  article  howto  tutorial  multi  twitter  social  commentary  quotes  gnon  unaffiliated  right-wing  statesmen  big-peeps  leadership  tribalism  us-them  migration  speaking  linguistics  quixotic  syntax  lexical 
june 2017 by nhaliday
mandarin - Is it easier to learn Chinese after learning Japanese or vice versa? - Chinese Language Stack Exchange
Apart from the Kanji/Hanzi, that they (partly) have in common, concerning the written part, there is nothing that can really help you with the other language:
- Chinese is pretty much SVO, Japanese is SOV;
- Chinese has tones, Japanese has no tones. When speaking, sentences do have a certain "tone", but not phonemic, i.e. it doesn't totally change the meaning;
- Chinese has one writing system (Hanzi), Japanese has 3 (Hiragana, Katakana, Kanji);

q-n-a  stackex  world  foreign-lang  language  china  asia  japan  sinosphere  multi  qra  direct-indirect  syntax 
june 2017 by nhaliday
The goal of the Lean Forward project is to collaborate with number theorists to formally prove theorems about research mathematics and to address the main usability issues hampering the adoption of proof assistants in mathematical circles. The theorems will be selected together with our collaborators to guide the development of formal libraries and verified tools.

mostly happening in the Netherlands


A Review of the Lean Theorem Prover: https://jiggerwit.wordpress.com/2018/09/18/a-review-of-the-lean-theorem-prover/
- Thomas Hales
seems like a Coq might be a better starter if I ever try to get into proof assistants/theorem provers

edit: on second thought this actually seems like a wash for beginners

An Argument for Controlled Natural Languages in Mathematics: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-controlled-natural-languages-in-mathematics/
By controlled natural language for mathematics (CNL), we mean an artificial language for the communication of mathematics that is (1) designed in a deliberate and explicit way with precise computer-readable syntax and semantics, (2) based on a single natural language (such as Chinese, Spanish, or English), and (3) broadly understood at least in an intuitive way by mathematically literate speakers of the natural language.

The definition of controlled natural language is intended to exclude invented languages such as Esperanto and Logjam that are not based on a single natural language. Programming languages are meant to be excluded, but a case might be made for TeX as the first broadly adopted controlled natural language for mathematics.

Perhaps it is best to start with an example. Here is a beautifully crafted CNL text created by Peter Koepke and Steffen Frerix. It reproduces a theorem and proof in Rudin’s Principles of mathematical analysis almost word for word. Their automated proof system is able to read and verify the proof.

research  math  formal-methods  msr  multi  homepage  research-program  skunkworks  math.NT  academia  ux  CAS  mathtariat  expert-experience  cost-benefit  nitty-gritty  review  critique  rant  types  learning  intricacy  functional  performance  c(pp)  ocaml-sml  comparison  ecosystem  DSL  tradeoffs  composition-decomposition  interdisciplinary  europe  germanic  grokkability  nlp  language  heavyweights  inference  rigor  automata-languages  repo  software  tools  syntax  frontier  state-of-art  pls  grokkability-clarity  technical-writing  database  lifts-projections 
january 2016 by nhaliday
Rob Pike: Notes on Programming in C
Issues of typography
Sometimes they care too much: pretty printers mechanically produce pretty output that accentuates irrelevant detail in the program, which is as sensible as putting all the prepositions in English text in bold font. Although many people think programs should look like the Algol-68 report (and some systems even require you to edit programs in that style), a clear program is not made any clearer by such presentation, and a bad program is only made laughable.
Typographic conventions consistently held are important to clear presentation, of course - indentation is probably the best known and most useful example - but when the ink obscures the intent, typography has taken over.


Finally, I prefer minimum-length but maximum-information names, and then let the context fill in the rest. Globals, for instance, typically have little context when they are used, so their names need to be relatively evocative. Thus I say maxphysaddr (not MaximumPhysicalAddress) for a global variable, but np not NodePointer for a pointer locally defined and used. This is largely a matter of taste, but taste is relevant to clarity.


C is unusual in that it allows pointers to point to anything. Pointers are sharp tools, and like any such tool, used well they can be delightfully productive, but used badly they can do great damage (I sunk a wood chisel into my thumb a few days before writing this). Pointers have a bad reputation in academia, because they are considered too dangerous, dirty somehow. But I think they are powerful notation, which means they can help us express ourselves clearly.
Consider: When you have a pointer to an object, it is a name for exactly that object and no other.


A delicate matter, requiring taste and judgement. I tend to err on the side of eliminating comments, for several reasons. First, if the code is clear, and uses good type names and variable names, it should explain itself. Second, comments aren't checked by the compiler, so there is no guarantee they're right, especially after the code is modified. A misleading comment can be very confusing. Third, the issue of typography: comments clutter code.
But I do comment sometimes. Almost exclusively, I use them as an introduction to what follows.


Most programs are too complicated - that is, more complex than they need to be to solve their problems efficiently. Why? Mostly it's because of bad design, but I will skip that issue here because it's a big one. But programs are often complicated at the microscopic level, and that is something I can address here.
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.) For example, binary trees are always faster than splay trees for workaday problems.

Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.

The following data structures are a complete list for almost all practical programs:

linked list
hash table
binary tree
Of course, you must also be prepared to collect these into compound data structures. For instance, a symbol table might be implemented as a hash table containing linked lists of arrays of characters.
Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. (See The Mythical Man-Month: Essays on Software Engineering by F. P. Brooks, page 102.)

Rule 6. There is no Rule 6.

Programming with data.
One of the reasons data-driven programs are not common, at least among beginners, is the tyranny of Pascal. Pascal, like its creator, believes firmly in the separation of code and data. It therefore (at least in its original form) has no ability to create initialized data. This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored-program computer. Code and data are the same, or at least they can be. How else can you explain how a compiler works? (Functional languages have a similar problem with I/O.)

Function pointers
Another result of the tyranny of Pascal is that beginners don't use function pointers. (You can't have function-valued variables in Pascal.) Using function pointers to encode complexity has some interesting properties.
Some of the complexity is passed to the routine pointed to. The routine must obey some standard protocol - it's one of a set of routines invoked identically - but beyond that, what it does is its business alone. The complexity is distributed.

There is this idea of a protocol, in that all functions used similarly must behave similarly. This makes for easy documentation, testing, growth and even making the program run distributed over a network - the protocol can be encoded as remote procedure calls.

I argue that clear use of function pointers is the heart of object-oriented programming. Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The O-O languages give you more of course - prettier syntax, derived types and so on - but conceptually they provide little extra.


Include files
Simple rule: include files should never include include files. If instead they state (in comments or implicitly) what files they need to have included first, the problem of deciding which files to include is pushed to the user (programmer) but in a way that's easy to handle and that, by construction, avoids multiple inclusions. Multiple inclusions are a bane of systems programming. It's not rare to have files included five or more times to compile a single C source file. The Unix /usr/include/sys stuff is terrible this way.
There's a little dance involving #ifdef's that can prevent a file being read twice, but it's usually done wrong in practice - the #ifdef's are in the file itself, not the file that includes it. The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.

Just follow the simple rule.

cf https://stackoverflow.com/questions/1101267/where-does-the-compiler-spend-most-of-its-time-during-parsing
First, I don't think it actually is true: in many compilers, most time is not spend in lexing source code. For example, in C++ compilers (e.g. g++), most time is spend in semantic analysis, in particular in overload resolution (trying to find out what implicit template instantiations to perform). Also, in C and C++, most time is often spend in optimization (creating graph representations of individual functions or the whole translation unit, and then running long algorithms on these graphs).

When comparing lexical and syntactical analysis, it may indeed be the case that lexical analysis is more expensive. This is because both use state machines, i.e. there is a fixed number of actions per element, but the number of elements is much larger in lexical analysis (characters) than in syntactical analysis (tokens).

programming  systems  philosophy  c(pp)  summer-2014  intricacy  engineering  rhetoric  contrarianism  diogenes  parsimony  worse-is-better/the-right-thing  data-structures  list  algorithms  stylized-facts  essay  ideas  performance  functional  state  pls  oop  gotchas  blowhards  duplication  compilers  syntax  lexical  checklists  metabuch  lens  notation  thinking  neurons  guide  pareto  heuristic  time  cost-benefit  multi  q-n-a  stackex  plt  hn  commentary  minimalism  techtariat  rsc  writing  technical-writing  cracker-prog  code-organizing  grokkability  protocol-metadata  direct-indirect  grokkability-clarity  latency-throughput 
august 2014 by nhaliday

bundles : cultureglobesoft

related tags

absolute-relative  abstraction  academia  aggregator  algorithms  alien-character  analogy  analysis  analytical-holistic  anglo  anglosphere  anthropology  api  apollonian-dionysian  applicability-prereqs  arrows  article  asia  atoms  automata-languages  bangbang  bare-hands  behavioral-econ  behavioral-gen  best-practices  biases  big-peeps  bio  biodet  blowhards  books  britain  build-packaging  c(pp)  calculator  canon  CAS  causation  characterization  cheatsheet  checking  checklists  china  class  cocktail  cocoa  code-organizing  cog-psych  commentary  community  comparison  compilers  composition-decomposition  computer-memory  concurrency  config  conquest-empire  constraint-satisfaction  contrarianism  cool  cooperate-defect  coordination  correctness  correlation  cost-benefit  coupling-cohesion  cracker-prog  critique  crosstab  cs  cultural-dynamics  culture  d-lang  data-science  data-structures  database  dataset  dataviz  dbs  debate  deep-learning  degrees-of-freedom  democracy  density  design  developing-world  devtools  differential  dignity  diogenes  direct-indirect  discussion  documentation  domestication  dotnet  DSL  duplication  duty  dynamic  early-modern  earth  ecology  ecosystem  editors  education  egalitarianism-hierarchy  EGT  empirical  engineering  error  error-handling  essay  ethics  europe  evidence-based  evopsych  examples  exocortex  expert-experience  explanans  explanation  extrema  facebook  faq  farmers-and-foragers  flexibility  flux-stasis  foreign-lang  foreign-policy  form-design  formal-methods  free-riding  french  frontier  functional  gavisti  gender  generalization  germanic  git  gnon  golang  good-evil  google  gotchas  grokkability  grokkability-clarity  GT-101  guide  haidt  hari-seldon  haskell  hci  heavy-industry  heavyweights  henrich  heuristic  hg  history  hmm  hn  homepage  homo-hetero  howto  humanity  humility  hypocrisy  ideas  illusion  impetus  incentives  increase-decrease  individualism-collectivism  inference  info-dynamics  init  integration-extension  interdisciplinary  intricacy  ios  iq  iron-age  iteration-recursion  japan  jargon  javascript  justice  jvm  keyboard  kinship  language  latency-throughput  latex  latin-america  leadership  learning  legacy  lens  lexical  libraries  lifts-projections  linear-algebra  linguistics  links  lisp  list  literature  llvm  local-global  logic  machine-learning  marginal  math  math.CA  math.CO  math.NT  mathtariat  measurement  medieval  mediterranean  memetics  memory-management  MENA  metabuch  metal-to-virtual  microfoundations  migration  minimalism  models  modernity  morality  msr  multi  n-factor  nationalism-globalism  nature  network-structure  neurons  nibble  nitty-gritty  nlp  nonlinearity  nordic  notation  numerics  objektbuch  ocaml-sml  occident  oly  oop  optimism  orders  org:med  org:nat  organizing  orient  os  osx  outliers  papers  pareto  parsimony  paste  pdf  people  performance  personality  pessimism  phalanges  philosophy  plots  pls  plt  polynomials  population  pragmatic  prediction  presentation  productivity  programming  project  properties  protocol-metadata  psych-architecture  psychology  psychometrics  python  q-n-a  qra  quixotic  quotes  r-lang  rand-approx  random  rant  reason  recommendations  reddit  reference  reflection  reinforcement  religion  repo  research  research-program  retention  review  rhetoric  right-wing  rigor  roots  rsc  rust  s:*  safety  sanctity-degradation  sapiens  scala  scale  sci-comp  science  scitariat  search  self-interest  sequential  signum  simulation  sinosphere  skunkworks  social  social-norms  social-psych  sociality  society  sociology  software  space  spanish  spatial  speaking  spreading  stackex  state  state-of-art  statesmen  static-dynamic  status  stereotypes  stoic  strings  study  studying  stylized-facts  subjective-objective  sublinear  summary  summer-2014  survey  syntax  systematic-ad-hoc  systems  technical-writing  techtariat  terminal  the-classics  the-great-west-whale  the-self  theory-of-mind  things  thinking  time  tip-of-tongue  tools  top-n  traces  track-record  tradeoffs  trees  trends  tribalism  tricks  trivia  tutorial  twitter  types  unaffiliated  uniqueness  unit  universalism-particularism  unix  us-them  usa  ux  vague  variance-components  vcs  video  visuo  volo-avolo  water  web  whole-partial-many  wiki  wire-guided  within-without  workflow  world  worrydream  worse-is-better/the-right-thing  writing  yak-shaving  🌞  🖥 

Copy this bookmark: