nhaliday + unix   69

C++ IDE for Linux? - Stack Overflow
A:
- Vim/Emacs + Unix/GNU tools,
- VSCode or Sublime
- CodeLite
- Netbeans
- QT Creator
q-n-a  stackex  programming  c(pp)  devtools  tools  ide  software  recommendations  unix  linux 
11 days ago by nhaliday
Unix philosophy - Wikipedia
1. Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".
2. Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
3. Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them.
4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.
wiki  concept  philosophy  lens  ideas  design  system-design  programming  engineering  systems  unix  subculture  composition-decomposition  coupling-cohesion  metabuch  skeleton  hi-order-bits  summary  list  top-n  quotes  aphorism  minimalism  minimum-viable  best-practices  intricacy  parsimony  protocol-metadata 
28 days ago by nhaliday
Call graph - Wikipedia
I've found both static and dynamic versions useful (former mostly when I don't want to go thru pain of compiling something)

best options AFAICT:

C/C++ and maybe Go: https://github.com/gperftools/gperftools
https://gperftools.github.io/gperftools/cpuprofile.html

static: https://github.com/Vermeille/clang-callgraph
https://stackoverflow.com/questions/5373714/how-to-generate-a-calling-graph-for-c-code
doxygen

Go: https://stackoverflow.com/questions/31362332/creating-call-graph-in-golang
both static and dynamic in one tool

Java: https://github.com/gousiosg/java-callgraph
both static and dynamic in one tool

Python:
https://github.com/gak/pycallgraph
more up-to-date forks: https://github.com/daneads/pycallgraph2 and https://github.com/YannLuo/pycallgraph
old docs: https://pycallgraph.readthedocs.io/en/master/

static: https://github.com/davidfraser/pyan

various: https://github.com/jrfonseca/gprof2dot

I believe all the dynamic tools listed here support weighting nodes and edges by CPU time/samples (inclusive and exclusive of descendants) and discrete calls. In the case of the gperftools and the Java option you probably have to parse the output to get the latter, tho.

IIRC Dtrace has probes for function entry/exit. So that's an option as well.
concept  wiki  reference  tools  devtools  graphs  trees  programming  code-dive  let-me-see  big-picture  libraries  software  recommendations  list  top-n  links  c(pp)  golang  python  javascript  jvm  stackex  q-n-a  howto  yak-shaving  visualization  dataviz  performance  structure  oss  osx  unix  linux  static-dynamic 
8 weeks ago by nhaliday
Errors in Math Functions (The GNU C Library)
https://stackoverflow.com/questions/22259537/guaranteed-precision-of-sqrt-function-in-c-c
For C99, there are no specific requirements. But most implementations try to support Annex F: IEC 60559 floating-point arithmetic as good as possible. It says:

An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex.

And:

The sqrt functions in <math.h> provide the IEC 60559 square root operation.

IEC 60559 (equivalent to IEEE 754) says about basic operations like sqrt:

Except for binary <-> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination's format.

The final step consists of rounding according to several rounding modes but the result must always be the closest representable value in the target precision.

[ed.: The list of other such correctly rounded functions is included in the IEEE-754 standard (which I've put w/ the C1x and C++2x standard drafts) under section 9.2, and it mainly consists of stuff that can be expressed in terms of exponentials (exp, log, trig functions, powers) along w/ sqrt/hypot functions.

Fun fact: this question was asked by Yeputons who has a codeforces profile.]
https://stackoverflow.com/questions/20945815/math-precision-requirements-of-c-and-c-standard
oss  libraries  systems  c(pp)  numerics  documentation  objektbuch  list  linux  unix  multi  q-n-a  stackex  programming  nitty-gritty  sci-comp  accuracy  types  approximation  IEEE  protocol-metadata 
8 weeks ago by nhaliday
history - Why are UNIX/POSIX system call namings so illegible? - Unix & Linux Stack Exchange
It's due to the technical constraints of the time. The POSIX standard was created in the 1980s and referred to UNIX, which was born in the 1970. Several C compilers at that time were limited to identifiers that were 6 or 8 characters long, so that settled the standard for the length of variable and function names.

http://neverworkintheory.org/2017/11/26/abbreviated-full-names.html
We carried out a family of controlled experiments to investigate whether the use of abbreviated identifier names, with respect to full-word identifier names, affects fault fixing in C and Java source code. This family consists of an original (or baseline) controlled experiment and three replications. We involved 100 participants with different backgrounds and experiences in total. Overall results suggested that there is no difference in terms of effort, effectiveness, and efficiency to fix faults, when source code contains either only abbreviated or only full-word identifier names. We also conducted a qualitative study to understand the values, beliefs, and assumptions that inform and shape fault fixing when identifier names are either abbreviated or full-word. We involved in this qualitative study six professional developers with 1--3 years of work experience. A number of insights emerged from this qualitative study and can be considered a useful complement to the quantitative results from our family of experiments. One of the most interesting insights is that developers, when working on source code with abbreviated identifier names, adopt a more methodical approach to identify and fix faults by extending their focus point and only in a few cases do they expand abbreviated identifiers.
q-n-a  stackex  trivia  programming  os  systems  legacy  legibility  ux  libraries  unix  linux  hacker  cracker-prog  multi  evidence-based  empirical  expert-experience  engineering  study  best-practices  comparison  quality  debugging  efficiency  time  code-organizing  grokkability 
9 weeks ago by nhaliday
The End of the Editor Wars » Linux Magazine
Moreover, even if you assume a broad margin of error, the pollings aren't even close. With all the various text editors available today, Vi and Vim continue to be the choice of over a third of users, while Emacs well back in the pack, no longer a competitor for the most popular text editor.

https://www.quora.com/Are-there-more-Emacs-or-Vim-users
I believe Vim is actually more popular, but it's hard to find any real data on it. The best source I've seen is the annual StackOverflow developer survey where 15.2% of developers used Vim compared to a mere 3.2% for Emacs.

Oddly enough, the report noted that "Data scientists and machine learning developers are about 3 times more likely to use Emacs than any other type of developer," which is not necessarily what I would have expected.

[ed. NB: Vim still dominates overall.]

https://pinboard.in/u:nhaliday/b:6adc1b1ef4dc

Time To End The vi/Emacs Debate: https://cacm.acm.org/blogs/blog-cacm/226034-time-to-end-the-vi-emacs-debate/fulltext

Vim, Emacs and their forever war. Does it even matter any more?: https://blog.sourcerer.io/vim-emacs-and-their-forever-war-does-it-even-matter-any-more-697b1322d510
Like an episode of “Silicon Valley”, a discussion of Emacs vs. Vim used to have a polarizing effect that would guarantee a stimulating conversation, regardless of an engineer’s actual alignment. But nowadays, diehard Emacs and Vim users are getting much harder to find. Maybe I’m in the wrong orbit, but looking around today, I see that engineers are equally or even more likely to choose any one of a number of great (for any given definition of ‘great’) modern editors or IDEs such as Sublime Text, Visual Studio Code, Atom, IntelliJ (… or one of its siblings), Brackets, Visual Studio or Xcode, to name a few. It’s not surprising really — many top engineers weren’t even born when these editors were at version 1.0, and GUIs (for better or worse) hadn’t been invented.

...

… both forums have high traffic and up-to-the-minute comment and discussion threads. Some of the available statistics paint a reasonably healthy picture — Stackoverflow’s 2016 developer survey ranks Vim 4th out of 24 with 26.1% of respondents in the development environments category claiming to use it. Emacs came 15th with 5.2%. In combination, over 30% is, actually, quite impressive considering they’ve been around for several decades.

What’s odd, however, is that if you ask someone — say a random developer — to express a preference, the likelihood is that they will favor for one or the other even if they have used neither in anger. Maybe the meme has spread so widely that all responses are now predominantly ritualistic, and represent something more fundamental than peoples’ mere preference for an editor? There’s a rather obvious political hypothesis waiting to be made — that Emacs is the leftist, socialist, centralized state, while Vim represents the right and the free market, specialization and capitalism red in tooth and claw.

How is Emacs/Vim used in companies like Google, Facebook, or Quora? Are there any libraries or tools they share in public?: https://www.quora.com/How-is-Emacs-Vim-used-in-companies-like-Google-Facebook-or-Quora-Are-there-any-libraries-or-tools-they-share-in-public
In Google there's a fair amount of vim and emacs. I would say at least every other engineer uses one or another.

Among Software Engineers, emacs seems to be more popular, about 2:1. Among Site Reliability Engineers, vim is more popular, about 9:1.
--
People use both at Facebook, with (in my opinion) slightly better tooling for Emacs than Vim. We share a master.emacs and master.vimrc file, which contains the bare essentials (like syntactic highlighting for the Hack language). We also share a Ctags file that's updated nightly with a cron script.

Beyond the essentials, there's a group for Emacs users at Facebook that provides tips, tricks, and major-modes created by people at Facebook. That's where Adam Hupp first developed his excellent mural-mode (ahupp/mural), which does for Ctags what iDo did for file finding and buffer switching.
--
For emacs, it was very informal at Google. There wasn't a huge community of Emacs users at Google, so there wasn't much more than a wiki and a couple language styles matching Google's style guides.

https://trends.google.com/trends/explore?date=all&geo=US&q=%2Fm%2F07zh7,%2Fm%2F01yp0m

https://www.quora.com/Why-is-interest-in-Emacs-dropping
And it is still that. It’s just that emacs is no longer unique, and neither is Lisp.

Dynamically typed scripting languages with garbage collection are a dime a dozen now. Anybody in their right mind developing an extensible text editor today would just use python, ruby, lua, or JavaScript as the extension language and get all the power of Lisp combined with vibrant user communities and millions of lines of ready-made libraries that Stallman and Steele could only dream of in the 70s.

In fact, in many ways emacs and elisp have fallen behind: 40 years after Lambda, the Ultimate Imperative, elisp is still dynamically scoped, and it still doesn’t support multithreading — when I try to use dired to list the files on a slow NFS mount, the entire editor hangs just as thoroughly as it might have in the 1980s. And when I say “doesn’t support multithreading,” I don’t mean there is some other clever trick for continuing to do work while waiting on a system call, like asynchronous callbacks or something. There’s start-process which forks a whole new process, and that’s about it. It’s a concurrency model straight out of 1980s UNIX land.

But being essentially just a decent text editor has robbed emacs of much of its competitive advantage. In a world where every developer tool is scriptable with languages and libraries an order of magnitude more powerful than cranky old elisp, the reason to use emacs is not that it lets a programmer hit a button and evaluate the current expression interactively (which must have been absolutely amazing at one point in the past).

https://www.reddit.com/r/emacs/comments/bh5kk7/why_do_many_new_users_still_prefer_vim_over_emacs/

more general comparison, not just popularity:
Differences between Emacs and Vim: https://stackoverflow.com/questions/1430164/differences-between-Emacs-and-vim

https://www.reddit.com/r/emacs/comments/9hen7z/what_are_the_benefits_of_emacs_over_vim/

https://unix.stackexchange.com/questions/986/what-are-the-pros-and-cons-of-vim-and-emacs

https://www.quora.com/Why-is-Vim-the-programmers-favorite-editor
- Adrien Lucas Ecoffet,

Because it is hard to use. Really.

However, the second part of this sentence applies to just about every good editor out there: if you really learn Sublime Text, you will become super productive. If you really learn Emacs, you will become super productive. If you really learn Visual Studio… you get the idea.

Here’s the thing though, you never actually need to really learn your text editor… Unless you use vim.

...

For many people new to programming, this is the first time they have been a power user of… well, anything! And because they’ve been told how great Vim is, many of them will keep at it and actually become productive, not because Vim is particularly more productive than any other editor, but because it didn’t provide them with a way to not be productive.

They then go on to tell their friends how great Vim is, and their friends go on to become power users and tell their friends in turn, and so forth. All these people believe they became productive because they changed their text editor. Little do they realize that they became productive because their text editor changed them[1].

This is in no way a criticism of Vim. I myself was a beneficiary of such a phenomenon when I learned to type using the Dvorak layout: at that time, I believed that Dvorak would help you type faster. Now I realize the evidence is mixed and that Dvorak might not be much better than Qwerty. However, learning Dvorak forced me to develop good typing habits because I could no longer rely on looking at my keyboard (since I was still using a Qwerty physical keyboard), and this has made me a much more productive typist.

Technical Interview Performance by Editor/OS/Language: https://triplebyte.com/blog/technical-interview-performance-by-editor-os-language
[ed.: I'm guessing this is confounded to all hell.]

The #1 most common editor we see used in interviews is Sublime Text, with Vim close behind.

Emacs represents a fairly small market share today at just about a quarter the userbase of Vim in our interviews. This nicely matches the 4:1 ratio of Google Search Trends for the two editors.

...

Vim takes the prize here, but PyCharm and Emacs are close behind. We’ve found that users of these editors tend to pass our interview at an above-average rate.

On the other end of the spectrum is Eclipse: it appears that someone using either Vim or Emacs is more than twice as likely to pass our technical interview as an Eclipse user.

...

In this case, we find that the average Ruby, Swift, and C# users tend to be stronger, with Python and Javascript in the middle of the pack.

...

Here’s what happens after we select engineers to work with and send them to onsites:

[Python does best.]

There are no wild outliers here, but let’s look at the C++ segment. While C++ programmers have the most challenging time passing Triplebyte’s technical interview on average, the ones we choose to work with tend to have a relatively easier time getting offers at each onsite.

The Rise of Microsoft Visual Studio Code: https://triplebyte.com/blog/editor-report-the-rise-of-visual-studio-code
This chart shows the rates at which each editor's users pass our interview compared to the mean pass rate for all candidates. First, notice the preeminence of Emacs and Vim! Engineers who use these editors pass our interview at significantly higher rates than other engineers. And the effect size is not small. Emacs users pass our interview at a rate 50… [more]
news  linux  oss  tech  editors  devtools  tools  comparison  ranking  flux-stasis  trends  ubiquity  unix  increase-decrease  multi  q-n-a  qra  data  poll  stackex  sv  facebook  google  integration-extension  org:med  politics  stereotypes  coalitions  decentralized  left-wing  right-wing  chart  scale  time-series  distribution  top-n  list  discussion  ide  parsimony  intricacy  cost-benefit  tradeoffs  confounding  analysis  crosstab  pls  python  c(pp)  jvm  microsoft  golang  hmm  correlation  debate  critique  quora  contrarianism  ecosystem  DSL 
june 2019 by nhaliday
One week of bugs
If I had to guess, I'd say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It's not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don't change how we write tests. Well, here's a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I'll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn't mean it's hopeless, though.

...

Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.

...

There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

https://hypothesis.works/articles/the-purpose-of-hypothesis/
From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452
- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)
- hardware driving most gains, not software
- software's actually less robust, often poorly designed and overengineered these days
- *list of bugs he's encountered recently*:
https://youtu.be/pW-SOdj4Kkk?t=1387
- knowledge of trivia becomes more than general, deep knowledge
- does at least acknowledge value of DRY, reusing code, abstraction saving dev time
techtariat  dan-luu  tech  software  error  list  debugging  linux  github  robust  checking  oss  troll  lol  aphorism  webapp  email  google  facebook  games  julia  pls  compilers  communication  mooc  browser  rust  programming  engineering  random  jargon  formal-methods  expert-experience  prof  c(pp)  course  correctness  hn  commentary  video  presentation  carmack  pragmatic  contrarianism  pessimism  sv  unix  rhetoric  critique  worrydream  hardware  performance  trends  multiplicative  roots  impact  comparison  history  iron-age  the-classics  mediterranean  conquest-empire  gibbon  technology  the-world-is-just-atoms  flux-stasis  increase-decrease  graphics  hmm  idk  systems  os  abstraction  intricacy  worse-is-better/the-right-thing  build-packaging  microsoft  osx  apple  reflection  assembly  things  knowledge  detail-architecture  thick-thin  trivia  info-dynamics  caching  frameworks  generalization  systematic-ad-hoc  universalism-particularism  analytical-holistic  structure  tainter  libraries  tradeoffs  prepping  threat-modeling  network-structure  writing  risk  local-glob 
may 2019 by nhaliday
unix - How can I profile C++ code running on Linux? - Stack Overflow
If your goal is to use a profiler, use one of the suggested ones.

However, if you're in a hurry and you can manually interrupt your program under the debugger while it's being subjectively slow, there's a simple way to find performance problems.

Just halt it several times, and each time look at the call stack. If there is some code that is wasting some percentage of the time, 20% or 50% or whatever, that is the probability that you will catch it in the act on each sample. So that is roughly the percentage of samples on which you will see it. There is no educated guesswork required. If you do have a guess as to what the problem is, this will prove or disprove it.

You may have multiple performance problems of different sizes. If you clean out any one of them, the remaining ones will take a larger percentage, and be easier to spot, on subsequent passes. This magnification effect, when compounded over multiple problems, can lead to truly massive speedup factors.

Caveat: Programmers tend to be skeptical of this technique unless they've used it themselves. They will say that profilers give you this information, but that is only true if they sample the entire call stack, and then let you examine a random set of samples. (The summaries are where the insight is lost.) Call graphs don't give you the same information, because they don't summarize at the instruction level, and
they give confusing summaries in the presence of recursion.
They will also say it only works on toy programs, when actually it works on any program, and it seems to work better on bigger programs, because they tend to have more problems to find. They will say it sometimes finds things that aren't problems, but that is only true if you see something once. If you see a problem on more than one sample, it is real.

http://poormansprofiler.org/

gprof, Valgrind and gperftools - an evaluation of some tools for application level CPU profiling on Linux: http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
gprof is the dinosaur among the evaluated profilers - its roots go back into the 1980’s. It seems it was widely used and a good solution during the past decades. But its limited support for multi-threaded applications, the inability to profile shared libraries and the need for recompilation with compatible compilers and special flags that produce a considerable runtime overhead, make it unsuitable for using it in today’s real-world projects.

Valgrind delivers the most accurate results and is well suited for multi-threaded applications. It’s very easy to use and there is KCachegrind for visualization/analysis of the profiling data, but the slow execution of the application under test disqualifies it for larger, longer running applications.

The gperftools CPU profiler has a very little runtime overhead, provides some nice features like selectively profiling certain areas of interest and has no problem with multi-threaded applications. KCachegrind can be used to analyze the profiling data. Like all sampling based profilers, it suffers statistical inaccuracy and therefore the results are not as accurate as with Valgrind, but practically that’s usually not a big problem (you can always increase the sampling frequency if you need more accurate results). I’m using this profiler on a large code-base and from my personal experience I can definitely recommend using it.
q-n-a  stackex  programming  engineering  performance  devtools  tools  advice  checklists  hacker  nitty-gritty  tricks  lol  multi  unix  linux  techtariat  analysis  comparison  recommendations  software  measurement  oly-programming  concurrency  debugging  metabuch 
may 2019 by nhaliday
man page - Wikipedia
NAME
The name of the command or function, followed by a one-line description of what it does.
SYNOPSIS
In the case of a command, a formal description of how to run it and what command line options it takes. For program functions, a list of the parameters the function takes and which header file contains its definition.
DESCRIPTION
A textual description of the functioning of the command or function.
EXAMPLES
Some examples of common usage.
SEE ALSO
A list of related commands or functions.
explanation  programming  engineering  documentation  howto  terminal  unix  wiki  reference  cheatsheet  trivia  info-foraging 
september 2017 by nhaliday
bash - Queue up commands while one command is being executed - Unix & Linux Stack Exchange
Press Ctrl+Z and immediately run bg. This causes the current command to keep running in the background. Then you can use fg && otherCommand to schedule otherCommand after the current one.

To make this easier, I've configured Ctrl+Z in my shell to run bg when I press it on an empty command line. See In zsh, how can I more quickly disown the foreground process? and How do you send command line apps directly to the background? ; I haven't checked if modern versions of bash make it easy to do the same.
q-n-a  stackex  unix  terminal  howto  workflow  short-circuit  tip-of-tongue 
september 2017 by nhaliday
How to extract files to another directory using 'tar' command? - Ask Ubuntu
Combining the previous answers and comments:

To simply extract the contents and create target directory if it is missing:

mkdir -p /target/directory && tar xf archive.tar -C /target/directory
To extract and also remove the root(first level) directory in the zip

mkdir -p /target/directory && tar xf archive.tar -C /target/directory --strip-components=1
q-n-a  stackex  howto  terminal  unix  yak-shaving  workflow  intricacy 
september 2017 by nhaliday
bash - How to find/replace and increment a matched number with sed/awk? - Stack Overflow
/e allows you to pass matched part to external command, and do substitution with the execution result. Gnu sed only.
why you need to get first and last part of lines: https://unix.stackexchange.com/questions/180783/sed-e-and-g-flags-not-working-together
That is a bit tortuously written. What it means is that, after the completion of a s/// command for this line, if there was a change, the (new) line is executed as a command and its output used as the replacement for this line.

example of what I had to do to get this to work w/ embedded quotes:
gsed -E 's/^\("(.*)", ([0-9]+)(.*)/echo "(\\\\"\1\\\\", $((\2+54))\3"/e'
maps ("foo", 3... -> ("foo", 57..
q-n-a  stackex  programming  howto  terminal  unix  yak-shaving  multi  gotchas  DSL 
september 2017 by nhaliday
linux - How do I replace the last occurrence of a character in a string using sed? - Unix & Linux Stack Exchange
You can do it with single command:

sed 's/\(.*\)-/\1 /'
The point is that sed is very greedy, so matches as many characters before - as possible, including others -.
q-n-a  stackex  howto  workflow  yak-shaving  terminal  unix  programming  DSL 
august 2017 by nhaliday
unix - How to split a delimited string into an array in awk? - Stack Overflow
To split a string to an array in awk we use the function split():

awk '{split($0, a, ":")}'
# ^^ ^ ^^^
# | | |
# string | delimiter
# |
# array to store the pieces
If no separator is given, it uses the FS, which defaults to the space:

$ awk '{split($0, a); print a[2]}' <<< "a:b c:d e"
c:d
q-n-a  stackex  programming  howto  yak-shaving  terminal  unix  workflow  DSL 
august 2017 by nhaliday
Where are my iBooks stored in macOS Sierra? - Ask Different
example for finding mentions of a string:
<go to that direction>
pt -c 'foobar' | awk -F: 'function dir(path) {sub("/.*", "", path); return path} {a[dir($1)]+=$2} END{for (k in a) {print a[k], k}}' | sort -nr
now wrapped up in a script: ~/bin/ibooks_mentions
q-n-a  stackex  workflow  yak-shaving  integration-extension  studying  sleuthin  info-foraging  osx  desktop  multi  terminal  unix  howto 
august 2017 by nhaliday
What is the best way to parse command-line arguments with Python? - Quora
- Anders Kaseorg

Use the standard optparse library.

It’s important to uphold your users’ expectation that your utility will parse arguments in the same way as every other UNIX utility. If you roll your own parsing code, you’ll almost certainly break that expectation in obvious or subtle ways.

Although the documentation claims that optparse has been deprecated in favor of argparse, which supports more features like optional option arguments and configurable prefix characters, I can’t recommend argparse until it’s been fixed to parse required option arguments in the standard UNIX way. Currently, argparse uses an unexpected heuristic which may lead to subtle bugs in other scripts that call your program.

consider also click (which uses the optparse behavior)
q-n-a  qra  oly  best-practices  programming  terminal  unix  python  libraries  gotchas  howto  pls  yak-shaving  integration-extension  protocol-metadata 
august 2017 by nhaliday
Dgsh – Directed graph shell | Hacker News
I've worked with and looked at a lot of data processing helpers. Tools, that try to help you build data pipelines, for the sake of performance, reproducibility or simply code uniformity.
What I found so far: Most tools, that invent a new language or try to cram complex processes into lesser suited syntactical environments are not loved too much.

...

I'll give dgsh a try. The tool reuse approach and the UNIX spirit seems nice. But my initial impression of the "C code metrics" example from the site is mixed: It reminds me of awk, about which one of the authors said, that it's a beautiful language, but if your programs getting longer than hundred lines, you might want to switch to something else.

Two libraries which have a great grip at the plumbing aspect of data processing systems are airflow and luigi. They are python libraries and with it you have a concise syntax and basically all python libraries plus non-python tools with a command line interface at you fingertips.

I am curious, what kind of process orchestration tools people use and can recommend?

--

Exactly our experience too, from complex machine learning workflows in various aspects of drug discovery.
We basically did not really find any of the popular DSL-based bioinformatics pipeline tools (snakemake, bpipe etc) to fit the bill. Nextflow came close, but in fact allows quite some custom code too.

What worked for us was to use Spotify's Luigi, which is a python library rather than DSL.

The only thing was that we had to develop a flow-based inspired API on top of Luigi's more functional programming based one, in order to make defining dependencies fluent and easy enough to specify for our complex workflows.

Our flow-based inspired Luigi API (SciLuigi) for complex workflows, is available at:

https://github.com/pharmbio/sciluigi

--

We have measured many of the examples against the use of temporary files and the web report one against (single-threaded) implementations in Perl and Java. In almost all cases dgsh takes less wall clock time, but often consumes more CPU resources.
commentary  project  programming  terminal  worrydream  pls  plt  unix  hn  graphs  tools  devtools  let-me-see  composition-decomposition  yak-shaving  workflow  exocortex  hmm  cool  software  desktop  sci-comp  stock-flow  performance  comparison  links  libraries  python 
january 2017 by nhaliday

bundles : techie

related tags

abstraction  academia  accretion  accuracy  acm  advanced  advice  age-generation  aging  algorithmic-econ  algorithms  analysis  analytical-holistic  aphorism  app  apple  applicability-prereqs  applications  approximation  art  assembly  backup  bangbang  benchmarks  best-practices  big-picture  blog  books  browser  build-packaging  business  c(pp)  caching  career  carmack  CAS  chart  cheatsheet  checking  checklists  civilization  cloud  coalitions  code-dive  code-organizing  collaboration  commentary  communication  comparison  competition  compilers  complex-systems  complexity  composition-decomposition  computation  concept  concurrency  config  confluence  confounding  conquest-empire  constraint-satisfaction  contest  contrarianism  cool  coordination  correctness  correlation  cost-benefit  coupling-cohesion  course  cracker-prog  critique  crosstab  crypto  cs  culture  dan-luu  dark-arts  data  data-science  data-structures  dataviz  dbs  debate  debugging  decentralized  degrees-of-freedom  design  desktop  detail-architecture  devops  devtools  differential  diogenes  discussion  distributed  distribution  documentation  dotnet  DSL  duplication  ecosystem  editors  efficiency  electromag  elegance  email  embedded  empirical  engineering  erik-demaine  erlang  error  essay  evidence-based  exocortex  expert-experience  explanation  extrema  facebook  flux-stasis  formal-methods  frameworks  functional  game-theory  games  generalization  gibbon  git  github  golang  google  gotchas  graphics  graphs  grokkability  ground-up  guide  hacker  hardware  haskell  hg  hi-order-bits  history  hmm  hn  homepage  howto  ide  ideas  idk  IEEE  impact  increase-decrease  info-dynamics  info-foraging  init  integration-extension  interface  intricacy  iron-age  iteration-recursion  jargon  javascript  jobs  julia  jvm  keyboard  keyboards  knowledge  learning-theory  left-wing  legacy  legibility  lens  let-me-see  libraries  linear-algebra  links  linux  lisp  list  local-global  lol  machine-learning  management  math  math.CA  math.CO  math.NT  measure  measurement  mechanism-design  mediterranean  memes(ew)  metabuch  metal-to-virtual  metrics  microsoft  minimalism  minimum-viable  models  mooc  multi  multiplicative  network-structure  networking  news  nibble  nitty-gritty  nostalgia  numerics  objektbuch  ocaml-sml  oly  oly-programming  oop  org:junk  org:med  organization  os  oss  osx  p:null  p:someday  packaging  parsimony  paste  pdf  people  performance  pessimism  philosophy  plan9  plots  pls  plt  poast  politics  poll  polynomials  pragmatic  prepping  presentation  probability  problem-solving  productivity  prof  profile  programming  project  properties  protocol-metadata  python  q-n-a  qra  quality  quixotic  quora  quotes  r-lang  rand-approx  random  ranking  reading  recommendations  recruiting  reddit  reference  reflection  research  responsibility  retrofit  rhetoric  right-wing  risk  robust  roots  rsc  ruby  rust  s:***  safety  scala  scale  scaling-tech  sci-comp  search  security  shipping  short-circuit  skeleton  sleuthin  slides  social  software  stackex  startups  state  static-dynamic  stereotypes  stock-flow  stream  strings  structure  study  studying  subculture  summary  summer-2014  sv  syntax  system-design  systematic-ad-hoc  systems  tainter  tcs  tech  tech-infrastructure  technology  techtariat  terminal  the-classics  the-world-is-just-atoms  thick-thin  things  thinking  threat-modeling  time  time-complexity  time-series  tip-of-tongue  tools  top-n  traces  track-record  trade  tradeoffs  trees  trends  tricks  trivia  troll  truth  tutorial  twitter  types  ubiquity  uniqueness  unit  universalism-particularism  unix  ux  vcs  video  visualization  volo-avolo  webapp  white-paper  whole-partial-many  wiki  workflow  working-stiff  worrydream  worse-is-better/the-right-thing  writing  yak-shaving  🖥 

Copy this bookmark:



description:


tags: