Live Free or Dichotomize - Using AWK and R to parse 25tb


10 bookmarks. First posted by rlaksana 5 days ago.


How to read this post: I sincerely apologize for how long and rambling the following text is. To speed up skimming of it for those who have better things to do with their time, I have started most sections with a “Lesson learned” blurb that boils down the takeaway from the following text into a sentence or two.
R 
2 days ago by prcleary
Every generation, these techniques are rediscovered. Note how Apache Spark choked out!
datascience  aws  unix  awk  r_language 
5 days ago by mechazoidal
A story of going from Spark, 8mins, and $20 per AWS query, to mostly R+Awk, 0.1s, and $0.0001 per query.

Pre-processing a massive (25tb) amount of DNA(?) data into a format easily queryable on AWS.
rlang  devops  datascience  ml  aws 
5 days ago by davison
Using AWK and R to parse 25tb https://t.co/2jLh9jWvyt

— Richard Laksana (@RichardLaksana) June 10, 2019
saved-twitter 
5 days ago by rlaksana