jm + files   7

Files Are Hard
This is basically terrifying. A catalog of race conditions and reliability horrors around the POSIX filesystem abstraction in Linux -- it's a wonder anything works.

'Where’s this documented? Oh, in some mailing list post 6-8 years ago (which makes it 12-14 years from today). The fs devs whose posts I’ve read are quite polite compared to LKML’s reputation, and they generously spend a lot of time responding to basic questions, but it’s hard for outsiders to troll [sic] through a decade and a half of mailing list postings to figure out which ones are still valid and which ones have been obsoleted! I don’t mean to pick on filesystem devs. In their OSDI 2014 talk, the authors of the paper we’re discussing noted that when they reported bugs they’d found, developers would often respond “POSIX doesn’t let filesystems do that”, without being able to point to any specific POSIX documentation to support their statement. If you’ve followed Kyle Kingsbury’s Jepsen work, this may sound familiar, except devs respond with “filesystems don’t do that” instead of “networks don’t do that”.I think this is understandable, given how much misinformation is out there. Not being a filesystem dev myself, I’d be a bit surprised if I don’t have at least one bug in this post.'
filesystems  linux  unix  files  operating-systems  posix  fsync  osdi  papers  reliability 
december 2015 by jm
The problem of managing schemas
Good post on the pain of using CSV/JSON as a data interchange format:
eventually, the schema changes. Someone refactors the code generating the JSON and moves fields around, perhaps renaming few fields. The DBA added new columns to a MySQL table and this reflects in the CSVs dumped from the table. Now all those applications and scripts must be modified to handle both file formats. And since schema changes happen frequently, and often without warning, this results in both ugly and unmaintainable code, and in grumpy developers who are tired of having to modify their scripts again and again.
schema  json  avro  protobuf  csv  data-formats  interchange  data  hadoop  files  file-formats 
november 2014 by jm
Friends don't let friends use mmap(2)
Rather horrific update from the trenches of Mozilla
mozilla  mmap  performance  linux  io  files  memory  unix  windows 
may 2014 by jm
LevelDB Benchmarks
nice results, particularly for sequential ops. will be a Riak backend vs InnoDB
leveldb  riak  databases  files  disk  google  storage  benchmarks 
july 2011 by jm
SoundCloud Developers Manifesto
'We recognize that only through your apps and hacks, can SoundCloud fully realize its potential as the audio platform.'
apps  hacks  soundcloud  mp3  music  hosting  files  json  rest  oauth  apis  http  from delicious
may 2010 by jm
filemap
'File-based, rather than tuple-based processing'; based around UNIX command-line toolset; good UNIXish UI; lots of caching of intermediate results; low setup overhead -- although it does require a shared POSIX filesystem, e.g. NFS, for synchronization
networking  python  opensource  grid  map-reduce  filemap  files  unix  command-line  parallel  distcomp 
july 2009 by jm

Copy this bookmark:



description:


tags: