jm + cs   11

HyperBitBit
jomsdev notes:

'Last year, in the AofA’16 conference Robert Sedgewick proposed a new algorithm for cardinality estimation. Robert Sedgwick is a professor at Princeton with a long track of publications on combinatorial/randomized algorithms. He was a good friend of Philippe Flajolet (creator of Hyperloglog) and HyperBitBit it's based on the same ideas. However, it uses less memory than Hyperloglog and can provide the same results. On practical data, HyperBitBit, for N < 2^64 estimates cardinality within 10% using only 128 + 6 bits.'
algorithms  programming  cs  hyperloglog  estimation  cardinality  counting  hyperbitbit
march 2017 by jm
Jeff Erickson's Algorithms, Etc.
This page contains lecture notes and other course materials for various algorithms classes I have taught at the University of Illinois, Urbana-Champaign. The notes are numbered in the order I cover the material in a typical undergraduate class, wtih notes on more advanced material (indicated by the symbol ♥) interspersed appropriately. [...] In addition to the algorithms notes I have been maintaining since 1999, this page also contains new notes on "Models of Computation", which cover a small subset of the material normally taught in undergraduate courses in formal languages and automata. I wrote these notes for a new junior-level course on "Algorithms and Models of Computation" that Lenny Pitt and I developed, which is now required for all undergraduate computer science and computer engineering majors at UIUC.

Via Tony Finch
via:fanf  book  cs  algorithms  jeff-erickson  uiuc
november 2016 by jm
Control theory meets machine learning
'DB: Is there a difference between how control theorists and machine learning researchers think about robustness and error?

BR: In machine learning, we almost always model our errors as being random rather than worst-case. In some sense, random errors are actually much more benign than worst-case errors. [...] In machine learning, by assuming average-case performance, rather than worst-case, we can design predictive algorithms by averaging out the errors over large data sets. We want to be robust to fluctuations in the data, but only on average. This is much less restrictive than the worst-case restrictions in controls.

DB: So control theory is model-based and concerned with worst case. Machine learning is data based and concerned with average case. Is there a middle ground?

BR: I think there is! And I think there's an exciting opportunity here to understand how to combine robust control and reinforcement learning. Being able to build systems from data alone simplifies the engineering process, and has had several recent promising results. Guaranteeing that these systems won't behave catastrophically will enable us to actually deploy machine learning systems in a variety of applications with major impacts on our lives. It might enable safe autonomous vehicles that can navigate complex terrains. Or could assist us in diagnostics and treatments in health care. There are a lot of exciting possibilities, and that's why I'm excited about how to find a bridge between these two viewpoints.'
control-theory  interviews  machine-learning  ml  worst-case  self-driving-cars  cs
november 2015 by jm
How the NSA Converts Spoken Words Into Searchable Text - The Intercept
This hits the nail on the head, IMO:
To Phillip Rogaway, a professor of computer science at the University of California, Davis, keyword-search is probably the “least of our problems.” In an email to The Intercept, Rogaway warned that “When the NSA identifies someone as ‘interesting’ based on contemporary NLP methods, it might be that there is no human-understandable explanation as to why beyond: ‘his corpus of discourse resembles those of others whom we thought interesting'; or the conceptual opposite: ‘his discourse looks or sounds different from most people’s.' If the algorithms NSA computers use to identify threats are too complex for humans to understand, it will be impossible to understand the contours of the surveillance apparatus by which one is judged.  All that people will be able to do is to try your best to behave just like everyone else.”
privacy  security  gchq  nsa  surveillance  machine-learning  liberty  future  speech  nlp  pattern-analysis  cs
may 2015 by jm
Cuckoo Filters
'In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend “standard” Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set member- ship tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters out-perform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'
algorithms  cs  coding  cuckoo-filters  bloom-filters  sets  data-structures
october 2014 by jm
The Stony Brook Algorithm Repository
This WWW page is intended to serve as a comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms. The problem taxonomy, implementations, and supporting material are all drawn from my [ie. Steven Skiena's] book 'The Algorithm Design Manual'. Since the practical person is more often looking for a program than an algorithm, we provide pointers to solid implementations of useful algorithms, when they are available.
algorithms  reference  coding  steven-skiena  combinatorial  cs
march 2014 by jm
To solve hard problems, you need to use bricolage
In a talk about a neat software component he designed, Bruce Haddon observed that there is no way that the final structure and algorithmic behavior of this component could have been predicted, designed, or otherwise anticipated.
Haddon observed that computer science serves as a source of core ideas: it provides the data structures and algorithms that are the building blocks. Meanwhile, he views software engineering as a useful set of methods to help design reliable software without losing your mind. Yet he points out that neither captures the whole experience.

That’s because much of the work is what Haddon calls hacking, but what others would call bricolage. Simply put, there is much trial and error: we put ideas to together and see where it goes.

This is a great post, and I agree (broadly). IMO, most software engineering requires little CS, but there are occasional moments where a single significant aspect of a project requires a particular algorithm, and would be kludgy, hacky, or over-complex to solve without it.
bricolage  hacking  cs  computer-science  work  algorithms
september 2013 by jm
Beating the CAP Theorem Checklist
'Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment
advocates a way to beat the CAP theorem. Your idea will not work. Here is why
it won't work:'

lovely stuff, via Bill De hOra
via:dehora  funny  cap  cs  distributed-systems  distcomp  networking  partitions  state  checklists
august 2013 by jm
CS in VN
Neil Fraser visits a school in Vietnam, and investigates their computer science curriculum. They are doing an incredible job, it looks like -- very impressive!
vietnam  programming  education  cs  computer-science  schools  coding  children
march 2013 by jm

Copy this bookmark:

description:

tags: