One of the things that surprised me most teaching this week was how useful it was to share my contact sheets - fill…
from twitter_favs
22 days ago
The rain finally let up and a family of red foxes went hunting for food. D850 & 600mm + X2 @ 1/640, f/8,…
from twitter_favs
22 days ago
The main reason not to visit Antarctica: Once you've been, you can't stop thinking about it.
from twitter_favs
6 weeks ago
dgryski/go-simstore: simhash storage and searching
go-simstore: store and search through simhashes

This package is an implementation of section 3 of "Detecting Near-Duplicates for Web Crawling" by Manku, Jain, and Sarma,

simhash is a simple simhashing library.
simstore is the storage and searching logic
simd is a small daemon that wraps simstore and exposes a http /search endpoint
This code is licensed under the MIT license
simhash  golang  github  code  library 
7 weeks ago
The 2K cover we deserve 🙌
from twitter_favs
9 weeks ago
A Neat Trick For Compressing Networked State Data « The blog at the bottom of the sea
For instance, if you had a large struct of data containing information about all the players in the world, projectiles, enemies, and other game objects – and you wanted to send this information to a specific player so their game client could render the world appropriately / do collision detection / etc.

It goes like this:

1) Get the initial state and send it (compressed).
2) When it’s time to send an update to the state, XOR it against the previous state, compress that result and send it.
3) Rinse and repeat.

The magic here is in the assumption that the state as a whole isn’t going to change much from update to update. If this assumption is true, when you do the XOR against the previous state, you are going to end up with a lot of zeroes, which compress very nicely, making for a small data payload.
algorithm  compression  hack  cool-trick  tricks 
9 weeks ago
Search / Information Retrieval Ranking Metrics
Information Retrieval metrics:

Useful Resources:
Learning to Rank for Information Retrieval (Tie-Yan Liu)

- mean_reciprocal_rank
- r_precision
- precision_at_k
- average_precision
- mean_average_precision
- dcg_at_k
- ndcg_at_k
code  python  ranking  ranking-metrics  metrics  dcg  ndcg  cg  rank 
11 weeks ago
Machine Learning Crash Course  |  Google Developers
A self-study guide for aspiring machine learning practitioners
Machine Learning Crash Course features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises.
machinelearning  ai  google  tensorflow  course  video  class  lectures 
12 weeks ago NLP · Practical NLP
Today we’re releasing our paper Universal Language Model Fine-tuning for Text Classification (ULMFiT), pre-trained models, and full source code in the Python programming language. The paper has been peer-reviewed and accepted for presentation at the Annual Meeting of the Association for Computational Linguistics (ACL 2018).

This method dramatically improves over previous approaches to text classification, and the code and pre-trained models allow anyone to leverage this new approach to better solve problems such as:

Finding documents relevant to a legal case;
* Identifying spam, bots, and offensive comments;
* Classifying positive and negative reviews of a product;
* Grouping articles by political orientation;
…and much more.
nlp  deeplearning  machinelearning  python 
may 2018
On Bifunctor IO and Java's Checked Exceptions
The Bifunctor IO data type is a hot topic in the Scala community. In this article however I'm expressing my dislike for it because it shares the same problems as Java's Checked Exceptions.
scala  functional  programming  java  io  bifunctor 
may 2018
C Primer
This is not a theoretical C language specifications document. It is a practical primer for the vast majority of real life cases of C usage that are relevant to EFL on todays common architectures. It covers application executables and shared library concepts and is written from a Linux/UNIX perspective where you would have your code running with an OS doing memory mappings and probably protection for you. It really is fundamentally not much different on Android, iOS, OSX or even Windows.

It won't cover esoteric details of “strange architectures”. It pretty much covers C as a high level assembly language that is portable across a range of modern architectures.
tutorial  programming  reference  c  introduction 
may 2018
google/randen: Fast backtracking-resistant random generator (pending publication).
What if we could default to attack-resistant random generators without excessive CPU cost? We introduce 'Randen', a new generator with security guarantees; it outperforms MT19937 and pcg64_c32 in real-world benchmarks. This is made possible by AES hardware acceleration and a large Feistel permutation.
random  rng  random-number-generator  google  algorithm  security  github  cpp  c++ 
may 2018
narqo/psqr: P-Square Algorithm in Go
Go implementation of The P-Square Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations.

The algorithm is proposed for dynamic calculation of [..] quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored, therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations.
quantile  golang  algorithm  github 
may 2018
The Spa - Divi Theme Examples
The Spa - Divi Theme Examples via @ Theme
from twitter_favs
may 2018
Someone was recently telling me they were a "Code Ninja". I don't want to work with a Ninja, they leave a bloody me…
from twitter_favs
may 2018
phogolabs/parcello: Golang Resource Bundler / Embedder
Parcello is a simple resource manager for Golang that allows embedding assets like SQL, bash scripts and images. That allows easy release management by deploying just a single binary rather than many files.
golang  package  bundle  package-embed  github  assets-embed  assets-packager  assets  deployment 
april 2018
The keepers took Chendra the elephant to visit the sea lions at Oregon zoo before they open, pur…
from twitter_favs
april 2018
RT : You know, I really hate to keep beating a downed zuckerberg, but to the extent that expensive patents indicate corp…
from twitter_favs
april 2018
rqlite/rqlite: The lightweight, distributed relational database built on SQLite.
rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it gracefully handles leader elections, and tolerates failures of machines, including the leader. rqlite is available for Linux, OSX, and Microsoft Windows.


rqlite gives you the functionality of a rock solid, fault-tolerant, replicated relational database, but with very easy installation, deployment, and operation. With it you've got a lightweight and reliable distributed relational data store. Think etcd or Consul, but with relational data modelling also available.

You could use rqlite as part of a larger system, as a central store for some critical relational data, without having to run larger, more complex distributed databases.


rqlite uses Raft to achieve consensus across all the instances of the SQLite databases, ensuring that every change made to the system is made to a quorum of SQLite databases, or none at all. You can learn more about the design here.

Key features

Very easy deployment, with no need to separately install SQLite.
Fully replicated production-grade SQL database.
Production-grade distributed consensus system.
An easy-to-use HTTP(S) API, including leader-redirection and bulk-update support. A CLI is also available, as are various client libraries.
Discovery Service support, allowing clusters to be dynamically created.
Extensive security and encryption support, including node-to-node encryption.
Choice of read consistency levels.
A flavour of transaction support.
Hot backups.
distributed  database  sqlite  golang  raft 
april 2018
CanonicalLtd/dqlite: Distributed SQLite for Go applications
This repository provides the dqlite Go package, which can be used to replicate a SQLite database across a cluster, using the Raft algorithm.

Design higlights

No external processes needed: dqlite is just a Go library, you link it it to your application exactly like you would with SQLite.
Replication needs a SQLite patch which is not yet included upstream.
The Go Raft package from Hashicorp is used internally for replicating the write-ahead log frames of SQLite across all nodes.
How does it compare to rqlite?

The main differences from rqlite are:

Full support for transactions
No need for statements to be deterministic (e.g. you can use time())
Frame-based replication instead of statement-based replication, this means in dqlite there's more data flowing between nodes, so expect lower performance. Should not really matter for most use cases.
database  sqlite  golang  programming  distributed-system 
april 2018
Rank correlation - Wikipedia
A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.
wiki  wikipedia  ranking  rank  rank-correlation  top-k-comparision 
april 2018
Hero Template (golang)
Hero is a handy, fast and powerful go template engine, which pre-compiles the html templtes to go code. It has been used in production environment in
golang  html  template  templating  github  code  pre-compile 
march 2018
buraksezer/consistent: Consistent hashing with bounded loads in Golang
This library provides a consistent hashing function which simultaneously achieves both uniformity and consistency.
golang  hash  hashing  consistent-hashing  google  github  algorithm 
march 2018
Probabilistic Filters By Example: Cuckoo Filter and Bloom Filters
Probablistic filters are high-speed, space-efficient data structures that support set-membership tests with a one-sided error. These filters can claim that a given entry is definitely not represented in a set of entries, or might be represented in the set. That is, negative responses are conclusive, whereas positive responses incur a small false positive probability (FPP).

The trade-off for this one-sided error is space-efficiency. Cuckoo Filters and Bloom Filters require approximately 7 bits per entry at 3% FPP, regardless of the size of the entries. This makes them useful for applictations where the volume of original data makes traditional storage impractical.

Bloom filters have been in use since the 1970s and are well understood. Implementations are widely available. Variants exist that support deletion and counting, though with expanded storage requirements.

Cuckoo filters were described in Cuckoo Filter: Practically Better Than Bloom, a paper by researchers at CMU in 2014. Cuckoo filters improve on Bloom filters by supporting deletion, limited counting, and bounded FPP with similar storage efficiency as a standard Bloom filter.

Below is side-by-side simulation of the inner workings of Cuckoo and Bloom filters.
datastructure  probabilistic  algorithm  bloom-filter  cuckoo-filter  visualization 
march 2018
rs/xid: xid is a globally unique id generator thought for the web
Package xid is a globally unique id generator library, ready to be used safely directly in your server code.

Xid is using Mongo Object ID algorithm to generate globally unique ids with a different serialization (base64) to make it shorter when transported as a string:

4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
The binary representation of the id is compatible with Mongo 12 bytes Object IDs. The string representation is using base32 hex (w/o padding) for better space efficiency when stored in that form (20 bytes). The hex variant of base32 is used to retain the sortable property of the id.

Xid doesn't use base64 because case sensitivity and the 2 non alphanum chars may be an issue when transported as a string between various systems. Base36 wasn't retained either because 1/ it's not standard 2/ the resulting size is not predictable (not bit aligned) and 3/ it would not remain sortable. To validate a base32 xid, expect a 20 chars long, all lowercase sequence of a to v letters and 0 to 9 numbers ([0-9a-v]{20}).

UUIDs are 16 bytes (128 bits) and 36 chars as string representation. Twitter Snowflake ids are 8 bytes (64 bits) but require machine/data-center configuration and/or central generator servers. xid stands in between with 12 bytes (96 bits) and a more compact URL-safe string representation (20 chars). No configuration or central generator server is required so it can be used directly in server's code.
uuid  golang  generator  guid  github  library  code 
march 2018
sharkdp/hyperfine: A command-line benchmarking tool
A command-line benchmarking tool (inspired by bench).

Statistical analysis across multiple runs.
Support for arbitrary shell commands.
Constant feedback about the benchmark progress and current estimates.
Warmup runs can be executed before the actual benchmark.
Cache-clearing commands can be set up before each timing run.
Statistical outlier detection.
Export results to various formats: CSV, JSON, Markdown.
Parameterized benchmarks.
cli  commandline  performance  rust  github  benchmark  tools  console 
march 2018
« earlier      
activerecord admin advice ai ajax algorithm algorithms analysis analytics animation anomaly-detection apache api apple applications architecture articles audio automation bash benchmark bestpractices blog book books browser business c c++ cache chart cheatsheet cli clustering cocoa code color command commandline community comparison compression concurrency concurrent configuration console cs css css3 data database datamining datastructure debugging deeplearning deployment design designpatterns development distributed distributed-architecture distributed-system documentation dom dsl ebook editor education erlang example filesystem flash framework free freeware fun functional gem generator git github go golang google googlecode graph graphics gui guide hack hacking hacks hash hashing hosting howto html html5 http idiom introduction ios iphone java javascript jquery json language layout learning library lifehacks links linux list lists log mac machinelearning macosx management math media memory merb metaprogramming mobile monitoring mysql network networking nlp nosql objective-c oop opensource optimization osx parallel parser patterns performance photography photoshop plugin postgresql presentation probability productivity programming protocol prototype python queue redis reference research resources rest ruby ruby1.9 rubyonrails scala scalability scm scripting search security server shell similarity software spark sql statistics stats storage stream string syntax sysadmin system terminal test testing text time timeseries tips tools tree tricks tutorial tutorials twitter ui unix utilities video visualization web web2.0 webdesign webdev webdevelopment webframework webkit webservices wiki wikipedia windows xml

Copy this bookmark: