nhaliday + hashing   23

Anti-hash test. - Codeforces
- Thue-Morse sequence
- nice paper: http://www.mii.lt/olympiads_in_informatics/pdf/INFOL119.pdf
In general, polynomial string hashing is a useful technique in construction of efficient string algorithms. One simply needs to remember to carefully select the modulus M and the variable of the polynomial p depending on the application. A good rule of thumb is to pick both values as prime numbers with M as large as possible so that no integer overflow occurs and p being at least the size of the alphabet.
2.2. Upper Bound on M
[stuff about 32- and 64-bit integers]
2.3. Lower Bound on M
On the other side Mis bounded due to the well-known birthday paradox: if we consider a collection of m keys with m ≥ 1.2√M then the chance of a collision to occur within this collection is at least 50% (assuming that the distribution of fingerprints is close to uniform on the set of all strings). Thus if the birthday paradox applies then one needs to choose M=ω(m^2)to have a fair chance to avoid a collision. However, one should note that not always the birthday paradox applies. As a benchmark consider the following two problems.

I generally prefer to use Schwartz-Zippel to reason about collision probabilities w/ this kind of thing, eg, https://people.eecs.berkeley.edu/~sinclair/cs271/n3.pdf.

A good way to get more accurate results: just use multiple primes and the Chinese remainder theorem to get as large an M as you need w/o going beyond 64-bit arithmetic.

more on this: https://codeforces.com/blog/entry/60442
oly  oly-programming  gotchas  howto  hashing  algorithms  strings  random  best-practices  counterexample  multi  pdf  papers  nibble  examples  fields  polynomials  lecture-notes  yoga  probability  estimate  magnitude  hacker  adversarial  CAS  lattice  discrete 
8 days ago by nhaliday
Merkle tree - Wikipedia
In cryptography and computer science, a hash tree or Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes.
concept  cs  data-structures  bitcoin  cryptocurrency  blockchain  atoms  protocol  wiki  reference  nibble  hashing  ideas  crypto  rigorous-crypto 
june 2017 by nhaliday
MinHash - Wikipedia
- goal: compute Jaccard coefficient J(A, B) = |A∩B| / |A∪B| in sublinear space
- idea: pick random injective hash function h, define h_min(S) = argmin_{x in S} h(x), and note that Pr[h_min(A) = h_min(B)] = J(A, B)
- reduce variance w/ Chernoff bound
algorithms  data-structures  sublinear  hashing  wiki  reference  random  tcs  nibble  measure  metric-space  metrics  similarity  PAC  intersection  intersection-connectedness 
february 2017 by nhaliday
Count–min sketch - Wikipedia
- estimates frequency vector (f_i)
- idea:
d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))
d*w counters a[r, c]
for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]
estimate for f_i is min_j a[j, h_j(i)]
- never underestimates but upward-biased
- pf: Markov to get constant probability of success, then exponential decrease with repetition
lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf
- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.
algorithms  data-structures  sublinear  hashing  wiki  reference  bias-variance  approximation  random  tcs  multi  stanford  lecture-notes  pdf  tim-roughgarden  nibble  pigeonhole-markov  PAC 
february 2017 by nhaliday

bundles : abstracttcs

related tags

acm  advanced  adversarial  algorithms  amortization-potential  analysis  ankur-moitra  applications  approximation  atoms  average-case  best-practices  bias-variance  bio  bioinformatics  bitcoin  bits  blockchain  books  c(pp)  caching  carmack  CAS  circuits  coding-theory  communication-complexity  complexity  compressed-sensing  computational-geometry  computer-memory  concentration-of-measure  concept  contiguity-proximity  contrarianism  convexity-curvature  counterexample  counting  course  crypto  cryptocurrency  cs  curvature  data-structures  decentralized  decision-theory  differential-privacy  dimensionality  discovery  discrete  distributed  divide-and-conquer  duality  duplication  elegance  embedding  embeddings  engineering  ensembles  entropy-like  erik-demaine  estimate  examples  expanders  expert  expert-experience  explanation  exploratory  exposition  features  fields  flux-stasis  fourier  frontier  game-theory  generalization  genetics  genomics  geometry  gotchas  gradient-descent  graph-theory  graphs  greedy  guide  GWAS  gwern  hacker  hashing  heuristic  high-dimension  history  homepage  howto  huge-data-the-biggest  ideas  information-theory  init  innovation  interdisciplinary  intersection  intersection-connectedness  iterative-methods  jvm  latency-throughput  latent-variables  lattice  lecture-notes  lectures  levers  libraries  linear-algebra  linear-programming  linearity  liner-notes  links  lower-bounds  magnitude  markov  matrix-factorization  measure  metabuch  methodology  metric-space  metrics  mihai  minimum-viable  mit  monte-carlo  multi  narrative  nibble  nitty-gritty  norms  novelty  oly  oly-programming  online-learning  optimization  orders  org:bleg  p2p  p:*  p:**  p:***  p:someday  PAC  papers  pdf  perturbation  pigeonhole-markov  polynomials  pragmatic  princeton  probability  programming  protocol  q-n-a  qra  quixotic  rand-approx  random  random-matrices  random-networks  ratty  reduction  reference  reflection  regularization  research  research-program  rhetoric  rigorous-crypto  roots  rounding  sampling  sanjeev-arora  scaling-tech  SDP  sequential  shipping  similarity  slides  software  space-complexity  sparsity  spatial  spectral  stackex  stanford  state  stock-flow  stories  strings  sublinear  submodular  synthesis  systems  szabo  talks  tcs  tcstariat  techtariat  texas  the-prices  the-trenches  tidbits  tim-roughgarden  time  toolkit  topics  trees  tricks  tutorial  unit  valiant  video  web  wiki  worse-is-better/the-right-thing  yak-shaving  yoga  👳 

Copy this bookmark: