algorithm   19701

« earlier    

Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog
Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor applicability to realtime use cases.
bigdata  algorithm  algorithms  datamining  probability  compsci  via:fourshortlinks 
9 hours ago by thadk
py-rollingsuperfasthash - A python library implementing a rolling version of Paul Hsieh's SuperFastHash - Google Project Hosting
This is a quick and dirty Python/C rolling hash implementation to generate a series of unsigned integer values given a string of text and a window size. This conversion of strings to integer hash values allows for quick longest-common-substring identification between large sets of documents using Rabin-Karp searching.
string  similarity  plagiarism  hash  python  algorithm 
yesterday by raphman
MoFR
Method of Four Russians
algorithm  datastructure 
2 days ago by tabito

« earlier    

Copy this bookmark:



description:


tags: