algorithm 19701
Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog
9 hours ago by thadk
Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor applicability to realtime use cases.
bigdata
algorithm
algorithms
datamining
probability
compsci
via:fourshortlinks
9 hours ago by thadk
py-rollingsuperfasthash - A python library implementing a rolling version of Paul Hsieh's SuperFastHash - Google Project Hosting
yesterday by raphman
This is a quick and dirty Python/C rolling hash implementation to generate a series of unsigned integer values given a string of text and a window size. This conversion of strings to integer hash values allows for quick longest-common-substring identification between large sets of documents using Rabin-Karp searching.
string
similarity
plagiarism
hash
python
algorithm
yesterday by raphman
Copy this bookmark: