strings 1249
utf8everywhere
24 days ago by rafaeldff
Purpose of this document -- To promote usage and support of the UTF-8 encoding, to convince that this should be the default choice of encoding for storing text strings in memory or on disk, for communication and all other uses. We believe that all other encodings of Unicode (or text, in general) belong to rare-edge cases of optimization and should be avoided by mainstream users
utf
utf-8
Unicode
utf-16
Utf8
char
charset
character
codepage
ASCII
text
string
strings
programming
windows
encoding
24 days ago by rafaeldff
[1204.3293] Efficiently decoding strings from their shingles
4 weeks ago by Vaguery
"Determining whether an unordered collection of overlapping substrings (called shingles) can be uniquely decoded into a consistent string is a problem that lies within the foundation of a broad assortment of disciplines ranging from networking and information theory through cryptography and even genetic engineering and linguistics. We present three perspectives on this problem: a graph theoretic framework due to Pevzner, an automata theoretic approach from our previous work, and a new insight that yields a time-optimal streaming algorithm for determining whether a string of $n$ characters over the alphabet $Sigma$ can be uniquely decoded from its two-character shingles. Our algorithm achieves an overall time complexity $Theta(n)$ and space complexity $O(|Sigma|)$. As an application, we demonstrate how this algorithm can be extended to larger shingles for efficient string reconciliation."
strings
algorithms
computational-complexity
nudge-targets
4 weeks ago by Vaguery
Copy this bookmark: