rtluckie + golang   14

Streaming Regex with io.Reader : golang
I have a small application that scans for credit card numbers on a local system. Because it scans all files on a system regardless of their size, I didn't want to read full files contents into a `[]byte` . My current solution uses a ring buffer and looks at each byte as it is read to see if it matches the pattern. A sort of FSM.This works pretty well but the matching code is so specific to credit card numbers now I'm having trouble extending it to match other types of sensitive data.The easiest approach would be if there were a regex implementation that could take an io.Reader. I've searched and don't think that one exists, but I'd love to know if I missed something.An alternative approach that I considered is to keep using the ring buffer, and then run a regex each time a byte is read. This seems to work OK with https://ift.tt/2MRXgkL since the pastern must always match the first character and it bails out if the first byte doesn't match. This works less well with the regex in the std lib as it runs much slower. Another downside of this approach is that I will need to know a maximum length of input the regex could match in order to configure the ring buffer size. This is less of a problem since most of my patterns will be non-recursive and relatively short.A final approach I considered is to just break the file up into larger chunks to run the full regex against, say 1mb. I would use the reader to fill the buffer and then run the regex against it. I could introduce some overlap of the chunks so fewer matches that span two chunks would be missed.The main reason I'm using a reader is because I'm decompressing files on the fly without having to worry that the decompresses size will fit into memoryIs there a better way of doing this? I like the idea of moving to regex as it makes introducing new patterns much easier but I don't want to introduce false negatives or use significantly more memory. via /r/golang
IFTTT  reddit  golang 
23 days ago by rtluckie

related tags

golang  IFTTT  reddit 

Copy this bookmark:



description:


tags: