At the cost of security everywhere, Google dorking is still a thing | Ars Technica

27 days ago by jm

I'd never heard of this term!

dorking
google
security
searching
web
27 days ago by jm

The Bkd Tree

january 2016 by jm

good explanation of this new data structure for searching multidimensional data

search
lucene
bkd-trees
searching
data-structures
january 2016 by jm

Efficient substring searching

march 2014 by jm

This is a couple of years old, but I like this:

A good demo of how large values of O(n) can be slower than small values of O(mn).

algorithms
search
strings
coding
big-o
string-search
searching
Turbo Boyer-Moore is disappointing, its name doesn’t do it justice. In academia constant overhead doesn’t matter, but here we see that it matters a lot in practice. Turbo Boyer-Moore’s inner loop is so complex that we think we’re better off using the original Boyer-Moore.

A good demo of how large values of O(n) can be slower than small values of O(mn).

march 2014 by jm

How the search for flight AF447 used Bayesian inference

march 2014 by jm

Via jgc, the search for the downed Air France flight was optimized using this technique:

'Metron’s approach to this search planning problem is rooted in classical Bayesian inference,

which allows organization of available data with associated uncertainties and computation of the

Probability Distribution Function (PDF) for target location given these data. In following this

approach, the first step was to gather the available information about the location of the impact site

of the aircraft. This information was sometimes contradictory and filled with ambiguities and

uncertainties. Using a Bayesian approach we organized this material into consistent scenarios,

quantified the uncertainties with probability distributions, weighted the relative likelihood of each

scenario, and performed a simulation to produce a prior PDF for the location of the wreck.'

metron
bayes
bayesian-inference
machine-learning
statistics
via:jgc
air-france
disasters
probability
inference
searching
'Metron’s approach to this search planning problem is rooted in classical Bayesian inference,

which allows organization of available data with associated uncertainties and computation of the

Probability Distribution Function (PDF) for target location given these data. In following this

approach, the first step was to gather the available information about the location of the impact site

of the aircraft. This information was sometimes contradictory and filled with ambiguities and

uncertainties. Using a Bayesian approach we organized this material into consistent scenarios,

quantified the uncertainties with probability distributions, weighted the relative likelihood of each

scenario, and performed a simulation to produce a prior PDF for the location of the wreck.'

march 2014 by jm

feedback loop n-gram analyzer

september 2011 by jm

'a simple parser of ARF compliant FBL complaints, which normalizes the email complaints and generates a 6-tuple n-gram version of the message. These n-grams are stored in a Redis database, keyed by the file in which they can be found. An inverse index also exists that allow you to find all messages containing a particular n-gram word.'

anti-spam
spam
fbl
feedback
filtering
n-grams
similarity
hashing
redis
searching
september 2011 by jm

Dutch grepping Facebook for welfare fraud

september 2011 by jm

'The [Dutch] councils are working with a specialist Amsterdam research firm, using the type of computer software previously deployed only in counterterrorism, monitoring [LinkedIn, Facebook and Twitter] traffic for keywords and cross-referencing any suspicious information with digital lists of social welfare recipients.

Among the giveaway terms, apparently, are “holiday” and “new car”. If the automated software finds a match between one of these terms and a person claiming social welfare payments, the information is passed on to investigators to gather real-life evidence.' With a 30% false positive rate, apparently -- let's hope those investigations aren't too intrusive!

grep
dutch
holland
via:tjmcintyre
privacy
facebook
twitter
linkedin
welfare
dole
fraud
false-positives
searching
Among the giveaway terms, apparently, are “holiday” and “new car”. If the automated software finds a match between one of these terms and a person claiming social welfare payments, the information is passed on to investigators to gather real-life evidence.' With a 30% false positive rate, apparently -- let's hope those investigations aren't too intrusive!

september 2011 by jm

**related tags**

Copy this bookmark: