jm + read-alignment   1

'Highly Sensitive Short Read Mapping with MapReduce'. current state of the art in DNA sequence read-mapping algorithms.
CloudBurst uses well-known seed-and-extend algorithms to map reads to a reference genome. It can map reads with any number of differences or mismatches. [..] Given an exact seed, CloudBurst attempts to extend the alignment into an end-to-end alignment with at most k mismatches or differences by either counting mismatches of the two sequences, or with a dynamic programming algorithm to allow for gaps. CloudBurst uses [Hadoop] to catalog and extend the seeds. In the map phase, the map function emits all length-s k-mers from the reference sequences, and all non-overlapping length-s kmers from the reads. In the shuffle phase, read and reference kmers are brought together. In the reduce phase, the seeds are extended into end-to-end alignments. The power of MapReduce and CloudBurst is the map and reduce functions run in parallel over dozens or hundreds of processors.

JM_SOUGHT -- the next generation ;)
bioinformatics  mapreduce  hadoop  read-alignment  dna  sequencing  sought  antispam  algorithms 
july 2012 by jm

Copy this bookmark: