leetcode_187
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Solutions
hash map
hash map with bit representation
Since there are only
4
possible characters in string,20
(2 bit per char) bits is the smallest number of bits that can represent a unique10-char
substring.A 32 bits
integer
is enough.When sliding the string, bits representation of the next substring can be quickly calculated by
shifting
the integer of the current substring.To faciliate the integer-subtring conversion, use
3 bits
representation(char & 7
) instead.A = 0b1000001, B = 0b000010, C = 0b1000011, D = 0b1000100
Or replace the int
value type to bool
.
rolling hash
Last updated
Was this helpful?