leetcode_187

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Solutions

  1. hash map

 straight forward with hash map.
class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        if (s.size() < 10) return vector<string>();
        unordered_map<string, int> m;
        vector<string> res;
        for (int i = 0; i < s.size() - 9; i++) {
            string substr = s.substr(i, 10);
            m[substr]++;
            if (m[substr] == 2) res.push_back(substr);
        }
        return res;
    }
};
  1. hash map with bit representation

  2. Since there are only 4 possible characters in string, 20(2 bit per char) bits is the smallest number of bits that can represent a unique 10-char substring.

  3. A 32 bits integer is enough.

  4. When sliding the string, bits representation of the next substring can be quickly calculated by shifting the integer of the current substring.

  5. To faciliate the integer-subtring conversion, use 3 bits representation(char & 7) instead.

    • A = 0b1000001, B = 0b000010, C = 0b1000011, D = 0b1000100

      class Solution {
      public:
      vector<string> findRepeatedDnaSequences(string s) {
        unordered_map<int, unsigned int> m;
        vector<string> res;
        unsigned int hash = 0;
        for (int i = 0; i < s.size(); i++)
        // hash & 0x3fffffff to make sure the first two bits are always 0.
            if (m[hash = hash << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
                res.push_back(s.substr(i - 9, 10));
        return res;
      }  
      };

Or replace the int value type to bool.

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        unordered_map<int, bool> m;
        vector<string> res;
        unsigned int hash = 0;
        for  (int i = 0; i < s.size(); i++) {
            hash = hash << 3 & 0x3FFFFFFF | s[i] & 7;
            if (m.find(hash) != m.end())
                if (m[hash]) {
                    m[hash] = false;
                    res.push_back(s.substr(i - 9, 10));
                }
            else
                m[hash] = true;
        }
        return res;
    }  
};
  1. rolling hash

Last updated

Was this helpful?