Files
2020-08-09 00:39:24 +08:00

25 lines
1.4 KiB
Markdown
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# [187. Repeated DNA Sequences](https://leetcode.com/problems/repeated-dna-sequences/)
## 题目
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
**Example:**
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
## 题目大意
所有 DNA 由一系列缩写为 ACG 和 T 的核苷酸组成例如“ACGAATTCCG”。在研究 DNA 时,识别 DNA 中的重复序列有时会对研究非常有帮助。编写一个函数来查找 DNA 分子中所有出现超多一次的10个字母长的序列子串
## 解题思路
- 这一题不用位运算比较好做,维护一个长度为 10 的字符串,在 map 中出现次数 > 1 就输出。
- 用位运算想做这一题,需要动态的维护长度为 10 的 hashkey先计算开头长度为 9 的 hash在往后面扫描的过程中如果长度超过了 10 ,就移除 hash 开头的一个字符,加入后面一个字符。具体做法是先将 ATCG 变成 00011011 的编码,那么长度为 10 hashkey 就需要维护在 20 位。mask = 0xFFFFF 就是 20 位的。维护了 hashkey 以后,根据这个 hashkey 进行去重和统计频次。