子序列是,一个字符串中的任意字符组成的序列,重点在于,不要求子序列是原字符串的连续序列。
如下例子所示,acg
是abcdefg
的子序列,但不是连续子序列。
abcdefg ==> acg
两个字符串的最大公共子序列的状态转移方程式如下:
dp[i][j]={max{dp[i−1][j],dp[i][j−1]}if s1[i]!=s2[j]dp[i][j]+1if s1[i] =s2[j]dp[i][j]= \begin{cases} max\{dp[i-1][j],dp[i][j-1]\}& \text{if s1[i]!=s2[j]}\\ dp[i][j]+1& \text{if s1[i] =s2[j]} \end{cases}dp[i][j]={max{dp[i−1][j],dp[i][j−1]}dp[i][j]+1if s1[i]!=s2[j]if s1[i] =s2[j]
实现的代码如下所示:
def longestCommonSubsequence(text1: str, text2: str) -> int:
m = len(text1)
n = len(text2)
if m * n == 0:
return 0
dp = [[0]*(n+1) for _ in range(m+1)]
for i in range(1, m+1):
for j in range(1, n+1):
if text1[i-1] != text2[j-1]:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
else:
dp[i][j] = dp[i-1][j-1] + 1
return dp[-1][-1]
最大公共子串
子串则是严格连续的字符串,具体的例子可以如下所示:
abcdefg ==> abc
则可以认为 abc
为abcedfg
的子串,但是aeg
不是。
两个字符串的最大公共子串的状态转移方程为:
dp[i][j]={0if s1[i]!=s2[j]dp[i][j]+1if s1[i] =s2[j]dp[i][j]= \begin{cases} 0& \text{if s1[i]!=s2[j]}\\ dp[i][j]+1& \text{if s1[i] =s2[j]} \end{cases}dp[i][j]={0dp[i][j]+1if s1[i]!=s2[j]if s1[i] =s2[j]
具体的python代码为:
def findLength(A: List[int], B: List[int]) -> int:
m =len(A)
n = len(A)
if m * n == 0:
return 0
max_len = 0
dp = [[0]*(n+1) for _ in range(m+1)]
for i in range(1, m+1):
for j in range(1, n+1):
if A[i-1] != B[j-1]:
dp[i][j] = 0
else:
dp[i][j] = dp[i-1][j-1] + 1
max_len = max(max_len, dp[i][j])
return max_len
编辑距离
编辑距离是针对,两个字符串A,B进行计算将A转变为B所需的最小操作次数。
一次操作可以是插入
,删除
,替换
两个字符串的编辑距离的状态转移方程为:
dp[i][j]={max{dp[i−1][j]+1,dp[i][j−1]+1,dp[i−1][j−1]+1}if s1[i]!=s2[j]max{dp[i−1][j]+1,dp[i][j−1]+1,dp[i−1][j−1]}if s1[i] =s2[j]dp[i][j]= \begin{cases} max\{dp[i-1][j]+1,dp[i][j-1]+1,dp[i-1][j-1]+1\}& \text{if s1[i]!=s2[j]}\\ max\{dp[i-1][j]+1,dp[i][j-1]+1,dp[i-1][j-1]\}& \text{if s1[i] =s2[j]} \end{cases}dp[i][j]={max{dp[i−1][j]+1,dp[i][j−1]+1,dp[i−1][j−1]+1}max{dp[i−1][j]+1,dp[i][j−1]+1,dp[i−1][j−1]}if s1[i]!=s2[j]if s1[i] =s2[j]
具体的python代码如下:
def edit_distance(word1,word2):
n = len(word1)
m = len(word2)
if n*m == 0: #有一个字符串为空
return n+m
d = [[0]*(m+1) for _ in range(n+1)]
for i in range(n+1):
d[i][0] = i
for j in range(m+1):
d[0][j] = j
for i in range(1,n+1):
for j in range(1,m+1):
right = d[i][j-1] + 1 # 插入
down = d[i-1][j] + 1 # 删除
right_down = d[i-1][j-1] # 替换
if word1[i-1] != word2[j-1]:
right_down += 1
d[i][j] = min(right, down, right_down)
return d[n][m]