How to Get Longest Common Subsequence in Python

  1. Understanding the Longest Common Subsequence
  2. Method 1: Dynamic Programming Approach
  3. Method 2: Recursive Approach with Memoization
  4. Conclusion
  5. FAQ
How to Get Longest Common Subsequence in Python

Finding the longest common subsequence (LCS) between two sequences is a classic problem in computer science, particularly in the fields of bioinformatics, text processing, and version control systems like Git. The LCS is the longest sequence that can be derived from both original sequences by deleting some elements without changing the order of the remaining elements.

In this tutorial, we’ll explore how to implement an efficient algorithm to find the LCS using Python. Whether you’re dealing with strings, lists, or any other sequences, this guide will provide you with the tools to tackle this problem effectively.

Understanding the Longest Common Subsequence

Before diving into the code, it’s essential to grasp the concept of the longest common subsequence. Given two sequences, the LCS is a subsequence that appears in both sequences in the same order. For example, for sequences “ABCBDAB” and “BDCAB”, the LCS is “BCAB”.

The LCS problem can be solved using dynamic programming, which allows us to build a solution incrementally and store intermediate results to avoid redundant calculations. Let’s explore how to implement this in Python.

Method 1: Dynamic Programming Approach

The dynamic programming approach is one of the most efficient methods to find the LCS. It involves creating a 2D table to store lengths of longest common subsequences for substrings of both sequences.

Here’s how you can implement it in Python:

def longest_common_subsequence(seq1, seq2):
    m, n = len(seq1), len(seq2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if seq1[i - 1] == seq2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    lcs_length = dp[m][n]
    
    lcs = []
    while m > 0 and n > 0:
        if seq1[m - 1] == seq2[n - 1]:
            lcs.append(seq1[m - 1])
            m -= 1
            n -= 1
        elif dp[m - 1][n] > dp[m][n - 1]:
            m -= 1
        else:
            n -= 1
            
    return ''.join(reversed(lcs)), lcs_length

# Example usage
seq1 = "ABCBDAB"
seq2 = "BDCAB"
lcs, length = longest_common_subsequence(seq1, seq2)
print("Longest Common Subsequence:", lcs)
print("Length of LCS:", length)

Output:

Longest Common Subsequence: BCAB
Length of LCS: 4

In this code, we first initialize a 2D array dp where dp[i][j] holds the length of the LCS for the substrings seq1[0:i] and seq2[0:j]. We iterate through both sequences, updating the dp table based on whether the characters match or not. Finally, we backtrack through the dp table to construct the LCS string, which we return along with its length.

Method 2: Recursive Approach with Memoization

Another way to solve the LCS problem is through recursion combined with memoization. This method is less efficient than dynamic programming but can be easier to understand for those familiar with recursive functions.

Here’s how you can implement this approach in Python:

def lcs_recursive(seq1, seq2, m, n, memo):
    if m == 0 or n == 0:
        return 0
    
    if (m, n) in memo:
        return memo[(m, n)]
    
    if seq1[m - 1] == seq2[n - 1]:
        memo[(m, n)] = 1 + lcs_recursive(seq1, seq2, m - 1, n - 1, memo)
    else:
        memo[(m, n)] = max(lcs_recursive(seq1, seq2, m - 1, n, memo),
                           lcs_recursive(seq1, seq2, m, n - 1, memo))
    
    return memo[(m, n)]

def longest_common_subsequence_recursive(seq1, seq2):
    memo = {}
    length = lcs_recursive(seq1, seq2, len(seq1), len(seq2), memo)
    return length

# Example usage
seq1 = "ABCBDAB"
seq2 = "BDCAB"
length = longest_common_subsequence_recursive(seq1, seq2)
print("Length of LCS:", length)

Output:

Length of LCS: 4

In this implementation, we use a recursive function lcs_recursive that checks the last characters of the sequences. If they match, we add one to the result and call the function for the remaining characters. If they don’t match, we take the maximum result from either ignoring the last character of seq1 or seq2. The memoization dictionary stores previously computed results for efficiency.

Conclusion

Finding the longest common subsequence in Python can be achieved through various methods, with dynamic programming and recursion being the most common. The dynamic programming approach is generally more efficient, while recursion with memoization offers a more intuitive understanding of the problem. By mastering these techniques, you can tackle a wide range of problems in programming, from text analysis to version control systems.

FAQ

  1. What is the longest common subsequence?
    The longest common subsequence is the longest sequence that can be derived from two sequences by deleting some elements without changing the order of the remaining elements.

  2. Which algorithm is best for finding the LCS?
    The dynamic programming algorithm is generally the most efficient for finding the longest common subsequence, as it avoids redundant calculations.

  3. Can LCS be applied to strings of different lengths?
    Yes, the longest common subsequence algorithm works for strings or sequences of any length.

  4. What are some practical applications of LCS?
    LCS is used in various fields such as bioinformatics for DNA sequencing, text comparison tools, and version control systems like Git.

  5. Is there a simpler method to find LCS?
    While dynamic programming is the most efficient, simpler methods like brute-force can be used for small sequences, though they are not practical for larger datasets.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe
Rohan Timalsina avatar Rohan Timalsina avatar

Rohan is a learner, problem solver, and web developer. He loves to write and share his understanding.

LinkedIn Website