Introduction

In this project, we use OpenCL to implement and benchmark various parallel implementations of the Needleman-Wunsch algorithm to compute edit distances between two sequences of strings.


Definition

An edit is defined as a single-character substitution, insertion, or deletion when comparing two string sequences. An edit distance measures the number of edits needed to go from one string sequence to another.

edits

Application

Efficient computation of edit distances between string sequences has many applications in computational biology, machine translation, speech recognition, and malware detection. For example, we can use edit distances to measure the similarities between two DNA molecules, which typically consist of billions of base pairs. Here the problem is particularly computationally intensive with the standard dynammic programming algorithm.

Our work draws inspiration from the Needleman-Wunsch algorithm. Since each iteration in the algorithm is not computationally intensive by itself, parallel algorithms with GPUs are particularly suited to this problem. We develop several versions of parallel algorithms implemented in OpenCL to compute edit distances between long string sequences.