Summary: The dynamic time warping (DTW) algorithm is widely used to determine time series similarity search. As DTW has quadratic time complexity, the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. In this paper, we present a parallel approach for DTW on a heterogeneous platform with a graphics processing unit (GPU). In order to exploit fine-grained data-level parallelism, we propose a specific parallel decomposition in DTW. Furthermore, we introduce an optimization technique called diamond tiling to improve the utilization of threads. Results show that our approach substantially reduces computational time.