Parallel sparse LU decomposition using FPGA with an efficient cache architecture
X Ge, H Zhu, F Yang, L Wang… - 2017 IEEE 12th …, 2017 - ieeexplore.ieee.org
X Ge, H Zhu, F Yang, L Wang, X Zeng
2017 IEEE 12th International Conference on ASIC (ASICON), 2017•ieeexplore.ieee.orgLU decomposition is widely used in the field of numerical analysis and engineering to solve
large-scale sparse linear equations. The complex data dependency makes it difficult to
parallelize the LU decomposition. In this paper, an architecture with an efficient cache for
parallel sparse LU decomposition using FPGA is proposed. The proposed architecture is
based on the Gilbert-Peierls (GP) algorithm. By using the elimination graph, we find the
column dependency of the LU decomposition. It is thus possible to exploit the parallelism …
large-scale sparse linear equations. The complex data dependency makes it difficult to
parallelize the LU decomposition. In this paper, an architecture with an efficient cache for
parallel sparse LU decomposition using FPGA is proposed. The proposed architecture is
based on the Gilbert-Peierls (GP) algorithm. By using the elimination graph, we find the
column dependency of the LU decomposition. It is thus possible to exploit the parallelism …
LU decomposition is widely used in the field of numerical analysis and engineering to solve large-scale sparse linear equations. The complex data dependency makes it difficult to parallelize the LU decomposition. In this paper, an architecture with an efficient cache for parallel sparse LU decomposition using FPGA is proposed. The proposed architecture is based on the Gilbert-Peierls (G-P) algorithm. By using the elimination graph, we find the column dependency of the LU decomposition. It is thus possible to exploit the parallelism. Through a dependency table, a simple but efficient cache strategy and its corresponding architecture are proposed. The proposed cache strategy avoids the cache miss and reduces the size of cache used to store all the intermediate data on chip. The experiment demonstrates that, our design can achieve speedup of 2.85x-10.27x, compared with UMFPACK running on general purpose processors. The cache size can be reduced by 50.93% on average with the proposed cache strategy.
ieeexplore.ieee.org
Showing the best result for this search. See all results