������������������ MPI + OPENMP ���������������� �������������������������������� ������������������������ ���������������������� ������������ �� ������ �������������������� �� �������������� ��������

Λεπτομέρειες βιβλιογραφικής εγγραφής
Τίτλος: ������������������ MPI + OPENMP ���������������� �������������������������������� ������������������������ ���������������������� ������������ �� ������ �������������������� �� �������������� ��������
Στοιχεία εκδότη: ���������������� ����������������������, 2022.
Έτος έκδοσης: 2022
Θεματικοί όροι: ���������� ������������������ ��������������, sparse matrix solve, distributed-memorv parallel algorithms, �������������������������������� ���������������������� ������������, ������������������������ ����������������, �������������� ���������������������� ��������, sparse matrix reordering, nested dissection ordering
Περιγραφή: Systems of linear equations (SLAE) with large sparse matrix arise from numerical simulations in various scientific and engineering applications. SLAE solution is one of the time and memory-consuming stages of modeling, which requires efficient implementation on modern supercomputers. Depending on the properties of the matrix, direct and iterative methods of solving such systems are applied. Direct methods are based on matrix factorization into two triangular matrices and backward solution of these systems. A feature of direct methods is that the number of non-zero elements during the factorization step can significantly increase over the initial matrix. Symmetric row and column reordering is used before numerical factorization to minimize the fill-in. A suitable permutation reduces the memory consumption for storing the factor and the time required for the most time-consuming stage numerical factorization. Therefore, it is important to develop new parallel reordering algorithms that reduce the total time for solving SLAEs with a sparse matrix, and hence, the time of numerical simulation in applications. The problem of finding the ordering that minimizes the factor fill-in is NP-hard. In practice, two heuristic approaches are commonly used: the minimum degree and nested dissection algorithms. The nested dissection algorithm is built on the divide-and-conquer principle. At each step of the algorithm, a graph separator is found. The separator vertices divide the graph into two disconnected subgraphs of similar size. Then the separator vertices are numbered and removed from the graph, and the algorithm operates recursively on new subgraphs. The process stops when all vertices are numbered. There are many modifications of the nested dissection method that differ in the algorithm for finding the separator. Since 1993, the modifications of the nested dissection algorithm employing the multilevel approach are used in parallel computations. Most sparse SLAE solvers have built-in implementations of reordering methods, as well as an interface for using third-party libraries and user permutations. Open source libraries ParMETIS and PT-Scotch are widely used in the academic community. They implement a multilevel nested dissection algorithm for distributed memory systems using MPI technology. In particular, the open academic direct solver MUMPS has an interface for using these libraries. The ParMETIS library also has a version for shared memory systems called mt-metis. The commercial solver Intel MKL PARDISO and its cluster version Intel Cluster PARDISO are examples of tight integration of the solver and ordering routine. They include optimized versions of the METIS library, designed for shared-memory and distributedmemorv systems, but its approaches to optimizing and combining shared and distributed memory computations during the reordering process have not been published. Earlier, we presented the PMORSv and DMORSv reordering libraries that implement the multilevel nested section method for sharedand distributed-memorv systems, respectively. The ordering algorithm in PMORSy is based on parallel processing of the task queue. The task is to find the separator in the current subgraph. After calculating the separator, new subgraphs go to the shared task queue, from which they are assigned for execution to free threads. The algorithm is implemented using OpenMP technology. The parallel algorithm in DMORSv is based on parallel processing of a binary tree constructed during the nested dissection method. Let the original graph be distributed among P processes. Then its separator is calculated in parallel by all P processes using a multilevel algorithm similar to that used in PT-Scotch. After the separator is found, new subgraphs are redistributed to P / 2 processes each, and the ordering process is continued independently for new subgraphs. The algorithm is implemented using MPI technology. In this paper, we propose a hybrid MPI + OpenMP parallel algorithm for distributed memory systems, in which the use of processes and threads within a single computing node is combined. The algorithm is constructed as follows. While the input graph is distributed among P > 1 processes, its separator is calculated in parallel by all SI���S processes using a multilevel algorithm from DMORSv. Once the input subgraph is stored on a single process, the ordering is performed using a parallel task queue according to the algorithm from PMORSy. The combination of parallelization schemes includes their advantages: the scalability of the distributed memory algorithm and less factor fill-in obtained by the algorithm on shared memory. We show the competitiveness of the implementation in comparison with analogs in terms of ordering time and factor fill-in. We test our implementation on 37 symmetric matrices from the SuiteSparse Matrix Collection and LOGOS-FEM Collection. Computational experiments show that the use of a hybrid parallelization scheme in comparison with the pure MPI DMORSv parallelization reduces the run-time of reordering by 5 % on average when working on two computational nodes. In comparison with ParMETIS, hybrid DMORSv works faster on most matrices, the average advantage is 2 times for matrices over 1 million rows and 4.9 times for matrices less than 1 million rows. Moreover, hybrid DMORSv produces orderings of 8 % on average smaller fill-in than ParMETIS for a 2 / 3 of test matrices. Further, we tested the obtained permutations for solving the SLAEs. For this, a series of experiments were carried out with the open source solver MUMPS. The use of the hybrid DMORSv permutations allowed us to reduce the run-time of the solver on 26 matrices out of 37 in comparison with the use of permutations from ParMETIS (on average, by 26 %). For most of these matrices, the lead is obtained both by reducing the ordering time and by the time of numerical factorization
�� ������������ ������������������������������ ������������ �������������������������������� ���������� �� ���������������� ���������������������� �������������� �� ���������� �������������������� �������������������� �������������� ������ ������������ �������������� ��������. ������������������������ ������������������������ ���������������� ������������������������������ ������������ ������������������ �������������� ������ ������������ �� ���������������������������� ��������������, �� �������������� ���������������������� �������������������������� �������������������������� ������������������ �� �������������� �� ������������ ���������� ���������������������������� ��������������. �������������������� �������������������� ���������������������������� ��������������������������, ������������������������ ������������������������������������������ �������������������� �� ������������������ �� ������������������ ���� �������������� �������������������������������� �� �������������������� �������������� ������������. ����������������, ������ �������������������� �������������������� ������������������������ ������������������ ������������������ ���������� �������������� �������� �� �������������� ���������������� �������������������� MUMPS ���� �������� ���������������� ����������.
Τύπος εγγράφου: Research
DOI: 10.24412/2073-0667-2022-1-28-41
Rights: CC BY
Αριθμός Καταχώρησης: edsair.doi...........f0b93914b906efe9c94206b1c6d60d50
Βάση Δεδομένων: OpenAIRE
Περιγραφή
DOI:10.24412/2073-0667-2022-1-28-41