Abstract:
The paper presents the results of a study done to find the optimal shapes
of matrix element partitioning on three abstract heterogeneous processors when
performing multiplication operations. An abstract processor model allows applying
the research results in systems with different heterogeneous architectures. To
determine the optimal partitioning shape, the work uses non-rectangular candidate
shapes identified by Ashley DeFlumere in her work as a result of applying the
technology of redistribution of matrix elements between the processors «push»:
Square Corner, Rectangle Corner, Square Rectangle, Block Rectangle, L-Rectangle,
Traditional 1D Rectangular. The optimality of shapes is determined for four classes
of matrix multiplication algorithms: Serial Communication with Barrier (SCB),
Parallel Communication with Barrier (PCB), Serial Communication with Bulk Overlap
(SCO) and Parallel Communication with Overlap (PCO). The Hockney model was used to evaluate the communication complexity of algorithms. Mathematical models of
the algorithm execution time were introduced in the paper for each considered
candidate shape in all algorithms. Based on the developed mathematical models,
software was developed that allows to select the form of elements partitioning
between processors, depending on the ratio of their speeds and latency of the
transmission medium.