Optimization of the Brillouin operator on the KNL architecture

Stephan Dürr

doi:10.1051/epjconf/201817502001

Open Access

Issue		EPJ Web Conf. Volume 175, 2018 35^th International Symposium on Lattice Field Theory (Lattice 2017)


Article Number		02001
Number of page(s)		8
Section		2 Algorithms and Machines
DOI		https://doi.org/10.1051/epjconf/201817502001
Published online		26 March 2018

EPJ Web of Conferences 175, 02001 (2018)
https://doi.org/10.1051/epjconf/201817502001

Optimization of the Brillouin operator on the KNL architecture

Stephan Dürr¹^,2

¹ University of Wuppertal, Gaußstraße 20, D-42119 Wuppertal, Germany
² IAS/JSC, Forschungszentrum Jülich GmbH, D-52425 Jülich, Germany

Published online: 26 March 2018

Abstract

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with N_c = 3 colors, N_v = 12 right-hand-sides, N_thr = 256 threads, on lattices of size 32³ × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (http://creativecommons.org/licenses/by/4.0/).

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.