tree.kummerlaender.eu

» Symbolically generated GPU-based LBM

Experimental generation of OpenCL kernels using SymPy, Mako and PyOpenCL.

Implements a straight forward AB pattern
All memory offsets are statically resolved
Underlying symbolic formulation is optimized using CSE
Characteristic constants of D2Q9 and D3Q27 are transparently recovered using only discrete velocities

This code was originally written for a seminar report on the advantages of symbolic generation of LBM implementations.

The source is available on Github and cgit alongside detailed benchmarks on both a older Nvidia K2200 and a more recent Nvidia Tesla P100. All in all the results of the CSE-optimized versions are surprisingly good, reaching up to 99.8% of the theoretical upper performance limit.

Performance

Theoretical maximum performance on tested hardware:

GPU	Bandwidth	D2Q9		D3Q19		D3Q27
		single	double	single	double	single	double
K2200	63.2 GiB/s	893	459	435	220	308	156
P100	512.6 GiB/s	7242	3719	3528	1787	2502	1262

Maximum measured performance…

GPU	D2Q9		D3Q19		D3Q27
	single	double	single	double	single	double
K2200	843.4	326.4	423.2	163.8	303.0	116.0
P100	6957.4	3585.0	3420.2	1763.8	2374.6	1259.6

…relative to theoretical maximum

GPU	D2Q9		D3Q19		D3Q27
	single	double	single	double	single	double
K2200	94.4%	71.1%	97.3%	74.5%	98.4%	74.4%
P100	96.1%	96.4%	96.9%	98.7%	94.9%	99.8%

For more details see the result/ and notebook/ directories.

Made with XSLT

Contact