Toby St Clere Smithe, who I mentored during the Google Summer of Code 2013, released PyViennaCL 1.0.0 today. PyViennaCL provides the Python bindings for the ViennaCL linear algebra and numerical computation library for general purpose computations on massively parallel hardware such as graphics processing units (GPUs) and other heterogeneous systems. ViennaCL itself is a header-only C++ library, so these bindings make available to Python programmers ViennaCL’s fast OpenCL and CUDA algorithms, in a way that is idiomatic and compatible with the Python community’s most popular scientific packages, NumPy and SciPy.

Let us have a quick look at how PyViennaCL is used. For simplicity, I assume that the underlying OpenCL device is a GPU, so all objects with data stored on the GPU is prefixed accordingly.

>>> import pyviennacl >>> import numpy >>> x = [1.0, 2.0, 3.0] #define a simple array >>> A = numpy.array([[1.0, 2.0, 3.0], #define a matrix using numpy [0.0, 3.0, 4.0], [0.0, 0.0, 5.0]]) >>> # Create a vector on the GPU holding the entries of x >>> gpu_x = pyviennacl.Vector(x) >>> # Create a matrix on the GPU holding the entries of A >>> gpu_A = pyviennacl.Matrix(A)

Copying data back and forth between the GPU is a pretty expensive operation, hence PyViennaCL requires you to be explicit about it when creating the respective objects.

Since the bandwidth of PCI-Express can be more than ten-fold smaller than that of high-end GPUs, you really want to have explicit control at this point.

As soon as the data is transferred over, you can start manipulating the PyViennaCL objects pretty much like you would do with numpy objects:

>>> gpu_y = gpu_A * gpu_x # matrix-vector product, lazy! >>> gpu_z = pyviennacl.sin(gpu_y) # element-wise sine

Note that operations are lazy, i.e. gpu_y and gpu_z only hold the expressions for evaluation.

You can of course output the results straight away:

>>> gpu_y array([ 14., 18., 15.]) >>> gpu_z array([ 0.99060736, -0.75098725, 0.65028784])

but on closer inspection you will see that they indeed hold the expressions:

>>> type(gpu_y) <class 'pyviennacl.pycore.Mul'> >>> type(gpu_z) <class 'pyviennacl.pycore.ElementSin'>

Thus, the expression gets evaluated automatically if you either print the object, or if you copy the data back to the host:

>>> y = gpu_y.value # trigger evaluation and GPU -> host data copy >>> type(y) <type 'numpy.ndarray'> # y is now a 1-D numpy array with dtype float64 >>> y array([ 14., 18., 15.])

If you want to trigger the computation explicitly, call the explicit() member function:

>>> gpu_z = gpu_y.execute() # trigger computation explicitly >>> type(gpu_z) <class 'pyviennacl.pycore.Vector'> # this is no longer an expression >>> gpu_z array([ 14., 18., 15.])

There are a lot of operations you can play with, which would certainly exceed the space available here:

- Various products: inner products, outer-products, matrix-vector products, matrix-matrix products
- Direct manipulation of subvectors and submatrices
- Dense triangular solvers
- Sparse matrices
- Iterative solvers

Since everything is along the spirit of NumPy, the syntax is pretty natural and self-explaining thanks to operator overloading. Very well done, Toby!

I'm happily looking forward to future extensions of PyViennaCL. Although performance is already pretty good for large problem sizes, there are a couple of places where both PyViennaCL and the underlying ViennaCL core can further increase in efficiency. Any input on how to further improve both libraries is of course very welcome 🙂

Pingback: Mentored Project Ideas for GSoC 2014 | Karl Rupp