Cooking GPU

We realized that there is an error in the textbook example in Chapter 2.5 (Simple Vector Addition Using CUDA in page 28).

Errorneous code example (AddVectors.cu)

#include "AddVectors.h"
#include "mex.h"

__global__ void addVectorsMask(float* A, float* B, float* C, int size)
{
int i = blockIdx.x;
if(i >= size)
return;

C[i] = A[i] +B[i];
}

Since this is the .cu file, we don't need the "mex.h" file.
Actually we already corrected in our example codes (downloaded from the publisher's website), but we didn't delete it in our manuscript by mistake.

After fix (AddVectors.cu):

#include "AddVectors.h"
// #include "mex.h"

__global__ void addVectorsMask(float* A, float* B, float* C, int size)
{
int i = blockIdx.x;
if(i >= size)
return;

C[i] = A[i] +B[i];
}

Block out or delete the line: #include "mex.h"

Thank you and appologize for your convenience.

Jung W. Suh

Finally our book “Accelerating MATLAB using GPU Computing” was published. Although this blog is not limited to the GPU computing under MATLAB environment, we are going to start a series of postings on MATLAB-related GPU computing topics we missed while we were preparing the book manuscript due to the tight deadline (Actually we have extended our manuscript due date several times).

In the book, we already spent a lot of spaces to explain both ways of making use of GPU under MATLAB. One was a more general way of the GPU-utilization through C-MEX. Another was the simply use of Mathworks’ Parallel Computing Toolbox for GPU.

Today, we are going to deal with another topic “Then, How to choose the GPU usage either the GPU-utilization through C-MEX or the Parallel Computing Toolbox for accelerating your MATLAB codes”. Both ways have their own strength and weakness. Now let’s list up their features.

	Parallel Computing Toolbox	C-MEX with CUDA
Easiness	easy	complicated
Cost	Need extra cost for buying this toolbox	No extra cost
Flexibility	Not flexible	Flexible
Performance	Limited	Excellent
Limitation	A lot	small

For the easiness point of view, when we choose the GPU-utilization through C-MEX, users should explicitly assign block size and grid size in addition to memory allocation, which make users consider many things for efficiency. Users of Parallel Computing Toolbox don’t have to consider the memory allocation and block/grid size for both MATLAB built-in function and non-built-in function, which make feel easier. However, those automatic assignment and allocation in the Parallel Computing Toolbox also result in the limitation of performance gain because of the fixed method. Although you may use the NVIDIA’s Visual Profiler for analyzing your Parallel Computing Toolbox codes, there is very small room for improvement from it. Actually, the flexibility and the easiness can be the flip sides of a coin.

Usually the limitation of the GPU-utilization through C-MEX for MATLAB is the generic limitation of GPU itself while the limitation of the Parallel Computing Toolbox comes from the implementation limitation within the toolbox.

The bottom line is that if you want small performance improvement for simple project with extra budget then the Parallel Computing Toolbox is a good choice for accelerating your MATLAB codes. If your project is huge and want to have a big performance benefit, then the GPU-utilization through C-MEX would be a good choice.

Cooking GPU

Tuesday, April 4, 2017

Please feel free to make a suggestion.

Thursday, May 15, 2014

Example codes for our book

Erratum

Monday, April 14, 2014

Forum

Friday, January 10, 2014

Then, How to choose the GPU usage either the GPU-utilization through C-MEX or the Parallel Computing Toolbox for MATLAB?