Tutorial

Please have a look at the architecture page if you haven't done so yet, to see what OpenCLIPER can do for you, and then come back again to start coding.

Overview

The goal of OpenCLIPER is to simplify writing OpenCL code. It will not make your code faster, nor more parallel, nor more bug-free. However, it will allow you to focus on coding kernels, because all the plethora of housekeeping work which is typically needed by OpenCL applications is taken care of automatically.

Note: from now on, we assume you have at least a very basic knowledge of how OpenCL works.

Tutorial: Intensity inverting filter

In this tutorial, we will write a first, simple program which just reverses the pixels of an image and, for the sake of demonstration, we will be comparing OpenCLIPER code with its pure-OpenCL equivalent.


Intensity inverting in C

First, let's see a pure C implementation. Because all the code runs in the CPU, there is no need todo anything special and the source ends up being pretty short:

  1. Load an input image. There are plenty of libraries that can do this. We will use DevIL, for instance (as OpenCLIPER does internally):

    ilInit();
    ILuint inputId;
    ilGenImages(1, &inputId);
    ilBindImage(inputId);
    ilLoadImage("input.png");
    unsigned int width = ilGetInteger(IL_IMAGE_WIDTH);
    unsigned int height = ilGetInteger(IL_IMAGE_HEIGHT);
    unsigned char* pixmap = (unsigned char*)malloc(width * height);
    ilCopyPixels(0, 0, 0, width, height, 1, IL_LUMINANCE, IL_UNSIGNED_BYTE, pixmap);
    

    That gives us the image in the pixmap buffer. For brevity, error checking is ommited.

  2. Do the real work, i.e. reverse the pixels in the image. Let's assume the image is one byte per pixel:

    for(unsigned int i = 0; i < width * height; i++)
        pixmap[i]=255-pixmap[i];
    

    Of course, this is not optimized in any way but we are not measuring performance here.

  3. Save the output image. Again, let's use DevIL to do the dirty work:

    ILuint outputId;
    ilGenImages(1, &outputId);
    ilBindImage(outputId);
    ilTexImage(width, height, 1, 1, IL_LUMINANCE, IL_UNSIGNED_BYTE, pixmap);
    ilEnable(IL_FILE_OVERWRITE);
    ilSaveImage("output.png");
    


Intensity inverting in OpenCL vs. Intensity inverting in OpenCLIPER

Well, that was simple. Next, we'll do the same thing in OpenCL. As you know, the processing work is written in a separate kernel file. Kernels are natively parallel, so we just need to specify how to transform a single pixel and then let OpenCL spread the work throughout all possible ALUs, like this:

__kernel void negate_kernel(__global uchar* input, __global uchar* output) {
    uint i = get_global_id(0);
    output[i] = (255 - input[i]);
}

But the tricky part is how to tell the processing device to execute that code. This is where OpenCLIPER comes in to help. Don't trust us; see it for yourself.


1a) Initializing the device in OpenCL

  1. With OpenCL, we need at least one platform available, so let's look for any of them and choose the first one we find:

    cl_uint numPlatforms;
    cl_platform_id platform = NULL;
    cl_int status = clGetPlatformIDs(0, NULL, &numPlatforms);
    if (status != CL_SUCCESS) {
        fprintf(stderr,"Error: Getting platforms!\n");
        return 1;
    }
    
    if(numPlatforms > 0) {
        cl_platform_id* platforms = (cl_platform_id* )malloc(numPlatforms* sizeof(cl_platform_id));
        status = clGetPlatformIDs(numPlatforms, platforms, NULL);
        platform = platforms[0];
        free(platforms);
    }
    

    It is not very elegant to pick the first available platform just because it was the first. Maybe we are missing an interesting device which is only present in, say, the fifth available platform.

  2. Now we need the list of devices in the chosen platform. We'll try to get a GPU-class device and, if there aren't any (remember: in the platform we chose above), then fall back to the CPU (again, if this platform supports it):

    cl_uint numDevices = 0;
    cl_device_id *devices;
    status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
    
    if (numDevices == 0) { //no GPU available.
        fprintf(stderr,"No GPU device available.\n");
        fprintf(stderr,"Choose CPU as default device.\n");
    
        status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 0, NULL, &numDevices);    
        devices = (cl_device_id*)malloc(numDevices * sizeof(cl_device_id));
        status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, numDevices, devices, NULL);
    }
    else {
        devices = (cl_device_id*)malloc(numDevices * sizeof(cl_device_id));
        status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, numDevices, devices, NULL);
    }
    
  3. Last for device initialization, we need an OpenCL context and a command queue associated to the context and the device we found:

    cl_context context = clCreateContext(NULL,1, devices,NULL,NULL,NULL);
    cl_command_queue commandQueue = clCreateCommandQueue(context, devices[0], 0, NULL);
    

With this, device initialization is over, but keep in mind that we are possibly missing some interesting devices, and that we did not care about any device properties. If we need a device that supports, say, OpenCL 1.1 or want to try the platform from vendor X, we have to throw more code at it.


1b) Initializing the device in OpenCLIPER

That's it for device selection and initialization. On any error, an exception is raised accordingly, so no need to check explicitly after each call.


2a) Loading the kernel in OpenCL

Next, we need to load the kernel that negates the pixels. Assume we saved it in a file called negate.cl. Then:

  1. We need to load the kernel into memory as a string, and then create a CL program from it:

    char *source;
    FILE *f;
    struct stat fileInfo;
    size_t fileSize;
    
    f=fopen("negate.cl","r");
    fstat(f->_fileno,&fileInfo);
    fileSize=fileInfo.st_size;
    source=(char*)malloc(fileSize+1);
    fread(source,sizeof(char),fileSize,f);
    fclose(f);
    source[fileSize]='\0';
    
    size_t sourceSize[] = {strlen(source)};
    cl_program program = clCreateProgramWithSource(context, 1, (const char**)&source, sourceSize, NULL);
    
  2. Now we need to compile the program, dump the log to the screen (just in case anything went wrong) and then create the kernel object.

    status=clBuildProgram(program, 1,devices,NULL,NULL,NULL);
    printf("status_build=%d\n",status);
    char *info=(char*)malloc(1000000);
    clGetProgramBuildInfo(program,devices[0],CL_PROGRAM_BUILD_LOG,1000000,info,NULL);
    printf("log=%s\n",info);
    
    cl_kernel kernel = clCreateKernel(program,"negate_kernel", NULL);
    

With this, we have created one kernel object, which we must keep track of by hand. Now, for the OpenCLIPER variant.


2b) Loading the kernel in OpenCLIPER

And that's all. The CL file has been loaded, the CL program created and compiled, and a kernel object has been created automagically for every kernel present in the source file(s), which we can access by name anywhere in our code. If there were any errors, an exception is raised with the build log.


3a) Setting inputs and outputs in OpenCL

  1. First we have to load the input image and copy it to device memory. We can use DevIL again to load the image:

    ilInit();
    ILuint inputId;
    ilGenImages(1, &inputId);
    ilBindImage(inputId);
    ilLoadImage("input.png");
    unsigned int width = ilGetInteger(IL_IMAGE_WIDTH);
    unsigned int height = ilGetInteger(IL_IMAGE_HEIGHT);
    unsigned char* hostInBuffer=(unsigned char*)malloc(width * height);
    ilCopyPixels(0, 0, 0, width, height, 1, IL_LUMINANCE, IL_UNSIGNED_BYTE, hostInBuffer);
    
    cl_mem inputBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR, width * height * sizeof(char),(void *) hostInBuffer, NULL);
    
  2. Now reserve another chunk of memory for the output, in the host and in the device:

    unsigned char* hostOutBuffer=(unsigned char*)malloc(width * height);
    cl_mem outputBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY , width * height * sizeof(char), NULL, NULL);
    

With this, we have our input image in the device, awaiting to be processed, and an empty output buffer to store the result.


3b) Setting inputs and outputs in OpenCLIPER

  1. As in plain OpenCL, we load the input image, but instead of explicitly copying it to the device, we just have to associate it with our CL application and get a handle to it in exchange (note that OpenCLIPER uses floats by default as the data type):

    std::shared_ptr<Data> pIn(new XData(std::string("input.png"), type_index(typeid(realType))));
    DataHandle inHandle = pCLapp->addData(pIn);
    
  2. Now, the same goes for the output: first create it (empty), then associate it with our app:

    std::shared_ptr<Data> pOut(new XData((dynamic_pointer_cast<XData>(pIn)), false));
    DataHandle outHandle = pCLapp->addData(pOut);
    

Once we have the handles, we can be sure that the data have been pinned in memory to improve transfer speeds and are properly laid in the device memory, their offsets and sizes conveniently accessible from CL kernels (in this simple tutorial this is not noticeable but when you have to work with n-dimensional, multi-slice, heterogeneous data sets, it comes in very handy).


4a) Setting up and launching the process in OpenCL

  1. Before launching the kernel, we have to set its parameters (the input and output buffers):

    clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&inputBuffer1);
    clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&outputBuffer);
    
  2. Now we can finally launch (enqueue in OpenCL terms) the kernel:

    size_t global_work_size[1] = {width * height};
    clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, global_work_size, NULL, 0, NULL, NULL);
    

This was simple indeed, but note how the final user has to deal with the internals of your kernel (i.e. know how you programmed it) to be able to use it. The solution is, of course, to encapsulate the kernel in its own class (or procedures if you are stuck to pure C) and provide a convenient interface to the user. That is exactly what OpenCLIPER does already for you.


4b) Setting up and launching the process in OpenCLIPER

To overcome the the need to code a proper interface for your kernels, OpenCLIPER includes the concept of processes. Processes expose a simple way to initialize and launch kernels, as well as to change their parameters safely. Processes can contain as many kernels as you like, and they can be chained at your discretion at no cost (i.e. setting a process output as another process' input is zero-copy).

Note also that OpenCLIPER is OpenCL, so you can call your kernel directly if you so desire, but then you'll have to explain your users all the gory details about your kernel internals!

  1. So, first we need to create a process that encapsulates our kernel and exposes a nice interface to the user. To this end, we write its header file, Negate.hpp:

    #include <OpenCLIPER/CLapp.hpp>
    #include <OpenCLIPER/Process.hpp>
    
    namespace OpenCLIPER {
        class Negate : public OpenCLIPER::Process {
            public:
                Negate(std::shared_ptr<OpenCLIPER::CLapp> pCLapp): Process(pCLapp) {};
                void init();
                void launch(ProfileParameters profParms);
        };
    }
    

    And then its implementation file, Negate.cpp:

    #include <Negate.hpp>
    
    namespace OpenCLIPER {
        void Negate::init() {
            kernel = getApp()->getKernel("negate_kernel");
            queue = getApp()->getCommandQueue();
        }
    
        void Negate::launch(ProfileParameters profileParameters) {
            cl::Buffer* pInBuf = getInput()->getContiguousMemoryDeviceBuffer();
            cl::Buffer* pOutBuf = getOutput()->getContiguousMemoryDeviceBuffer();
    
            kernel.setArg(0, *pInBuf);
            kernel.setArg(1, *pOutBuf);
    
            cl::NDRange globalSizes = {NDARRAYWIDTH(getInput()->getNDArray(0)) * NDARRAYHEIGHT(getInput()->getNDArray(0))};
    
            queue.enqueueNDRangeKernel(kernel, cl::NullRange, globalSizes, cl::NDRange(), NULL, NULL);
        }
    }
    

    Note how kernel initialization and launching are separated so that lengthy initializations don't hinder consecutive executions. Additional parameters can be set by the user on-the-fly, but we don't use that functionality in this tutorial.

  2. Now that we have our process ready, we can link to it any buffer available in our app and launch it. Every process has an input and an output so that they can be chained easily:

    std::unique_ptr<Process> pProcess(new Negate(pCLapp));
    pProcess->setInHandle(inHandle);
    pProcess->setOutHandle(outHandle);
    
    pProcess->init();
    pProcess->launch();
    

    ... and your users just see what they have (and want) to see. No more, no less.


5a) Saving results in OpenCL

Just the opposite of loading input data. First we copy the result image from the device to the host, and then save it:

    clEnqueueReadBuffer(commandQueue, outputBuffer, CL_TRUE, 0, width * height * sizeof(char), hostOutBuffer, 0, NULL, NULL);

    ILuint outputId;
    ilGenImages(1, &outputId);
    ilBindImage(outputId);
    ilTexImage(width, height, 1, 1, IL_LUMINANCE, IL_UNSIGNED_BYTE, hostOutBuffer);
    ilEnable(IL_FILE_OVERWRITE);
    ilSaveImage("output.png");


5b) Saving results in OpenCLIPER

Accordingly, copy the result image to the host (albeit this time pinned memory is used) and save:

    pCLapp->device2Host(outHandle, SyncSource::BUFFER_ONLY);

    auto outputData=dynamic_pointer_cast<XData>(pCLapp->getData(outHandle));
    outputData->save("output.png", SyncSource::BUFFER_ONLY);

Summary

Here you have the listings of both versions side by side.

OpenCL

#include <CL/cl.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <string>
#include <fstream>
#include <sys/stat.h>
#include <sys/time.h>

int convertToString(const char *filename,char **s) {
    FILE *f;
    struct stat fileInfo;
    size_t fileSize;

    f=fopen(filename,"r");
    if(f!=NULL) {
        fstat(f->_fileno,&fileInfo);
        fileSize=fileInfo.st_size;

        *s=(char*)malloc(fileSize+1);

        fread(*s,sizeof(char),fileSize,f);
        fclose(f);
        s[fileSize]='\0';
        return 0;
    }
    fprintf(stderr,"Error: failed to open file %s\n",
        filename);
    return -1;
}



int main(int argc, char* argv[]) {
    cl_uint numPlatforms;
    cl_platform_id platform = NULL;
    cl_int status = clGetPlatformIDs(0, NULL,
        &numPlatforms);
    if (status != CL_SUCCESS) {
        fprintf(stderr,"Error: Getting platforms!\n");
        return 1;
    }

    if(numPlatforms > 0) {
        cl_platform_id* platforms =
           (cl_platform_id* )malloc(numPlatforms*
           sizeof(cl_platform_id));
        status = clGetPlatformIDs(numPlatforms,
            platforms, NULL);
        platform = platforms[0];
        free(platforms);
    }

    cl_uint numDevices = 0;
    cl_device_id *devices;
    status = clGetDeviceIDs(platform,
        CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);

    if (numDevices == 0) {
        fprintf(stderr,"No GPU device available.\n");
        fprintf(stderr,"Choose CPU as default device.
            \n");

        status = clGetDeviceIDs(platform,
            CL_DEVICE_TYPE_CPU, 0, NULL, &numDevices);
        devices = (cl_device_id*)malloc(numDevices *
            sizeof(cl_device_id));
        status = clGetDeviceIDs(platform,
            CL_DEVICE_TYPE_CPU, numDevices, devices,
            NULL);
    }
    else {
        devices = (cl_device_id*)malloc(
            numDevices * sizeof(cl_device_id));
        status = clGetDeviceIDs(platform,
            CL_DEVICE_TYPE_GPU, numDevices, devices,
            NULL);
    }

    cl_context context = clCreateContext(NULL,1,
         devices,NULL,NULL,NULL);
    cl_command_queue commandQueue =
        clCreateCommandQueue(context, devices[0], 0,
        NULL);

    const char *filename = "negate.cl";
    char *source;
    status = convertToString(filename, &source);
    size_t sourceSize[] = {strlen(source)};
    cl_program program = clCreateProgramWithSource(
        context, 1, (const char**)&source, sourceSize,
        NULL);

    status=clBuildProgram(program, 1,devices,NULL,
        NULL, NULL);
    printf("status_build=%d\n",status);
    char *info=(char*)malloc(1000000);
    clGetProgramBuildInfo(program,devices[0],
        CL_PROGRAM_BUILD_LOG,1000000,info,NULL);
    printf("log=%s\n",info);

    cl_kernel kernel = clCreateKernel(program,
        "negate_kernel", NULL);

    unsigned char* hostInBuffer=(unsigned char*)
        malloc(bufSize);
    unsigned char* hostOutBuffer=(unsigned char*)
        malloc(bufSize);

    ilInit();
    ILuint inputId;
    ilGenImages(1, &inputId);
    ilBindImage(inputId);
    ilLoadImage("input.png");
    unsigned int width = ilGetInteger(
        IL_IMAGE_WIDTH);
    unsigned int height = ilGetInteger(
        IL_IMAGE_HEIGHT);
    unsigned char* hostInBuffer=(unsigned char*)
        malloc(width * height);
    ilCopyPixels(0, 0, 0, width, height, 1,
        IL_LUMINANCE, IL_UNSIGNED_BYTE, hostInBuffer);

    cl_mem inputBuffer = clCreateBuffer(context,
        CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,
        width * height * sizeof(char),(void *)
        inBuffer1, NULL);
    cl_mem outputBuffer = clCreateBuffer(context,
        CL_MEM_WRITE_ONLY , width * height *
        sizeof(char), NULL, NULL);

    status = clSetKernelArg(kernel, 0, sizeof(cl_mem),
        (void *)&inputBuffer1);
    status = clSetKernelArg(kernel, 1, sizeof(cl_mem),
        (void *)&outputBuffer);

    size_t global_work_size[1] = {width * height};
    status = clEnqueueNDRangeKernel(commandQueue,
        kernel, 1, NULL, global_work_size, NULL, 0,
        NULL, NULL);

    status = clEnqueueReadBuffer(commandQueue,
        outputBuffer, CL_TRUE, 0, width * height *
        sizeof(char), outBuffer, 0, NULL, NULL);

    ILuint outputId;
    ilGenImages(1, &outputId);
    ilBindImage(outputId);
    ilTexImage(width, height, 1, 1, IL_LUMINANCE,
        IL_UNSIGNED_BYTE, hostOutBuffer);
    ilEnable(IL_FILE_OVERWRITE);
    ilSaveImage("output.png");

    status = clReleaseKernel(kernel);
    status = clReleaseProgram(program);
    status = clReleaseMemObject(inputBuffer1);
    status = clReleaseMemObject(inputBuffer2);
    status = clReleaseMemObject(outputBuffer);
    status = clReleaseCommandQueue(commandQueue);
    status = clReleaseContext(context);

    if (outBuffer != NULL) {
        free(outBuffer);
        outBuffer = NULL;
    }

    if (devices != NULL) {
        free(devices);
        devices = NULL;
    }

    return 0;
}
}

OpenCLIPER: main program

#include <OpenCLIPER/XData.hpp>
#include <OpenCLIPER/processes/examples/Negate.hpp>
#include <iostream>
#include <string>

using namespace OpenCLIPER;
int main(int argc, char *argv[]) {
  std::shared_ptr pCLapp = std::make_shared();

  try {
    CLapp::PlatformTraits platformTraits;
    CLapp::DeviceTraits deviceTraits;
    pCLapp->init(platformTraits,deviceTraits);

    pCLapp->loadKernels("examples/negate.cl");

    std::shared_ptr pIn(new XData(std::
        string("input.png"), type_index(typeid(
        realType))));

    std::shared_ptr pOut(new XData((
        dynamic_pointer_cast(pIn)), false));

    DataHandle inHandle = pCLapp->addData(pIn);
    DataHandle outHandle = pCLapp->addData(pOut);

    std::unique_ptr pProcess(new
        Negate(pCLapp));
    pProcess->setInHandle(inHandle);
    pProcess->setOutHandle(outHandle);

    pProcess->init();
    pProcess->launch();

    pCLapp->device2Host(outHandle,
        SyncSource::BUFFER_ONLY);

    auto outputData=dynamic_pointer_cast
        (pCLapp->getData(outHandle));
    outputData->save("output.png",
        SyncSource::BUFFER_ONLY);

    pProcess.reset(nullptr);
    pCLapp->delData(inHandle);
    pCLapp->delData(outHandle);
    pCLapp = nullptr;
  } catch (std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
  }
}

OpenCLIPER: Negate.hpp

#include <OpenCLIPER/CLapp.hpp>
#include <OpenCLIPER/Process.hpp>

namespace OpenCLIPER {
    class Negate : public OpenCLIPER::Process {
        public:
            Negate(std::shared_ptr pCLapp):
                Process(pCLapp) {};
            void init();
            void launch(ProfileParameters profParms);
    };
}

OpenCLIPER: Negate.cpp

#include <Negate.hpp>

namespace OpenCLIPER {
    void Negate::init() {
        kernel = getApp()->getKernel("negate_kernel");
        queue = getApp()->getCommandQueue();
    }

    void Negate::launch(ProfileParameters profParms) {
        cl::Buffer* pInBuf = getInput()->
            getContiguousMemoryDeviceBuffer();
        cl::Buffer* pOutBuf = getOutput()->
            getContiguousMemoryDeviceBuffer();

        kernel.setArg(0, *pInBuf);
        kernel.setArg(1, *pOutBuf);

        cl::NDRange globalSizes = {NDARRAYWIDTH(getInput()->
            getNDArray(0)) * NDARRAYHEIGHT(getInput()->
            getNDArray(0))};

        queue.enqueueNDRangeKernel(kernel, cl::NullRange,
            globalSizes, cl::NDRange(), NULL, NULL);
    }
}