Skip to content

Drivers

  • Find the files in: lib/CL/devices/formosa The pocl-formosa.h and pocl-formosa.c define and implement the user space device driver for FORMOSA. The pocl-formosa-util.h and pocl-formosa-util.cc includes the support functions for the driver.

pocl-formosa.c

void pocl_formosa_init_device_ops(struct pocl_device_ops *ops)

ops is the struct for storing the driver function for the device. For example, ops->alloc_mem_obj = pocl_formosa_alloc_mem_obj set the memory allocation handler to our implementation pocl_formosa_alloc_mem_obj.

pocl_formosa_probe

The function will test if your device is available or not. If a formosa device is probed and correctly responsed, the function will set the global variable formosa_available to true.

pocl_formosa_init

The function defines the attributes of the devices and perform intialization for the device drivers. For example, we will set the device name here and specify the local/global memory size here. Also, we will intialize the memory allocator in this function. There is also a pocl_formosa_uninit function to free some data structure safely.

pocl_formosa_read / pocl_formosa_write

These functions implement how to read data from device to host memory and write data to devices memory. These functions will be called when OpenCL read/write buffer is enqueued.

pocl_formosa_alloc_mem_obj / pocl_formosa_free

Allocate and free device memory space. Note that we currently only support CL_MEM_READ_WRITE / CL_MEM_READ_ONLY / CL_MEM_WRITE_ONLY memory flags. The memory allocation on global memory are all done on the host driver, that is, given a memory size to allocate, the allocator will maintain a data structure and return the available addres on the device.

pocl_formosa_post_build_program

We will build our kernel program in this function with the following steps: 1. Run pocl llvm passes - Output LLVM kernel modules (in LLVM IR format) 2. Compile program (fsa_compile_program in pocl_formosa_utils) 1. Generate trampoiline functions for the modules 2. Write the bitcodes 3. Compile the bitcodes and link with 1) kernel library, 2) start.S with our linker script

pocl_formosa_run

This is the most important function that implements how we gather the kernel argument and run the kernel. The function will be used when enqueueNDRangeKernel is called. The execution steps are as follow:

  1. Iterate all kernel arguments and calculate the space device need to store the arguments
  2. Allocate host argument buffer and device arguemnt buffer
  3. Place the arguments in the host buffer
  4. Upload the host argument buffer to device argument buffer
  5. Setup the context data
    • Work dimension
    • Kernel ID
    • Local sizes for a workgroup
    • Number of workgroups in 3 dimensions
    • Set the entry pc and trampoline function pc
    • etc.
  6. Upload the kernel to device
  7. Start
  8. Wait for kernel to finish
  9. Release buffers