ISPC and Integrating ISPC Into CMake Projects

ISPC (Implicit SPMD Program Compiler) is a compiler made by Intel that can be used to achieve a high degree of parallelism in CPU computations for multiple architectures (not just Intel). It generates parallel programs utilizing SIMD (Single Input Multiple Data) instructions and does it architecture agnostically. SIMD as its name implies are parallel instructions that operate on multiple data. This is useful for example doing fast vector computations such as adding two vectors or arrays. The standard way of adding two arrays using C++ is looping through the elements and adding them and then compiling the program using compiler optimization flags and hoping that the compiler optimizes the loop to use SIMD instructions. For something simple as vector addition a compiler such as GCC can vectorize the loop but SIMD optimizations are not typically done for some other common operations such as the dot product.

Integrating the ISPC compiler into a project is useful for example a game programmer, AI developer or anyone with heavy parallelizable CPU tasks who wants to write efficient calculations with support for multiple platforms and targets without having to write code for multiple architectures or having to make portability compromises.

ISPC actually uses a SPMD (Single Program Multiple Data) computing model as the ISPC compiled programs are executed as multiple programs in parallel and achieves a SIMD model.

Comparing ISPC programs to programs using intrinsics. Would you prefer to write code such as this

//ISPC dot product
export void dotProduct(uniform const float v1[], uniform const float v2[],
                       uniform int count, uniform float vout[]) {   
    foreach(i = 0 ... count) {
        vout[i] += v1[i] * v2[i];
    }
}

Or this?

//SSE dot product using intrinsics
inline __m128 sse_dot4(__m128 v0, __m128 v1)
{
    v0 = _mm_mul_ps(v0, v1);
    v1 = _mm_shuffle_ps(v0, v0, _MM_SHUFFLE(2, 3, 0, 1));
    v0 = _mm_add_ps(v0, v1);
    v1 = _mm_shuffle_ps(v0, v0, _MM_SHUFFLE(0, 1, 2, 3));
    v0 = _mm_add_ps(v0, v1);

    return v0;
}

I’m sure the code above is efficient but it was probably not simple to write and only targets SSE.

That’s enough introduction. You can follow the steps I go through to create a minimum working project that utilizes ISPC to write efficient parallel functions. You can then integrate ISPC programs into your own project.

The ISPC compiler

My recommendation is downloading the precompiled binaries of the ISPC compiler. You can find the precompiled binaries here. Extract them somewhere accessible on your filesystem. For example

/home/my-user/tools/ispc-v1.15.0-linux
C:/tools/ispc-v1.15.0-windows

As of writing this article (March 2021) GCC has an ABI bug which means the Clang C++ compiler is needed for Linux to link with the ISPC object files. If you’d like to use the GCC compiler you’d need to compile ISPC from source using GCC. ISPC source code is available here.

ISPC Hello World

We’ll write a C++ main function that will call a ISPC program to perform some parallel computations. We’ll begin by writing the ISPC program and use a simple example from the ISPC website. ISPC uses a C-like language that should feel familiar. You can read more about the language details in the ISPC user’s guide. We’ll call this program simple.ispc.

//simple.ispc
export void simple(uniform float vin[], uniform float vout[],
                   uniform int count) {
    foreach (index = 0 ... count) {
        float v = vin[index];
        if (v < 3.)
            v = v * v;
        else
            v = sqrt(v);
        vout[index] = v;
    }
}

The program accepts an array of floats that it does some basic computation with, squaring the first 3 values and evaluating the square root of any other value. When this is compiled with the ISPC compiler we’ll instruct it to also output a header we can use to access the function symbol from our main program. The header basically consists of this

//simple.h (loosely)
namespace ispc {
    extern "C" {
        extern void simple(float * vin, float * vout, int32_t count);
    }
}

We’ll use this main C++ program to execute the ISPC program.

//main.cpp
#include <stdio.h>
#include "simple.h"

int main() {
    float vin[16], vout[16];
    for (int i = 0; i < 16; ++i)
        vin[i] = i;

    ispc::simple(vin, vout, 16);

    for (int i = 0; i < 16; ++i)
        printf("%d: simple(%f) = %f\n", i, vin[i], vout[i]);
}

We’re simply initializing an array of floats from 0 to 15 and calling the ISPC program with it and printing the results.

CMake project

This is done using the following custom command. The example is for a Linux system. Change the path to your ISPC compiler path. The custom command for Microsoft using MSVC is almost identical except for the command path. The compiler path and flags should actually be passed to CMake as an environment variable but I’ve kept it hardcoded here for simplicity.

    add_custom_command(OUTPUT ${PROJECT_BINARY_DIR}/simple.o ${PROJECT_BINARY_DIR}/include/simple.h
                        COMMAND "/home/my-user/ISPC/ispc-v1.15.0-linux/bin/ispc \
                           --target=avx2 --arch=x86-64 ${CMAKE_CURRENT_SOURCE_DIR}/simple.ispc \
                           --header-outfile=${PROJECT_BINARY_DIR}/include/simple.h \
                           -o ${PROJECT_BINARY_DIR}/simple.o"
                        DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/simple.ispc
    )

The command outputs the ISPC object file into the project binary directory and the header file an include directory under the project binary directory. The command specifies a dependence to the source file so that it’s rebuilt if it changes and it also specifies the outputs so CMake can figure out the dependency graph. You could use some nicer directories than the binary output directory root.

CMake version 3.19 and above has some nice support for ISPC but it still doesn’t properly target MSVC which is why I’m using this custom command.

To link against the ISPC object files I’m sure there exists multiple methods using CMake. Here I’ll use the CMake add_library function to create a link target. We’ll define a library called ispcLibrary consisting of the ISPC object file and tell the linker how it should be linked against (C-style linking).

add_library(ispcLibrary
        STATIC
        simple.o)
# Set the linker language to C so the linker knows how to link
SET_TARGET_PROPERTIES(
    ispcLibrary
    PROPERTIES
    LINKER_LANGUAGE C 
)

What’s left is defining the main executable and instructing CMake how it should be compiled and linked, telling CMake ispcLibrary should be generated before the main executable and including the header directory we specified that ISPC should output to.

# Add the main program exectuable
add_executable(main)
target_sources(main PRIVATE main.cpp)
add_dependencies(main ispcLibrary) # Needed as dependency as the header file is generated by ISPC
target_include_directories(main PRIVATE ${PROJECT_BINARY_DIR}/include)

# Now create main by linking with the ispc library
target_link_libraries(
    main 
    ispcLibrary
)

I’ll show a compilation under Linux using Clang since I’ve used the precompiled libraries. I’ll also be using the more modern Ninja build system instead of make.

mkdir linux_build
cd linux_build
cmake -GNinja -DCMAKE_CXX_COMPILER=clang++ ..
ninja

Now when I run the main binary the following output is produced

0: simple(0.000000) = 0.000000
1: simple(1.000000) = 1.000000
2: simple(2.000000) = 4.000000
3: simple(3.000000) = 1.732051
4: simple(4.000000) = 2.000000
5: simple(5.000000) = 2.236068
6: simple(6.000000) = 2.449490
7: simple(7.000000) = 2.645751
8: simple(8.000000) = 2.828427
9: simple(9.000000) = 3.000000
10: simple(10.000000) = 3.162278
11: simple(11.000000) = 3.316625
12: simple(12.000000) = 3.464102
13: simple(13.000000) = 3.605551
14: simple(14.000000) = 3.741657
15: simple(15.000000) = 3.872983

Further if I disassemble the simple.o object file compiled against the AVX2 instruction set I can see AVX2 instructions being generated such as vbroadcastss.

objdump -d simple.o
...
19:   c4 e2 7d 18 05 00 00 vbroadcastss 0x0(%rip),%ymm0 # 22 <simple___un_3C_unf_3E_un_3C_unf_3E_uni+0x22>

You can access all relevant source files at my GitHub.ISPC (Implicit SPMD Program Compiler) is a compiler made by Intel that can be used to achieve a high degree of parallelism in CPU computations for multiple architectures (not just Intel). It generates parallel programs utilizing SIMD (Single Input Multiple Data) instructions and does it architecture agnostically. SIMD as its name implies are parallel instructions that operate on multiple data. This is useful for example doing fast vector computations such as adding two vectors or arrays. The standard way of adding two arrays using C++ is looping through the elements and adding them and then compiling the program using compiler optimization flags and hoping that the compiler optimizes the loop to use SIMD instructions. For something simple as vector addition a compiler such as GCC can vectorize the loop but SIMD optimizations are not typically done for some other common operations such as the dot product.

Integrating the ISPC compiler into a project is useful for example a game programmer, AI developer or anyone with heavy parallelizable CPU tasks who wants to write efficient calculations with support for multiple platforms and targets without having to write code for multiple architectures or having to make portability compromises.

ISPC actually uses a SPMD (Single Program Multiple Data) computing model as the ISPC compiled programs are executed as multiple programs in parallel and achieves a SIMD model.

Comparing ISPC programs to programs using intrinsics. Would you prefer to write code such as this

//ISPC dot product
export void dotProduct(uniform const float v1[], uniform const float v2[],
                       uniform int count, uniform float vout[]) {   
    foreach(i = 0 ... count) {
        vout[i] += v1[i] * v2[i];
    }
}

Or this?

//SSE dot product using intrinsics
inline __m128 sse_dot4(__m128 v0, __m128 v1)
{
    v0 = _mm_mul_ps(v0, v1);
    v1 = _mm_shuffle_ps(v0, v0, _MM_SHUFFLE(2, 3, 0, 1));
    v0 = _mm_add_ps(v0, v1);
    v1 = _mm_shuffle_ps(v0, v0, _MM_SHUFFLE(0, 1, 2, 3));
    v0 = _mm_add_ps(v0, v1);

    return v0;
}

I’m sure the code above is efficient but it was probably not simple to write and only targets SSE.

That’s enough introduction. You can follow the steps I go through to create a minimum working project that utilizes ISPC to write efficient parallel functions. You can then integrate ISPC programs into your own project.

The ISPC compiler

My recommendation is downloading the precompiled binaries of the ISPC compiler. You can find the precompiled binaries here. Extract them somewhere accessible on your filesystem. For example

/home/my-user/tools/ispc-v1.15.0-linux
C:/tools/ispc-v1.15.0-windows

As of writing this article (March 2021) GCC has an ABI bug which means the Clang C++ compiler is needed for Linux to link with the ISPC object files. If you’d like to use the GCC compiler you’d need to compile ISPC from source using GCC. ISPC source code is available here.

ISPC Hello World

We’ll write a C++ main function that will call a ISPC program to perform some parallel computations. We’ll begin by writing the ISPC program and use a simple example from the ISPC website. ISPC uses a C-like language that should feel familiar. You can read more about the language details in the ISPC user’s guide. We’ll call this program simple.ispc.

//simple.ispc
export void simple(uniform float vin[], uniform float vout[],
                   uniform int count) {
    foreach (index = 0 ... count) {
        float v = vin[index];
        if (v < 3.)
            v = v * v;
        else
            v = sqrt(v);
        vout[index] = v;
    }
}

The program accepts an array of floats that it does some basic computation with, squaring the first 3 values and evaluating the square root of any other value. When this is compiled with the ISPC compiler we’ll instruct it to also output a header we can use to access the function symbol from our main program. The header basically consists of this

//simple.h (loosely)
namespace ispc {
    extern "C" {
        extern void simple(float * vin, float * vout, int32_t count);
    }
}

We’ll use this main C++ program to execute the ISPC program.

//main.cpp
#include <stdio.h>
#include "simple.h"

int main() {
    float vin[16], vout[16];
    for (int i = 0; i < 16; ++i)
        vin[i] = i;

    ispc::simple(vin, vout, 16);

    for (int i = 0; i < 16; ++i)
        printf("%d: simple(%f) = %f\n", i, vin[i], vout[i]);
}

We’re simply initializing an array of floats from 0 to 15 and calling the ISPC program with it and printing the results.

CMake project

This is done using the following custom command. The example is for a Linux system. Change the path to your ISPC compiler path. The custom command for Microsoft using MSVC is almost identical except for the command path. The compiler path and flags should actually be passed to CMake as an environment variable but I’ve kept it hardcoded here for simplicity.

    add_custom_command(OUTPUT ${PROJECT_BINARY_DIR}/simple.o ${PROJECT_BINARY_DIR}/include/simple.h
                        COMMAND "/home/my-user/ISPC/ispc-v1.15.0-linux/bin/ispc \
                           --target=avx2 --arch=x86-64 ${CMAKE_CURRENT_SOURCE_DIR}/simple.ispc \
                           --header-outfile=${PROJECT_BINARY_DIR}/include/simple.h \
                           -o ${PROJECT_BINARY_DIR}/simple.o"
                        DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/simple.ispc
    )

The command outputs the ISPC object file into the project binary directory and the header file an include directory under the project binary directory. The command specifies a dependence to the source file so that it’s rebuilt if it changes and it also specifies the outputs so CMake can figure out the dependency graph. You could use some nicer directories than the binary output directory root.

CMake version 3.19 and above has some nice support for ISPC but it still doesn’t properly target MSVC which is why I’m using this custom command.

To link against the ISPC object files I’m sure there exists multiple methods using CMake. Here I’ll use the CMake add_library function to create a link target. We’ll define a library called ispcLibrary consisting of the ISPC object file and tell the linker how it should be linked against (C-style linking).

add_library(ispcLibrary
        STATIC
        simple.o)
# Set the linker language to C so the linker knows how to link
SET_TARGET_PROPERTIES(
    ispcLibrary
    PROPERTIES
    LINKER_LANGUAGE C 
)

What’s left is defining the main executable and instructing CMake how it should be compiled and linked, telling CMake ispcLibrary should be generated before the main executable and including the header directory we specified that ISPC should output to.

# Add the main program exectuable
add_executable(main)
target_sources(main PRIVATE main.cpp)
add_dependencies(main ispcLibrary) # Needed as dependency as the header file is generated by ISPC
target_include_directories(main PRIVATE ${PROJECT_BINARY_DIR}/include)

# Now create main by linking with the ispc library
target_link_libraries(
    main 
    ispcLibrary
)

I’ll show a compilation under Linux using Clang since I’ve used the precompiled libraries. I’ll also be using the more modern Ninja build system instead of make.

mkdir linux_build
cd linux_build
cmake -GNinja -DCMAKE_CXX_COMPILER=clang++ ..
ninja

Now when I run the main binary the following output is produced

0: simple(0.000000) = 0.000000
1: simple(1.000000) = 1.000000
2: simple(2.000000) = 4.000000
3: simple(3.000000) = 1.732051
4: simple(4.000000) = 2.000000
5: simple(5.000000) = 2.236068
6: simple(6.000000) = 2.449490
7: simple(7.000000) = 2.645751
8: simple(8.000000) = 2.828427
9: simple(9.000000) = 3.000000
10: simple(10.000000) = 3.162278
11: simple(11.000000) = 3.316625
12: simple(12.000000) = 3.464102
13: simple(13.000000) = 3.605551
14: simple(14.000000) = 3.741657
15: simple(15.000000) = 3.872983

Further if I disassemble the simple.o object file compiled against the AVX2 instruction set I can see AVX2 instructions being generated such as vbroadcastss.

objdump -d simple.o
...
19:   c4 e2 7d 18 05 00 00 vbroadcastss 0x0(%rip),%ymm0 # 22 <simple___un_3C_unf_3E_un_3C_unf_3E_uni+0x22>

You can access all relevant source files at my GitHub.