The massive scalability of modern GPUs has led to many things, from Bitcoin mining to folding@home, seti@home and various other general purpose tasks that involve huge amounts of number crunching. While not as flexible as CPUs, modern GPUs can be hugely scalable thanks to the massive number of stream processors they contain. I've long been impressed by the technology, and long known that to implement something using it you can use CUDA and/or OpenCL. For about the same amount of time there have been low level Java wrappers too that mean you can execute OpenCL without resorting to native languages - even if it meant you had to write the OpenCL directly, as well as huge amounts of boilerplate code to interface with it. Of course, if you wanted your application to work in the real world, a completely separate load of code for it to fall back on if the device didn't support OpenCL was needed too. For about the same amount of time, I've thought that what we really need is a ...