FP10: Generic Number Crunching Via Pixel Bender And ShaderJob
Though the use of Pixel Bender kernels is mainly designed for carrying out graphics-related tasks, it has another powerful use - generic number crunching. Since Pixel Bender kernels and ActionScript run on separate threads, you can use this to your advantage to a.) avoid sluggish UI performance and b.) increase the speed in which complex calculations are made. For instance, let's say that you have a large collection of numbers and you need to loop through the collection and perform a series of mathematical tasks to each one (for a 3D or audio engine perhaps). Passing the collection to a Pixel Bender kernel for processing can result in some pretty massive performance gains, even over heavily optimized ActionScript code.
Below is a very simple example of a Pixel Bender kernel which is designed to accept a collection of numbers and return a new collection that contains the corresponding square root of each number.
-
<languageVersion : 1.0;>
-
-
kernel NumberCruncher
-
<
-
namespace : "AIF";
-
vendor : "Ryan Taylor";
-
version : 1;
-
description : "Basic example of a generic number cruncher.";
-
>
-
{
-
input image1 src;
-
output pixel3 result;
-
-
void evaluatePixel()
-
{
-
pixel1 value = pixel1(sqrt(sample(src, outCoord())));
-
result = pixel3(value, 0.0, 0.0);
-
}
-
}
First of all, note that the result is being passed back as a three-element vector of pixel values. The current release of the Pixel Bender Toolkit throws an error and refuses to export byte code when the output has less than three channels. The Pixel Bender spec claims that it supports output types of pixel1, pixel2, pixel3, and pixel4, so I am assuming this is a bug. To temporarily get around this, I am simply using pixel3 and passing back values of 0 for the other two elements. Also note that the input type needs to be image1.
On the ActionScript side of things, the workflow is as follows:
-
1.) Create a 'ByteArray' and use the 'writeFloat' method to add each number that you would like to pass through the Pixel Bender kernel for processing. Make sure that the endian setting is set to 'little endian'. Documentation also states that a 'Vector.<Number>' collection can be used as the shader input, however it appears that there is currently a bug with this functionality (it throws an error saying that the parameter is invalid).
2.) Create a 'Shader' instance and pass it the Pixel Bender byte code. The width of the input needs to be set to the length of the collection that you are passing. In the case of a ByteArray, you need to divide the length by four since each float increases the length by four instead of one. The height of the input should be set to one. Last, but not least, the input's input needs to be set to the collection.
3.) Create a 'ByteArray' for storing the shader's resulting output. Again, make sure that the endian setting is set to 'little endian'.
4.) Create a 'ShaderJob' instance and pass it the shader, input 'ByteArray', and the width and height values that you set for the 'Shader'. Add an event listener that listens for the 'complete' event so that you can access the output 'ByteArray' once the shader has finished processing. Lastly, you must call the 'start' method to execute the shader.
Here is a basic example of this workflow in action:
-
package
-
{
-
import flash.display.Shader;
-
import flash.display.ShaderJob;
-
import flash.display.Sprite;
-
import flash.events.Event;
-
import flash.utils.ByteArray;
-
import flash.utils.Endian;
-
-
public class Main extends Sprite
-
{
-
protected var _shader:Shader;
-
-
protected var _shaderJob:ShaderJob;
-
-
protected var _input:ByteArray;
-
-
protected var _output:ByteArray;
-
-
[Embed(source="/../assets/filters/NumberCruncher.pbj", mimeType="application/octet-stream")]
-
protected var NumberCruncher:Class;
-
-
public function Main()
-
{
-
init();
-
}
-
-
protected function init():void
-
{
-
_input = new ByteArray();
-
_input.endian = Endian.LITTLE_ENDIAN;
-
_input.writeFloat(4);
-
_input.writeFloat(16);
-
_input.writeFloat(100);
-
_input.writeFloat(400);
-
_input.position = 0;
-
-
var width:int = _input.length >> 2;
-
var height:int = 1;
-
-
_shader = new Shader(new NumberCruncher());
-
_shader.data.src.width = width;
-
_shader.data.src.height = height;
-
_shader.data.src.input = _input;
-
-
_output = new ByteArray();
-
_output.endian = Endian.LITTLE_ENDIAN;
-
-
_shaderJob = new ShaderJob(_shader, _output, width, height);
-
_shaderJob.addEventListener(Event.COMPLETE, onShaderJobComplete, false, 0, true);
-
_shaderJob.start();
-
}
-
-
protected function onShaderJobComplete(event:Event):void
-
{
-
_output.position = 0;
-
-
var length:int = _output.length;
-
-
for(var i:int = 0; i < length; i += 4)
-
{
-
var output:Number = _output.readFloat();
-
-
if(i % 3 == 0)
-
trace("value -> " + output);
-
}
-
}
-
}
-
}
In order to work around that three channel output bug that I mentioned earlier, I am filtering out the values that are irrelevant using a modulus operation in the 'onShaderJobComplete' event handler. The resulting values should each be the square root of the original value that was written to the input 'ByteArray'.
So that is pretty much it. As you can imagine, once the little bugs are worked out, this is going to be extremely useful stuff.
16 Comments so far
Leave a reply

Hey Ryan,
Fascinating stuff. So can you speak more to some possible scenarios where this could be used. For example, if someone were to re-factor the Papervision3d libraries to use this process would we see a lot higher ceiling of capabilities out of Papervision3d?
Hey Tony,
Yes and no. As I mentioned in the post, there is performance to be gained by having your complex algorithms crunching numbers on a different thread than your ActionScript. The use of this technique along with the new drawing API and 3D utilities is definitely going to improve 3D libraries such as Papervision3D in terms of both performance and texture quality. The new Matrix3D class has API elements such as decompose and rawData which allow you to access values in the form of a Vector of Numbers. In this form, you can then pass the Vector to a Pixel Bender kernel for processing. Once processing is completed, you can then use the resulting Vector with the Utils3D.projectVectors method to output a Vector of 2D coordinates and uvtData with the proper T values for use with the Graphics.drawTriangles method. As you can see, everything has been thought out for a smooth 3D workflow.
That being said, this doesn't change the fact that a 3D engine has a lot of DisplayObjects on screen at any given time, and Flash Player can only handle so much. For that reason, I wouldn't go as far as saying that the ceiling is going to be a lot higher, but there will certainly be some noticeable improvements.
An audio application may be one of the best examples for use of this technique. When applying an effect to a series of audio bits, each bit needs to be passed through the effect's algorithm. This technique lends itself nicely to handling exactly that and will also prevent the UI from locking up while the processing is taking place.
In general, anytime that you find yourself doing some heavy computing in ActionScript, you should consider the use of this technique.
Cool article.
I've found a better workaround that removes wasting of 2 of the three values of the pixel3 result. I have been using matched types for both input and output. I used image4 as my input and float4 as my output. This implies that each evaluatePixel call consumes 4 floats out of the ByteArray each pass, and writes 4 back out. What it means is you have to provide your source data in chunks of 4 floats. In exvaluatePixel, you can either access each component of the input (no loops, so this sucks), or work on it as a whole. For instance, in your example above, if your input type were image4, you could apply sqrt directly to 4 values at a time! I'll post some code on my blog soon.
I also posted this on the FP10 forum. You might be interested
Hey spender,
I did in fact toy with that idea around the time that I originally posted this article; however I decided to stick with the handling of excess data in ActionScript. My reasoning for this is solely based around the hope that the limitation will be removed from the toolkit by it's final release and then the workaround in ActionScript can simply be removed as well. Both approaches have their pros and cons, so it is just mainly a matter of preference until the toolkit supports pixel1 and pixel2 output types.
well got it workin without the excess. i used image4 as inputs and pixel4 as output
I used vectors as input from flash, and vector as the target
the key is to setup the width on the shader input to the length of the vector divided by 4, and same for the target width on the shader job
height should be 1
when you read a pixel4 via sample() the individual values are available as r,g,b,a which represent 4 values from your input vector
brain not hurt
scratch all that, brain still hurt
ok so I did get this working with image3 and float3. if interested email me for source aqueen@gmail.com
So I tried your example but instead of using image1, I used image4 and instead of adding 4 floats, I added 16*4. If you set the width to 16 and the height to 1, the code actually does not work. Somehow Pixel Bender only samples the first 16 and the last 16 values (or the 4 pixels and the last 4 pixels). Have you run into this type of issue?
Pixel bender is good when u wana apply the same calculation to all the numbers, most of the times a calculation depends on the answer of a previous calculation. Doing this will pixel bender means loads of trips to shader, back and forth.
what about using C/C++ for number crunching, together with alchemy?
Cool: image4 / pixel4 also works.
sqrt works on all channels (4 in this case).
This means that the input should be quadruples of floats.
=> 4 floats : width = 1
=> 8 floats : width = 2
etc.
input image4 src;
output pixel4 dst;
void
evaluatePixel()
{
pixel4 value = sqrt(sample(src, outCoord()));
dst = value;
}
01. For all of us who come later, thanks to everyone for the original and subsequent posts which help us incrementally learn what we need to know in order to successfully run a ShaderJob.
02. Using input and output variables which are vectors of four floating point numbers, can anyone definitively answer the question: "What appears to be the thread dispatching model used inside the black box?"
03. I thought I had read somewhere that it was based [in addition to the number of cores it finds itself to be running on] on 'rows', so I was surprised that all of you -- despite experimenting with different widths always recommended a width of 1. My first tests, therefore, were to extend your helpful examples to cases of width = 2 to width = 4. Actually [on a 4 core box running XP Pro], there is so much overhead in startup/teardown even in a test of 640 square roots, I cannot positively 'see' what is happening. The granularity of TaskManager is not good enough. Basically, no matter which test case I run, the length of time to complete the job is consistently between 20 and 28 milliseconds. TaskManager does seem to report a slightly higher aggregate CPU usage if I increase height, but the difference is only from 4% to 6%. It appears, in every case, that the CPU load is distributed across all four cores, but what can you tell in 20 ms?
04. No, this is not intended as a 'trick question', I really would like to understand what is happening. It may be that the degree of parallelism is NOT related to height. It seems very unlikely, but the parallelization may depend on the input/output data type. Image1 versus Image4, does cause sqrt() to be invoked four times -- is that sequential or in parallel at that level?
Wow, it is good thing this site is moderated, maybe you will be able to fix the juxtaposition of width and height in my post.
I meant to say that because all the examples kept height at 1, while varying width, I purposely tested heights of 2 and 4.
Awesome tutorial, I'm loving pixel bender xD
I also loved the talk at Adobe MAX, brilliant stuff. How would I go about passing in float3s? I'm trying to get a pixel bender to transform some 3D vectors, I'm passing in my matrix3d as a parameter, and then sending in my vertices as a Vector. and input type image3. It's tending to either lock up or spew errors... Haven't found any examples on the web of crunching more than 1 number at once.
Hi,
How to porperly create the pixer blender file,
I am getting runtime error image depth mismatch,while running the pbk file. how to correct it?
Yes I got it. Thank you for your great example.