Multi-threading and .Net 4.0

Posted by Berin Loritsch Wed, 07 Jul 2010 13:19:00 GMT

.Net 4.0 has some new features that will make it easier to work with multiple threads. This will in turn make it easier to use up all those cores on your multi-core processors. The usual warnings and caveats apply with ensuring your classes are thread safe. So, what is there to get excited about?

First, .Net finally gets it’s Barrier class. A barrier allows all the threads running in parallel to complete at the same time. Java’s had theirs since Java 5 (more than a couple years old). I find it most useful when I am using multiple threads to do some number crunching, but I need to make sure I’ve incorporated all their work before I go on. The initial purpose is to have the multiple threads sync up before doing the next round of processing. Using multi-threaded ray tracer problem from yesterday, let’s look at the problem it will solve:


ThreadPool.SetMaxThreads(screenWidth,25);

for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        int cx = x;
        int cy = y;

        ThreadPool.QueueUserWorkItem(state =>
        {
            Color c = RenderColor(cx, cy, scene);
            RenderPixelDelegate dl = RenderPixel;
            pictureBox.Invoke(dl, new Object[] { cx, cy, c });
        }, (y+1) * (x+1));
    }
}

Now, the RenderPixel(x,y,color) method takes care of plotting the pixel on the bitmap and telling the screen to redraw. The problem is the screen redrawing. That takes a lot of time, and the whole program performs better when you only redraw when the whole raster line is done. Problem is, you can’t be sure that the raster line is complete just because the last pixel in the row finished when you are computing the pixels in parallel. Truly we only care about the last line. In Java we could poll the ThreadPool to see if there are any remaining workers. Unfortunately .Net doesn’t give us that option. This is where the Barrier comes in to play. By using a Barrier we can ensure all the pixels are rendered before issuing the final redraw command:


ThreadPool.SetMaxThreads(screenWidth,25);
Barrier barrier = new Barrier(screenHeight * screenWidth + 1);

for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        int cx = x;
        int cy = y;

        ThreadPool.QueueUserWorkItem(state =>
        {
            Color c = RenderColor(cx, cy, scene);
            RenderPixelDelegate dl = RenderPixel;
            pictureBox.Invoke(dl, new Object[] { cx, cy, c });
            barrier.SignalAndWait();
        }, (y+1) * (x+1));
    }
}

barrier.SignalAndWait();
barrier.Dispose();

pictureBox.Refresh();

A couple things I’ll point out here. When you want the barrier to force the main thread to wait, you have to include it in the number of Barrier participants. I added a participant for each thread. The Java variation allowed for a thread to signal that it got to the synchronization point but not pause. That’s more useful in a situation like this where we only need to be notified that the thread is done. Having too many open Barriers may have bad effects on the ThreadPool because the worker queue items can’t officially close until all the barriers are synced up. Unfortunately I do not have .Net 4 installed on my machine so I can’t say for sure. It’s my suspicion based on my experience with the Java equivalent.

Another key threading improvement is the Parallel class. In some ways I think it is a step up from the Java Fork/Join Task architecture. The nice thing about the Parallel class is that you can use the Action delegate instead of the WaitCallBack delegate. The WaitCallBack item passes in an object for the thread state, however you rarely need it. The Parallel class will let you invoke the same delegate several times, or perform a parallel loop for you. Essentially, the code block I’ve been using can look like this now:


Barrier barrier = new Barrier(screenHeight * screenWidth + 1);

Parallel.For(0, screenHeight, y =>
{
    Parallel.For(0, screenWidth, x =>
    {
        int cx = x;
        int cy = y;

        Color c = RenderColor(cx, cy, scene);
        RenderPixelDelegate dl = RenderPixel;
        pictureBox.Invoke(dl, new Object[] { cx, cy, c });
        barrier.SignalAndWait();
    }
}

barrier.SignalAndWait();
barrier.Dispose();

pictureBox.Refresh();

Of course, I’m assuming that the Parallel For loop has the same mutable integer problem that I experienced with the delegate BeginInvoke problem. The problem still exists for the ThreadPool worker pool, so there is no reason for me to assume any different here. I like the simplicity and lack of clutter that the Parallel class provides.

Please do note that I chose a highly parallelizable problem to show off these features. Not all complex actions are as responsive. Below is a checklist to determine if a type of problem/algorithm can be ridiculously parallel:

  • No state needs to be shared between threads
  • All state necessary for the function was passed in as parameters
  • There are no reference or output parameters

Ray tracing fits this bill well because each pixel can be computed and plotted completely independent of the other pixels. If we added anti-aliasing to the mix, we would need to create more samples than we display. Web applications also fit this bill well because HTTP is a stateless protocol. Each request is handled independently from the others. There are several other problems that fit this bill.

When you have a set of data that is shared between threads and it has to remain consistent (i.e. race conditions would cause major problems or unstable behavior), you have to be more careful with your threads. Sure you have locks, mutexes, etc. but they essentially turn a multi-threaded application into a single threaded application and carry more overhead than if you never dealt with threads in the first place. By rethinking the problem a bit, you might be able to make the overall solution a bit more friendly to multiple threads. Below are some ways of making room:

  • Copy all data into each thread
    • Avoids synchronizations because the data is being used only in one thread at a time
    • Adds memory overhead and requires efficient and safe copy routines
  • Don’t do micro threads
    • The ThreadPool and Parallel classes handle micro processing needs, reusing threads as necessary
    • The costs outweigh the benefits. Always look for the major points that can be parallelized before looking at smaller items.
  • Understand the goals you have for multi-threading
    • The raw processing time might be shorter if you did everything in one thread
    • You can process more things at once if you have multiple cores, but you won’t gain much from having more threads than processing units (in fact you might lose some)
    • Is it the data or the process that can be run in parallel? Sometimes it’s not recommended to split an algorithm up into multiple threads, but you still might be able to process the data in chunks.
    • Stop if you can’t understand what the code is supposed to be doing. Debugging parallel code is very difficult, but if you have no working mental model for how the code is supposed to work it is impossible.

The trick with multi-threaded programming is minimizing the dependencies between the threads. The less they have to coordinate with each other the more efficiently the threads can use your computer. That’s a lesson from Erlang.

.Net Asynchronous Features Require Some Rethinking

Posted by Berin Loritsch Tue, 06 Jul 2010 17:31:00 GMT

Disclaimer: I’m originally a Java guy, and I know all kinds of asynchronous patterns and approaches that work in the Java world. The .Net approach to multi-threading and asynchronous applications is quite a bit different. In some ways the way that Microsoft approached the problem is more problematic. I stumbled across someone’s code for a small C# ray tracer with a fixed scene. I figured that this would be the perfect place to start playing with asynchronous code. Microsoft likes to hide things from you. I find this a bit problematic while trying to find out what is going wrong. Case and point: my assumptions about the way lambdas and closures ought to work (from my Ruby background) turned out to be wrong. Consider this code snippet:


for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        RenderColorDelegate px = RenderColor;
        px.BeginInvoke(x, y, scene, ar =>
           {
                Color c = px.EndInvoke(ar);
                RenderPixelDelegate dl = RenderPixel;
                pictureBox.Invoke(dl,new Object[]{x,y,c});
            }, null);
    }
}

Before I go into the surprise, let me tel you what this code is doing. This code is launching the rays from the view port one pixel at a time to calculate the color for that pixel. The px.BeginInvoke call is done on a delegate I had to create to alias the method I wanted to make asynchronous. I needed to pass in the variable “x”, “y”, and the scene that was passed in to the Render method. The original code makes use of synchronous calls, and behaves exactly as you would expect. X is always between 0 and one less than screenWidth. Y is always between 0 and one less than screenHeight. Now here is the surprise: when calling the methods asynchronously X can become greater than screenWidth!

Let me state that again in other terms: the values you pass in to the asynchronous method are not the values that end up being used. It behaves as a reference to the integer you pass in, and any call to x++ will increment the value on any asynchronous calls that have not been executed yet. That should alarm you. This can be (and was in this case) the source of serious and difficult to trace errors. With Ruby and Java anonymous classes, the values for x and y are frozen at the moment the asynchronous block is created. However, with .Net it is not. In order to fix the problem we have to copy x and y to new integers strictly for the purpose of passing in to the asynchrnous call:


for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        // Necessary to work around bug
        int cx = x;
        int cy = y;

        RenderColorDelegate px = RenderColor;
        px.BeginInvoke(cx, cy, scene, ar =>
            {
                Color c = px.EndInvoke(ar);
                RenderPixelDelegate dl = RenderPixel;
                pictureBox.Invoke(dl,new Object[]{cx,cy,c});
            }, null);
    }
}

The “pictureBox.Invoke” method should be familiar to .Net and Java UI developers. Essentially screen drawing and refreshing has to be done within the GUI thread. The “Invoke” method makes that happen when you are in another thread. The style of asynchronous call that I used included the callback function. This makes the redraw happen after the calculation is done.

When it comes to more traditional Thread manipulation, Java’s bag of tricks is a bit more complete. That may have changed with .Net 4.0 but I need to stick with 3.5. For example, the Future construct is really what I need. A Future is basically the promise of a value for when you need it later. It allows you to start processing a complex algorithm before you need the result, and pause for the result only when you really need it. It works for lazy initialization, but it’s more effective when you have a number of complex calculations you need to merge together. Java also has some good thread pool support, tasks, and other features. The Fork/Join approach in Java7 is also a very effective way to divvy up work. .Net 4.0 may have something similar to that, but both of those are taboo for new work for the time being.

All in all, I only scratched the surface. The delegate BeginInvoke() approach was designed for asynchronous event processing. It wasn’t really designed for more serious multi-threading. In fact, the way I used it for the ray tracer was the wrong tool for the job.

The bottom line is that I have to change the way I think about multi-threaded application design in the .Net platform. Some of the principles I’ve picked up from other languages still convey—but there’s a new twist on them. It’s to be expected. However, do be cautious of the delegate problem I ran into. It’s not good to have your values change underneath of you.