Nothin' Like Good POO

Posted by Berin Loritsch Fri, 27 Aug 2010 00:24:00 GMT

In a reaction to the whole fad of using components for everything, the Java folks came up with a back to grass roots movement called Plain Old Java Objects (POJO). That’s all fine and good, but Java’s not the only language with objects. So in this article I’m writing about Plain Old Objects, or POO for short.

Fragile POO and Testing Foo

Writing good unit tests is it’s own challenge. In order to have confidence that you are testing what you believe you are testing, you really need to isolate the code. I don’t know about you, but most objects I write use other objects. In fact, you’ll find it fairly common to see object trees in your application. That’s true if you are creating a web application that manages people’s information or if you are writing games. There’s probably a lot of ways to create the tree structure, but there are a few things that will cause a fragile architecture and easily broken tests.

Let’s say you have some code that needs to make decisions based on when a user first opened their account. Now, to make this difficult, the code is separated by a couple intermediate objects from the user account object. The fragile approach would be to chain the calls back to the user account. For example:


if ( Owner.Owner.Account.StartDate < DecisionDate )
{
    DoSomethingSpecial();
}

You can substitute code from your own language of choice if you like. So what’s so bad about this approach? The problem is we are making some assumptions that may not always hold true. Let’s say you got a Null Pointer/Reference exception in your if statement. Where do you look first? It could be that your object doesn’t have an owner. That wouldn’t be uncommon in a unit test environment. It could be your object’s owner doesn’t have an owner. Easy to miss in unit testing. Or it could be that there is no account. Perhaps it’s not the account but there is no date and the comparison is throwing the exception. That’s four things that can go wrong just on the left side of the comparison statement.

Degrees of Separation

So, we still need to make a decision on the user’s account start date. If we have corrupt data, or we are migrating accounts from one back end server to another, our code might break suddenly. If we were to be truly defensive, we would need a very complex null checking condition before we made this simple check. In the case we outlined above, we have three degrees of separation from the information we need. Really, what we need is direct access to the account.

You should have no more than one degree of separation from any information you need.

Something the component folks got right, is that by definition every component has direct access to the information it needs. The container, or controlling object takes care of that for us. The mechanisms to get at that information vary between the component systems, and can even be more complex than necessary.

If the Account object is central to your application, and multiple objects need to access it, you have a couple choices. You can make the Account static and centrally accessible. That’s great for easy access, and might even work if you can guarantee only one user’s account will ever be needed at one time. It’s a solution that would work for a desktop application with a single document interface. If you work on the web, or you want to let your users have more than one document open at a time, a centrally accessible static object will introduce other fragilities that are not worth the trouble.

Another solution would be to provide a lookup interface to find the right Account. This is the solution that some component frameworks use (ahem. JNDI, Avalon, etc.). A pro here is that you can do some cool things to make sure you get the right account. It allows you the flexibility of serving some accounts from LDAP while some are in a database. Problem is, we have left the world of POO and introduced a lot of complexity we probably just don’t need.

A better solution would be the judicious use of the Hollywood Principle. “Don’t call us, we’ll call you.” Some of the POJO (or POO) proponents will object, exclaiming that’s exactly the thing we wanted to avoid. Component frameworks like Spring, Castle, PicoContainer use this approach extensively. The only thing that the component frameworks do for you is to automate how the component objects are assigned to each other. We don’t need to go all the way down the path of using a component architecture. All we need to do is ensure that all children objects are assigned the same Account object that it has.

Now we have one degree of separation. We write the test to make sure children objects get assigned the account object that the parent has. We write the test to make sure the child object can use the account properly. This makes it easier to mock out the Account object and simplify our test setup.

In our tests, we should only ever have to assign the object(s) that will be used directly by the code we are testing. We should never have to create a whole hierarchy of objects for code to work. When something goes wrong with your object chain, the only way to find out exactly what it is will be to break out the debugger and step through your program. Any change to any object in the chain can potentially break the code you are writing. That is why we only want one degree of separation between any information we need and the object we are writing.

Say it with me, Singletons are Evil

I’m sure we can all point to a situation where having a singleton was very useful, and made code a lot easier. If we are honest with ourselves, we’ll realize the number of times it worked we can count on one hand, or maybe even one finger. The problem isn’t necessarily the singleton at the time it was created. The problem has more to do with how code evolves.

When you have an object that is easily accessible from anywhere in your program, more and more code will depend on that object being there. Then the code will depend on the objects that it can get from the singleton. Then you have code that accesses the singleton that would be better off written differently. Over time that singleton becomes a point of contention in your application.

In this day of multi-core processors and the demand for multi-threaded or multi-process applications, having a single object that all other objects access introduces subtle concurrency errors. Even if you handle the concurrency problems well with mutexes and a slick locking mechanism, you introduce bottlenecks and even the risk of deadlock because all your code is accessing this one singleton.

It’s worse if you mix the singleton with the multiple dimensions of separation problem we talked about earlier. Now, instead of proper object oriented design, you are developing a procedural application that mixes the worst from both object programming and procedural programming metaphors.

Singletons also are a sore point with testing. The problem is that if the singleton carries any state whatsoever, the effects of one test will affect other tests. If the tests are not executed in the correct order, they may not pass. There’s nothing more frustrating than trying to figure out why a test provides different results when it is doing the same thing. The code behaves insanely, when the tests expect it to behave the same way every time.

There’s More to Good Architecture

I don’t have the time or space to write out every little thing that can go wrong. The important approach is to simplify your system as much as possible. If you can compose your application of several well contained classes that behave predictably in tests, you stand a better chance of writing a solid application. Every version of your application you release, think to yourself, “Is there anything I can get rid of?” Or “Can I do the same work with less code?”

To me, an impressive mark of a solid application is how little code is needed to do all that it does. The more lines of code, the more opportunities for something to go wrong. The more moving parts, the more unpredictable the application becomes. Pursue simplicity, one step at a time.

Reflectively Yours

Posted by Berin Loritsch Fri, 06 Aug 2010 01:02:00 GMT

I’ve been doing a lot of work with serialization in .Net, which also means I’ve been doing a lot with .Net’s reflection API. I’m sure you may be wondering why we didn’t just use IXmlSerializable or something like that. The long and the short of it is that there were reasons. Mainly it was the fact that we wanted to use the same set of attributes to pull double duty for serialization and for the user interface. I won’t bore you with the details of the serialization approach, the focus of today is something that utterly surprised me.

If you see the code below, do you think it will work?


public class PersonalClass
{
    private string ccInfo = "XXXXXXXXXXXX1234";
}

public class DeviousReflection
{
    public void main(string[] args)
    {
        FieldInfo field = typeof(PersonalClass).GetField("ccInfo",
                      /* binding flags for instance and non-public */);

        PersonalClass instance = new PersonalClass();

        string gotcha = field.GetValue(instance);

        field.SetValue(instance, new Object[] {"Really Devious"});
    }
}

I apologize in advance if I mistyped something from the API, it was from memory. So for the reflection uninitiated, what I’ve done is set up an object with a private member that I shouldn’t be able to touch. Next I set up a separate class that should never have access to the private members of another class to perform some reflection. I looked at the type of the class I wanted to exploit (... um, access …) and got a reference to the private field as described through reflection. I then took an instance of the class I wanted to exploit, and used the reflection to not only get its current value but to set a new value.

Now, if this were the Java VM you would be out of luck. The JVM has been extra vigilant about ensuring proper access to class members both by the compiler and reflectively through the VM. The protections are built in at the lowest level.

We’re talking .Net here, so as long as you can get an instance of the FieldInfo for the private field, you can get and set information in private fields. I even tested this out in classes that weren’t in the same assembly. It still works. I’ll leave the implications this has for you to your imagination. However, this is a really good reason to not use “Magic Values” hard coded in your application. I don’t even have to know the structure of your application. I can iterate through the fields and find something that looks promising.

What this means for my application and the serialization problem is that it really frees the developers up to develop the way they normally would. No need to artificially make something public just because you want to write it to a file. What this means for folks releasing .Net applications for money to the world at large is that code obfuscation tools become really important. An application built for hire is probably OK. There is no real security through obscurity, but at least it makes it a bit harder to find where the important stuff is since everything looks the same.

The More Things Change, the More They Stay the Same

Posted by Berin Loritsch Wed, 21 Jul 2010 23:25:00 GMT

Garbage collection is nothing new, and the Java folks have pretty much proven that garbage collection is a valid solution to memory management problems. Thanks to the JVM, the bar was set pretty high when Microsoft entered the fray with the Common Language Runtime (CLR). Based on some recent observations tracking down some performance issues in a C# application, it turns out all the experience writing Java paid off. So here are the high level observations:

  • Both the JVM and CLR use a variation of mark/sweep and generational garbage collection
  • Both the JVM and CLR will pause program execution during collection processing (even if there are threaded garbage collectors, the program is paused in that thread while collection occurs)
  • Creating a number of small, simple objects will be quickly collected in the first generational sweep
  • Creating a number of large, hierarchical object trees will be more slowly collected in later generation sweeps
  • Later generation sweeps take a long time in comparison to the first generation
  • The CLR garbage collector is more deterministic than the JVM garbage collector

All the bullet points up to the last one should be common sense if you’ve been developing in a garbage collected language for any amount of time. The last one actually surprised me a bit. When I performed some micro benchmarks to confirm my suspicions about the performance characteristics, the number of garbage collections was always the same. The amount of time the process took varied, the amount of memory varied (albeit very little), but the number of garbage collections performed in each test was identical. What this means is that it is a bit easier to perform micro benchmarks in the CLR and have reasonable confidence in the results.

Needless to say, when you are having performance problems and it is not immediately obvious what the problem is, suspect unplanned garbage collection first. I was given two algorithms that did the same thing, but a little differently. The original algorithm was accurate, but slow. The new algorithm was inaccurate but about 4-5 times faster. The question was, can we keep the accuracy but get the speed? So the first thing I did was put together a test harness to ensure that everything was measured the same way. I can’t post my code here, but there are a few things to consider when writing micro benchmarks:

  1. Always provide a warm up period so the environments can perform any optimizations (yes, the CLR does runtime optimizations)
  2. Perform your garbage collection, and wait for all finalizers (there’s two calls on the GC object to do this)
  3. Collect your pre-test measurements: time, memory usage, number of garbage collections in each generation
  4. Perform you test (use several iterations)
  5. Collect your post-test measurements: time, memory usage, number of garbage collections in each generation
  6. Report the difference between the pre-test measurements and the post-test measurements

I was able to document the difference between the two algorithms. The slow algorithm was four times slower, used 2 MB of memory, and caused more than 350 garbage collections—over a hundred of which were in the mature generation collection. Looking at the algorithms the biggest difference that would explain the difference in memory consumption like that was the object being used to hold the data. The faster algorithm used a pair of objects that held the raw data directly encapsulated. The slower algorithm used a matrix library, which held the values in an array of arrays. An array of arrays is a complex object from the viewpoint of the garbage collector. So on a hunch (that Java experience coming to play), I made a copy of the older algorithm that used a simple array.

By just changing the object holding the values, the original accurate algorithm was marginally faster than the newer “optimized” inaccurate one. It caused fewer garbage collections, and they were all first generation collections which are faster. I thought, “this is great, let’s try to do even better”. If the source of the performance drain was related to garbage collection, let’s see if we can eliminate it altogether. The key to making this happen is to use the Flyweight Pattern. Basically, the flyweight pattern says to create a small set of unique objects and then just change the values in them as necessary. In order to make that happen, I had to adjust the original API for the original algorithm to use references to the return object so I can let it reuse the response object. Just to up the ante, I used the original expensive value object to perform this test.

The results had the original algorithm performing twice as fast as the optimized algorithm, using only 8 KB of memory, and 0 garbage collections. All of this just by changing how we are using the objects that are expensive to create. I realize it is not always practical to use the flyweight pattern, but in the times where it can be used the results can be quite impressive. I was able to drop something that took 4 seconds down to 400 milliseconds. Those results are not typical, but it is always worth throwing a couple tests to discover the nature of the problem. If my first hunch didn’t pay off, I would never have tried the flyweight pattern. I would have looked at other avenues of speeding things up.

The bottom line is this: the more pressure you put on your garbage collector by creating and discarding objects in rapid succession, the more your garbage collector will punish you by sucking up all the CPU cycles.

Multi-threading and .Net 4.0

Posted by Berin Loritsch Wed, 07 Jul 2010 13:19:00 GMT

.Net 4.0 has some new features that will make it easier to work with multiple threads. This will in turn make it easier to use up all those cores on your multi-core processors. The usual warnings and caveats apply with ensuring your classes are thread safe. So, what is there to get excited about?

First, .Net finally gets it’s Barrier class. A barrier allows all the threads running in parallel to complete at the same time. Java’s had theirs since Java 5 (more than a couple years old). I find it most useful when I am using multiple threads to do some number crunching, but I need to make sure I’ve incorporated all their work before I go on. The initial purpose is to have the multiple threads sync up before doing the next round of processing. Using multi-threaded ray tracer problem from yesterday, let’s look at the problem it will solve:


ThreadPool.SetMaxThreads(screenWidth,25);

for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        int cx = x;
        int cy = y;

        ThreadPool.QueueUserWorkItem(state =>
        {
            Color c = RenderColor(cx, cy, scene);
            RenderPixelDelegate dl = RenderPixel;
            pictureBox.Invoke(dl, new Object[] { cx, cy, c });
        }, (y+1) * (x+1));
    }
}

Now, the RenderPixel(x,y,color) method takes care of plotting the pixel on the bitmap and telling the screen to redraw. The problem is the screen redrawing. That takes a lot of time, and the whole program performs better when you only redraw when the whole raster line is done. Problem is, you can’t be sure that the raster line is complete just because the last pixel in the row finished when you are computing the pixels in parallel. Truly we only care about the last line. In Java we could poll the ThreadPool to see if there are any remaining workers. Unfortunately .Net doesn’t give us that option. This is where the Barrier comes in to play. By using a Barrier we can ensure all the pixels are rendered before issuing the final redraw command:


ThreadPool.SetMaxThreads(screenWidth,25);
Barrier barrier = new Barrier(screenHeight * screenWidth + 1);

for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        int cx = x;
        int cy = y;

        ThreadPool.QueueUserWorkItem(state =>
        {
            Color c = RenderColor(cx, cy, scene);
            RenderPixelDelegate dl = RenderPixel;
            pictureBox.Invoke(dl, new Object[] { cx, cy, c });
            barrier.SignalAndWait();
        }, (y+1) * (x+1));
    }
}

barrier.SignalAndWait();
barrier.Dispose();

pictureBox.Refresh();

A couple things I’ll point out here. When you want the barrier to force the main thread to wait, you have to include it in the number of Barrier participants. I added a participant for each thread. The Java variation allowed for a thread to signal that it got to the synchronization point but not pause. That’s more useful in a situation like this where we only need to be notified that the thread is done. Having too many open Barriers may have bad effects on the ThreadPool because the worker queue items can’t officially close until all the barriers are synced up. Unfortunately I do not have .Net 4 installed on my machine so I can’t say for sure. It’s my suspicion based on my experience with the Java equivalent.

Another key threading improvement is the Parallel class. In some ways I think it is a step up from the Java Fork/Join Task architecture. The nice thing about the Parallel class is that you can use the Action delegate instead of the WaitCallBack delegate. The WaitCallBack item passes in an object for the thread state, however you rarely need it. The Parallel class will let you invoke the same delegate several times, or perform a parallel loop for you. Essentially, the code block I’ve been using can look like this now:


Barrier barrier = new Barrier(screenHeight * screenWidth + 1);

Parallel.For(0, screenHeight, y =>
{
    Parallel.For(0, screenWidth, x =>
    {
        int cx = x;
        int cy = y;

        Color c = RenderColor(cx, cy, scene);
        RenderPixelDelegate dl = RenderPixel;
        pictureBox.Invoke(dl, new Object[] { cx, cy, c });
        barrier.SignalAndWait();
    }
}

barrier.SignalAndWait();
barrier.Dispose();

pictureBox.Refresh();

Of course, I’m assuming that the Parallel For loop has the same mutable integer problem that I experienced with the delegate BeginInvoke problem. The problem still exists for the ThreadPool worker pool, so there is no reason for me to assume any different here. I like the simplicity and lack of clutter that the Parallel class provides.

Please do note that I chose a highly parallelizable problem to show off these features. Not all complex actions are as responsive. Below is a checklist to determine if a type of problem/algorithm can be ridiculously parallel:

  • No state needs to be shared between threads
  • All state necessary for the function was passed in as parameters
  • There are no reference or output parameters

Ray tracing fits this bill well because each pixel can be computed and plotted completely independent of the other pixels. If we added anti-aliasing to the mix, we would need to create more samples than we display. Web applications also fit this bill well because HTTP is a stateless protocol. Each request is handled independently from the others. There are several other problems that fit this bill.

When you have a set of data that is shared between threads and it has to remain consistent (i.e. race conditions would cause major problems or unstable behavior), you have to be more careful with your threads. Sure you have locks, mutexes, etc. but they essentially turn a multi-threaded application into a single threaded application and carry more overhead than if you never dealt with threads in the first place. By rethinking the problem a bit, you might be able to make the overall solution a bit more friendly to multiple threads. Below are some ways of making room:

  • Copy all data into each thread
    • Avoids synchronizations because the data is being used only in one thread at a time
    • Adds memory overhead and requires efficient and safe copy routines
  • Don’t do micro threads
    • The ThreadPool and Parallel classes handle micro processing needs, reusing threads as necessary
    • The costs outweigh the benefits. Always look for the major points that can be parallelized before looking at smaller items.
  • Understand the goals you have for multi-threading
    • The raw processing time might be shorter if you did everything in one thread
    • You can process more things at once if you have multiple cores, but you won’t gain much from having more threads than processing units (in fact you might lose some)
    • Is it the data or the process that can be run in parallel? Sometimes it’s not recommended to split an algorithm up into multiple threads, but you still might be able to process the data in chunks.
    • Stop if you can’t understand what the code is supposed to be doing. Debugging parallel code is very difficult, but if you have no working mental model for how the code is supposed to work it is impossible.

The trick with multi-threaded programming is minimizing the dependencies between the threads. The less they have to coordinate with each other the more efficiently the threads can use your computer. That’s a lesson from Erlang.

.Net Asynchronous Features Require Some Rethinking

Posted by Berin Loritsch Tue, 06 Jul 2010 17:31:00 GMT

Disclaimer: I’m originally a Java guy, and I know all kinds of asynchronous patterns and approaches that work in the Java world. The .Net approach to multi-threading and asynchronous applications is quite a bit different. In some ways the way that Microsoft approached the problem is more problematic. I stumbled across someone’s code for a small C# ray tracer with a fixed scene. I figured that this would be the perfect place to start playing with asynchronous code. Microsoft likes to hide things from you. I find this a bit problematic while trying to find out what is going wrong. Case and point: my assumptions about the way lambdas and closures ought to work (from my Ruby background) turned out to be wrong. Consider this code snippet:


for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        RenderColorDelegate px = RenderColor;
        px.BeginInvoke(x, y, scene, ar =>
           {
                Color c = px.EndInvoke(ar);
                RenderPixelDelegate dl = RenderPixel;
                pictureBox.Invoke(dl,new Object[]{x,y,c});
            }, null);
    }
}

Before I go into the surprise, let me tel you what this code is doing. This code is launching the rays from the view port one pixel at a time to calculate the color for that pixel. The px.BeginInvoke call is done on a delegate I had to create to alias the method I wanted to make asynchronous. I needed to pass in the variable “x”, “y”, and the scene that was passed in to the Render method. The original code makes use of synchronous calls, and behaves exactly as you would expect. X is always between 0 and one less than screenWidth. Y is always between 0 and one less than screenHeight. Now here is the surprise: when calling the methods asynchronously X can become greater than screenWidth!

Let me state that again in other terms: the values you pass in to the asynchronous method are not the values that end up being used. It behaves as a reference to the integer you pass in, and any call to x++ will increment the value on any asynchronous calls that have not been executed yet. That should alarm you. This can be (and was in this case) the source of serious and difficult to trace errors. With Ruby and Java anonymous classes, the values for x and y are frozen at the moment the asynchronous block is created. However, with .Net it is not. In order to fix the problem we have to copy x and y to new integers strictly for the purpose of passing in to the asynchrnous call:


for (int y = 0; y < screenHeight; y++)
{
    for (int x = 0; x < screenWidth; x++)
    {
        // Necessary to work around bug
        int cx = x;
        int cy = y;

        RenderColorDelegate px = RenderColor;
        px.BeginInvoke(cx, cy, scene, ar =>
            {
                Color c = px.EndInvoke(ar);
                RenderPixelDelegate dl = RenderPixel;
                pictureBox.Invoke(dl,new Object[]{cx,cy,c});
            }, null);
    }
}

The “pictureBox.Invoke” method should be familiar to .Net and Java UI developers. Essentially screen drawing and refreshing has to be done within the GUI thread. The “Invoke” method makes that happen when you are in another thread. The style of asynchronous call that I used included the callback function. This makes the redraw happen after the calculation is done.

When it comes to more traditional Thread manipulation, Java’s bag of tricks is a bit more complete. That may have changed with .Net 4.0 but I need to stick with 3.5. For example, the Future construct is really what I need. A Future is basically the promise of a value for when you need it later. It allows you to start processing a complex algorithm before you need the result, and pause for the result only when you really need it. It works for lazy initialization, but it’s more effective when you have a number of complex calculations you need to merge together. Java also has some good thread pool support, tasks, and other features. The Fork/Join approach in Java7 is also a very effective way to divvy up work. .Net 4.0 may have something similar to that, but both of those are taboo for new work for the time being.

All in all, I only scratched the surface. The delegate BeginInvoke() approach was designed for asynchronous event processing. It wasn’t really designed for more serious multi-threading. In fact, the way I used it for the ray tracer was the wrong tool for the job.

The bottom line is that I have to change the way I think about multi-threaded application design in the .Net platform. Some of the principles I’ve picked up from other languages still convey—but there’s a new twist on them. It’s to be expected. However, do be cautious of the delegate problem I ran into. It’s not good to have your values change underneath of you.

Fun with Delegates, or How to make .Net more like Ruby 2

Posted by Berin Loritsch Thu, 01 Jul 2010 16:53:00 GMT

I’ll admit it, I’m a Ruby fan. There are certain aspects of programming with that language that are really fun. While I don’t have the privilege of working with it every day, I’m working with it enough. With yesterday’s look at the world of LINQ, I decided to play a bit more with delegates and extension methods. For the other Ruby fans out there, it is true that C# is both a statically typed language and those types are locked in at compile time. However, there are ways to make C# behave more like Ruby. Sure you can use the var keyword which just means that the type isn’t decided until the first assignment. But I’m going to look at imitating the way Ruby does looping in C#.

Our example Ruby code that we want to imitate:

list = ["one", "two", "three"]

list.each {|item| puts item}

# Alternately:

list.each do |item|
  puts item
end

Perusing the .Net APIs, your IEnumerable<T> or List<T> objects don’t have an Each() method. Although it’s not difficult to add it after the fact. The process is not hard. The first step is to create an extension method. Extension methods are much like Ruby mixins, and are implemented very similarly. As long as our mixin class in in scope, so is our extension method. Let’s look at how it would look in C#:


string[] list = {"one", "two", "three"};

list.Each( item => Console.WriteLine(item) );

// Alternately:

list.Each(
    delegate(string item)
    {
        Console.WriteLine(item);
    });

The code to implement this really is not difficult to do, but it requires some understanding of the underlying mechanisms that make it possible:


public static class IEnumerableExtensions
{
    public static void Each<T>(this IEnumerable<T> list,
            Action<T> callback)
    {
        foreach(T item in list)
        {
             callback(item);
        }
    }
}

To better understand what’s going on here, I’m going to have to call out a few things. First, the mixin class IEnumerableExtensions must be static. Without the static keyword in front of the class the CLR will not be able to look inside for extension methods. That keyword also tells the compiler that all members of this class will also be static. The class cannot be instantiated in any way. Functionally, it behaves a lot like a Ruby module, which is why I’m calling it a mixin class. Note: the name of the class really doesn’t matter, but convention adds the word “Extensions” to the class or interface we are adding functionality to.

Next, the first argument of an extension method includes the keyword “this” and the type we are extending as the first parameter. Without the keyword “this” we would have just another random static method. Also note, this is a feature that is not possible in Java without bytecode manipulation or dynamic proxy chicanery. Perhaps that will change in the future. For those that used to know how C++ objects worked, the structure here makes a lot of sense. It’s similar to the idiom popular among C programmers to pass the struct being modified as the first parameter. I chose the IEnumberable interface because the same method will work on anything that implements that interface: which is exactly how Ruby attacks the problem. Essentially Ruby has an Enumerable module (mixin) that defines all the extra methods the collection classes can use.

We used generics here to provide the same idioms and levels of type checking that come with the .Net platform. Even when you are mixing in concepts derived from other languages, you should never completely throw away the principles in the language you are using. The generics allow us to use the method preserving all the type safety built into the language without requiring us to write a number of overloads. This helps using the method feel a little more Ruby-like without ignoring C# principles.

Lastly, I want to point out the type Action. It is a delegate defined by the .Net framework, along with its companion “Func”. The only difference between the two is that Action does not return anything and Func does. Instead of creating my own delegates, I simply used what was already available. With the pair of delegates supplied, we can have a lot of fun. For example, let’s say we wanted to derive a list of answers from executing the same function across all the members of a set of values. One example would be to derive the Root Mean Square (RMS) value on a set of numbers. Our extension method would look like this:


public static List<TResult> Collect<T,TResult>(
        this IEnumerable<T> list, Func<T,TResult> callback)
{
    List<TResult> answers = new List<TResult>();

    foreach (T item in list)
    {
        answers.Add(callback(item));
    }

    return answers;
}

The way you would use the “Derive” method we just defined to calculate the RMS would look something like this:


double[] values = {1.0, 4.0, 3.0};

double rms = Math.Sqrt(
    values.Collect(x => x * x).Sum() / values.Count());

Console.WriteLine("RMS is: {0}", rms);

The “Derive” method as it is written will also allow you to do a mass conversion of all the elements in one IEnumerable object into a new list. For example, if you had an array of numbers and you wanted a set of strings it can be done with just one line:


double[] values = {1.0, 4.0, 3.0};

List<string> conversion = values.Derive( item => item.ToString() );

conversion.Each( item => Console.WriteLine(item) );

Your creativity is bounded only by your imagination.

Language Integrated Query has definite implications for Testing

Posted by Berin Loritsch Wed, 30 Jun 2010 15:49:00 GMT

I know I just talked about the conceptual complexity of C# and Java, and C#’s Language Integrated Query (LINQ) is another example of why that is. However, it provides a feature that can potentially aid in properly unit testing database bound objects. The way Linq is implemented, it works on anything that is IEnumerable. That means you can have an List of objects or a database bound collection of objects and your interface to query the contents is identical. Put in other terms, you can write your queries once and they will function the same regardless of the back end implementation. Technically speaking, that means you can swap out databases fairly easily.

So, how does this work? You might ask (esp. if you are a Java developer). The book Professional C# 2008 from Wrox publishing does a good job of walking you through the transformation. Essentially, the query is performed in the object space rather than in the back end space which may not support queries anyway. First lets look at Java’s file filtering API:


File[] files = File.listFiles(new FileFilter() {
    public boolean accept(File file) {
        return file.getName().contains("like_it");
    }
});

This helps set the stage, because it’s the same principle. From here on out we are using C#. The equivalent C# code uses something called delegates which in some ways resemble Ruby’s code blocks you can pass in to certain functions. The starting code looks like this:


List<Widget> widgets = new List<Widget>(Repository.GetWidgets());
List<Widget> filteredWidgets = widgets.FindAll(
    delegate(Widget w)
    {
        return w.Name.Contains("like_it");
    });
filteredWidgets.Sort(
    delegate(Widget w1, Widget w2)
    {
        return w2.Name.CompareTo(w1.Name);
    });

As you can see it’s the same principle. Instead of anonymous inner class mess with Java, you are using delegates in C#. Very little difference here. Apparently C# has something called extension methods which I can only imagine they got the inspiration from Ruby or something similar (yes, Ruby is one of my favorite languages). Essentially the extension methods let you create static methods that accept the object you want to extend and lets you work with the public interface for that object. The twist is that you can call the extension method on your object as if it were part of the original API. This is how the core of LINQ is implemented. LINQ adds the following extension methods for the IEnumerable interface:

  • Where
  • OrderBy
  • OrderByDescending
  • Select
  • ... and more …

Using the extension interfaces our code sample gets rewritten as:


IEnumerable<Widget> filteredWidgets = Repository.GetWidgets()
    .Where(
        delegate(Widget w)
        {
            return w.Name.Contains("like_it");
        })
    .OrderByDescending(
        delegate(Widget w)
        {
            return w.Name;
        })
    .Select(
        delegate(Widget w)
        {
            return w;
        });

It’s further refined by Lambda expressions. Lambda expressions have existed in venerable languages like Smalltalk and Lisp who claim the origin of all useful language features (tongue in cheek here). They also exist in Ruby. Essentially, they let you rewrite the whole delegate clause to take up much less space and read more elegantly:


IEnumerable<Widget> filteredWidgets = Repository.GetWidgets()
    .Where( w => w.Name.Contains("like_it") )
    .OrderByDescending( w => w.Name )
    .Select( w => w );

This is further simplified using new keywords. To the best of my knowledge the keywords map back to the extension methods with lambda expressions. I’m not positive about that. However, there is a key distinction here. Unlike the work we’ve done thus far, a LINQ query is nothing but an execution plan. Similar in concept to how your database server will take your query and create a plan for it. The query plan is not executed until you start iterating over it. The final results of the LINQ query are below:


var query = from w in Repository.GetWidgets()
            where w.Name.Contains("like_it");
            orderby w.Name descending
            select w;

OK. So let’s be straight about this. The concept of LINQ is pretty cool. It also helps simplify some otherwise complex filtering you would have to do on your objects. However, it appears to be strictly run in the object space. I can’t be certain because I haven’t read up on DataSource bound LINQ queries yet. There are some analogs to Hibernate’s HQL. If all the processing is done within the object space you run the risk of having very long running queries when dealing with large data sets. LINQ works with any object implementing IEnumerable, so there is nothing that says you have to keep all objects resident in memory. I’m pretty sure that the DataSource only loads what is needed at one time. Since the query is run as you iterate over it, you don’t run the risk of running out of memory when your query delivers a large result set.

For most CRUD based applications (like blogs, etc.) LINQ will work pretty well—particularly if you are dealing with limited result sets. However, if you deal with very large result sets involving joins you may have some performance problems. Those are the types of things that databases were designed to handle well. Like I said, I don’t know if there is a mapping to the underlying database querying through LINQ if it is working on DataSets. It’s still a potential performance problem that has plagued other languages and frameworks doing something similar.

All in all, I think LINQ has been done in a pretty slick way. What it allows you to do is focus on the logic of the query and how it is supposed to work for you. It is data agnostic, so it doesn’t care if it is working with lists, sets, databases, or a custom object that implements the IEnumerable interface. That really helps out when you need to mock up some base data to run your queries against in your unit tests. The fact that you can use this process with any list also opens up some unique possibilities for dealing with lists of data on screen. Hope you enjoyed this intro to a C# topic that piqued my interest.

Java and C# suffer from the same ailment

Posted by Berin Loritsch Tue, 29 Jun 2010 16:46:00 GMT

I have an interest in language design, even though I have no direct outlet for it at the moment. So as I’ve been contemplating what I like and what I don’t like about the languages I have been exposed to, I’ve realized that both Java and C# are suffering from the same core ailment. That ailment is the conceptual complexity underlying these platforms. I have to say platform because both Java and C# use a virtual machine that has been used to host other languages as well. C# without the CLR is like Java without the JVM: useless. This is in stark contrast to the almost sublime conceptual simplicity of Lisp, Smalltalk, and even Ruby.

Both C# and Java have bolted on several different features to deal with the underlying complexities, much like the English language has imported words from several different languages. English, technically a Germanic language borrows significantly from Romantic languages like Latin, and even some Greek. We won’t mention some import words from vastly different languages like Japanese (kimono, karaoke, katana, kanji). So it is with Java and C#. A short list of concepts shared by both languages include:

  • Autoboxing
  • Attributes/Annotations
  • Dynamic binding (.Net 4.0 has a DLR and Java 7 has new JVM opcodes for this purpose)
  • For each style iterating
  • API document generation
  • and more…

The problem isn’t so much the features in and of themselves. The problem is more subtle than that. In order to deal with the complexity of the language itself, these features are necessary. In some ways, a language like Lisp has conceptual appeal, even though its syntax is hard to wrap your head around. If everything is a list, from parameters passed in to a function to data values, and the language is built around set theory, it maps pretty well to a discipline of math. Heck, with Lisp a function is just a list of operations. Although perhaps in some ways Lisp is too conceptually simple.

The problem I’m getting at is being able to form a reasonable hypothesis of how the software is addressing your problems. I remember reading a PR piece on how Java was better than C# that had a small snippet of code asking how many method invocations there were. The two or three line snippet actually ended up invoking an unexpectedly large number of methods, from attribute accessors to delegates and some other magic. The intent of the developer was clear, although the impact of the code was unexpectedly complex. That’s not to say that C# is bad. The article was a PR piece to help Java developers still feel good about themselves. However, Java is just as guilty. Have you ever tried to debug dynamic proxy code? Have you worked with features that injected functionality into your code for you (Spring/Hibernate comes to mind)?

Other than the general second law of thermodynamics, what is it that drives languages to be more complex? Rather than truly seeking simplicity, both Java and C# have progressively moved toward sweeping the inherent complexity under the rug. Essentially moving the problem from something the developer has to worry about to something the platform has to worry about. To paraphrase my wife’s favorite movie:

There are three kinds of pipe. You have nickel, and you can see where that’s gotten you. You have bronze, which is very good… until something goes wrong. Something always goes wrong. And then you have copper, which is the only kind I use. from Moonstruck

Programming is a complex process. Translating the sometimes conflicting desires of a human into something a computer can understand is not easy. That complexity is further compounded by the moving parts we need to work together to accomplish our goals. My goal in exploring the world of language design is to find the right path for true simplicity. While we are approaching on that ideal from different programming paradigms, we haven’t quite reached the ideal yet. It feels like we live in a world where there is only nickel and bronze, and copper has yet to be discovered. I’m not the only one thinking about this for sure.

.NET Culture Shock 19

Posted by Berin Loritsch Fri, 25 Jun 2010 12:29:00 GMT

In my transition to learning C#/.NET I’ve run into what is my biggest hurdle: culture shock. The technology behind Java and the JVM and the technology behind C# and the CLR are becoming more similar than different. However the culture behind the technologies are like night and day, oil and water. I supposed if we were going to liken it to eras gone by, the Java culture would be more like the 70s hippie culture and .NET would be like the 80s yuppie culture.

One of the things I liked about Java was the share and share alike mentality. There are thousands of open source projects in Java, many of them with free integrations into your IDE. If you needed help with anything, there was someone with a clue that could help you—and they would. Much of what I expect out of an IDE came from the Java world. The concept of refactorings included and automated in the IDE was a major breakthrough, and now no Java IDE wanting to be taken seriously can exclude that feature. Thank you JetBrains for introducing the world to the way it should be. When .NET first came on the scene I don’t think Microsoft was ready in this regard. When the MS peddler came to my company I asked about refactoring tools in Visual Studio (this was around 2003), and the guy looked at me sheepishly. “But with .NET you can design one interface and use it on the web or on a desktop…” he fumbled. When I said I don’t do that every day, and I need something that helps me do my job better every day he wrote it down. I don’t think I was the only one to raise that objection because by the next release of Visual Studio they had the beginnings of refactoring tools included.

The Java culture works well with venerable organizations like the Apache Software Foundation. In some ways the Java culture mirrored the meritocracy already ingrained at the ASF. However, the one thing that hurt Java in the long run is also the one thing that made it better as a first language to learn. It’s that just about every major infrastructure piece has been freely distributed under open source projects. Sure, you get what you pay for in terms of set up and configuration, but even that got better with time. The free aspect is what undermined the ability of companies to make money. Why spend tens of thousands of dollars on a license when the free option was there? Just put an intern on it and it will cost less than the commercial option. Of course, that meant that it was equally easy to play with these tools yourself and make yourself more valuable to the company.

The .NET culture is a pay and share alike mentality. While there are a number of open source projects, they are fewer and farther between. Even so, plugins we expect to be included with the IDE like JUnit integration or ANT/Maven/Boost integration for builds either are non-existent or require you to pay a hefty fee. There’s an NUnit for .NET projects, and by the looks of it is a bit better off than the JUnit 4 equivalent. However, unlike JUnit 4, Visual Studio doesn’t have a plugin for it. In fact, it never will because Microsoft has its own testing features it’s wanting to push. You can incorporate NUnit, but it’s a bit more involved. Or you can use TestDriven.NET, which is not cheap.

The pay and share alike culture extends to the community surrounding .NET as well. When you need help, it’s hard to find what you want online. The only people volunteering free advice shouldn’t, and I’m not convinced that I’d be getting my money’s worth if I paid someone either. While the MSDN has gotten better, it still has a long ways to go before it is truly usable. Part of the problem is that it is so big, trying to find the answer you need is quite difficult. Even when you find it, there’s rarely enough depth to be able to put it to good use. Many .NET books are well in excess of 1000 pages. Rather than focusing on one corner of the technology and bringing the user through the process of solving a problem, the books focus on comprehensively covering all of the API and assume that the reader has more knowledge than they do. Or they are written down to a third grader, and the happy medium is no where to be found. I’m sure the Head First book is good, as they usually are.

Bottom line is that there are obstacles in the .NET world that impede self learning. It’s not insurmountable, and many of the obstacles are cultural in nature. The Job market for Java still doubles .NET, but enough key customers insist on the technology you can’t completely ignore it. Additionally, any new languages you learn can open your eyes to new and better ways to solve problems. I’m still hunting for my go-to resources in the .NET space. It’s going to be an interesting ride, and I’m actually looking forward to it.

What's Black and Blue and Eats Kids for Breakfast? 1

Posted by Berin Loritsch Thu, 24 Jun 2010 13:10:00 GMT

That would be the beast under my desk I built over the past couple months. Allow me to indulge for a moment. This is the first time I’ve had a machine that was better than you could get from a retailer. I also have the satisfaction of building this thing with my own hands. It’s running 6 AMD cores at over 4GHz each, has 8GB of fast RAM, Windows 7×64 Pro, and a few other goodies to boot. My thoughts are organized into the following sections: overclocking, unexpected effects of screen size, and why did I do all this?

Thoughts on Overclocking

I’ll admit my complete ignorance of many of the finer details of overclocking. These days motherboards have made this a fairly trivial task, as long as you stay within operating parameters. Many motherboards have automatic overclocking modes, with varying levels of success. I’ve read people bragging about getting their AMD Phenom II 1090T Thuban x6 processors up to about 4.4GHz with air cooling. More commonly, the numbers were around 4.1GHz. I found that for me, cooling was not a major problem. It was voltage sag. I’ve got an MSI motherboard, which doesn’t have as many power phases as the competing Asus motherboard. As a result I have to race the CPU voltage higher than spec to compensate for the voltage sag of using the CPU above it’s rated speed. Anyone will tell you that this is potentially dangerous. It could render the $300 chip dead if you are not careful, so easy does it on pushing the voltage up. That said, $300 is a price point I can risk for some additional speed that you just can’t buy.

I currently have my front side bus at 240MHz, which pushes the CPU to about 4,080Hz and the RAM bus to about 1,600Hz. My RAM is rated for 1600Hz with memory timings of 6-8-6-24-1T (which was Greek to me until I started reading up on it). Basically, the RAM is still operating within spec on my overclock. I also didn’t have to slow down the timings to compensate for system stability. I’m not willing to push both the RAM and the CPU, but it’s nice to have some fast RAM. When I ran the Windows Experience test, I hadn’t sped up the RAM yet. At its last test I got a score of about 7.6 (out of a possible 7.9) for that subsystem.

Overall, I did a good job balancing the components on this build. The Windows Experience ratings for the different subsystems were all within a couple tenths of a point of each other. The lone straggler was the IBM SSD boot drive that got a rating of about 7.3. It’s lousy write performance hurt the score a bit, but it’s a boot drive so I’m not complaining.

Screen Resolution has Many Side Effects

I’m sure part of the reason my Windows Experience rating for the GPU was as high as it was has to do with the fact that I’m running at 1280×1024. When I originally purchased that monitor, it was about the best you could do without spending ungodly amounts of money. Of course, video cards these days look at that resolution and laugh. Essentially the video cards don’t see any distinction until you get up into the 1080p high def range.

Something that I wasn’t expecting was the increased sensitivity of my mouse. I purchased a Razer Lachesis gaming mouse with up to 4,000 DPI sensitivity. That means every inch the mouse travels the cursor travels 4,000 pixels. When you only have 1280×1024 pixels on screen, that’s way more distance than is called for. A slight bump of the mouse will send the cursor off the screen. I had to lower the sensitivity to a usable level. It’s good to know that I have the extra there, just in case I decide to push a stack of monitors as one screen.

Why Go Through All the Trouble?

Honestly, I’m tired of complaining about my machine. I’ve been running underpowered gear for a good long time. I haven’t been able to enjoy PC gaming in the least due to my mediocre gear. Nor have I been happy with the screen real estate for programming and other activities. So I decided to treat myself. Overall I spent a little over $2100 USD on this machine, including the fancy keyboard, mouse, and the full Windows 7 Pro install. The price is right about what I wanted to pay, which is a good thing. I still have a few more accessories to get, so when all is said and done I’ll probably add another $700 to the rig. I need a new monitor (at least 1080p), a second video card to keep up with the increased resolution, and MS Office.

All that aside, I do have some experiments to try out, and some new programming ventures I want to expand into. With a rig like this, I have something powerful enough to keep up, and enable some more esoteric things like using your GPU for calculations. Imagine not being limited to just what your CPU can push. I’m looking forward to my new endeavors.

Older posts: 1 2 3 ... 13