Intro to Programming with Ruby

Posted by Berin Loritsch Mon, 28 Jul 2008 12:12:00 GMT

First of all, Ruby is a programming language that has become pretty popular with the advent of Ruby on Rails . That said, Ruby is a great first programming language to learn. I won’t bore you with it’s history, even though the language has been in existence for over a decade. For the summary about Ruby, check out http://www.ruby-lang.org/en/about/ and also look at the different tutorials. This tutorial is meant for people who don’t know squat about programming. If it’s confusing at all, let me know in the comments.

Before we delve into objects and things like that, let’s consider what happens when we have to use Ruby to help with other tasks. Those tasks can be deployment scripts, test support, etc. One of the most important things you will learn has nothing to do with actually making stuff work. It’s the comment. Basically you want to leave notes for yourself so that you can get back into the swing of things after you leave something alone for a while. To do that with Ruby, all you need is the ’#’ symbol. It marks the beginning of a comment, and the comment is over at the end of the line.

#
# A block of comments looks like this, with a '#' symbol
# at the beginning of every line.  Comments are supposed
# to help you remember things later on.
#

Start with good commenting habits, and keep them up. Some things are pretty self explanatory so make sure you document the big things and not what every line is doing. Remember that these are clues you leave to yourself, or anyone else who will work with the code about what’s going on.

Now, Ruby is an object oriented language, but before I get into making objects let’s look at what they can do for you. In Ruby, everything is an object—your numbers, ranges, collections, etc. To help wrap your brain around objects, think of them as “things”.

Objects are things you can tell what to do.

Let’s say you need something to be done five times. The number five is an object, which means you can tell it to do something for you. Core Ruby Doc has all the standard objects that are part of the language, but it can be a little overwhelming at the beginning. First of all, the number five is an Integer (there are no decimal points so math people tell us that the correct name is an Integer). If you look up Integer in the Ruby docs, you’ll find that a list of things that it can do. There are two that look pretty interesting to us, but for now we just want to do something five times. Here’s how we do it:

5.times do |num|
    puts 'Do something!'
end

Ok, what’s going on here? The number five is an object, and the ”.” symbol means “tell the object to do something”. The name after the ”.” is what we are telling it to do. So we are basically saying “Number 5, I want you to .times”. OK, so we are missing part of the picture. That part is the do block. Everything between the do and end is also being passed along with the message ”.times”. Let’s call the ”.times” message a method.

A method is something you can tell an object to do.

Continuing on, we have the do block followed by some pipe symbols ”|” and a name in the middle. This is how the number 5 passes something back into your do block so that you can use it. The name you give this something is called a variable. You can use it or ignore it, it’s up to you. We are ignoring it for now, but if we want to use it, we just have to use that name.

5.times do |count|
    puts 'Do something on time ' + count
end

What you see will change based on the value of the variable “count” that the number 5 is giving us. The documentation tells us that the number will start at 0 and go to just under our number. So we will see five lines counting up from 0 to 4. A lot of programming languages do this, so it’s just something you have to get used to. Everything between the do and the end markers gets run each time. The important thing is that the variable you named inside the ”|” symbols is how you use that variable.

OK, we are taking some baby steps here, but I’ll stop for today. First of all, we learned that things are called Objects and we tell Objects what to do by calling Methods (some people call them messages, but other languages use the word Method so we are just being consistent). We also learned that you could pass in a whole block into a method and have that object run that block for us. We learned that when we pass in a block to an object’s message, it can pass something back into our block so that we can use it. I want to point out that the whole “pass a block of code” thing is something that not every language can do. For instance at the time I am writing this, Java, C#, and C++ can’t do that.

Lastly, I want to start you thinking about something. The best code is self documenting, but it will never read like a book. As long as the details are clear, all you have to worry about in your comments is why you are doing something five times. With the last snippet of code above, it almost reads like English. We are saying “five times do something using ‘count’”. The word “puts” is actually a method on the text console object. Ruby makes some assumptions to make the code a little more readable.

Using Java Enums for Finite State Machines 3

Posted by Berin Loritsch Mon, 23 Jul 2007 12:57:00 GMT

Finite state machines are useful design constructs for a number of situations, although they seem to be fewer and farther between these days. Currently, the only place I tend to use them is when I have to write a parser by hand. Sure there are BNF parser generators around, but not all parsing requirements fit those restrictions. I have to parse legacy message formats which predate BNF parser theory (i.e. from the 1950s), so this is a useful tool to ensure that the message is properly formatted and all the information is pulled out properly.

According to Design Patterns by the GoF, the way to do an object oriented finite state machine is to use objects to represent each state, each with a common interface. My first introduction was a C++ program that was written with the old C mindset. That meant that the states were represented by an enum and all actions were taken with large switch or if/else hierarchies. I can see how using objects can clean things up, because the conditional logic would be decided by the state object. Of course, the state pattern in the GoF book required keeping the state in an external object and neglected to dictate how to change the state properly.

The Problem We are Solving

For my purposes, I have to populate a message object with all the information from a text message. That includes things like security markings, addresses, tags, captions, etc. The message format is distinct enough so that each line means something. Of course, there is a proper order of processing, but I can parse one line at a time. That simplifies things a whole bunch. I can have the parser work with an interface that looks like this:

public interface Partial {
    public State parse(Message message, String line)
        throws ParseException;
}

In Java 7 we will likely be able to use partials for this approach, which will clean up a lot of the code clutter. In that case the interface would look like this:

{Message, String => State throws ParseException}

Either approach you take will allow you to write the enumeration delegating the actual processing to anonymous classes. I’m sure you’re thinking, “My God! This guy’s off his rocker! I thought he liked elegant code!” Trust me, you’ll see the elegance in a minute. You aren’t going to do this all the time, but when the situation calls for it, you’ll appreciate it. The real benefit comes from testing.

Enums and Finite State Machines

Java enums are objects which is really useful. I’ve used this fact to associate sort order SQL snippets with an enum for the different sorting algorithms supported in a system, along with other uses. I decided to do an experiment with rewriting a parser we have. The parser works for the most part, but it leaves out some important information we want to support, and more importantly it is difficult to change. It needs some major rework beyond the scope of a bunch of refactorings. That is why I chose to rewrite it.

We have to start writing the State instances, and it really helps to have a starting point. For this blog, I’ll only split a message into header and body sections. There’s a whole lot more going on, but I just want to show how things work. The marker that splits the message header from the body will be a line that has “BODY” on it with nothing else. First, let’s look at our State enum. The important thing here is that we are not

public enum State implements Partial {
    HEADER(null),
    BODY(null);

    private final Partial partial;

    private State(Partial parser) {
        partial = parser;
    }

    public State parse(Message message, String line)
            throws ParseException {
        return partial.parse(message, line);
    }
}

The implementations of HEADER and BODY are null right now because we will get into it a bit later. First, you’ll notice a couple things about the enum. We are passing something that does work into the constructor, which also means that our enums can do work. For consistency sake we used the same interface for the enum as we did for the interface we pass into the constructors. If we were to use the closures spec, we wouldn’t have an interface to implement, so the method we provided would be how we access the blocks passed into the constructor. The spirit of the design is the same, it’s just that there is less extra code to type. Just so you can see what it looks like (assuming I have a better understanding of the spec), here you go:

public enum State {
    HEADER(null),
    BODY(null);

    private final {Message,String=>State throws ParseException}
           partial;

    private State(
            {Message,String=>State throws ParseException} parser) {
        partial = parser;
    }

    public State parse(Message message, String line)
            throws ParseException {
        return partial.invoke(message, line);
    }
}

In either case, the base design is identical. The only way to have the functionality of enum values change based on the value is to use the delegate approach. In short, we are passing in an object that does the work that is specific to that state in the constructor and calling it later when we call the parse method. It’s also important to note that enums are singletons by definition. There is one and only one State.HEADER enum value in the system, as there is one and only one State.BODY enum value in the system. That means the implementation has to be re-entrant. As long as you don’t attempt to keep any state in the objects you should be fine.

For the rest of the article, I’ll be focusing on the anonymous class approach (i.e. the first version). I’m assuming you can do the translation into closures later. Besides, I’m not sure if it would be legal to use the control invocation syntax for the constructor or not. This is a question for Neal Grafter, would this be legal syntax for the constructor of an object (it would be better if that’s the case)?:

HEADER(Message message, String line:) {
    HEADER
}

The State Implementations

The implementation of the state is very simple. We are providing an anonymous class (or closure declaration). The header is going to add the line of text to the header provided until we hit the “BODY” line. We aren’t going to copy that line at all. The important thing is that we can easily test these conditions in isolation from any other state. Let’s write some tests to make sure our implementation does what it is supposed to do (we are skipping the boiler plate JUnit code):

public void testCopiesLineToMessageHeader_and_ReturnsHEADERstate()
        throws ParseException {
    String line = "test line";
    Message message = new Message();

    State state = State.HEADER.parse(message, line);

    assertEquals( State.HEADER, state );
    assertEquals( line, message.getHeader() );
}

Currently our implementation compiles but only throws NullPointerExceptions because we haven’t given it anything to do yet. Let’s at least get this to pass. We have to rewrite the HEADER constructor in the enum:

HEADER(new Partial() {
    public State parse(Message message, String line)
            throws ParseException {

        message.addLineToHeader( line );
        return HEADER;
    }
});

That’s all well and good, but we need to make sure that we switch to the BODY state eventually. So let’s add a new test:

public testBodyLineReturnsBodyState_and_DoesNotWriteToMessage()
        throws ParseException {
    String line = "BODY" 
    Message message = new Message();

    State state = State.HEADER.parse(message, line);

    assertEquals( State.BODY, state );
    assertEmpty( message.getHeader() );
}

Now, all we have to do is make this pass in the HEADER enum:

HEADER(new Partial() {
    public State parse(Message message, String line)
            throws ParseException {
        if ( line.equals("BODY") ) return BODY;

        message.addLineToHeader( line );

        return HEADER;
    }
});

Now, if you are using Java closures, you simply cannot have multiple exit points. It’s easy enough to alter the logic. Some people don’t like what I did as a matter of principle. That’s OK. We know it works and its tested. We can refactor it later to our heart’s content. For the purpose of brevity, it’s up to you to do the same thing with the State.BODY implementation.

Using our Finite State Machine

Using the Finite State Machine we just created (granted it has only two states) is really easy. We know that we need a message object, and we need to iterate over the lines to a message. We’ll assume that you got it from some Reader. Here is a method that does the hard work for you:

public Message parseMessage(Reader in)
        throws ParseException, IOException
{
    Message message = new Message();
    State state = State.HEADER;
    BufferedReader reader = new BufferedReader(in);

    String line = null;

    while((line = reader.readLine()) != null) {
        state = state.parse(message, line);
    }

    reader.close();

    return message;
}

I left all the error handling code as an exercise for you, dear reader. If you want to provide accurate line numbers for your ParseException objects, you can surround the thing in a try/catch and use the following construct:

catch(ParseException pe) {
    // Rewrite the line number where the problem occurred.
    ParseException npe =
        new ParseException(pe.getMessage(), lineNumber);
    npe.setStackTrace(pe.getStackTrace());
    throw npe;
}

You have to keep track of the line number yourself. The lineNumber variable was incremented in the while loop in my code. I also wrapped the IOException in a ParseException keeping track of the line number so the signature was simplified. Of course, you’ll want close the reader in a finally clause.

In Conclusion

You aren’t going to write a finite state machine every day, but when you do you’ll want to keep things as simple as possible. The approach I outlined is very handy in the sense that you can completely test each state in isolation from the others. Because it is its own object, you don’t have to worry about testing the whole of the application to ensure your states are doing what they were designed to do. I typically have each State instance tested in its own TestCase object. This makes it particularly clear what each state is supposed to do, and how/when transitions take place.

I find that FSM are much easier to understand when you think about one state at at time. Trying to keep the whole “if this, then that, or is it the other thing” reasoning can be avoided that way. Doing it the old procedural approach is really not very useful these days. Don’t query data and keep control. Ask your objects to do work for you. Delegate.