Intro to Programming with Ruby
First of all, Ruby is a programming language that has become pretty popular with the advent of Ruby on Rails . That said, Ruby is a great first programming language to learn. I won’t bore you with it’s history, even though the language has been in existence for over a decade. For the summary about Ruby, check out http://www.ruby-lang.org/en/about/ and also look at the different tutorials. This tutorial is meant for people who don’t know squat about programming. If it’s confusing at all, let me know in the comments.
Before we delve into objects and things like that, let’s consider what happens when we have to use Ruby to help with other tasks. Those tasks can be deployment scripts, test support, etc. One of the most important things you will learn has nothing to do with actually making stuff work. It’s the comment. Basically you want to leave notes for yourself so that you can get back into the swing of things after you leave something alone for a while. To do that with Ruby, all you need is the ’#’ symbol. It marks the beginning of a comment, and the comment is over at the end of the line.
# # A block of comments looks like this, with a '#' symbol # at the beginning of every line. Comments are supposed # to help you remember things later on. #
Start with good commenting habits, and keep them up. Some things are pretty self explanatory so make sure you document the big things and not what every line is doing. Remember that these are clues you leave to yourself, or anyone else who will work with the code about what’s going on.
Now, Ruby is an object oriented language, but before I get into making objects let’s look at what they can do for you. In Ruby, everything is an object—your numbers, ranges, collections, etc. To help wrap your brain around objects, think of them as “things”.
Objects are things you can tell what to do.
Let’s say you need something to be done five times. The number five is an object, which means you can tell it to do something for you. Core Ruby Doc has all the standard objects that are part of the language, but it can be a little overwhelming at the beginning. First of all, the number five is an Integer (there are no decimal points so math people tell us that the correct name is an Integer). If you look up Integer in the Ruby docs, you’ll find that a list of things that it can do. There are two that look pretty interesting to us, but for now we just want to do something five times. Here’s how we do it:
5.times do |num|
puts 'Do something!'
end
Ok, what’s going on here? The number five is an object, and the ”.” symbol means “tell the object to do something”. The name after the ”.” is what we are telling it to do. So we are basically saying “Number 5, I want you to .times”. OK, so we are missing part of the picture. That part is the do block. Everything between the do and end is also being passed along with the message ”.times”. Let’s call the ”.times” message a method.
A method is something you can tell an object to do.
Continuing on, we have the do block followed by some pipe symbols ”|” and a name in the middle. This is how the number 5 passes something back into your do block so that you can use it. The name you give this something is called a variable. You can use it or ignore it, it’s up to you. We are ignoring it for now, but if we want to use it, we just have to use that name.
5.times do |count|
puts 'Do something on time ' + count
end
What you see will change based on the value of the variable “count” that the number 5 is giving us. The documentation tells us that the number will start at 0 and go to just under our number. So we will see five lines counting up from 0 to 4. A lot of programming languages do this, so it’s just something you have to get used to. Everything between the do and the end markers gets run each time. The important thing is that the variable you named inside the ”|” symbols is how you use that variable.
OK, we are taking some baby steps here, but I’ll stop for today. First of all, we learned that things are called Objects and we tell Objects what to do by calling Methods (some people call them messages, but other languages use the word Method so we are just being consistent). We also learned that you could pass in a whole block into a method and have that object run that block for us. We learned that when we pass in a block to an object’s message, it can pass something back into our block so that we can use it. I want to point out that the whole “pass a block of code” thing is something that not every language can do. For instance at the time I am writing this, Java, C#, and C++ can’t do that.
Lastly, I want to start you thinking about something. The best code is self documenting, but it will never read like a book. As long as the details are clear, all you have to worry about in your comments is why you are doing something five times. With the last snippet of code above, it almost reads like English. We are saying “five times do something using ‘count’”. The word “puts” is actually a method on the text console object. Ruby makes some assumptions to make the code a little more readable.
What's the Minimum Requirement for a Successful Project? 3
This is in preparation for a newbie series on the Ruby language; however, it’s included for your benefit here. There are certain things that any project needs to keep track of if it is going to be successful:
- What needs to be done?
- What was done?
- And how close are we to being done?
I’m going to avoid all the project management and tech speak as much as possible. For you college students, pay attention, if you aren’t doing these things you are writing a recipe for disaster. So let’s look at these things one by one.
What needs to be done?
To answer the question you need to ask yourself, “if I get nothing else done what will make me successful?” These things are called requirements and every project has a different focus, so the requirements will not be the same. The next thing you have to ask yourself, “what is standing in between me and something I am proud to show off?” These things are called issues and they can be your own doing, or they can be external forces. Now, you can lump everything together, because you have to address both your requirements and your issues in any given release. A release is a version of what you are doing that addresses a list of things—whether they are requirements or issues. It’s kind of a package deal.
I’ve managed a few projects where we put everything in an issues list, and assigned version numbers to those things. The issues could be further classified into requirements, enhancements (aka new requirements), bugs (mistakes we made), configuration problems (environment problems), and risks (something that may or may not happen that we need to be aware of). The important thing is that you assign these things to a version so that you can have an idea of what the package is. The complete list of things to be done is your release, and as you knock them off, you can see how close you are to making a release. The list can grow as your are working and new bugs come up, or the people you are writing the application for come up with something new.
What was done?
To track what was done, you not only need to track how you are doing in your issues list, but you also need to make sure that your workspace (source code, etc.) matches what you said is done. You need to make sure that when you have a number of people working together on the same project, that no one accidentally deletes someone else’s work or added something extra. For software projects, the chief tool to keep track of what was in older releases and merging multiple people’s work together is called a version control system. Some common free version control systems include CVS and Subversion. For software projects, there really is no excuse for not using a tool for version control. The two listed are free open source projects, and the second one is a little better to manage. There are commercial software projects, but the important thing is that you are using something other than people to manage the process. Your version control software has to support the following functions:
- Tagging (marking a set of code at a version in history so you can say what was done for a previous release)
- Merging (when more than one person is working, their work should be merged into the controlled version and not just replacing the whole file)
- Logging (every commit should have a message summarizing what was included in the change)
Document management systems usually don’t support tagging and merging. They just keep a history of versions of a document. They also usually require you lock a file so that no one else can touch it while the changes are happening. While that may be necessary when you are working with word processor documents, with source code it is an unnecessary restriction that gets in the way of doing work.
So how do we make sure we did what we said we did? You test. Testing is essentially checking the application is doing what we expect it to do. You can do testing manually, automatically, or some combination of the two. The best option is to combine automated testing with manual testing. The automated testing will always check the “blessed path” or expected way to use an application. That way you make sure you don’t accidentally break something with new changes. The manual testing (people actually using the application) will have people trying to break it or abuse the system. If they succeed in breaking it in an unexpected way you have a new bug and a new automated test to deal with.
How close are we?
Hopefully, the list of open issues is getting smaller and the number of passing tests is increasing. Eventually, the number of open issues for a release will reach zero, and all the new stuff will be fully tested and given the green light. That’s when you know you are ready to cut a new release. It helps to chunk up the work so that you can get a feel for how close we are getting a little at a time. Let’s say you have a long list of things for a release, and you think it’s going to take about six months to get it all done. If you wait until the end of those six months to begin checking your work, you are probably going to get overwhelmed with the list of things that went wrong. It’s better to break up the work into smaller chunks.
I find that one or two week cycles give a good balance between getting the work done and testing early. If you do too much work before testing, it will take a lot of work to fix the problems. If you do too little work before testing the work becomes tedious and unpleasant. It’s best to have a good battle marching speed to do little mini releases so that people testing will be able to catch and report problems before they get too big.
For any project where money is involved (i.e. just about anything other than an open source project) it’s important to see how close you are to your estimate. Unexpected problems can really affect when you get the work done. Let’s say you are marching along really well and it looks like you might even beat your 6 month deadline, but suddenly the database is imploding under the work load. Now you have to find out why it’s misbehaving and fix it. The problem might be a configuration problem, or it might be that you have too little database for your application, or it might be that you are doing too much with the application. Finding the problem and fixing might set you back a month, so meeting that six month deadline is starting to look like a pipe dream. Now you have to figure out how you are going to handle the schedule problem. You can move the schedule, change what’s required for the release, change the way you are doing things to speed it up, or make people work longer. There’s a physical limit to how long people can work, and the longer the work the exponentially more inefficient they become. If you can work smarter instead of harder, go for it. It will help you with the next release.
With open source projects, the schedule is less of an issue, and things are released when they are ready. However, if someone wants to help out and start helping, it’s always best to give a clear answer of what you are trying to do. If they can see the list of issues and help get it done quicker, more power too them.
Testing and Choosing Your Process
Process . It’s a dirty word around some people. All it means is “A way to get something done”. Nothing more, nothing less. However, we have things like ISO 9001, CMM, Agile, XP, Scrum, etc. all presenting themselves as the way to develop your software. Each has it’s benefits and drawbacks, and each works with a different kind of client. If you hear anyone tell you that their way to do things is better, faster, etc. than anything you are doing without knowing the details you know you are dealing with a snake oil salesman. Even within the CMMI framework, there is a lot of leeway for you to have a heavy process or an agile process.
Process is just a way to get things done.
I’ve heard of some people’s software development process as “write a little code, run the app, write a little more code, and so on.” That process can work for some people, but not for everyone. I happen to be a fan of Test Driven Development coupled with Continuous Design—a fairly Agile approach. However, I know there is a lot of people who just can’t think that way. That’s fine. We just probably won’t be working on the same projects.
Manual Process vs. Automatic Process
The first aspect software development process is finding the right ratio between things you do manually, and things that the process takes care of for you—as a side effect of just doing the process. We programmers tend to be lazy, and so we do whatever is easiest on us to do. Ironically, we are so lazy sometimes that we would rather not invest the time to set up the automation unless there is a clear reason to do so. It’s actually a good thing because too much automation can get in the way of making changes. Of course, manual processes don’t scale very well.
Anything you do manually is something you have to remember to tell someone else.
When you are working alone, you can take all the shortcuts you want because you don’t have to explain what you are doing. The problem comes when you add someone to your team. All the things you were doing you now have to explain to someone else and get them to do it, or it won’t get done. Heck, sometimes you either forget a step even when you are by yourself. It’s for that reason we even consider automation. The places where we have the best benefit from using tools include configuration management and creating builds. I wouldn’t imagine writing any code without version control software of some sort even if I’m by myself. The versions are managed for you so you can go back to a release to reproduce issues if necessary, the code is backed up and managed beyond your own hard drive, and as soon as you add more people to the team it takes care of merging code for you. The benefits far out way the alternative. Creating a build might be easy or difficult depending on the structure of the complete application. The build process will create your distributables and run any tests you have included. Tools include Make, Rake, Ant, etc. and have differing levels of complexity. The important aspect is that you aren’t relying on someone else to make the identical configuration settings in their IDE to get the same executable. It’s all taken care of by the script.
There are other areas where automation can help out, but it does depend on the culture of the project, and the ease of setting it up. How do you manage your issues? Many tools generate reports just from how you use the application. You can show how quickly you are burning through your issues, and you can see how fast or slowly you are approaching the completed release. Sometimes you can add flags to the tool to notify you if values are outside a certain set of parameters.
Testing your Process
In any software project there are number of things you have to keep track of, and your process (the way you get things done) should make it easier. So how do we know if the process is working or not? You have to start by identifying what’s important. What is going to make your client lose confidence in you? If you do nothing else, what has to be done? Do you really need full Earned Value Management (EVM) or can you get away with proving you are making progress at the expected rate. Many times it’s enough just to know when you are falling behind. One thing that’s a must in any process is that you produce the deliverables.
If anything important slips through the cracks, your process is broken. Period.
Once you’ve identified the important stuff you can’t miss, you keep track of it. So how do I do that? You do your process. If you can ensure your process takes care of tracking the important stuff, all the better. Keeping track of something is what some people call “metrics”. The thing I don’t like about metrics is that you are essentially associating numbers to parts of your process. Numbers people like metrics because they have something to play with. The reality is that numbers don’t mean anything at all. Trends mean something, numbers don’t. If you are getting better, (i.e. your trend is going in the right direction), your process is working. If you are getting worse, (i.e. your trend is going in the wrong direction), your process needs to be fixed.
So how do you track trends, and when does a change really mean something? The tricky thing with trends is that if your sample is too short you might make adjustments that are not warranted, but if the sample is too long you won’t make the adjustments soon enough. There is no substitute for your gut feeling when you are just starting to track how well your process is working. The trick is not to be alarmist, and make proper adjustments. It’s like learning to drive for the first time. You either make too many large adjustments and overcompensate, or you make too many small adjustments. The good news is that you aren’t going to crash right away.
There are a few things that process people (you know those heavy process proponents) forget to look at while they are testing their process. How much work is needed to feed the process, vs. how much work is spent actually making deliverables? If you need 5 people that do nothing other than track the project’s progress when you have a team of 3 developers generating code, there’s something a little off-kilter. Notice that I said the 5 people are feeding process. Ensuring you have a quality product is work that’s done for the deliverable. Ensuring you have a quality code review is not. That’s “meta-work” or work about work. It’s stuff done to feed the process. Even the discipline of code reviews can be brought into question as long as you gain the benefit another way. A cod review is merely a means to both mentor other team members and ensure the quality of the product. If your process handles those benefits in other ways, then you don’t need a formal code review.
Fixing a Broken Process
If you are like me, then you really won’t wonder about how you fix a broken process. You just make the changes on the fly and the process evolves. You know these “lessons learned” meetings people have that dig up all the surprises and issues that we ran across? Are you merely identifying lessons, or are you actually learning them? If a technology choice turned out to be bad, how can you make sure the same mistake doesn’t happen again?
You haven’t really learned a lesson unless you did something to correct it.
If you discover that something you are doing isn’t lending any value whatsoever, you shouldn’t be doing it anymore. The advantage of changing your ways vs. logging issues is that it actually fixes the cause of your heartburn. Logging an issue helps you vent and get things off your chest, but it’s still there until you actually do something about it. That’s why the traditional “lessons learned” meetings don’t work. They let people vent, and then nothing is done about it. “Yeah, that was painful. We’ll be doing it again next go ‘round too.”
One of the biggest lessons I learned is that waiting until a “lessons learned” meeting to vent is waiting too long. If you are experiencing some heartburn now, fix it now. If you spend more time collecting numbers than you do using those numbers (um, I mean metrics), then just how important are those numbers in the grand scheme of things? Can you find another way of collecting those numbers? It’s funny, but machines are much better about collecting them than people are. There’s probably some way of making those numbers be a byproduct of how you do things. In essence, if you can make your tools support the way you do things and keep track of the numbers for you, you are better off. As long as you are depending on people to actually log numbers, you are depending on the most unreliable source for numbers. It’s a pain to keep track of them. Even if we only need to type them into a form, it’s still a pain to keep track of them. Let your tools do all the data entry for you.
The bottom line is prove you need to change, change, and then prove your change is helping. It’s TDD for process. If you see a bad trend and the process is the culprit, prove it. Then decide what a good trend would be. Make your changes and see how close you are to the results you want. You may find that your change is “close enough”, in which case you are done. Of course, you may find that you just introduced more work than the way of doing things, in which case you revert back or make a different change.
Test Driven Development 101
I’m back after a long hiatus, and I’m probably talking to the air right now, but that’s OK. For the two or three of you actually listening to me, listen on. We have a number of projects of different sorts at the company I work for, and you’d be surprised at how few do test driven development—and how few even write unit tests. In this day and age, not writing tests that can be automated at all is inexcusable. At the very least, the tricky stuff should be thoroughly tested. This article is for anyone who is either skeptical about or interested in Test Driven Development. That includes managers as well as developers.
What’s the Problem, Man?
We all ( at least should ) want to write quality software, and we will take this as the “given” in geometry. Since we want to write quality software, how do we do it? In the early cowboy days of software engineering, it was painful to write code. First you did all your planning, wrote what you hoped would work, printed the stacks of punch cards, and then loaded it on the main frame. Which explains all the texts on the huge design up front methodology. However, that was before my time. When I first started, we wrote a little, ran the software, and debugged. Which explains why debuggers were so important at that time. However, times have changed. We now have a plethora of unit testing frameworks and maintainable build scripts to incorporate the tests into the build process.
As soon as we ran into some hard problems that really couldn’t be completely designed up front (like TCP/IP stacks), we developers had to figure out a way to make sure things worked properly. I mean, the spec is pretty complete, but there are a lot of details and practical limitations imposed by hardware that you won’t find in the spec. If you attempted to alter the spec to include all these corner cases, you would also lose any way of understanding how it is supposed to work. This is how the automated unit test was born. All these corner cases had to be accounted for so that future changes to the code won’t break the existing functionality. I think everyone recognizes the importance of testing. It’s just how, when, and where testing happens that people disagree on.
Another problem that is often brushed under the table is code slipping in that doesn’t need to be there. How often have you tracked down a problem only to discover code that was never supposed to exist be the culprit. You remove the offending code, and violá it works! The code could have been there from old legacy requirements that no longer apply, or it could be a developer without experience over thinking a problem. Nevertheless the code is there and it is causing problems.
If you adopt a clean as you go philosophy to developing software, you had better make sure you don’t accidentally breaking anything as you refactor your code. Even with “safe” refactoring, there is an inherent risk that you can inadvertently break something you didn’t think you would. If only there was a way to make sure all the important functionality keeps working…. Oh yeah, you do have a full test suite don’t you? Oh, it didn’t get written because you were under the gun and you were on a roll? Too bad.
Test Driven Development (TDD) was designed to not only address these issues, but to instill a discipline so that your unit tests would get written. Let’s face it, all people are lazy. It’s in our nature to not do anything we think is a waste of time. Sure we’ll do a little work now to avoid a bunch of work later, but we need to know it really is going to avoid a bunch of work later. What some people fail to realize about TDD is that the time will be spent somewhere. Either you write your tests up front or you spend time in a debugger later. Either you prove that your approach will work now, or you do it later. Either you break up your work into testable chunks now, or you attempt to do it later and fail at it.
OK, So How Does It Work?
At its heart, TDD is pretty simple—you simply perform a prove, fix, prove cycle. First you prove it doesn’t work . Then you fix the problem. Finally, you prove your fix worked . If you are lucky, your first “prove” step will prove the code already handles what you were thinking about. Write the test anyway, because you need to make sure that any changes will still support that test case. It’s fairly easy to see how to perform the mechanics of TDD, but less about it’s design implications without a little more explanation.
One of the side-effects of writing your tests before you write code is that it makes your code easier to test. Remember, people are lazy? Since a proper unit test sets up the environment and checks the effect of the call after, we make it easy to set up the environment. In fact, the less we rely on network connectivity or environment variables, the better. If a bit of code can be tested just by passing objects into it and examining the return value, then you aren’t going to be overly clever in your implementation. Easily testable code, also happens to be more modular. If you can pass in a mock object for something that would normally talk to the network so that you can have the mock imitate the situations you would encounter there, you can more easily test just the small unit you are working on.
Unit Tests vs. Integration Tests
In the best of all worlds, a project would have both. Unit tests make sure that the smallest unit (such as a method on a class) is doing what we expect it to do. A proper unit test also tests only one aspect of that method. It’s not uncommon to have several tests for the same method to make sure that all the corner cases are taken care of. Integration tests make sure all the different units work together like we thought they would. A unit test uses mock objects to isolate the thing we are testing from everything else. An integration test sets up the complete environment and runs the test against that environment. They are different tools for different problems. To do TDD, unit tests are required, but additional testing is a plus.
More often than not, code that is properly unit tested will work just fine. However, there are some issues that only crop up when the whole system is put together. Perhaps there is some race condition, or some complicated event loop that only appears in certain conditions. Once you track down the cause of the problem, you can write a unit test to reproduce the condition that caused the mess in the first place. With the new unit test, you fix the code, and everything should work in integration again.
Many times, the integration tests are all done manually. The challenge is that it is hard to set up a complete environment, test the user interface or system messaging, and evaluate the results automatically. The problem is that over time the number of tests that people have to do for a release becomes daunting. People are less picky than machines. If there is a tool to support integration testing like Selenium or some other testing framework, use it to at least catch regression issues. You know those issues where you accidentally break something that used to work when you add a new feature? Anything that was working and gets broken needs to be tested every time. Machines are good at doing rote repetition like that.
Battle Rhythm
When you sit down and start doing TDD, you’ll develop a battle rhythm. Most IDEs these days have support for running your unit tests without going through the whole build process. It’s pretty convenient, and it makes TDD a lot easier to do. So, you’ve got a new requirement and you need to get it working. It’s good to have a general idea of where you want to go, or how it is supposed to work, but don’t be a slave to that idea. We start writing the unit test at the easiest place it is to begin—whatever that may be. I personally find it is best to start with the happy path (the path where everything works as expected). For example, look at the pseudo code below:
Test String is a URL -------------------- 1. Use the string "http://bloritsch.d-haven.net" 2. Pass the string to the String Utility "isURL" method 3. Assert the response is "true"
Of course, it’s pretty easy to make this one test pass. All we have to do is write the StringUtility.isURL() method to simply return true. No evaluation or anything. When we run the test, we prove that solution is good enough for now. So we need to start thinking about the next test. What if the string is not a URL? So we add the next test:
Test String is NOT a URL ------------------------ 1. Use the string "I'm not a URL, ignoramus!" 2. Pass the string to the String Utility "isURL" method 3. Assert the response is "false"
Now we’ve proven we have some work to do. So we change StringUtility to return true if the string starts with “http:”. It’s the simplest thing, right? But what if we want to include SSL encrypted URLs, or mailto URLs? So you add tests for them, and make them pass—without breaking the other tests you’ve written. All these tests are cumulative, so while they may have been extra work at the beginning, they can save your bacon later. Don’t forget about those corner cases, what if the string is null ? Etc.
By the time you are done with this method, you’ll have done the following things without even realizing it:
- Documented what you consider a URL and what is not a URL (design documentation side-effect)
- Proven the design works (design proof side-effect)
- Tested the implementation (implementation proof)
- Provided a safety net to do refactoring (supporting implementation malleability)
Sure the method is just a part of the overall design, but you’ve thought about and decided how the method is going to be used from the perspective of someone using the method. It’s a shift in thinking from being the “implementer” of the method, which usually results in code that is easier to use elsewhere. You’ve also introduced a level of trust in this method that you wouldn’t have if you just reached for that regular expression you found on the net to determine if something is a URL or not. You’ve also introduced a boundary where the implementation can be as simple or complex as you want—but code that uses the method won’t care about those details.
The battle rhythm of proving, fixing, and proving actually improves your development speed. It may not seem like it at first, but by testing all along the way we’ve minimized the time we will have to spend in a debugger. The ramp up time is a little slower, but as you get your battle rhythm going, you stay at a constant pace. Without TDD, I find myself working with bursts of productivity interrupted by long periods of finding out exactly where I went wrong in a debugger. With TDD, I find myself working at a steady pace, and those occasions where I missed something I spend much less time in the debugger. Once I’ve discovered the culprit, I add the test case that reproduces the error condition and then make it work. Now, when I refactor code I can make sure I don’t reintroduce the problem accidentally.
Silver Bullet?
There is no silver bullet, and no golden hammer that will make things work perfectly the first time. There are only tools that help you get closer to the ideal. TDD is a tool that helps improve quality from the start. Most detractors of TDD look at the claims of documenting your design as being false—or at least unreadable to normal people. I may give them that argument, but TDD isn’t about documenting design, it’s about building a better quality product with the minimal amount of investment. It’s about improving your productivity over the course of a software project. It’s about reducing the number of “doh!” bugs to virtually none saving your brain cells for the more complex problems. Finally, it’s about minimizing the risk involved in refactoring or even rewriting your software.
All of these benefits are things that the text books say are a good thing. It’s also done in a way that is less painful to developers. Writing documentation is a pain in the butt, however, writing test cases is something that directly benefits the developer. It benefits the project over it’s period of performance. Bottom line? More bang for the buck.
