Java and C# suffer from the same ailment

Posted by Berin Loritsch Tue, 29 Jun 2010 16:46:00 GMT

I have an interest in language design, even though I have no direct outlet for it at the moment. So as I’ve been contemplating what I like and what I don’t like about the languages I have been exposed to, I’ve realized that both Java and C# are suffering from the same core ailment. That ailment is the conceptual complexity underlying these platforms. I have to say platform because both Java and C# use a virtual machine that has been used to host other languages as well. C# without the CLR is like Java without the JVM: useless. This is in stark contrast to the almost sublime conceptual simplicity of Lisp, Smalltalk, and even Ruby.

Both C# and Java have bolted on several different features to deal with the underlying complexities, much like the English language has imported words from several different languages. English, technically a Germanic language borrows significantly from Romantic languages like Latin, and even some Greek. We won’t mention some import words from vastly different languages like Japanese (kimono, karaoke, katana, kanji). So it is with Java and C#. A short list of concepts shared by both languages include:

  • Autoboxing
  • Attributes/Annotations
  • Dynamic binding (.Net 4.0 has a DLR and Java 7 has new JVM opcodes for this purpose)
  • For each style iterating
  • API document generation
  • and more…

The problem isn’t so much the features in and of themselves. The problem is more subtle than that. In order to deal with the complexity of the language itself, these features are necessary. In some ways, a language like Lisp has conceptual appeal, even though its syntax is hard to wrap your head around. If everything is a list, from parameters passed in to a function to data values, and the language is built around set theory, it maps pretty well to a discipline of math. Heck, with Lisp a function is just a list of operations. Although perhaps in some ways Lisp is too conceptually simple.

The problem I’m getting at is being able to form a reasonable hypothesis of how the software is addressing your problems. I remember reading a PR piece on how Java was better than C# that had a small snippet of code asking how many method invocations there were. The two or three line snippet actually ended up invoking an unexpectedly large number of methods, from attribute accessors to delegates and some other magic. The intent of the developer was clear, although the impact of the code was unexpectedly complex. That’s not to say that C# is bad. The article was a PR piece to help Java developers still feel good about themselves. However, Java is just as guilty. Have you ever tried to debug dynamic proxy code? Have you worked with features that injected functionality into your code for you (Spring/Hibernate comes to mind)?

Other than the general second law of thermodynamics, what is it that drives languages to be more complex? Rather than truly seeking simplicity, both Java and C# have progressively moved toward sweeping the inherent complexity under the rug. Essentially moving the problem from something the developer has to worry about to something the platform has to worry about. To paraphrase my wife’s favorite movie:

There are three kinds of pipe. You have nickel, and you can see where that’s gotten you. You have bronze, which is very good… until something goes wrong. Something always goes wrong. And then you have copper, which is the only kind I use. from Moonstruck

Programming is a complex process. Translating the sometimes conflicting desires of a human into something a computer can understand is not easy. That complexity is further compounded by the moving parts we need to work together to accomplish our goals. My goal in exploring the world of language design is to find the right path for true simplicity. While we are approaching on that ideal from different programming paradigms, we haven’t quite reached the ideal yet. It feels like we live in a world where there is only nickel and bronze, and copper has yet to be discovered. I’m not the only one thinking about this for sure.

Concurrent Programming Lessons, and some abstract thought 1

Posted by Berin Loritsch Thu, 17 Jun 2010 16:21:00 GMT

Taking a break from my self serving blogging about my machine geekery, I’d like to jot down my thoughts on building a concurrent language that merges some powerful concepts from other existing languages. Based on lessons from Erlang, Ruby, and JavaScript I think it is possible to approach a working model for how to do safe concurrency in an object oriented manner.

First, let’s examine some concepts from Erlang, or Concurrency Oriented Programming:

  • The world around us is concurrent
  • Each cogniscent being maintains their own state
  • Exchange of ideas is performed by passing messages
  • By responding to messages, each cogniscent being may change their state

Based on these operations, Erlang makes a few restrictions. Variables are write-once (i.e. immutable once set). Because of this, Erlang does not need locks, mutexes, semaphores, or other fancy concurrency control mechanisms the popular languages use. All reads will be the same, regardless of timing issues. Processes are first class citizens, and no memory is shared between processes. Again, the same evils apply. If information is passed from one process to another, it is done by sending messages. The approach echoes observations smarter people than myself have seen in using SAX vs. DOM for XML parsing. The event (message) based architecture of SAX was less memory demanding and easier to maintain throughput in highly concurrent systems (i.e. web servers) than the DOM alternative. There’s a few more things in here that deal with reliability such as the VM’s ability to monitor and restart processes that fail. The end result are programs that scale easily with the processing nodes available, both local CPU cores and remote machines.

Next, let’s examine some concepts from Ruby, or Object Oriented Programming:

  • The world around us consist of things
  • Things can act on other things, or can be the recipient of actions
  • Each thing should maintain its internal state
  • Things act on other things by sending messages (i.e. calling methods)

Based on this set of basic rules, there is a fair amount of overlap between the highly concurrent Erlang concepts and the object oriented view of Ruby (or Java, or C# if you prefer). In practice there are a few different types of things. First, there are things (objects) that cannot act on their own (i.e. value objects like color, money, or dates) but will respond accordingly when acted on by others. These value objects never change state and are completely passive. Next there are objects that represent the current state of the world. These business objects, as some call them, maintain their own state and respond to messages from other objects. In some cases, the business objects will act on other business objects. Finally, there are things that act, or service objects. A service objects are a little different than the physical representations of the other two types of objects, but they take care of complex logic, workflow, etc.

Finally, let’s examine JavaScript, or Prototype Based Programming:

  • There are no object descriptions (classes), only objects
  • Objects can send or respond to messages (i.e. calling methods)
  • Objects should maintain their internal state

So there is some overlap here as well. The major difference between prototype languages and object languages is the lack of a class. In essence, instead of defining how an object should look and behave using a class, you copy an existing object prototype. In the copy process you can extend the prototype by adding methods (new message receivers), properties, or whatever you like. You can also simply use the object as it is. There is a side effect here, that is the system tends to have fewer objects overall compared to your object oriented system. That helps with pesky matters like garbage collection. However, the objects tend to be a bit more powerful.

Now, some personal observations based on working with these languages:

  1. I come from an object oriented background, it makes sense to me so it’s hard to make the mental shift to the other programming approaches.
  2. There is a fair amount of overlap in the concepts, to the point where we can start formulating how to merge them.
  3. Defining a class for an object that will have only one instance seems a bit excessive. The system has to keep the definition of the object and the object resident in memory. Perhaps the prototype approach can help reign that in.
  4. Tying a process to an object gives us the concurrency of Erlang and the familiarity of objects. Essentially, the messages are methods and each object manages itself in its own process. Garbage collection can be much quicker since the collector can be optimized for one process’s data.
  5. Straight value objects don’t necessarily need their own process, they can run within the process (object) that uses them.

I’ve also identified a few challenges with the process/object approach as well. Processes will have to be monitored to see if they are still in use. A special garbage collector would need to be written for that purpose. The Erlang concept of a write-once variable matches the mathematical ideal well. For example, X=X+1 is a mathematical impossibility but a common programming concept in many languages. Yet, objects need to vary their state over time. Special distinction needs to be made to differentiate state that can change vs. state that cannot change. In some ways the concept of a Map for maintaining the internal state of an object is a natural approach. It might be how Erlang programmers maintain state in their processes.

There are a few things that I am concerned with, no matter what the language is or how concurrency is performed:

  • Security. There’s bad people out there wanting to do bad things. Unfortunately, most security models are more of a pain to work with and consequently don’t get used.
  • Internationalization. We live on a planet with many languages and cultures. At the very least UTF should be the default internal representation of strings. This is still a field the industry is trying to figure out.
  • Robustness. Error handling has to be given special attention. If you get it right, you will help people create software that won’t easily break. If you get it wrong, you will help people create monstrocities that break more easily.
  • Testability. Anyone who has done unit testing seriously has learned that the design of the code affects how easy it is to test. The easier it is to test, the easier it is to catch bugs, and the less likely people will complain about writing unit tests.
  • Scalability. The platform should make it easier to take advantage of new features like multicore processors and remote machines. Ideally, the software performance should scale along with the hardware. I hate jumping through hoops to do what should really be done in the platform.

These are just random thoughts. Please shoot holes in them. I know I’ll have to figure out a lot of details to make something like that work.

Intro to Functions

Posted by Berin Loritsch Fri, 22 Aug 2008 11:58:00 GMT

When you are just writing quick scripts, you can use Ruby all you want and be happy. However, there comes a point where you have to do the same thing in a bunch of places. Functions are a way to organize the logic in your code so that you can re-use it in more than one place. I’ll introduce how to do math at the beginning, but functions aren’t only for numbers as we will show later.

Doing Some Math

As long as you are working with numbers, you will have to remember some symbols. In your math text books you will see symbols that just don’t exist on keyboards and requires different key combination to make them show up. The good news is that the conventions for replacing mathematical symbols in code is pretty standard across languages. You only have to learn them once, which helps.

  • + addition
  • - subtraction
  • * multiplication
  • / division
  • % modulus
  • ^ exponent
  • () group expressions

Math expressions are performed in algebraic order. In short, that means that expressions are evaluated in the reverse order from what I listed. Parentheses first, exponents next, then multiplication, division and modulus, finally addition and subtraction. Just to make it clear, look at the following code:

puts 4 + 5 * 6
# 34

puts (4 + 5) * 6
# 54

puts 4 + (5 * 6)
# 34

It’s a good habit to use parentheses to make things clearer. There’s a few more symbols that allow you to do bit manipulation, but then I have to explain the math behind it. Let’s focus on this level of math for now. Let’s say we want to do a little trigonometry and calculate the area of a circle. The mathematical formula for the area of a circle is πr2. So how do we get a hold of the value of π? There is a Ruby module called Math that has the value of π and other more advanced functions.

Ok, so how does the expression look like in Ruby?

radius = 5

puts Math::PI * (radius ^ 2)
# 15.707963267949

I added the parentheses to make it clearer that the exponent (raising to the power of two) comes first. So what if we wanted to reuse this function anywhere? We would have to create a function to do it. It’s pretty easy, and you will use the same construct in another post when we talk about creating our own methods. Let’s create our function:

def area radius
    Math::PI * (radius ^ 2)
end

So what’s going on here? The word def is a Ruby keyword that tells Ruby that you are creating a function. After that, is the name of the function. Finally we have the list of parameters. A parameter is a name we give to a value that you pass to the function. Basically, the function is going to do something with that value—even though it doesn’t know what the value is first. The next line bears some explaining.

Functions can return a value, which is usually their whole point. However, we don’t see any words that say “return this”. It’s probably the most unintuitive thing you’ll run into with Ruby, but the last expression in a function is the value that’s returned. It’s a carryover from Smalltalk, and once you understand that it becomes a little more understandable. If we had one more line that just had the number 2 on it, then the function would always return the number 2—which is wrong for what we want. What some people do to make things a bit clearer is to use the keyword return . That keyword is designed for letting you leave a method early for some cases, but it works just as well. It’s probably not a bad habit as other languages require you to use it. The method would now look like this:

def area radius
    return Math::PI * (radius ^ 2)
end

The keyword end is something we saw already when we were doing loops in the last lesson. This keyword is used to end any block, so you will use it a lot.

Not All Functions Are for Math

I introduced functions with math because that’s where the idea came from. But most problems don’t require the use of heavy math. Ruby isn’t designed to be a math engine anyway. Just for fun, let’s create a function that will turn a number into words—Japanese words to be exact. It’s only fitting as Ruby came from Japan after all. Just to save us some work, we’ll limit ourselves to the range from 0 to 99. To do that we need to use an if statement. The if statement let’s us do something if it is true, but skips the code inside if it is not true. We also want to raise an issue so that the calling code knows that they asked something we can’t deliver. The keyword is raise , which is rather convenient. You can “raise” any object, but we will just use a string. The code looks like this:

if not (0..99).include? number
    raise "We can only translate numbers between 0 and 99" 
end

I’ll include the solution below, and just expound on things in comments. Your job is to expand the method to do up to 999, or to change it to another language. I’m using Japanese partly because it’s easy to do with code. Other languages have more exceptions.

def to_japanese(number)
    #
    # Protect our method from trying to work on numbers
    # it doesn't support
    #
    if not (0..99).include? number
        raise  "We can only translate numbers between 0 and 99" 
    end

    #
    # Keep it simple, use the variations of four and
    # nine that work in the tens column as well as
    # the ones column.  These are the numbers from
    # zero to nine.
    # 
    numbers = ['rei', 'ichi', 'ni', 'san', 'yon', 'go',
               'roku', 'nana', 'hachi', 'kyu']

    #
    # Modulus gives the remainder.
    # 12 divided by 10 is 1 with a remainder of 2.
    # It's a good way to get just the ones column.
    # Then we use regular division to get just the tens column
    #
    ones = number % 10
    tens = number / 10

    case tens
        # When we are doing 10 - 19
        when 1
            japanese = (0 == ones) ? 'ju' : 'ju ' + numbers[ones]

        # When we are doing 20 - 99
        when 2..9
            japanese = numbers[tens] + ' ju'

            if (ones > 0)
                japanese = [japanese, numbers[ones]].join(' ')
            end

        # Otherwise we are doing 0-9
        else
            japanese = numbers[ones]
    end

    return japanese
end

So there are a couple things I need to explain above. First is the case, when, else construct. The case statement tells Ruby that we are going to use the following expression (in this example the expression is a variable) with a bunch of comparisons. It’s a little nicer than doing a bunch of if/else statements. The first match is what gets run. Each case that we are checking is marked with the when statement. To translate it to English, it’s like saying “when tens is 1 do this”, “when tens is in the range 2..9 do that”, “otherwise do this”.

The next thing I have to explain is the (something) ? true : false construct. It’s a shorthand for an if/else statement. Essentially, we are saying that if the ones column is 0, just return ‘ju’ otherwise return ‘ju ’ plus the translation of the ones column. Have fun!

What if Programming Languages Followed the Social Paradigm?

Posted by Berin Loritsch Mon, 28 Jan 2008 13:54:00 GMT

Sometimes, all it takes is a subtle shift in your viewpoint to open your eyes to new possibilities. The big problem with many existing programming languages is that they don’t always lend themselves to natural parallelization. Yes, that includes Java, C++, C#, and Ruby. It’s not impossible with those languages, it just doesn’t come for free. The reason it’s a big deal nowadays is that multi-core chips are hear to stay. Architectures like the PS3’s Cell architecture are likely to become the norm. The result is like fitting a round peg into a square hole. It’s not impossible, it just requires a lot of work.

So how do social paradigms work to allow an expressive and powerful language with a natural ability to be parallelized? I’m not fully sure, but it might work for the expressive and powerful part. Think about the concept of tagging. A tag is just metadata, and what that metadata means is up to us to decide. Of course, as soon as I use the word metadata, I’m sure I lose a part of my audience. I know my ears turned off the first time I heard that word. It didn’t mean anything to me, and it was an intangible concept that didn’t hold any value. Until, that is, we introduced the concept of tagging.

Metadata is the adjectives your language uses. It’s how you tell a fast car from a slow car.

So if objects are actors, and methods are verbs, how do I make these adjectives work for me. Who assigns these adjectives, and when can they be assigned? Here’s where the social paradigm comes in. Anybody, the developer, the actors (classes), environment, can assign these adjectives to any other actor or thing in the system. So what does that buy me? The purpose for tags is to find things again. What if we do something special with the tags? If a piece of code tags another piece of code, it’s because it wants to do something with it later. In fact the system can use that same mechanism.

For example, what if the language could tell dynamically whether the flyweight pattern was more applicable for you, and you don’t have to do a thing? If the runtime environment can determine how long it takes to create an instance of an object, it can tag the class as “Fast”, “Slow”, or “Average”. With that information, the environment can determine whether it is worth it to keep creating new instances of the object or switch context with the same memory resident object. Alternatively, it might decide to turn a reference to that object as a Future or asynchronous object. Sure you can send messages to the object, and expect the messages to be answered in the order you need them, but the application doesn’t have to stop in its tracks while you are waiting on an answer from a remote source.

OK, so now that we’ve seen something potentially useful, what about the powerful part. Sure it’s pretty cool to use asynchronous calls without declaring that you want something to be asynchronous, but what about other useful things like being able to perform the same function on all objects that were tagged specially? You know, kind of like telling all the stealth enemies to come out of hiding when they’ve been located? Rules engines work this way. You tell the rules to monitor all the objects with certain facts and do something when they match. Oh, isn’t this Functional Programming? Why, how astute of you. Then shouldn’t we use Scala? Scala requires you to be too explicit, and I can’t see any examples of it actually making life easier or easier to understand.

One of my frustrations with the Java Virtual Machine is its security model. In most cases, it is to inflexible and difficult to be used, so the application just runs unprotected—relying on the underlying operating system to enforce any security constraints. It might work for Unix based machines, but Windows machines are usually not protected as well by default. Also, if the application were run as the super user, you can cause some serious problems. Sometimes you want to be able to set up a sandbox, set some attributes for it, and run things inside of that. Kind of like setting up a virtual world for a set of components, or plugins. You can allow that plugin to access only the things you want it to access, and nothing more. What’s better to decide this than code you already trust? You can set up a separate work directory, and have the plugin use it as if it were the default system work directory (or temp directory). The code that set up the world for the plugin to do its job can decide by how often the code bumps into the security constraints if the plugin is behaving nicely.

The concept of the sandboxed little worlds fit well into the “Groups” concept that is present in the social applications. Everyone that is part of a group has a common goal and function. Of course, the same actors can be a member of several groups, and they all have to obey the rules of the group they are in. It’s the same exact instance, it’s just that the context that it is working in is different.

There’s some more possibilities, but this is enough to chew on for now. Of course, many of these things can be done without the need for a new programming language—it’s just that a new syntax would make it easier to work with and hide the implementation details. The resulting language shouldn’t be strictly object oriented, or functional. It should be designed in a way where the language can optimize at runtime based on the resources being used. It should also be designed in a way where it should be readable, understandable, and predictable.

How to Aproach a New Language

Posted by Berin Loritsch Wed, 05 Sep 2007 11:54:00 GMT

Whether you are new to a spoken language or a computer language, the principles are similar. There is so much to learn and it can seem so foreign to you that you can easily get overwhelmed. You can always start with survival phrases: little conversation swiss army knives that can get you a long way. They are really designed to get you to a point where you can find someone who speaks your language to finish the conversation, so it’s not like you are going to understand what many people are saying back to you most of the time. There are three major parts to understand a language: the vocabulary, the grammar, and the writing system.

Programming languages are usually easier to learn than spoken languages primarily because the vocabulary is intentionally kept small. The grammar is also kept consistent and simple to keep the parsers sane and predictability high. The writing system usually entails what can be typed from a keyboard and a few rules for mathematic symbols. There is some punctuation that you have to worry about, but not that much. So what makes a language so difficult? In my short experience, it has to do with translating the simple rules into something useful. You need to learn the libraries that come with the language to help you get something done with the operating system. A more subtle problem is more akin to dialects in spoken language, which is finding the standard idioms for doing things.

Spoken languages are usually tougher to learn because of the volume of the vocabulary and the different grammar rules you have to learn. Sure there is a basic grammar that is consistent throughout a language, but there are always exceptions you have to learn. After all that, you’ll invariably get stumped at some phrases and slang. For example if you translate the Arabic phrase for “How are you?” literally into English you would get “What color are you?” For someone who has grown up using the language its as natural as breathing, but for someone else it’s not that intuitive. How should you take it if someone wishes you to “be enlarged with fatness”? Should you be insulted or flattered? In many languages and cultures it’s a compliment.

Even though the two types of languages have very different challenges and end goals, there are a few strategies to help you in the process. Surprisingly, these strategies are the same for both endeavors. As someone who has learned Spanish and Classical Greek in a classroom environment, and everything else by self-study and pestering people, I can say that the classroom only gets you so far. This is what I’ve found to be helpful to me:

  1. Use it. What good will a language do you if you don’t use it for something? Even if you don’t know a lot, use what you know. You’ll find out more about what you need to know by stealing Nike’s advertising slogan and Just Do It.
  2. Develop a system or schedule for how you are going to expand your vocabulary. You can only take information in so fast before your brain overloads. It needs time to process what you’ve learned so far.
  3. Review constantly. You’re making mistakes, you just don’t know it yet. Go back over what you’ve done in the past in light of what you know now.
  4. Immerse yourself in the language. Do what you need to do to see and hear the language used properly. Watch shows, listen to podcasts, get involved in an open source project, read books, whatever you can do, do it. You’ll eventually find a good support group that will help you with the difficult stuff, and find new friends in the process.
  5. Learn the slang. Textbooks and classrooms teach you the “correct” way to do things, from an academic standpoint. That might be all well and good for some situations, but the real world is different than the classroom.
  6. Don’t strive for mastery. Strive to be better than you are. Mastery comes with a large amount of personal investment in the process. You’ll get overwhelmed if you try to master what you are studying. Just try to make incremental improvements and you’ll be more productive and happy with the progress.

Ok, so considering everything, how can I make these general guidelines? After all, what have I taken the time to learn, and how good am I with it? Even I’m surprised when I look at the list:

  • English—fluent, my natural language, 14 years of classroom (grade school and college) plus being competent in the idioms and slang.
  • Spanish—somewhat conversant, my second language, 2 years of classroom and a few years of talking to Spanish speaking people. I can speak, read, and write, but I’m still really slow listening. My vocabulary has wained a bit from lack of use.
  • Classic Greek—1 year of classroom, some self study using the Bible and study helps. It’s no longer a spoken language, but I can read and write the characters and I know where to look for answers that I need.
  • Japanese—I just started learning this language mostly out of curiosity. I’ve got some cultural ties to Japan both from my wife’s family and from mine. I’ve got martial arts, culinary, and cultural interests. Imagine my delight to find Japanese Pod 101.com to aid in my endeavors here.
  • There are some languages I just picked up a smattering of phrases, words, etc. Hardly useful more than to break the ice: Arabic, Punjabi, French, German, Russian, Swahili, Finnish.
  • BASIC—My first computer was a Commodore 64, which came with BASIC and some other language options. I learned BASIC well enough to work on some toy programs. I also learned the IBM and TRS-80 variants.
  • LOGO—I didn’t do more than turtle graphics with this one, but I’m dating myself aren’t I?
  • 6502 Assembly Language—BASIC was too slow, so I figured out how to make things happen at a lower level. Graphics were the same either way, so I found Assembly much more powerful and even expressive than the limited BASIC that was native to the Commodore machine. BTW geoProgrammer was excellent (more later).
  • GameMaker—Game making infrastructure, including sprite editors and rule editors.
  • COMAL—I had a class at school with this one. I never used it for anything more the classroom. However I did learn some cool tricks to use the modulus operator to handle certain corner cases with leap year handling.
  • C++—I skipped C because I believed that C++ had more of a future. By this time I left my C64 behind, and I was working with PCs. I first learned with GCC, and then with Microsoft Visual C++. When I started, neither were standards compliant (although I don’t think any where). One of my first personal projects involved CORBA. I’m very well conversant in this language, and if I need to I can get right back into it.
  • ColdFusion—I try to block this experience from my mind, but it did drive home the need for a good MVC architecture.
  • Java—I started by developing a data migration tool, which was actually well constructed even for a first project. I then got into Cocoon and Avalon as an answer to the ColdFusion fiasco. I’m fluent and still using this today.
  • D—The first of the languages I explored for sheer curiosity. It touted itself as a successor to C++ , having enough influence from Java to correct some memory handling snafus and binary compatibility across machines, yet enough C++ to be able to use local libraries natively. Obviously, binary compatibility has to do with how the methods are bound together, which is something that was left undefined in C++. That means it doesn’t matter which compiler is used or what machine the binary was compiled on, it will work if all the supporting libraries are present.
  • C#—While I was never really sold on anything that is born from Microsoft, ignoring it completely is more of a mistake. I wanted to find out where the utility began and the hype ended. Sadly, it was nothing more than a language very similar to Java. Sure it had some nice conveniences which were adopted fairly quickly in Java, so as an evolution it had some value. Nevertheless, I can’t entirely trust a culture which does nothing more than to parrot one voice. Java at least has a diverse world of opinions which provides a healthy base to gain experience from. It’s probably because of my firm cultural bias from C++ and Java aesthetics that make me think that C# code looks ugly. C++ had some tools that would extract specially formatted comments into docs, a system that Java took wholesale and extended into the JavaDoc system. C# ignored the precedence and decided to do things different from Java just to be different.
  • Lisp/JESS—I had to introduce myself for a project where we were using the Java Expert System Shell for a part of it. It’s based on a dialect of Lisp, so it took a lot of getting used to for me. I barely developed a working understanding of the tool.
  • Smalltalk—The second of the languages I explored for sheer curiosity. I wanted to get back to the roots of object oriented programming, and see how these guys did things. It had a profound influence on me, even though I don’t use it everyday. I highly recommend introducing yourself to the language even if you have no intentions of using it.
  • Ruby—I’ve heard so much about how nice this language is to use, and I have to say it is a shear pleasure to use once you’ve caught on to the Ruby culture. Remember how one of the keys to learning a language is to immerse yourself in it? You’ll be rewarded. Of course, now I love Ruby on Rails which has greatly influenced how I think web applications should be developed.
  • Perl—I can’t say I’m fluent, but I am conversant enough. I was hired to migrate an application to a more modern architecture (with newer versions of Perl and DBI libraries).

That’s a lot of stuff. I’ve got all these languages, both spoken and programming, rummaging around my head with different levels of proficiency. However, by the time you’ve learned your third language it becomes easier. You start to see similarities between them, and you start thinking about them abstractly. Those notions help you learn the new languages more quickly. When you realize that Greek adjectives are positioned grammatically like Spanish, and that Japanese sentence structure is somewhat like Greek (with the verb at the end) you’re building on concepts you’ve already learned.

I’m not finished yet. There will be new languages I have interest in the future. They’ll likely be programming languages, but I’m not ruling out learning another spoken language—or even expanding my knowledge in one of the ones I barely know. Constant learning keeps you sharp, and the good thing is that you can do it a little bit at a time. I don’t have the time, spare money, or the inkling to sit down in a classroom these days. However, if I can fit a little study in here and there, I’m happy. I’ve learned that there are more similarities than differences in all these languages and cultures.

Tag Suggestion from Content

Posted by Berin Loritsch Tue, 28 Aug 2007 12:59:00 GMT

I am researching what companies have technologies can suggest tags from the content of posts. For example, if I post a blog entry, the technology would automatically tag my content with appropriate tags. The way most link tagging sites like del.icio.us and ma.gnolia.com perform this task is by taking tags other people have used and you have used to suggest something. That’s great when you have several different people all tagging and marking links differently. That’s not so great when there is a central content like Flickr or a blog like this one. There’s one copy of the article, one copy of a picture, etc. There just isn’t a wider pool of tags to suggest from. The only way then is to analyze the content, and see how other similar content has been marked.

The academia approach involves natural language processing, storing contextual models of both the tag space and the content. You can get pretty accurate with that kind of approach, and even discover when there are tags that are misspelled but mean the same thing, etc. I’ve done some searching around and found Language Computer which is the research arm of Lymba . I also found a paper from TagAssist about the topic.

No matter how you slice it, this approach is going to take some number crunching and disk space. That means multiple machines to process the content on the way in. For very low volume submission sites like my blog, it might be possible to do everything on one machine. For higher volume submission sites like the one I’m working on, that’s a real problem to work through.

The question I have, and I haven’t been able to find much on the subject, is if there are low-tech solutions that will get us 50 percent of the way there for a little investment. We may have to do this “cool” integration at a later stage, depending on the costs involved. I need to find a set of alternatives and choose what will be the best match, but this is a relatively new application for this type of technology. If anyone has some clues, please let me know.