Thursday, April 22, 2010

DSLs and Insomnia (a few notes)

Intro
This has nothing to do with Rails on a superficial level.

On a deeper level, since Rails is a DSL (Domain Specific Language) it probably might have to do with Rails in a tangential sense.

To wit: I'm refactoring the code base for an application I was spending my nights and weekends on over the past year or so. The application has to do with a fairly wide range of numerical models and there is always that point where I have to suck it up when thinking that Ruby doesn't have much to do with Statistics per se.

Or numerical wonkery in general. It seems that the practical application of math ideas has been conquered by Python (witness all the books on Machine Learning written for Python coders and the veritable dearth of any such work for Rubyists).

Anyways, while in the grips of some cold medicine (which never lets me sleep right), I figured I might as well make some proper use of my time and started looking at the 50,000 km view of my codebase and came to the conclusion that I was going to make a DSL to deal with all the funkiness of my particular problem domain.

But, first I needed to deal with some very basic statistical methods. Sums, averages, variances, standard deviations are all methods that I need as a core base of functions.

Furthermore, I would like my application to have the ability to be generic enough that an end user who would know that he or she wants the standard deviation for a set of numbers would be able to type something like out st_dev [1, 2, 3, 4, 5] and come up with the answer (something like 1.58) without having to deal with anything more complicated than the command line.

So, this is how I implemented a small DSL which handles the four methods I've listed above and how one can use it without having to learn Ruby or fire up their spreadsheet.

to start: Create a basic class:

class Stat
def self.load(filename)
new.instance_eval(File.read(filename), filename)
end
end

Now this simple method was something of a mystery to me when I first read encountered it (see reading list at the bottom of the page) but essentially what it does is take a string and then assume that the first word of the next line is a method within that class and then evaluate that line.

This is only slightly different than typing ruby -e eval "puts 'ruby is perfect' " at your command prompt.

What it does allow you to do is build up a small domain language with nothing but methods.

Let me show you what I mean...

Sum
Definition: "Sum" will be a keyword that will take an array of items and then return the sum of the set. Example: sum [1, 2, 3, 4, 5]

Out
Definition: "Out" will be the keyword which prints out the result of the function. Example: out sum[1, 2, 3, 4, 5]

Implementation:

def sum elements
total = 0
elements.each {|x| total += x }
return total
end

def out(expression)
puts expression
end
With these two methods, I'm able to create a separate file called "one.stat" and type in out sum[1, 2, 3, 4, 5] and nearly be able to get a response from my DSL.

I say nearly because there is a pretty important command that I left off and that is the one which calls the interpreter and then evaluates all of the separate items.

Since the class is called Stat, then the last line of the stat.rb should be "Stat.load(ARGV.shift)" which in plain speak means "create an instance of the Stat class and call the load function on the name of the file that was typed in".

In other words, now I can save my "one.stat" file and at the command prompt type "ruby stat.rb one.stat" and I will see on the command line that the number "15" has been printed.

Conclusion
That's pretty much the beginning and the end of the simplest DSL tutorial you will probably find on the internet. As you can see, there were no state machines, no heaps or stacks to deal with, no Turing machines or any of the other nonsense that gets piled on when Rubyists start pontificating on the excellence of a DSL.

I'll leave the mean, variance, and standard deviation methods up to the reader for their own edification. Also, once I manage to figure out why github and I are not getting along, I will provide the source code to this little excursion into DSL land as an addendum.

Remember, a domain specific language can make some problems easier (making a web app with CGI in Ruby is an exercise in poking your eyes out with torn fingernails - believe me: I've tried) but, you end up with a sad side effect that you now have two codebases, the one in Ruby and the one in your DSL.

Reading notes:
http://www.valibuk.net/2009/03/domain-specific-languages-in-ruby/ is one of the simpler introductions to creating DSLs in Ruby but also gets into creating a mini Turing machine which, while cute - is not a very good example of how practical this technique is.

http://www.artima.com/rubycs/articles/ruby_as_dsl3.html is considered one of the more "definitive" texts on creating DSLs in Ruby and I found it more arcane than the previous one.