Thursday, April 22, 2010

DSLs and Insomnia (a few notes)

Intro
This has nothing to do with Rails on a superficial level.

On a deeper level, since Rails is a DSL (Domain Specific Language) it probably might have to do with Rails in a tangential sense.

To wit: I'm refactoring the code base for an application I was spending my nights and weekends on over the past year or so. The application has to do with a fairly wide range of numerical models and there is always that point where I have to suck it up when thinking that Ruby doesn't have much to do with Statistics per se.

Or numerical wonkery in general. It seems that the practical application of math ideas has been conquered by Python (witness all the books on Machine Learning written for Python coders and the veritable dearth of any such work for Rubyists).

Anyways, while in the grips of some cold medicine (which never lets me sleep right), I figured I might as well make some proper use of my time and started looking at the 50,000 km view of my codebase and came to the conclusion that I was going to make a DSL to deal with all the funkiness of my particular problem domain.

But, first I needed to deal with some very basic statistical methods. Sums, averages, variances, standard deviations are all methods that I need as a core base of functions.

Furthermore, I would like my application to have the ability to be generic enough that an end user who would know that he or she wants the standard deviation for a set of numbers would be able to type something like out st_dev [1, 2, 3, 4, 5] and come up with the answer (something like 1.58) without having to deal with anything more complicated than the command line.

So, this is how I implemented a small DSL which handles the four methods I've listed above and how one can use it without having to learn Ruby or fire up their spreadsheet.

to start: Create a basic class:

class Stat
def self.load(filename)
new.instance_eval(File.read(filename), filename)
end
end

Now this simple method was something of a mystery to me when I first read encountered it (see reading list at the bottom of the page) but essentially what it does is take a string and then assume that the first word of the next line is a method within that class and then evaluate that line.

This is only slightly different than typing ruby -e eval "puts 'ruby is perfect' " at your command prompt.

What it does allow you to do is build up a small domain language with nothing but methods.

Let me show you what I mean...

Sum
Definition: "Sum" will be a keyword that will take an array of items and then return the sum of the set. Example: sum [1, 2, 3, 4, 5]

Out
Definition: "Out" will be the keyword which prints out the result of the function. Example: out sum[1, 2, 3, 4, 5]

Implementation:

def sum elements
total = 0
elements.each {|x| total += x }
return total
end

def out(expression)
puts expression
end
With these two methods, I'm able to create a separate file called "one.stat" and type in out sum[1, 2, 3, 4, 5] and nearly be able to get a response from my DSL.

I say nearly because there is a pretty important command that I left off and that is the one which calls the interpreter and then evaluates all of the separate items.

Since the class is called Stat, then the last line of the stat.rb should be "Stat.load(ARGV.shift)" which in plain speak means "create an instance of the Stat class and call the load function on the name of the file that was typed in".

In other words, now I can save my "one.stat" file and at the command prompt type "ruby stat.rb one.stat" and I will see on the command line that the number "15" has been printed.

Conclusion
That's pretty much the beginning and the end of the simplest DSL tutorial you will probably find on the internet. As you can see, there were no state machines, no heaps or stacks to deal with, no Turing machines or any of the other nonsense that gets piled on when Rubyists start pontificating on the excellence of a DSL.

I'll leave the mean, variance, and standard deviation methods up to the reader for their own edification. Also, once I manage to figure out why github and I are not getting along, I will provide the source code to this little excursion into DSL land as an addendum.

Remember, a domain specific language can make some problems easier (making a web app with CGI in Ruby is an exercise in poking your eyes out with torn fingernails - believe me: I've tried) but, you end up with a sad side effect that you now have two codebases, the one in Ruby and the one in your DSL.

Reading notes:
http://www.valibuk.net/2009/03/domain-specific-languages-in-ruby/ is one of the simpler introductions to creating DSLs in Ruby but also gets into creating a mini Turing machine which, while cute - is not a very good example of how practical this technique is.

http://www.artima.com/rubycs/articles/ruby_as_dsl3.html is considered one of the more "definitive" texts on creating DSLs in Ruby and I found it more arcane than the previous one.

Tuesday, June 23, 2009

ID3A update

At this point it's all in the hands of the beta testers.
One found an issue with XServer and FXRuby on MacOS because I had set the row header width to 0.

I confirmed on my eeepc and changed it to 1 pixel.
Problem finished.

The expectation is that I have a tech demo tomorrow so I pushed through the big changes version v.02 which boiled down to a restructuring of the layout, more unit tests, and a batch query process.

On the Drails front, I've renamed it to FXCess.

Using the template that I started with PaperTrade, I was able to quickly build some unit tests which confirmed that I can not only use ActiveRecord to interact with a small sqlite3 db but, also build using TDD which, for a project in which I have a very clear idea of what's going to happen is the fastest way to build.

So far:

C:\fxcess\test>ruby test_all.rb
Loaded suite test_all
Started
.....
Finished in 0.531 seconds.

5 tests, 7 assertions, 0 failures, 0 errors

Friday, June 19, 2009

ID3A is born

Well, after several weeks of hitting my head against the wall of my imposed deadline, I am releasing ID3A (ID3 Analyser) to beta testers tonight.

There are a lot of lessons learned in this first rush to beta and I'm going to jot them down now so I don't forget:
  • Ruby rules in just about every way. The ease of the programming language provided me the flexibility to bounce the code around and find out where things went wrong faster than any other language I've used before.
  • While I was kvetching about the lack of an IDE last time, I have to point out that FXRuby is so straightforward that one only needs to use google and the API to put pieces together. Case in point, I needed a file dialog form and initially I was afraid that I would have to cobble one together but, the FXFileDialog widget was already created and packed more functionality than I really needed.
  • requires matter - as in where to put the 'require' statement. This should be top level and make sure that the "require 'fox 16' " is the top line. Otherwise you'll have a nice mess in the stdout as FXRuby reloads your requirements a second time
  • Ocra is a funny beast. It's a handy replacement for rubyscript2exe now that the latter is broken due to some changes in the Ruby internals however, it has some of it's own "gotchas" such as: location matters when compiling the Ruby application. Since I was coding on a virtual Windows XP system, I found that if I compiled on the desktop, I could only run it that desktop. Finally, I compiled in the root folder (C:\) and found that it would work in my Vista system's desktop.
I'm releasing a beta first as I want to take another whack at the code and "pretty it up" before releasing the actual source. Since I'm using three OSS apps (ai4r, fxruby, ocra) it feels correct to release ID3A's source code at some point.

I'm grateful that one of my beta testers is none other than Sergio Fierens (author of ai4r) and I found him to be a great resource for a functionality that I wanted to add to my app which wasn't explicitly built into ai4r.

Right now the majority of the beta testers are stock market traders so I expect that there is a strong probability that there may be changes to the application skewed towards that contingent.

While this might seem a far afield from 'rails' it's not - I am beginning to expect that it would be a very simple thing to 'port' RoR onto the desktop.

I hereby dub that project "Drails" (Desktop Rails).

Wednesday, June 17, 2009

Beta Cut 1 of ID3 app

Getting laid off was something that really impacted my development process as I've had to focus the marjority of my time on finding another job.

While that has not borne fruit as of yet, I have completed two drafts of a white paper on creating classification trees (a la ID3) and thinking about how to build a Windows app in Ruby.

Thankfully, there is FXRuby - which once one gets over the annoyance of there not being a sensible IDE to build the interface, the actual coding is pretty straightforward. Coming from a background of creating GUI interfaces in VBA for 8 years, it's a new experience to have to manually type out the location, properties, and relationships for each widget.

Fortunately, Ruby gets the backend portion right making anything cobbled together in Visual Basic (including .NET) to be fairly lacking in robustness.

Having looked at a few of the GUI kits in Java, I am also thankful that the author of FXRuby has adhered to Ruby standards and not twisted it into some Ruby looking, Java accented monstrosity to build around. There are enough Java coders out there doing nasty stuff to Ruby already as it is.

The actual compilation to a Windows app is much easier with Ocra than the previous rubyscript2exe and allows for some niceties such as bundling in specific dlls. Once this project is out to the beta testers, I'm going to take a whack at creating a generic database creation tool with FXRuby and sqlite3. Sort of a poor man's Access if you will.

Sunday, February 15, 2009

A slight Change in Dev Environs

Vista 64bit doesn't seem to like me at the moment as I am having issues installing software.

Fortunately, for the project and the timeline I have, I can easily switch to Linux as I am doing currently for the sole purpose of adding graphs.

Now this causes a problem in that I wanted to port this app using the Ruby2exe application, and while that application will create a Windows exe from within Linux - the problem is that the graphing was initially going to be done with Gruff and that requires ImageMagick to be installed in one form or another on the target system.

So, I'll have to root around for another graphing subsystem or perhaps use jRuby (which I would like to avoid at all costs because it's not portable in a way that is simple for the end user).

New Code
module StatTools
def sum(seq)
total = 0
seq.each do |line|
total += line
end

def mean(seq)
return sum(seq) / seq.size.to_f
end

def moving_average(seq, num)
total = []
pad = num - 1
1.upto(pad) do |line|
total << 0
end
num.upto(seq.size) do |line|
total << mean(seq.slice(line - num, num))
end
end

Sunday, February 8, 2009

A "hackish" data loader

If I'm going to be dealing with whacks of data, I want a fairly clean way of loading the data into the database in the format I specified.

So, while the first order of business would be the creation of the PaperTrade class (which will be the first gem I release out of this work), I wanted a quick and dirty way to load the files so I had to do a few things first.

0. Create the table on the database.
  • So, I created a sqlite3 file called "db.db" (because I like palindromes)
  • I used the table definition below (at some point I'll come back and create a migration for this
  • CREATE TABLE stocks(
    id integer primary key,
    name varchar(20),
    date integer,
    open float,
    high float,
    low float,
    close float,
    volume integer,
    adj_close float);
1. Creation of the Data Loader class.
There are a few notes to this:
  • I use a couple of modules to abstract out the ActiveRecord calls so that the loader script which creates the DataLoad class will not have to deal with anything - the only part that is a bit unwieldy is that there is an "include Tables" to pull in the creation of the abstracted Stocks class. On the whole, I believe there should be a more elegant way of dealing with this but for now, it works. The only time I think it will bite me will be if I build more on to this so that multiple tables are created. That could get overly messy
  • The date format that the csv data I receive back from Yahoo! is DD/MM/YY which apparently is what it thinks all Canadian localised date formats should be. So, I had to create a twist date method to deal with that and it's highly idiosyncratic to my uses - you may not get the same mileage if your date data does not comply with that format. Also, there is a bit of a "Y19K" thing going on so don't import any data prior to 1919 or use this class after 2019 ;-)
  • Since I felt so "clever" with my Tables include, I decided to do one for the require statements.
  • module Requires
    require 'activerecord'
    end

    module Tables
    class Stock < ActiveRecord::Base
    end
    end

    class DataLoad
    include Requires
    def initialize(data_file_load, stock_name, db_file_name)
    @file_load = data_file_load
    @name = stock_name
    @db_file_name = db_file_name
    @adapter = 'sqlite3'
    end

    def connect_to_db
    # Connect to a database
    ActiveRecord::Base.establish_connection(
    {:adapter => @adapter,
    :database => @db_file_name})
    end

    def twist_date(stringer)
    months = ["", "jan", "feb", "mar", "apr", "may", "jun", "jul",
    "aug", "sep", "oct", "nov", "dec"]
    split_string = stringer.split('/')
    short_date = split_string.last
    if short_date.to_i > 19 then
    return_date = '19' + short_date
    else
    return_date = '20' + short_date
    end
    return [return_date.to_i, months[split_string[1].to_i], split_string[0].to_i]
    end

    def data_file_load
    @data_load = []
    File.open(@file_load) do |file|
    while line = file.gets
    @data_load << line.chomp!
    end
    @headers = @data_load.shift
    @headers.gsub!(/\"/,'')
    end
    end

    def csv_to_db
    @data_load.each do |line|
    nter = Stock.new
    nter.name = @name
    date_twisted = twist_date(line.split(',').first)
    nter.date = Time.local(date_twisted.first, date_twisted[1], date_twisted.last).to_i
    nter.open = line.split(',')[1].to_f
    nter.high = line.split(',')[2].to_f
    nter.low = line.split(',')[3].to_f
    nter.close = line.split(',')[4].to_f
    nter.volume = line.split(',')[5].to_i
    nter.adj_close = line.split(',')[6].to_f
    nter.save
    end
    end
    end #class End
At this point in time, I haven't written any unit tests but that should be in an upcoming post.

2. The data loader script:
  • # This is a script to load tables to the database

    require 'data_load_class'

    puts "enter the name of the csv file to load"
    data_file_load = gets.chomp!

    puts "enter the name of the identifer for the stock ex: USO"
    stock_name = gets.chomp!

    puts "enter the name of the database file"
    db_file_name = gets.chomp!

    a_data_loader = DataLoad.new(data_file_load, stock_name, db_file_name)

    a_data_loader.connect_to_db

    # We can only create the tables after we've connected
    # to the database
    include Tables

    # Get the data
    a_data_loader.data_file_load

    # Load 'em up
    a_data_loader.csv_to_db
Since I'm blogging the code as it exists today, there are a bunch of unpolished bits.

The parts I'm most unhappy about are:
  • The whole creating the ActiveRecord Stock class. I did quite a bit of reading on how to do this and wasn't pleased with any of the current solutions. I think Rails has it's own way of doing this so that's a study task for me.
  • Creating my own Y19K bug wasn't too pleasing but, I wanted to deal with the data as I was receiving it and not have ot 'normalise' the data through OOCalc or Excel if at all possible.
  • Lack of unit tests - I recognize this is a bit askew from my last post but, the fact of the matter is that I have not yet internalised the whole TDD because I have not made up my mind as to whether or not BDD (a la RSpec) makes more sense to me.
See you soon.

Wednesday, February 4, 2009

A (thumbnail) functional design

See-ell-o (I keep toying with the name) will have two fronts but with one core.

The two fronts will be:
  1. Utilizing Rails framework aimed at the inter(tubes)
  2. Utilizing FXRuby and focused on Windows(tm) and Wine(tm) targets
The back end will start out as sqlite3 but may morph into mysql on the web end all things depending on how the development goes.

Rationale
The functional purpose is to leverage as much of the ActiveRecord goodness there is because, for the most part there will be one gargantuan table to pull from as I do not intend to over normalize one bit.

DB Schema (version 0.0)
Let me explain, since the application will be creating various options for reversal to mean genomes in a GA solution candidate, there will not be any real need to create a table for each equity so, for all purposes there will be one table with the following structures layout:
  • id
  • name
  • date (in unixtime)
  • open
  • high
  • low
  • close
  • volume
  • adjusted close
The first release will be primary a wizard which walks the user through the process of importing a csv file (from either google or yahoo) and loading up the database with that information.

Then selections such as:
  • population size
  • generations
  • cross over
  • mutation
  • reproduction selection (how much % of qualified candidates will breed)
and then the application will rip into the data creating the all the necessary ancillary data (moving day averages) on the fly and calculate the fitness function for each candidate.

Fitness function will operate as follows:
  1. A class called "Paper Trade" will be created using data from test portion of data range
  2. The candidate will take the starting equity and using buy and sell rules trade the "tape"
  3. If the candidate finishes the "tape" with a positive equity then it will survive, if it is negative, then that is immediate grounds for removal as a breeder
Finally, (either using Ruport or Gruff) a report will be generated with various levels of peformance metrics.

Testing

I will be using rcov to identify where to build the tests.

So, I intend to provide unit tests to keep my skills sharp in that endeavor but I will not be bound by them except at the point where I will be releasing this application as a gem (for desktop) or a Rails app (for the burgeoning Web 2.x).

Development Tools
My primary development environment will be my Windows Vista 64bit Home Premium system. While I realise this might not be the most sexy environment (aka a Macintosh) or the most robust (Linux Mint excellence) - the fact of the matter is that there are specific challenges I mean to address for those Ruby on Windows users.

Editor => irb, Notepad++
Database => Sqlite3

notes:
I use irb as my main development tool. By adding the irb-history module (notes here) and overworking the Marshal.dump and Marshal.load features - I am almost able to reach the Lisp goodness that allows the entire state to be saved and loaded for development.

Since, irb is my main battleground and joy - it would be overkill to actually install Eclipse/Netbeans/Aptana Studio on my system and I would find it very counter productive.

sqlite3 on Vista 64 bit:
If you found this blog by using the above line as a search criteria, then something has probably gone very wrong with the Google Spider.

What I can tell you, hapless wanderer is what worked for me was to create a directory on the C: drive called sqlite and inside it put both the .dll and the executable.

Then I simply added the folder to the path and voila - instant sqlite goodness on my Windows machine. (It took me the better part of an evening to finally figure that one out)

Why Windows?
Persistent bugger aren't you?
Well, here are a few thoughts on the whole Ruby/Windows mess:
  1. Everyone knows that Ruby runs very slow on Windows.
  2. Except me - the first production quality app I developed was deployed on Windows and it was a major fail. It wasn't until much later that I discovered this is a long standing problem with Ruby.
  3. At the end of the day, there is a magnitude or order more Windows users than the other platforms combined so not optimising the code for that platform would be akin to biting off my nose to spite my face.
  4. Knowing how Ruby works on Windows should make me more adept at working in multiple environments. With the current economic circumstances, it seems prudent to understand as many possible variations as I can.
  5. Also, by supporting Wine as a platform, I can write one set of code and deploy on either Mac or *nix at will.
  6. I'm not an OS elitist. Case in point, when I was spending my days as a graphic artist on Macs, I was spending my evenings tinkering with Windows 95/98 shareware to do similar work and on weekends was taking classes that used Amigas.
Back to the lab.