Sunday, February 15, 2009

A slight Change in Dev Environs

Vista 64bit doesn't seem to like me at the moment as I am having issues installing software.

Fortunately, for the project and the timeline I have, I can easily switch to Linux as I am doing currently for the sole purpose of adding graphs.

Now this causes a problem in that I wanted to port this app using the Ruby2exe application, and while that application will create a Windows exe from within Linux - the problem is that the graphing was initially going to be done with Gruff and that requires ImageMagick to be installed in one form or another on the target system.

So, I'll have to root around for another graphing subsystem or perhaps use jRuby (which I would like to avoid at all costs because it's not portable in a way that is simple for the end user).

New Code
module StatTools
def sum(seq)
total = 0
seq.each do |line|
total += line
end

def mean(seq)
return sum(seq) / seq.size.to_f
end

def moving_average(seq, num)
total = []
pad = num - 1
1.upto(pad) do |line|
total << 0
end
num.upto(seq.size) do |line|
total << mean(seq.slice(line - num, num))
end
end

Sunday, February 8, 2009

A "hackish" data loader

If I'm going to be dealing with whacks of data, I want a fairly clean way of loading the data into the database in the format I specified.

So, while the first order of business would be the creation of the PaperTrade class (which will be the first gem I release out of this work), I wanted a quick and dirty way to load the files so I had to do a few things first.

0. Create the table on the database.
  • So, I created a sqlite3 file called "db.db" (because I like palindromes)
  • I used the table definition below (at some point I'll come back and create a migration for this
  • CREATE TABLE stocks(
    id integer primary key,
    name varchar(20),
    date integer,
    open float,
    high float,
    low float,
    close float,
    volume integer,
    adj_close float);
1. Creation of the Data Loader class.
There are a few notes to this:
  • I use a couple of modules to abstract out the ActiveRecord calls so that the loader script which creates the DataLoad class will not have to deal with anything - the only part that is a bit unwieldy is that there is an "include Tables" to pull in the creation of the abstracted Stocks class. On the whole, I believe there should be a more elegant way of dealing with this but for now, it works. The only time I think it will bite me will be if I build more on to this so that multiple tables are created. That could get overly messy
  • The date format that the csv data I receive back from Yahoo! is DD/MM/YY which apparently is what it thinks all Canadian localised date formats should be. So, I had to create a twist date method to deal with that and it's highly idiosyncratic to my uses - you may not get the same mileage if your date data does not comply with that format. Also, there is a bit of a "Y19K" thing going on so don't import any data prior to 1919 or use this class after 2019 ;-)
  • Since I felt so "clever" with my Tables include, I decided to do one for the require statements.
  • module Requires
    require 'activerecord'
    end

    module Tables
    class Stock < ActiveRecord::Base
    end
    end

    class DataLoad
    include Requires
    def initialize(data_file_load, stock_name, db_file_name)
    @file_load = data_file_load
    @name = stock_name
    @db_file_name = db_file_name
    @adapter = 'sqlite3'
    end

    def connect_to_db
    # Connect to a database
    ActiveRecord::Base.establish_connection(
    {:adapter => @adapter,
    :database => @db_file_name})
    end

    def twist_date(stringer)
    months = ["", "jan", "feb", "mar", "apr", "may", "jun", "jul",
    "aug", "sep", "oct", "nov", "dec"]
    split_string = stringer.split('/')
    short_date = split_string.last
    if short_date.to_i > 19 then
    return_date = '19' + short_date
    else
    return_date = '20' + short_date
    end
    return [return_date.to_i, months[split_string[1].to_i], split_string[0].to_i]
    end

    def data_file_load
    @data_load = []
    File.open(@file_load) do |file|
    while line = file.gets
    @data_load << line.chomp!
    end
    @headers = @data_load.shift
    @headers.gsub!(/\"/,'')
    end
    end

    def csv_to_db
    @data_load.each do |line|
    nter = Stock.new
    nter.name = @name
    date_twisted = twist_date(line.split(',').first)
    nter.date = Time.local(date_twisted.first, date_twisted[1], date_twisted.last).to_i
    nter.open = line.split(',')[1].to_f
    nter.high = line.split(',')[2].to_f
    nter.low = line.split(',')[3].to_f
    nter.close = line.split(',')[4].to_f
    nter.volume = line.split(',')[5].to_i
    nter.adj_close = line.split(',')[6].to_f
    nter.save
    end
    end
    end #class End
At this point in time, I haven't written any unit tests but that should be in an upcoming post.

2. The data loader script:
  • # This is a script to load tables to the database

    require 'data_load_class'

    puts "enter the name of the csv file to load"
    data_file_load = gets.chomp!

    puts "enter the name of the identifer for the stock ex: USO"
    stock_name = gets.chomp!

    puts "enter the name of the database file"
    db_file_name = gets.chomp!

    a_data_loader = DataLoad.new(data_file_load, stock_name, db_file_name)

    a_data_loader.connect_to_db

    # We can only create the tables after we've connected
    # to the database
    include Tables

    # Get the data
    a_data_loader.data_file_load

    # Load 'em up
    a_data_loader.csv_to_db
Since I'm blogging the code as it exists today, there are a bunch of unpolished bits.

The parts I'm most unhappy about are:
  • The whole creating the ActiveRecord Stock class. I did quite a bit of reading on how to do this and wasn't pleased with any of the current solutions. I think Rails has it's own way of doing this so that's a study task for me.
  • Creating my own Y19K bug wasn't too pleasing but, I wanted to deal with the data as I was receiving it and not have ot 'normalise' the data through OOCalc or Excel if at all possible.
  • Lack of unit tests - I recognize this is a bit askew from my last post but, the fact of the matter is that I have not yet internalised the whole TDD because I have not made up my mind as to whether or not BDD (a la RSpec) makes more sense to me.
See you soon.

Wednesday, February 4, 2009

A (thumbnail) functional design

See-ell-o (I keep toying with the name) will have two fronts but with one core.

The two fronts will be:
  1. Utilizing Rails framework aimed at the inter(tubes)
  2. Utilizing FXRuby and focused on Windows(tm) and Wine(tm) targets
The back end will start out as sqlite3 but may morph into mysql on the web end all things depending on how the development goes.

Rationale
The functional purpose is to leverage as much of the ActiveRecord goodness there is because, for the most part there will be one gargantuan table to pull from as I do not intend to over normalize one bit.

DB Schema (version 0.0)
Let me explain, since the application will be creating various options for reversal to mean genomes in a GA solution candidate, there will not be any real need to create a table for each equity so, for all purposes there will be one table with the following structures layout:
  • id
  • name
  • date (in unixtime)
  • open
  • high
  • low
  • close
  • volume
  • adjusted close
The first release will be primary a wizard which walks the user through the process of importing a csv file (from either google or yahoo) and loading up the database with that information.

Then selections such as:
  • population size
  • generations
  • cross over
  • mutation
  • reproduction selection (how much % of qualified candidates will breed)
and then the application will rip into the data creating the all the necessary ancillary data (moving day averages) on the fly and calculate the fitness function for each candidate.

Fitness function will operate as follows:
  1. A class called "Paper Trade" will be created using data from test portion of data range
  2. The candidate will take the starting equity and using buy and sell rules trade the "tape"
  3. If the candidate finishes the "tape" with a positive equity then it will survive, if it is negative, then that is immediate grounds for removal as a breeder
Finally, (either using Ruport or Gruff) a report will be generated with various levels of peformance metrics.

Testing

I will be using rcov to identify where to build the tests.

So, I intend to provide unit tests to keep my skills sharp in that endeavor but I will not be bound by them except at the point where I will be releasing this application as a gem (for desktop) or a Rails app (for the burgeoning Web 2.x).

Development Tools
My primary development environment will be my Windows Vista 64bit Home Premium system. While I realise this might not be the most sexy environment (aka a Macintosh) or the most robust (Linux Mint excellence) - the fact of the matter is that there are specific challenges I mean to address for those Ruby on Windows users.

Editor => irb, Notepad++
Database => Sqlite3

notes:
I use irb as my main development tool. By adding the irb-history module (notes here) and overworking the Marshal.dump and Marshal.load features - I am almost able to reach the Lisp goodness that allows the entire state to be saved and loaded for development.

Since, irb is my main battleground and joy - it would be overkill to actually install Eclipse/Netbeans/Aptana Studio on my system and I would find it very counter productive.

sqlite3 on Vista 64 bit:
If you found this blog by using the above line as a search criteria, then something has probably gone very wrong with the Google Spider.

What I can tell you, hapless wanderer is what worked for me was to create a directory on the C: drive called sqlite and inside it put both the .dll and the executable.

Then I simply added the folder to the path and voila - instant sqlite goodness on my Windows machine. (It took me the better part of an evening to finally figure that one out)

Why Windows?
Persistent bugger aren't you?
Well, here are a few thoughts on the whole Ruby/Windows mess:
  1. Everyone knows that Ruby runs very slow on Windows.
  2. Except me - the first production quality app I developed was deployed on Windows and it was a major fail. It wasn't until much later that I discovered this is a long standing problem with Ruby.
  3. At the end of the day, there is a magnitude or order more Windows users than the other platforms combined so not optimising the code for that platform would be akin to biting off my nose to spite my face.
  4. Knowing how Ruby works on Windows should make me more adept at working in multiple environments. With the current economic circumstances, it seems prudent to understand as many possible variations as I can.
  5. Also, by supporting Wine as a platform, I can write one set of code and deploy on either Mac or *nix at will.
  6. I'm not an OS elitist. Case in point, when I was spending my days as a graphic artist on Macs, I was spending my evenings tinkering with Windows 95/98 shareware to do similar work and on weekends was taking classes that used Amigas.
Back to the lab.