patrick on rails: A "hackish" data loader

If I'm going to be dealing with whacks of data, I want a fairly clean way of loading the data into the database in the format I specified.

So, while the first order of business would be the creation of the PaperTrade class (which will be the first gem I release out of this work), I wanted a quick and dirty way to load the files so I had to do a few things first.

0. Create the table on the database.

So, I created a sqlite3 file called "db.db" (because I like palindromes)
I used the table definition below (at some point I'll come back and create a migration for this
CREATE TABLE stocks(
id integer primary key,
name varchar(20),
date integer,
open float,
high float,
low float,
close float,
volume integer,
adj_close float);

1. Creation of the Data Loader class.
There are a few notes to this:

I use a couple of modules to abstract out the ActiveRecord calls so that the loader script which creates the DataLoad class will not have to deal with anything - the only part that is a bit unwieldy is that there is an "include Tables" to pull in the creation of the abstracted Stocks class. On the whole, I believe there should be a more elegant way of dealing with this but for now, it works. The only time I think it will bite me will be if I build more on to this so that multiple tables are created. That could get overly messy
The date format that the csv data I receive back from Yahoo! is DD/MM/YY which apparently is what it thinks all Canadian localised date formats should be. So, I had to create a twist date method to deal with that and it's highly idiosyncratic to my uses - you may not get the same mileage if your date data does not comply with that format. Also, there is a bit of a "Y19K" thing going on so don't import any data prior to 1919 or use this class after 2019 ;-)
Since I felt so "clever" with my Tables include, I decided to do one for the require statements.
module Requires
require 'activerecord'
end

module Tables
class Stock < ActiveRecord::Base
end
end

class DataLoad
include Requires
def initialize(data_file_load, stock_name, db_file_name)
@file_load = data_file_load
@name = stock_name
@db_file_name = db_file_name
@adapter = 'sqlite3'
end

def connect_to_db
# Connect to a database
ActiveRecord::Base.establish_connection(
{:adapter => @adapter,
:database => @db_file_name})
end

def twist_date(stringer)
months = ["", "jan", "feb", "mar", "apr", "may", "jun", "jul",
"aug", "sep", "oct", "nov", "dec"]
split_string = stringer.split('/')
short_date = split_string.last
if short_date.to_i > 19 then
return_date = '19' + short_date
else
return_date = '20' + short_date
end
return [return_date.to_i, months[split_string[1].to_i], split_string[0].to_i]
end

def data_file_load
@data_load = []
File.open(@file_load) do |file|
while line = file.gets
@data_load << line.chomp!
end
@headers = @data_load.shift
@headers.gsub!(/\"/,'')
end
end

def csv_to_db
@data_load.each do |line|
nter = Stock.new
nter.name = @name
date_twisted = twist_date(line.split(',').first)
nter.date = Time.local(date_twisted.first, date_twisted[1], date_twisted.last).to_i
nter.open = line.split(',')[1].to_f
nter.high = line.split(',')[2].to_f
nter.low = line.split(',')[3].to_f
nter.close = line.split(',')[4].to_f
nter.volume = line.split(',')[5].to_i
nter.adj_close = line.split(',')[6].to_f
nter.save
end
end
end #class End

At this point in time, I haven't written any unit tests but that should be in an upcoming post.

2. The data loader script:

# This is a script to load tables to the database

require 'data_load_class'

puts "enter the name of the csv file to load"
data_file_load = gets.chomp!

puts "enter the name of the identifer for the stock ex: USO"
stock_name = gets.chomp!

puts "enter the name of the database file"
db_file_name = gets.chomp!

a_data_loader = DataLoad.new(data_file_load, stock_name, db_file_name)

a_data_loader.connect_to_db

# We can only create the tables after we've connected
# to the database
include Tables

# Get the data
a_data_loader.data_file_load

# Load 'em up
a_data_loader.csv_to_db

Since I'm blogging the code as it exists today, there are a bunch of unpolished bits.

The parts I'm most unhappy about are:

The whole creating the ActiveRecord Stock class. I did quite a bit of reading on how to do this and wasn't pleased with any of the current solutions. I think Rails has it's own way of doing this so that's a study task for me.
Creating my own Y19K bug wasn't too pleasing but, I wanted to deal with the data as I was receiving it and not have ot 'normalise' the data through OOCalc or Excel if at all possible.
Lack of unit tests - I recognize this is a bit askew from my last post but, the fact of the matter is that I have not yet internalised the whole TDD because I have not made up my mind as to whether or not BDD (a la RSpec) makes more sense to me.

See you soon.

patrick on rails

Sunday, February 8, 2009

A "hackish" data loader

No comments:

Post a Comment

Followers

Blog Archive

About Me