Wednesday, January 27, 2016

ML based prediction with PostgreSQL & PL/R in four rounds - I

By means of PL/R,  PostgreSQL can execute R code inside the database server since 2008 or so (welcome MS SQL Server 2016. ;->).

I tried to teach PostgreSQL a new trick lately with the help of PL/R and here are the results...

Round 1:

Let's start with a very simple example of supervised machine learning, using a Support vector machine (SVM) from the e1071 package that I stole from here.

When you run this, the output is something like this

predclasses

ab
a40
b06

This is the confusion matrix of the SVM, and it tells us, that it predicted all classes correctly from the numerical input.

Which in this case is not surprising, because we cross validated with the very same data used for training. Usually you don't do this but split the data into a training and a validation set, but for the sake of brevity this model will do.

The model is now ready to predict classes from numerical input, like so:

And  this is already all we need for a naive implementation of a SVM based predictor function in PostgreSQL.

Let's try...

select r_predict1(7);

a

select r_predict1(2);

b

Whoa, it lives! :-) But what about performance? Let's run this statement three times:

select s.*, r_predict1(s.*) from generate_series(1,1000) s;

1.4 seconds for each run. Average: 1.4 s.

That's not exactly stellar.

Can we do better?

2 comments:

  1. If declaring library, classes ... is expensive, I would suggest to pass all values at once calling and use a loop inside function.

    ReplyDelete