Friday, February 12, 2016

Fun with a integrarelational DBMS: SoilGrids

PostgreSQL has many ways to extend it's capabilities in well defined ways: Custom datatypes, custom functions, custom operators, even custom indexes.

And then there's the Foreign Data Wrapper, an API to pull (almost) any kind of data into PostgreSQL and treat it (almost) like a native table. There is already an impressive list of them, but sometimes you have to write your own one. Like yesterday, when I wanted to integrate pH data from the SoilGrids1km project into an already existing PostgreSQL/PostGIS system.

The data from SoilGrids is freely available, so I just could have downloaded it and put it into PostGIS. But the data set is a) huge and b) constantly updated and c) they have a REST API, so why not tap into it directly?

Maybe because the native language of the Foreign Data Wrapper is C and that's not exactly well suited for a fast prototype. :-) But then there is Multicorn, a bridge between the FDW API and Python, so I gave it a try...

After two hours, including the installation of Multicorn itself, I had this up and running:

CREATE SERVER soilgrids_srv
   FOREIGN DATA WRAPPER multicorn
  OPTIONS (wrapper 'soilgrids_fdw.SoilGridsForeignDataWrapper');

CREATE FOREIGN TABLE public.soilgrids1km
   (latitude real ,
    longitude real ,
    ph_l real ,
    ph_m real ,
    ph_u real ,
    depth real ,
    publication_date date )
   SERVER soilgrids_srv;

select * from public.soilgrids1km where latitude = '51.57' and longitude = '5.39'

latitude longitude ph_l ph_m ph_u depth  publication_date
51.57    5.39      4.2  5.8  7.4  -0.025 2014-04-02 

I bet a more pythonic person than me could write something like this in under one hour or so.

And that level of extensibility is one of the reasons why I think that PostgreSQL is one awesome DBMS. Integrarelational even...

No comments:

Post a Comment