Recently, I’ve had a chance to play with R’s plumber library and used it to run scripts on a schedule. This post will show how we can bring together R, Docker and the Google Scheduler service to run a R script on a schedule. 

The R Script: Pre-Plumber

Before wrapping the R code in the plumber functions, let’s look at the R script we want to schedule.

# Connect to postgres database 
con <- RPostgres::dbConnect(RPostgres::Postgres(),
dbname = Sys.getenv('dbname'), 
host = Sys.getenv('host'), 
port = Sys.getenv('port'), 
user = Sys.getenv('user'), 
password = Sys.getenv('password'))  

# Read data
input_data <- RPostgres::dbReadTable(con, 'input_data_ozone')

# Load Random Forest Quantile Regression model
qrf <- readRDS(file = "qrf.rds")

# Split features and target variable
X <- input_data[, c('Solar.R', 'Wind', 'Temp', 'Month', 'Day')]

# Produce predictions for quantiles
predictions <- predict(qrf, X, what = c(0.25, 0.50, 0.75))

# Add a column for time to check if the table has been updated
full_predictions <- cbind.data.frame(time = Sys.time(), predictions)  

# Write to DB
RPostgres::dbWriteTable(conn = con, name = "Model_Predictions", value = full_predictions, overwrite = TRUE)  

# Close connection
RPostgres::dbDisconnect(conn = con)

This script connects to a Postgres database (I’ve stored credentials as environmental variables in a  .env file), load in a Random Forest Quantile Regression Model (as an RDS model), use this model to generate predictions and write these predictions back to the database. In a production environment, we would want to update an existing table instead of overwriting the whole table but this suffices for illustration purposes. 

The R Script: Plumber Version

We can wrap this R script with plumber functions so that we can run this script via a HTTP request. We label this script plumber.R

library(RPostgres)
library(DBI)
library(quantregForest)
library(dplyr)
library(lubridate)
library(jsonlite)
library(dotenv)

# Load RFQR model
qrf <- readRDS(file = "qrf.rds")

#* Check token - Request body should contain a token to authenticate HTTP request
#* @filter checkAuth
function(req, res){
  
request <- jsonlite::fromJSON(req$postBody)
  
if(request$token != Sys.getenv('plumber_token_auth')) {
    
  return(list(message = "token is incorrect"))
    
} else {
   plumber::forward()
}
}

#* Update a table in PostGres -- Batch Deployment
#* @post /run
function(){
  
# Connect to postgres database 
con <- RPostgres::dbConnect(RPostgres::Postgres(),
                              dbname = Sys.getenv('dbname'), 
                              host = Sys.getenv('host'),
                              port = Sys.getenv('port'),
                              user = Sys.getenv('user'),
                              password = Sys.getenv('password'))
  
# Read data
input_data <- RPostgres::dbReadTable(con, 'input_data_ozone')
  
# Fit model
## Split features and target variable
X <- input_data[,c('Solar.R', 'Wind', 'Temp', 'Month', 'Day')]
y <- input_data[, c('Ozone')]
  
# Produce predictions for quantiles
predictions <- predict(qrf, X, what = c(0.25, 0.50, 0.75))
  
full_predictions <- cbind.data.frame(time = Sys.time(), predictions)
 
# Write to DB
RPostgres::dbWriteTable(conn = con, name = "Model_Predictions", value = full_predictions, overwrite = TRUE)
  
# Close connection
RPostgres::dbDisconnect(conn = con)

return(list(message = "Update Successful"))
}

The plumber.R has 2 functions, one is a filter which checks the HTTP request body for an access token and if the token matches, the request is forwarded to the second function which updates the table.

Docker and Deploy

We want this plumber.R script to be deployed to an endpoint that we can send HTTP requests to from any computer. The first step to do this is to make a Docker image – the Dockerfile I used to construct the Docker image is:

FROM rocker/verse:4.0.2

RUN R -e "install.packages(c('RPostgres', 'DBI', 'dplyr' ,'lubridate', 'plumber', 'httr', 'quantregForest', 'jsonlite', 'dotenv'), repos = 'http://cran.us.r-project.org')"

COPY plumber.R plumber.R
COPY .env .env
COPY qrf.rds qrf.rds

EXPOSE 8000

ENTRYPOINT ["R", "-e", "library(plumber); library(RPostgres); library(jsonlite); library(quantregForest); library(dotenv); plumb('plumber.R')$run(port=8000, host='0.0.0.0')"]

Once the Docker image is pushed to Dockerhub, we can deploy it via a cloud provider. The particular cloud provider I used to deploy this Docker image is Microsoft Azure. I will cover the details of how to deploy a Docker image in a different post. The Docker image has been deployed to the following endpoint https://ozone-batch-dep.azurewebsites.net/run. We can see that this has worked by triggering the error message in the plumber.R  file with the following cURL request. 
curl -X POST --data '{"token":"foobar"}' "https://ozone-batch-dep.azurewebsites.net/run" 

This will yield an error message that we set it up to return with an incorrect token: {"message":["token is incorrect"]}

The correct cURL request to update the table in the database is:

plumber_token_auth="replacewithcorrecttoken"
curl -X POST --data '{"token":"'"$plumber_token_auth"'"}' "https://ozone-batch-dep.azurewebsites.net/run"

This will yield a message indicating that the update his been successful:
{"message":["Update Successful"]}

Scheduling the Updates

So far, our table updates whenever we send a cURL request to the appropriate endpoint where the message body contains the correct token. If we want this table to be updated on a particular schedule we will need to schedule the sending of HTTP requests to the endpoint. The idea is to schedule the sending of HTTP requests to be as often as when we want the database table to be updated. This is where Google Scheduler can help – it can automate the scheduling and sending of HTTP requests to a specified endpoint.

Moreover, we also see the importance of having an authentication layer over our endpoint. Since the endpoint is exposed to the world, it is possible that other bad actors can send HTTP requests to this endpoint, if there was no authentication, these bad actors can update the database table.

After signing up to Google Cloud’s free trial we go to the Google cloud scheduler here. Going to the console, we can create a job (after defining a project space) which allows us to automate the sending of HTTP requests. An example replicating the cURL request we sent earlier can be found below.

After clicking create we have successfully scheduled the sending of HTTP requests. Our database table will now update every minute via HTTP request. 

References

  1. https://github.com/JunaidMB/ozone_batch_deployment
  2. https://crontab.guru/ (If you forget how Crontab syntax as often as I do)
  3. https://cloud.google.com/scheduler
  4. Plumber docs: https://www.rplumber.io/index.html

Leave a Reply