Today we will be utilizing a new feature in 2017+ SQL Server environments: machine learning via Python native script execution. In this tutorial we will recap:
1) How to load a Pickle file into a SQL server database stored as binary data.
2) Retrieve that file and load it as a machine learning model.
3) Apply the model predictions to our SQL Server data. The steps to achieve our predictions are as follows:
I. Create a table to hold our binary models
II. Create a table to hold our machine learning (ML) data
III. Create a procedure to load our pickled model into the ML table
IV. Fire stored procedure to load the pickle file
V. Create another stored procedure to take in our SQL server data and model and return a result set
VI. Compare the result set from the model to our data
Download the zip file below to follow along with this tutorial. The files include
Iris.csv – The raw data from the Iris dataset.
IrisClassifier.pkl – The pickle file which holds the machine learning model
ML Init.sql – The SQL server initialization file to setup the SQL server database and table structure.
I. First step is to create a database and a table to store our machine learning logic
II. Next we need to load the Iris data from our csv file into a table in our new database. Let’s create the table in TSQL and utilize a bulk insert command.
III. Next we must load our persisted model logic from a pickle file. Pickle is a python library that allows you to persist anything to a file on your work station. In this case, I trained the model beforehand in a jupyter notebook environment and created the pickle file already.
IV. Next we need a stored procedure that will pick up the binary stored in our table and apply the model prediction to our data.
To do this we use a combination of the input_data_1, input_data_name and params keyword arguments for our stored procedure.
V. Now we create a temp table to hold the results from our newly predicting stored procedure.
VI. Now join the results back to the original to assess model performance!
There we have it; our first ML model trained in Python, stored in SQL server and executed via a stored procedure! Further interesting applications for this approach would be to build a SSRS report to capture the model’s predictions for stakeholders.
Also, to keep advancing the models and “check them into” the sql database, we can easily keep a running tally of model performance because all of the models are stored in the ML_Models table.
Ian Fogelman
My name is Ian Fogelman. I like to develop data driven solutions with SQL Server, Python, .NET and predictive analytics.