Aleo ZKML on the Boston Housing dataset [PART 2]
This blog is a continuation of th Aleo ZKML on the Boston Housing dataset series. Please follow the link to check out the first part
In this blog we will be going further with aim of this blog to train the boston housing dataset using Aleo in order to preserve privacy of machine learning training.
Get Started
Import the necessary libraries which you will need for your analysis
from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression import numpy as np import pandas as pd
Build the function to load data and split the data into test and train datasets
Build a function to load in the data to your work space
def load_data(): data = pd.read_csv('./BostonHousing.csv') X = data.drop('medv', axis=1) y = data['medv'] X = np.round(X).astype(int) y = np.round(y).astype(int) return X, y
Load the data and return X and y
X, y = load_data()
Split the data into train and test sets in a ratio of 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Convert the train data to a list
X_train_leo = X_train.values.tolist() y_train_leo = y_train.values.tolist()
Next we obtain our lineal regression model and fit the model to our train data
model = LinearRegression() model.fit(X_train, y_train)
Next we get the model weights and intercepts and assign them to the variable weights and bias
weights = model.coef_ bias = model.intercept_
Scale the weights
We need to convert the weights and biases into integer, this will scale these parameters and our model.
# Scale weights and bias weights_scaled = [int(w) for w in weights] bias_scaled = int(bias)
Build the linear function
Build the prediction function for linear regression
# Perform the linear prediction def linear_regression_predict(weights, features, bias): prediction = 0 for i in range(len(weights)): prediction += weights[i] * features[i] prediction += bias return prediction
Feature scaling
Next we need to get the columns that have been scaled that is columns with weights = 0, then drop these columns from the X_test dataset, then convert it to a list. Also we need to get the weights that are not scaled that is weights that are not equal to 0
zero_weight_columns = X.columns[np.array(weights_scaled) == 0] # Scale features for prediction X_test = X_test.drop(zero_weight_columns, axis=1) X_test_scaled = X_test.applymap(float_to_int).values.tolist() weights_scaled = [weight for weight in weights_scaled if weight != 0]
Linear Regression with Aleo
Build the function that generates Aleo code for linear regression
#Utils function for generate aleo code linear_regression def generate_aleo_code(weights, integer_type='i32'): num_features = len(weights) function_str = "transition linear_regression_predict(" # Add weight and feature arguments for i in range(num_features): function_str += f"weight{i}: {integer_type}, feature{i}: {integer_type}, " # Add bias argument function_str += f"bias: {integer_type}) -> {integer_type} {{\n" # Add body of function function_str += f"let prediction: {integer_type} = " for i in range(num_features): function_str += f"weight{i} * feature{i} + " function_str += "bias;\n" function_str += "return prediction;\n" function_str += "}" return function_str
#generate new project aleo !./leo new linear_regression
Next we replace the linear_regression.aleo program with our Aleo code generator function you earlier defined and write this into a file called content_program_aleo and save in the directory path 'linear_regression/src/main.leo'
#Fill code in main.leo content_program_aleo = "program linear_regression.aleo { %code% }".replace('%code%', generate_aleo_code(weights_scaled)) with open('linear_regression/src/main.leo', 'w') as file: file.write(content_program_aleo)
Import the necessary libraries
import os from interp_leo.leo_uitls import convert_from_leo_type from interp_leo.leo_program import LeoProgram
Train the Model
Here, we try to train the model through 102 iterations to obtain the predicted value for the target variable.
leo_program = LeoProgram(path=os.getcwd() + '/linear_regression') y_pred_scaled = [] iter = 1 for x in X_test_scaled: print('iteration', f"{iter}/{len(X_test_scaled)}") value = leo_program.linear_regression_predict(*([val for pair in zip(weights_scaled, x) for val in pair] + [bias_scaled])) y_pred_scaled.append(convert_from_leo_type(value)) iter += 1
Visualize our predicted values.
y_pred_scaled
Convert our y_pred_scaled back to numpy array
y_pred = np.array(y_pred_scaled)
Evaluation
Next we will be evaluating our model using the metric root mean squared error [RMSE]
def rmse(y_true, y_pred): ''' Compute Root Mean Square Percentage Error between two arrays. ''' loss = np.sqrt(np.mean(np.square(((y_true - y_pred) / y_true)), axis=0)) return loss
import numpy as np y_test = np.array(y_test.apply(float_to_int)) print('🔥 RMSE error:', rmse(y_test, y_pred))
From the image we can see that our RMSE error is 0.43035
This is lower than what we got from Part 1 with RMSE error of 0.28489. There are continuous improvements and update of Aleo ZKML initiative.
Thank you for following through with this write-up on Aleo ZKML
Website: https://www.aleo.org/
T witter : https://twitter.com/AleoHQ