October 28, 2023

Aleo ZKML on the Boston Housing dataset [PART 2] 

This blog is a continuation of th Aleo ZKML on the Boston Housing dataset series. Please follow the link to check out the first part

In this blog we will be going further with aim of this blog to train the boston housing dataset using Aleo in order to preserve privacy of machine learning training.

Get Started

Import the necessary libraries which you will need for your analysis

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd

Build the function to load data and split the data into test and train datasets

Build a function to load in the data to your work space

def load_data():
  data = pd.read_csv('./BostonHousing.csv')
  X = data.drop('medv', axis=1)    
  y = data['medv']    
  X = np.round(X).astype(int)    
  y = np.round(y).astype(int)
  return X, y

Load the data and return X and y

X, y = load_data()

Split the data into train and test sets in a ratio of 80:20

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Convert the train data to a list

X_train_leo = X_train.values.tolist()
y_train_leo = y_train.values.tolist()

Next we obtain our lineal regression model and fit the model to our train data

model = LinearRegression()
model.fit(X_train, y_train)

Next we get the model weights and intercepts and assign them to the variable weights and bias

weights = model.coef_
bias = model.intercept_

Scale the weights

We need to convert the weights and biases into integer, this will scale these parameters and our model.

# Scale weights and bias
weights_scaled = [int(w) for w in weights]
bias_scaled = int(bias)

Build the linear function

Build the prediction function for linear regression

# Perform the linear prediction
def linear_regression_predict(weights, features, bias):
    prediction = 0    
    for i in range(len(weights)):        
    prediction += weights[i] * features[i]    
    prediction += bias    
    return prediction
    

Feature scaling

Next we need to get the columns that have been scaled that is columns with weights = 0, then drop these columns from the X_test dataset, then convert it to a list. Also we need to get the weights that are not scaled that is weights that are not equal to 0

zero_weight_columns = X.columns[np.array(weights_scaled) == 0]
# Scale features for prediction
X_test = X_test.drop(zero_weight_columns, axis=1)
X_test_scaled = X_test.applymap(float_to_int).values.tolist()
weights_scaled = [weight for weight in weights_scaled if weight != 0]

Linear Regression with Aleo

Build the function that generates Aleo code for linear regression

#Utils function for generate aleo code linear_regression
def generate_aleo_code(weights, integer_type='i32'):    
    num_features = len(weights)    
    function_str = "transition linear_regression_predict("    
    # Add weight and feature arguments    
    for i in range(num_features):        
        function_str += f"weight{i}: {integer_type}, feature{i}: {integer_type}, "    # Add bias argument    function_str += f"bias: {integer_type}) -> {integer_type} {{\n"
    # Add body of function    
    function_str += f"let prediction: {integer_type} = "    
    for i in range(num_features):        
        function_str += f"weight{i} * feature{i} + "    
    function_str += "bias;\n"
    
    function_str += "return prediction;\n"    
    function_str += "}"
    return function_str

Generate new project

#generate new project aleo
!./leo new linear_regression

Next we replace the linear_regression.aleo program with our Aleo code generator function you earlier defined and write this into a file called content_program_aleo and save in the directory path 'linear_regression/src/main.leo'

#Fill code in main.leo
content_program_aleo = "program linear_regression.aleo { %code% }".replace('%code%', generate_aleo_code(weights_scaled))
with open('linear_regression/src/main.leo', 'w') as file:    
    file.write(content_program_aleo)

Import the necessary libraries

import os
from interp_leo.leo_uitls import convert_from_leo_type
from interp_leo.leo_program import LeoProgram

Train the Model

Here, we try to train the model through 102 iterations to obtain the predicted value for the target variable.

leo_program = LeoProgram(path=os.getcwd() + '/linear_regression')
y_pred_scaled = []
iter = 1
for x in X_test_scaled:  
    print('iteration', f"{iter}/{len(X_test_scaled)}")  
    value = leo_program.linear_regression_predict(*([val for pair in zip(weights_scaled, x) for val in pair] + [bias_scaled]))  
    y_pred_scaled.append(convert_from_leo_type(value))  
    iter += 1

Visualize our predicted values.

y_pred_scaled

Convert our y_pred_scaled back to numpy array

y_pred = np.array(y_pred_scaled)

Evaluation

Next we will be evaluating our model using the metric root mean squared error [RMSE]

Build the RMSE function

def rmse(y_true, y_pred):    
    '''    
    Compute Root Mean Square Percentage Error between two arrays.    
    '''    
    loss = np.sqrt(np.mean(np.square(((y_true - y_pred) / y_true)), axis=0))
    return loss

Print the RMSE error value

import numpy as np
y_test = np.array(y_test.apply(float_to_int))
print('🔥 RMSE error:', rmse(y_test, y_pred))

From the image we can see that our RMSE error is 0.43035

This is lower than what we got from Part 1 with RMSE error of 0.28489. There are continuous improvements and update of Aleo ZKML initiative.

Thank you for following through with this write-up on Aleo ZKML


Website: https://www.aleo.org/

T witter : https://twitter.com/AleoHQ

G ithub : https://github.com/AleoHQ

D iscord : https://discord.com/invite/aleohq