Aleo ZKML on the Boston Housing dataset [PART 1]
Zero-knowledge machine learning (ZKML) is a relatively new field that focuses on training machine learning models without revealing the underlying data. For providing frameworks with inherent privacy-preserving properties in machine learning, Aleo can be used for this purpose especially in respect to ZKML.
In this blog we will be explaining the application of Aleo in ZKML using the Boston housing dataset. This dataset has found wide use in statistics and machine learning. The Boston housing dataset contains different house features for houses in Boston, Massachusetts in the united States. With 506 data points, this dataset is usually used in regression tasks for predicting house prices based on the given features.
We will be applying simple linear regression algorithm on the Boston dataset, this is used to establish a linear relationship between these features and the housing prices
This project was done on jupyter note book
Set up Aleo and Leo
You'll need to install Aleo and Leo and then set up the required environment. This sets up our environment and downloads Aleo and leo into our system or server.
!wget !unzip !rm -rf
wait for the process to complete, you will get an image like similar to this
Get the dataset
Obtain the Boston dataset which we will be using for processing.
# Get the boston housing data !wget
Clone the Aleo zk-ML initiative
!git clone && mv aleo-zkml-initiative-1/interp_leo interp_leo && rm -rf aleo-zkml-initiative-1
Without ZK
Import the necessary libraries which you will need for your analysis
from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression import numpy as np import pandas as pd
Build a function to load in the data to your work space
def load_data(): data = pd.read_csv('./BostonHousing.csv') X = data.drop('medv', axis=1) y = data['medv'] X = np.round(X).astype(int) y = np.round(y).astype(int) return X, y
Load the data and return X and y
X, y = load_data()
Split the data into train and test sets in a ratio of 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Convert the train data to a list
X_train_leo = X_train.values.tolist() y_train_leo = y_train.values.tolist() print(X_train_leo)print(y_train_leo)
We are using a simple linear model for our analysis here. So we get our model and fit our linear model on our train data (X_train, y_train)
model = LinearRegression(), y_train)
Next we get the model weights and intercepts and assign them to the variable weights and bias
weights = model.coef_ bias = model.intercept_
View the weights by calling the variable weights
Here, we make predictions using our linear model on the test dataset to obtain our predicted values for various values of y
Build the prediction function for linear regression
# Perform the linear prediction def linear_regression_predict(weights, features, bias): prediction = 0 for i in range(len(weights)): prediction += weights[i] * features[i] prediction += bias return prediction
Get the predicted values of y and assign the values as y_pred.
y_pred = [linear_regression_predict(weights, x, bias) for x in X_test.values.tolist()]
Next we will be evaluating our model using the metric root mean squared error [RMSE]
def rmse(y_true, y_pred): ''' Compute Root Mean Square Percentage Error between two arrays. ''' loss = np.sqrt(np.mean(np.square(((y_true - y_pred) / y_true)), axis=0)) return loss
import numpy as np print('🔥 RMSE error:', rmse(np.array(y_test), np.array(y_pred)))
From the image we can see that our RMSE error is 0.28489
We will be continuing the Part II on the next update.
Follow this link to access the Part II
T witter :