Decision Tree with Leo Transpiler: Demonstration
Introduction
In this blog we will be demonstration the use of Leo transpiler using the Iris flower dataset.
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.
It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:
Python a programming language that is used greatly in the machine learning, AI and data science world, there need to be a way to bridge the gap and allow python developers to create smart contracts and decentralized applications on the Aleo blockchain. The Leo transpiler functions to transforms or convert python codes into formats compatible and suitable for the Aleo virtual machine.
The zkML transpiler is an open-source SDK that bridges Python — one of the most popular programming languages for machine learning developers — and zero-knowledge cryptography.
Developers can train their machine learning model as normal, then use the transpiler to convert the model into Leo, a ZK-friendly programming language compatible with Aleo’s zero-knowledge layer 1 solution.
The transpiler is currently implemented for decision tree models, a common type of machine learning algorithm that can create both classification and regression models. Eventually, it may be expanded to include random forest ML models, simple neural networks, linear regression models, and others.
We will be demonstrating the implementation of decision tree model using the Leo transpiler.
Load the Iris dataset and explore the data
We need to load and visualize our dataset that we would be using for this demonstration. this dataset is built into the sklearn library
# import the key libraries from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt from sklearn.tree import plot_tree import logging import os from leotranspiler import LeoTranspiler # Load the iris dataset iris = load_iris() X = iris.data y = iris.target # take a look at the data print(f"Data shape: {X.shape}") print(f"Feature names: {iris.feature_names}") print(f"Label names: {iris.target_names}") print(f"First row: {X[0]}") print(f"First label: {y[0]}")
The output will look like this
Split the data
Split the data into train and test sets in a ratio of 80:20
# Split the dataset into a training and a test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Create the decision tree classifier
# Create and train a decision tree classifier clf = DecisionTreeClassifier(random_state=0) clf.fit(X_train, y_train)
Visualize the decision tree
Here we try to visualize the decision tree
# visualize the decision tree plt.figure(figsize=(15, 7.5)) plot_tree( clf, filled=True, feature_names=iris.feature_names, class_names=iris.target_names.tolist(), ) plt.show()
Transpile the Python code into Leo code
First we need to set the logger and then transpiler the decision tree into leo code
# Set the logger logger = logging.getLogger() logger.setLevel(logging.INFO) # Transpile the decision tree into Leo code lt = LeoTranspiler(model=clf, validation_data=X_train) leo_project_path = os.path.join(os.getcwd(), "tmp") leo_project_name = "tree1" lt.to_leo(path=leo_project_path, project_name=leo_project_name)
Checkout the transpiled leo code
Try to visualize the transpiled python code
# take a look at the transpiled code leo_code_path = os.path.join( (os.path.join(leo_project_path, leo_project_name, "src", "main.leo")) ) with open(leo_code_path, "r") as f: leo_code = f.read() print(leo_code)
Prove and compare the different predictions
Let's try one instance of the data on Leo and python models and compare the predictions and labels
We first perfect the prove and prediction using leo
# prove and compare the Leo prediction with the Python prediction and the label zkp = lt.execute(input_sample=X_test[0]) python_prediction = clf.predict([X_test[0]]) print(f"Circuit constraints: {zkp.circuit_constraints}") print(f"Leo prediction in fixed-point notation: {zkp.output[0]}") print(f"Leo prediction in decimal notation: {zkp.output_decimal[0]}") print(f"Python prediction: {python_prediction[0]}") print(f"Label: {y_test[0]}") print(f"Proof: {zkp.proof}")
# Compute the accuracy of the Leo program on the test set num_test_samples = len(X_test) leo_predictions = np.zeros(num_test_samples) for i in range(num_test_samples): leo_predictions[i] = lt.run(input_sample=X_test[i]).output_decimal[0] # The leo accuracy leo_accuracy = np.sum(leo_predictions == y_test) / num_test_samples
Then, we make the prediction using python:
# make the prediction using python python_predictions = clf.predict(X_test) # The python accuracy python_accuracy = np.sum(python_predictions == y_test) / num_test_samples
print(f"Leo accuracy: {100*leo_accuracy} %") print(f"Python accuracy: {100*python_accuracy} %")
Thank you for following through with this demonstration
Website: https://www.aleo.org/
T witter : https://twitter.com/AleoHQ