Decision Tree with Leo Transpiler: Demonstration

Introduction

In this blog we will be demonstration the use of Leo transpiler using the Iris flower dataset.

The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

Id
SepalLengthCm
SepalWidthCm
PetalLengthCm
PetalWidthCm
Species

Python a programming language that is used greatly in the machine learning, AI and data science world, there need to be a way to bridge the gap and allow python developers to create smart contracts and decentralized applications on the Aleo blockchain. The Leo transpiler functions to transforms or convert python codes into formats compatible and suitable for the Aleo virtual machine.

The zkML transpiler is an open-source SDK that bridges Python — one of the most popular programming languages for machine learning developers — and zero-knowledge cryptography.

Developers can train their machine learning model as normal, then use the transpiler to convert the model into Leo, a ZK-friendly programming language compatible with Aleo’s zero-knowledge layer 1 solution.

The transpiler is currently implemented for decision tree models, a common type of machine learning algorithm that can create both classification and regression models. Eventually, it may be expanded to include random forest ML models, simple neural networks, linear regression models, and others.

We will be demonstrating the implementation of decision tree model using the Leo transpiler.

Load the Iris dataset and explore the data

We need to load and visualize our dataset that we would be using for this demonstration. this dataset is built into the sklearn library

# import the key libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
import logging
import os
from leotranspiler import LeoTranspiler

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# take a look at the data
print(f"Data shape: {X.shape}")
print(f"Feature names: {iris.feature_names}")
print(f"Label names: {iris.target_names}")
print(f"First row: {X[0]}")
print(f"First label: {y[0]}")

The output will look like this

Split the data

Split the data into train and test sets in a ratio of 80:20

# Split the dataset into a training and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Create the decision tree classifier

# Create and train a decision tree classifier
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train, y_train)

Output will look like this

Visualize the decision tree

Here we try to visualize the decision tree

# visualize the decision tree
plt.figure(figsize=(15, 7.5))
plot_tree(
    clf,
    filled=True,
    feature_names=iris.feature_names,
    class_names=iris.target_names.tolist(),
)
plt.show()

Output will look like this

Transpile the Python code into Leo code

First we need to set the logger and then transpiler the decision tree into leo code

# Set the logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Transpile the decision tree into Leo code
lt = LeoTranspiler(model=clf, validation_data=X_train)
leo_project_path = os.path.join(os.getcwd(), "tmp")
leo_project_name = "tree1"
lt.to_leo(path=leo_project_path, project_name=leo_project_name)

Output will look like this

Checkout the transpiled leo code

Try to visualize the transpiled python code

# take a look at the transpiled code
leo_code_path = os.path.join(
    (os.path.join(leo_project_path, leo_project_name, "src", "main.leo"))
)
with open(leo_code_path, "r") as f:
    leo_code = f.read()
print(leo_code)

Output will look like this

Prove and compare the different predictions

Let's try one instance of the data on Leo and python models and compare the predictions and labels

For Leo model

We first perfect the prove and prediction using leo

# prove and compare the Leo prediction with the Python prediction and the label
zkp = lt.execute(input_sample=X_test[0])
python_prediction = clf.predict([X_test[0]])

print(f"Circuit constraints: {zkp.circuit_constraints}")
print(f"Leo prediction in fixed-point notation: {zkp.output[0]}")
print(f"Leo prediction in decimal notation: {zkp.output_decimal[0]}")
print(f"Python prediction: {python_prediction[0]}")
print(f"Label: {y_test[0]}")
print(f"Proof: {zkp.proof}")

Output will look like this

Next compute the accuracy

# Compute the accuracy of the Leo program on the test set
num_test_samples = len(X_test)
leo_predictions = np.zeros(num_test_samples)
for i in range(num_test_samples):
    leo_predictions[i] = lt.run(input_sample=X_test[i]).output_decimal[0]

# The leo accuracy 
leo_accuracy = np.sum(leo_predictions == y_test) / num_test_samples

For Python model

Then, we make the prediction using python:

# make the prediction using python 
python_predictions = clf.predict(X_test)
# The python accuracy 
python_accuracy = np.sum(python_predictions == y_test) / num_test_samples

Compare the two accuracies

print(f"Leo accuracy: {100*leo_accuracy} %")
print(f"Python accuracy: {100*python_accuracy} %")

Output will look like this

Thank you for following through with this demonstration

Website: https://www.aleo.org/

T witter : https://twitter.com/AleoHQ

G ithub : https://github.com/AleoHQ

D iscord : https://discord.com/invite/aleohq