A Guide to IoT Rules Engines: Decision Trees
Decision trees are a popular way of capturing the complexity of conditional rules in many use cases. Are they the best option for IoT devices? Are they scalable (no)? Let's learn more about these logical structures.
What Are Decision Trees?
A popular way of capturing the complexity of conditional rules is by using decision trees, which are graphs that use a branching method to illustrate every possible outcome of a decision.
How Are Decision Trees Used in IoT?
Drools, mostly known for its rules engine based on forward-chaining, has an extension to integrate with decision tables, using an excel sheet in combination with snippets of embedded code to accommodate any additional logic or required thresholds.
Should You Use Decision Trees to Model Complex Logic?
Decision trees are useful when the number of states per each variable is limited (such as binary YES/NO states) but can become overwhelming when the number of states increases. This is because the depth of the tree grows linearly with the number of variables, but the number of branches grows exponentially with the number of states.
With 6 Boolean variables (True or False), there are 2^2^6 = 2^64 = 18,446,744,073,709,551,616 distinct decision trees (in literature, often referred to as the “hypothesis space for decision trees” problem).
Majority voting isn’t possible, unless we branch even further, where multiple distinct outcomes are also part of the tree structure. Conditional executions should come out of the box. As the name suggests, decision trees are all about conditional executions.
Decision trees are never implemented as such in an IoT context. In expert systems, where decisions are outcomes of Q&A scenarios, logic would follow conditional execution, as new data (questions) are served to the decision tree engine. In an IoT context, we feed rules engines with data and expect decisions to come back as a result. In that case, we talk about decision tables, which means we feed data into the decision tables and results (decisions) come back at once. More about this not so subtle difference between tables and trees can be found here.
Decision trees are easily interpretable which makes them attractive for use cases where this capability is essential (such as healthcare, among others).
Should You Use Decision Trees to Model Uncertainty?
Decision trees use a white-box model. Important insights can be generated based on domain experts describing a situation and their preferences for outcomes. But decision trees are unstable, meaning that a small change in the data can lead to a big change in the structure of the optimal decision tree.
They are also often relatively inaccurate. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.
Decision trees cannot model uncertainty and utility functions, unless—just as with time information—we add these within the tree as decision nodes, which complicates decision tables even further.
Are Decision Trees Explicable?
Decision trees are easy to understand and interpret. People are able to understand decision tree models after just a brief explanation. Still, decisions cannot be seen or inspected once the rule is instantiated and are only represented as labeled “arrows” in the graph during the design phase.
When implemented as decision tables, the explicability drops further as each row in the table is a rule with each column in that row being either a condition or an action for that rule. This results in the total sequence being unclear—no overall picture is given by decision tables.
Are Decision Trees Adaptable?
Decision trees are mostly used for graphical knowledge representation. It’s extremely hard to build a rules engine with decision trees and even harder to build applications on top of it. They’re hard to extend with any third-party systems. Also, any small change in the training data can lead to a big change in the structure of the optimal decision tree.
How Easy Is It to Operate With Decision Trees?
Applying the same decision tree rule across multiple devices in the IoT domain is close to impossible, as most of the decision trees implement rules by mixing logic residing in decision tables with actions defined separately in code, making it extremely difficult to manage the complete process.
Are Decision Trees Scalable?
Short answer: no. Decision tree rules are stateless, which means that, in theory, it should be easy to run multiple rules in parallel. However, you cannot, within one instance of a rule, distribute the load to different processes while executing that one particular rule. The fact that the depth of the tree grows linearly with the number of variables but the number of branches grows exponentially with the number of states makes decision trees hard, if not impossible, to scale. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.