Decision Tree

ID3 (Iterative Dichotomiser 3)

Posted by Pritish Roy on November 20, 2020

decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.

A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from the root to leaf represent classification rules. 

Some of the advantages of using a decision tree; They are simple to understand and interpret. Easy to understand decision tree models and interpret for a small-scaled tree. Have value even with little hard data. Important insights can be generated based on experts describing a situation (its alternatives, probabilities, and costs) and their preferences for outcomes. Help determine worst, best and expected values for different scenarios. Use a white-box model. If a given result is provided by a model.Can be combined with other decision techniques.

In order to select the optimal splitting feature a* from the dataset A, as the splitting process proceeds, we want samples within each node to have an increasing tendency to belong to the same class, i.e increasing purity. The three measurements of purity are information gain, gain ratio and Gini index.

Information gain is a synonym for Kullback–Leibler divergence; the amount of information gained about a random variable or signal from observing another random variable. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the conditional expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one. 

Information Entropy: Ent(D)

Entropy: The entropy of a dataset, is a measure the impurity, of the dataset Entropy can also be thought, as a measure of uncertainty. We should try to minimize, the Entropy. The goal of machine learning models is to reduce uncertainty or entropy, as far as possible.
  • Assume there are m classes in total: C1, C2, ....., Cm
  • p_{i}: the ratio of examples belonging to class C_{i} in the dataset D
  • Information entropy (if p=0, p*log_{2} p = 0)
  • Ent(D) = - sum of p_{i} * log_{2} p_{i} from i=1 to m
  • less Ent(D) purer node
Information Gain: Gain(D, a)

Information gain is a measure of, how much information, a feature gives us about the classes. Decision Trees algorithm, will always try, to maximize information gain. Feature, that perfectly partitions the data, should give maximum information. A feature, with the highest Information gain, will be used for split first.

  • Assume for feature a, there are V possible values: {V1, V2, ......, Vv},
  • make branches according to different values of a
  • D_{v}_{j}: all samples in D with the {a} feature value equal to v_{j}
  • Gain(D,a) = Ent(D) - sum of |D_{v}_{j}| * Ent(D_{v}_{j}) / |D| from j=1 to V, |D| measures the number of elements in D
  • Select the optimal splitting feature by the one having the largest information gain
In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4. 5 algorithm, and is typically used in the machine learning and natural language processing domains.

The ID3 algorithm begins with the original set S as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S and calculates the entropy H(S) or the information gain IG(S) of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S is then split or partitioned by the selected attribute to produce subsets of the data. The algorithm continues to recurse on each subset, considering only attributes never selected before.

Stopping Criteria:

  • every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples.
  • if there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset.
  • if there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set.


ID3 (Examples, Target_Attribute, Attributes)
    Create a root node for the tree
    If all examples are positive, Return the single-node tree Root, with label = +.
    If all examples are negative, Return the single-node tree Root, with label = -.
    If number of predicting attributes is empty, then Return the single node tree Root,
    with label = most common value of the target attribute in the examples.
    Otherwise Begin
        A <- The Attribute that best classifies examples.
        Decision Tree attribute for Root = A.
        For each possible value, V_{i}, of A,
Add a new tree branch below Root, corresponding to the test A = V_{i}.
Let Examples(V_{i}) be the subset of examples that have the value V_{i} for A
If Examples(V_{i}) is empty
Then below this new branch add a leaf node with label = most common target value in the examples Else below this new branch add the subtree ID3 (Examples(V_{i}), Target_Attribute, Attributes – {A})
End Return Root
Decision Tree Implementation in Python:

@author: roycek

import csv
import sys
from math import log2

label = -1 # decision label index
decision = 'yes' # decision
max_depth = 999 # default max depth of decision tree
delimiter = '\t' # data file separated by

class Tree:
def __init__(self):
self.parent = {}
self.child = {}

def __repr__(self):
return f'{self.parent}'

def read_csv(file) -> tuple:
"""input file reader and returns rows and columns in list"""
with open(file, newline='') as line:
data_file = csv.reader(line, delimiter=delimiter)
data = list(data_file)
return data[1:], data[:1][0]

def get_leaf_decision(l_yes, l_no, parent_decision) -> int:
"""returns leaf decision with majority decision if not equal else returns it's parent decision"""
leaf_value = 0
if l_yes > l_no:
leaf_value = 1
elif l_yes == l_no:
leaf_value = 1 if parent_decision else 0
return leaf_value

def calculate_entropy_info(label_yes, label_no, parent_decision, info=False, attribute_dict=None, node=None,
data=None, info_gain=None) -> tuple:
"""takes the decision label's and return's information gain / entropy depending on callback parameter"""
entropy = 0.0
if label_yes != 0 and label_no != 0:
entropy = - label_yes * log2(label_yes) - label_no * log2(label_no)
if info:
info_gain -= (attribute_dict[node] * entropy) / len(data)
return info_gain if info else entropy, get_leaf_decision(label_yes, label_no, parent_decision)

def get_label_value(data, rows, columns, node=None) -> tuple:
"""returns tuple with count of total number of decisions"""
yes, no = 0, 0
for row in rows:
if not node:
if data[row][columns] == decision:
yes += 1
no += 1
if data[row][columns] == node:
if data[row][label] == decision:
yes += 1
no += 1
return yes, no

def get_entropy(data, rows, headers, parent_decision) -> tuple:
"""returns calculated entropy and leaf value"""
yes, no = get_label_value(data, rows, len(headers))
entropy, leaf_value = calculate_entropy_info(yes / (yes + no), no / (yes + no), parent_decision)
return entropy, leaf_value

def get_attribute(data, rows, column, attribute_structure, return_list=False) -> dict or list:
"""returns list of unique attributes of columns if flagged true else returns dictionary of the column attributes
and the number of decision occurrence"""
for row in rows:
node = data[row][column]
if return_list and node not in attribute_structure:
if not return_list:
if node not in attribute_structure:
attribute_structure[node] = 1
attribute_structure[node] += 1
return attribute_structure

def get_information_gain(data, rows, columns, headers, parent_decision) -> tuple:
"""returns the maximum information gained by column, its index and leaf value"""
information_gain = 0.0
index = -999
entropy, leaf_value = get_entropy(data, rows, headers, parent_decision)
if entropy == 0:
return information_gain, index, leaf_value

for column in columns:
info = entropy
attribute_dict = get_attribute(data, rows, column, attribute_structure={})
for node in attribute_dict:
yes, no = get_label_value(data, rows, column, node)
info = calculate_entropy_info(yes / (yes + no), no / (yes + no), parent_decision, info=True,
attribute_dict=attribute_dict, node=node, data=data,
if info >= information_gain:
information_gain, index = info, column
return information_gain, index, leaf_value

def input_decision(root, leaf_value):
"""inserts leaf value as either true or false"""
root.parent = True if leaf_value == 1 else False

def get_major(parent_decision) -> bool:
"""returns the major parent decision in boolean form"""
return True if parent_decision[0] > parent_decision[1] else False

def decision_tree_builder(data, rows, columns, headers, maximum_depth, parent_decision, depth=0):
"""builds decision tree in recursive manner by splitting the dataset and with respect to maximum depth based on
information gain and terminating when information gain is 0 or when we have reached the maximum depth"""
information_gain, index, leaf_value = get_information_gain(data, rows, columns, headers, parent_decision)

dt_id3 = Tree()
if information_gain == 0 or depth == maximum_depth:
input_decision(dt_id3, leaf_value)
return dt_id3

new_dataset_header = columns.copy()
attr_list = get_attribute(data, rows, index, attribute_structure=[], return_list=True)
for node in attr_list:
major_parent = get_major(get_label_value(data, rows, len(headers)))
new_dataset_split = []
for row in rows:
if data[row][index] == node:
tree_builder = decision_tree_builder(data, new_dataset_split, new_dataset_header, headers,
maximum_depth, major_parent, depth + 1)
dt_id3.parent[headers[index]] = dt_id3.child
dt_id3.child[node] = tree_builder
return dt_id3

def get_decision(input_data, leaf_answer):
"""returns decision if boolean decision found else calls function recursively to classify until answer is found"""
return classify_data(input_data, leaf_answer) if type(leaf_answer.parent) != bool else leaf_answer.parent

def get_leaf(decision_tree, key, input_data):
"""returns internal nodes from decision tree"""
return decision_tree.parent[key][input_data[key]]

def classify_data(input_data, decision_tree):
"""returns decision to classify input data based on the decision tree"""
answer = None
for key in list(input_data.keys()):
if key in list(decision_tree.keys()):
answer = decision_tree[key][input_data[key]]
except AttributeError:
answer = get_leaf(decision_tree, key, input_data)
return get_decision(input_data, answer)
except AttributeError:
if key in decision_tree.parent:
answer = get_leaf(decision_tree, key, input_data)
return get_decision(input_data, answer)

def accuracy_metric(actual, predicted, correct=0) -> float:
"""calculate accuracy of the dataset comparing the actual decision from the dataset vs the predicted
from the decision tree"""
for i in range(len(actual)):
actual[i] = True if actual[i] == decision else False
if actual[i] == predicted[i]:
correct += 1
return round(correct / len(actual) * 100.0, 2)

def classify_dataset(data, column, dt_id3) -> tuple:
"""classifier accepts training and test dataset, puts the actual decisions in a list and calls the classify_data
to get predicted decisions"""
predicted, actual_decision = [], []
for row in data:
predicted.append(classify_data(dict(zip(column[:label], row)), dt_id3.parent))
return actual_decision, predicted

def id3(argv, input_depth):
"""main id3 function calls the decision tree builder and prints, also calls other functions to classify
training and test dataset"""
data, column = read_csv(argv[:1][0])
if len(argv) > 1:
if isinstance(int(argv[1]), int):
input_depth = int(argv[1])
print(f'Max Depth of Decision Tree: {argv[1]}')
if input_depth == 0:
print(f'\nMax Depth: {input_depth} -> NOT ALLOWED!!!\ni.e max_depth > 0')
raise Exception
except ValueError:
decision_tree_id3 = decision_tree_builder(data, [i for i in range(0, len(data))],
[i for i in range(0, len(column) - 1)],
column[:label], input_depth, parent_decision=False)
print(f'Decision Tree: --> {decision_tree_id3.__repr__()}')
actual_decision, predicted = classify_dataset(data, column, decision_tree_id3)
accuracy_training = accuracy_metric(actual_decision, predicted)

print(f'Training Accuracy on {argv[:1][0]}: {accuracy_training}')

if len(argv) > 2:
test_data, test_column = read_csv(argv[label])
actual_decision, predicted = classify_dataset(test_data, test_column, decision_tree_id3)
accuracy_test = accuracy_metric(actual_decision, predicted)
print(f'Test Accuracy on {argv[label]}: {accuracy_test}')
except Exception as e:
print(f'{str(e).upper()}\nFollow command:\npython <training file> --mandatory '
f'<maximum depth of tree> --optional <test file> --optional')

if __name__ == '__main__':
id3(sys.argv[1:], max_depth)


Decision Tree: --> {'outlook': {'sunny': {'humidity': {'high': False, 'normal': True}}, 'overcast': True, 'rain': {'wind': {'weak': True, 'strong': False}}}}