# Decision Tree

## ID3 (Iterative Dichotomiser 3)

Posted by Pritish Roy on November 20, 2020

decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.

A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from the root to leaf represent classification rules.

Some of the advantages of using a decision tree; They are simple to understand and interpret. Easy to understand decision tree models and interpret for a small-scaled tree. Have value even with little hard data. Important insights can be generated based on experts describing a situation (its alternatives, probabilities, and costs) and their preferences for outcomes. Help determine worst, best and expected values for different scenarios. Use a white-box model. If a given result is provided by a model.Can be combined with other decision techniques.

In order to select the optimal splitting feature a* from the dataset A, as the splitting process proceeds, we want samples within each node to have an increasing tendency to belong to the same class, i.e increasing purity. The three measurements of purity are information gain, gain ratio and Gini index.

Information gain is a synonym for KullbackLeibler divergence; the amount of information gained about a random variable or signal from observing another random variable. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the conditional expected value of the KullbackLeibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one.

Information Entropy: Ent(D)

Entropy: The entropy of a dataset, is a measure the impurity, of the dataset Entropy can also be thought, as a measure of uncertainty. We should try to minimize, the Entropy. The goal of machine learning models is to reduce uncertainty or entropy, as far as possible.
• Assume there are m classes in total: C1, C2, ....., Cm
• p_{i}: the ratio of examples belonging to class C_{i} in the dataset D
• Information entropy (if p=0, p*log_{2} p = 0)
• Ent(D) = - sum of p_{i} * log_{2} p_{i} from i=1 to m
• less Ent(D) purer node
Information Gain: Gain(D, a)

Information gain is a measure of, how much information, a feature gives us about the classes. Decision Trees algorithm, will always try, to maximize information gain. Feature, that perfectly partitions the data, should give maximum information. A feature, with the highest Information gain, will be used for split first.

• Assume for feature a, there are V possible values: {V1, V2, ......, Vv},
• make branches according to different values of a
• D_{v}_{j}: all samples in D with the {a} feature value equal to v_{j}
• Gain(D,a) = Ent(D) - sum of |D_{v}_{j}| * Ent(D_{v}_{j}) / |D| from j=1 to V, |D| measures the number of elements in D
• Select the optimal splitting feature by the one having the largest information gain
In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4. 5 algorithm, and is typically used in the machine learning and natural language processing domains.

The ID3 algorithm begins with the original set S as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S and calculates the entropy H(S) or the information gain IG(S) of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S is then split or partitioned by the selected attribute to produce subsets of the data. The algorithm continues to recurse on each subset, considering only attributes never selected before.

Stopping Criteria:

• every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples.
• if there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset.
• if there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set.

PsuedoCode:

```ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A <- The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, V_{i}, of A,            Add a new tree branch below Root, corresponding to the test A = V_{i}.            Let Examples(V_{i}) be the subset of examples that have the value V_{i} for A            If Examples(V_{i}) is empty                Then below this new branch add a leaf node with label = most common target value in the examples
Else below this new branch add the subtree ID3 (Examples(V_{i}), Target_Attribute, Attributes  {A})    End
Return Root```
Decision Tree Implementation in Python:

`"""@author: roycek"""import csvimport sysfrom math import log2label = -1  # decision label indexdecision = 'yes'  # decisionmax_depth = 999  # default max depth of decision treedelimiter = '\t'  # data file separated byclass Tree:    def __init__(self):        self.parent = {}        self.child = {}    def __repr__(self):        return f'{self.parent}'def read_csv(file) -> tuple:    """input file reader and returns rows and columns in list"""    with open(file, newline='') as line:        data_file = csv.reader(line, delimiter=delimiter)        data = list(data_file)        return data[1:], data[:1]        def get_leaf_decision(l_yes, l_no, parent_decision) -> int:    """returns leaf decision with majority decision if not equal else returns it's parent decision"""    leaf_value = 0    if l_yes > l_no:        leaf_value = 1    elif l_yes == l_no:        leaf_value = 1 if parent_decision else 0    return leaf_valuedef calculate_entropy_info(label_yes, label_no, parent_decision, info=False, attribute_dict=None, node=None,                           data=None, info_gain=None) -> tuple:    """takes the decision label's and return's information gain / entropy depending on callback parameter"""    entropy = 0.0    if label_yes != 0 and label_no != 0:        entropy = - label_yes * log2(label_yes) - label_no * log2(label_no)        if info:            info_gain -= (attribute_dict[node] * entropy) / len(data)    return info_gain if info else entropy, get_leaf_decision(label_yes, label_no, parent_decision)def get_label_value(data, rows, columns, node=None) -> tuple:    """returns tuple with count of total number of decisions"""    yes, no = 0, 0    for row in rows:        if not node:            if data[row][columns] == decision:                yes += 1            else:                no += 1        else:            if data[row][columns] == node:                if data[row][label] == decision:                    yes += 1                else:                    no += 1    return yes, nodef get_entropy(data, rows, headers, parent_decision) -> tuple:    """returns calculated entropy and leaf value"""    yes, no = get_label_value(data, rows, len(headers))    entropy, leaf_value = calculate_entropy_info(yes / (yes + no), no / (yes + no), parent_decision)    return entropy, leaf_valuedef get_attribute(data, rows, column, attribute_structure, return_list=False) -> dict or list:    """returns list of unique attributes of columns if flagged true else returns dictionary of the column attributes    and the number of decision occurrence"""    for row in rows:        node = data[row][column]        if return_list and node not in attribute_structure:            attribute_structure.append(node)        if not return_list:            if node not in attribute_structure:                attribute_structure[node] = 1            else:                attribute_structure[node] += 1    return attribute_structuredef get_information_gain(data, rows, columns, headers, parent_decision) -> tuple:    """returns the maximum information gained by column, its index and leaf value"""    information_gain = 0.0    index = -999    entropy, leaf_value = get_entropy(data, rows, headers, parent_decision)    if entropy == 0:        return information_gain, index, leaf_value    for column in columns:        info = entropy        attribute_dict = get_attribute(data, rows, column, attribute_structure={})        for node in attribute_dict:            yes, no = get_label_value(data, rows, column, node)            info = calculate_entropy_info(yes / (yes + no), no / (yes + no), parent_decision, info=True,                                          attribute_dict=attribute_dict, node=node, data=data,                                          info_gain=info)        if info >= information_gain:            information_gain, index = info, column    return information_gain, index, leaf_valuedef input_decision(root, leaf_value):    """inserts leaf value as either true or false"""    root.parent = True if leaf_value == 1 else Falsedef get_major(parent_decision) -> bool:    """returns the major parent decision in boolean form"""    return True if parent_decision > parent_decision else Falsedef decision_tree_builder(data, rows, columns, headers, maximum_depth, parent_decision, depth=0):    """builds decision tree in recursive manner by splitting the dataset and with respect to maximum depth based on    information gain and terminating when information gain is 0 or when we have reached the maximum depth"""    information_gain, index, leaf_value = get_information_gain(data, rows, columns, headers, parent_decision)    dt_id3 = Tree()    if information_gain == 0 or depth == maximum_depth:        input_decision(dt_id3, leaf_value)        return dt_id3    new_dataset_header = columns.copy()    new_dataset_header.remove(index)    attr_list = get_attribute(data, rows, index, attribute_structure=[], return_list=True)    for node in attr_list:        major_parent = get_major(get_label_value(data, rows, len(headers)))        new_dataset_split = []        for row in rows:            if data[row][index] == node:                new_dataset_split.append(row)        tree_builder = decision_tree_builder(data, new_dataset_split, new_dataset_header, headers,                                             maximum_depth, major_parent, depth + 1)        dt_id3.parent[headers[index]] = dt_id3.child        dt_id3.child[node] = tree_builder    return dt_id3def get_decision(input_data, leaf_answer):    """returns decision if boolean decision found else calls function recursively to classify until answer is found"""    return classify_data(input_data, leaf_answer) if type(leaf_answer.parent) != bool else leaf_answer.parentdef get_leaf(decision_tree, key, input_data):    """returns internal nodes from decision tree"""    return decision_tree.parent[key][input_data[key]]def classify_data(input_data, decision_tree):    """returns decision to classify input data based on the decision tree"""    answer = None    for key in list(input_data.keys()):        try:            if key in list(decision_tree.keys()):                try:                    answer = decision_tree[key][input_data[key]]                except AttributeError:                    answer = get_leaf(decision_tree, key, input_data)                finally:                    return get_decision(input_data, answer)        except AttributeError:            if key in decision_tree.parent:                answer = get_leaf(decision_tree, key, input_data)                return get_decision(input_data, answer)def accuracy_metric(actual, predicted, correct=0) -> float:    """calculate accuracy of the dataset comparing the actual decision from the dataset vs the predicted    from the decision tree"""    for i in range(len(actual)):        actual[i] = True if actual[i] == decision else False        if actual[i] == predicted[i]:            correct += 1    return round(correct / len(actual) * 100.0, 2)def classify_dataset(data, column, dt_id3) -> tuple:    """classifier accepts training and test dataset, puts the actual decisions in a list and calls the classify_data    to get predicted decisions"""    predicted, actual_decision = [], []    for row in data:        actual_decision.append(row[label])        predicted.append(classify_data(dict(zip(column[:label], row)), dt_id3.parent))    return actual_decision, predicteddef id3(argv, input_depth):    """main id3 function calls the decision tree builder and prints, also calls other functions to classify    training and test dataset"""    try:        data, column = read_csv(argv[:1])        if len(argv) > 1:            try:                if isinstance(int(argv), int):                    input_depth = int(argv)                    print(f'Max Depth of Decision Tree: {argv}')                    if input_depth == 0:                        print(f'\nMax Depth: {input_depth} -> NOT ALLOWED!!!\ni.e max_depth > 0')                        raise Exception            except ValueError:                pass        decision_tree_id3 = decision_tree_builder(data, [i for i in range(0, len(data))],                                                  [i for i in range(0, len(column) - 1)],                                                  column[:label], input_depth, parent_decision=False)        print('-------------------------------------------------')        print(f'Decision Tree: --> {decision_tree_id3.__repr__()}')        print('-------------------------------------------------')        actual_decision, predicted = classify_dataset(data, column, decision_tree_id3)        accuracy_training = accuracy_metric(actual_decision, predicted)        print(f'Training Accuracy on {argv[:1]}: {accuracy_training}')        if len(argv) > 2:            test_data, test_column = read_csv(argv[label])            actual_decision, predicted = classify_dataset(test_data, test_column, decision_tree_id3)            accuracy_test = accuracy_metric(actual_decision, predicted)            print(f'Test Accuracy on {argv[label]}: {accuracy_test}')    except Exception as e:        print(f'{str(e).upper()}\nFollow command:\npython decisiontreeassignment.py <training file> --mandatory '              f'<maximum depth of tree> --optional <test file> --optional')if __name__ == '__main__':    id3(sys.argv[1:], max_depth)`

OUTPUT:

`Decision Tree: --> {'outlook': {'sunny': {'humidity': {'high': False, 'normal': True}}, 'overcast': True, 'rain': {'wind': {'weak': True, 'strong': False}}}}` 