Decision Tree (ID3 / CART)

@decision_tree_id3_cart

A Decision Tree builds a flowchart-like structure that splits data step by step using features, until it can make a clear prediction.

Pseudo-code

function build_tree(X, y):
    if all values in y are the same then
        return leaf node with that value

    if no features left then
        return leaf node with most common value in y

    best_feature <- choose feature with highest information gain

    create node with best_feature

    for each possible value v of best_feature:
        X_sub, y_sub <- subset of data where feature == v

        if X_sub is empty then
            child <- leaf node with most common value in y
        else
            child <- build_tree(X_sub, y_sub)

        attach child to node with condition (feature == v)

    return node

function predict(x, tree):
    while tree is not a leaf:
        feature <- tree.feature
        value <- x[feature]
        move to the child node matching value

    return tree.value

root root is the starting point of the tree, like the first question you ask.
node node is a point in the tree where a decision is made or a final answer is given.
feature feature is something you can ask about the data, like 'age', 'color', or 'size'.
value value is the specific answer to a feature, like 'age = 20' or 'color = red'.
target target is what we want to predict, like 'yes/no' or a number.
X X is the input data (all features). Think of it as a table of information.
y y is the correct answers (labels) for the data.

None None represents missing or unknown data.
entropy entropy measures how mixed or messy the data is. High entropy means very mixed, low entropy means very clean.
gain gain tells us how good a feature is at splitting the data into cleaner groups.

Recipe Steps

Think of this like a guessing game. You want to guess something by asking smart yes/no questions.

Start with all your data and ask: which question (feature) splits the data best into clean groups?

Pick the best question and split the data into smaller groups based on the answers.

Now repeat the same idea for each group. Keep asking the best question for that smaller group.

Stop when a group becomes pure (all answers are the same) or there is nothing left to split.

To predict, start at the top and follow the questions like a path until you reach an answer.

Questions & Answers

To split data step by step until we can make clear and simple predictions.

To choose the feature that makes the data the most organized after splitting.

We stop and create a leaf node with that answer.

We follow the tree from the root to a leaf based on feature values.

- The algorithm keeps splitting data into smaller and cleaner groups.
- Choosing the best feature at each step is very important.
- If the tree grows too much, it can memorize data instead of learning patterns.
- Missing values need to be handled carefully.
- Decision trees are easy to understand because they look like simple rules.
- They can be used for both classification and regression tasks.

Related Algorithms

Gradient Descent

Gradient Descent is a way for machines to learn from data by adjusting their predictions to get closer to the correct answer. It's like a never-ending staircase where the machine takes small steps down until it reaches the bottom, which is the optimal solution.

View

K-Means Clustering

K-Means Clustering groups data points into k clusters by repeatedly assigning points to the nearest center and updating those centers.

View

Longest Common Subsequence (LCS)

The Longest Common Subsequence (LCS) finds the longest sequence of characters that appear in the same order in both sequences (not necessarily next to each other).

View

Decision Tree (ID3 / CART)

Pseudo-code

Recipe Steps

Questions & Answers

What is the goal of a Decision Tree?

Why do we use information gain?

What happens when data is already pure?

How does prediction work?

Related Algorithms

Gradient Descent

K-Means Clustering

Longest Common Subsequence (LCS)