Decision Tree (ID3 / CART)
@decision_tree_id3_cartPseudo-code
function build_tree(X, y):
if all values in y are the same then
return leaf node with that value
if no features left then
return leaf node with most common value in y
best_feature <- choose feature with highest information gain
create node with best_feature
for each possible value v of best_feature:
X_sub, y_sub <- subset of data where feature == v
if X_sub is empty then
child <- leaf node with most common value in y
else
child <- build_tree(X_sub, y_sub)
attach child to node with condition (feature == v)
return node
function predict(x, tree):
while tree is not a leaf:
feature <- tree.feature
value <- x[feature]
move to the child node matching value
return tree.value
- root root is the starting point of the tree, like the first question you ask.
- node node is a point in the tree where a decision is made or a final answer is given.
- feature feature is something you can ask about the data, like 'age', 'color', or 'size'.
- value value is the specific answer to a feature, like 'age = 20' or 'color = red'.
- target target is what we want to predict, like 'yes/no' or a number.
- X X is the input data (all features). Think of it as a table of information.
- y y is the correct answers (labels) for the data.
- None None represents missing or unknown data.
- entropy entropy measures how mixed or messy the data is. High entropy means very mixed, low entropy means very clean.
- gain gain tells us how good a feature is at splitting the data into cleaner groups.
Recipe Steps
Think of this like a guessing game. You want to guess something by asking smart yes/no questions.
Start with all your data and ask: which question (feature) splits the data best into clean groups?
Pick the best question and split the data into smaller groups based on the answers.
Now repeat the same idea for each group. Keep asking the best question for that smaller group.
Stop when a group becomes pure (all answers are the same) or there is nothing left to split.
To predict, start at the top and follow the questions like a path until you reach an answer.
Questions & Answers
- Choosing the best feature at each step is very important.
- If the tree grows too much, it can memorize data instead of learning patterns.
- Missing values need to be handled carefully.
- Decision trees are easy to understand because they look like simple rules.
- They can be used for both classification and regression tasks.
Related Algorithms
Gradient Descent
Gradient Descent is a way for machines to learn from data by adjusting their predictions to get closer to the correct answer. It's like a never-ending staircase where the machine takes small steps down until it reaches the bottom, which is the optimal solution.
ViewK-Means Clustering
K-Means Clustering groups data points into k clusters by repeatedly assigning points to the nearest center and updating those centers.
ViewLongest Common Subsequence (LCS)
The Longest Common Subsequence (LCS) finds the longest sequence of characters that appear in the same order in both sequences (not necessarily next to each other).
View