Where Discovery Begins

# Decision Tree Solved | Id3 Algorithm (concept and numerical) | Machine Learning (2019)

Decision Tree is a supervised learning method used for classification and regression. It is a tree which helps us by assisting us in decision-making!

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a data set into smaller and smaller subsets and simultaneously decision tree is incrementally developed. The final tree is a tree with decision nodes and leaf nodes. A decision node has two or more branches. Leaf node represents a classification or decision. We cannot do more split on leaf nodes.

The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data.
#codewrestling #decisiontree #machinelearning #id3
Common terms used with Decision trees:

Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets.

Splitting: It is a process of dividing a node into two or more sub-nodes.

Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.

Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.

Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.

Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.

Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes whereas sub-nodes are the child of parent node.

How does Decision Tree works ?

Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables.

1. Easy to Understand: Decision tree output is very easy to understand even for people from non-analytical background. It does not require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis.
2. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant variables and relation between two or more variables. With the help of decision trees, we can create new variables / features that has better power to predict target variable. It can also be used in data exploration stage. For e.g., we are working on a problem where we have information available in hundreds of variables, there decision tree will help to identify most significant variable.
3 Decision trees implicitly perform variable screening or feature selection.
4. Decision trees require relatively little effort from users for data preparation.
5. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree.
6. Data type is not a constraint: It can handle both numerical and categorical variables. Can also handle multi-output problems.

ID3 Algorithm

Key Factors:
Entropy- It is the measure of randomness or ‘impurity’ in the dataset.
Information Gain: It is the measure of decrease in entropy after the dataset is split.

Home Three