Summary of Boosting Algorithm – Machine learning

The AdaBoost algorithm performs better than other algorithms by generating a hypothesis with a low training error derived through the combination of many different hypotheses into a single hypothesis. In addition, the combined hypothesis has a smaller variance compared to variance produced by the weak learner, WeakLearn. AdaBoost is effective in solving hard learning problems by promoting the weak learning algorithm, WeakLearn, performance on the difficult parts of the sample space. It also influences WeakLearn to alter its hypothesis in order to solve harder learning problems.

AdaBoost is of two distinct versions depending on their performance. AdaBoost.M1 is simpler and works best when the weak hypothesis has an error of less than ½, whereby the error of the combined hypothesis, hfin, becomes insignificant. AdaBoost.M1 works by providing sample distribution, Dt, to the WeakLearn, which then computes the combined hypothesis. The combined hypothesis is the weighted sum of the individual weak hypothesis predicting the label. The AdaBoost.M1 places emphasis on weak hypotheses with a lower error thus making it only slightly better than random guessing. The major limitation of AdaBoost.M1 is that, it cannot compute hypothesis with an error that is more than ½. It works within an error range of 1-1/k, where k is the number of possible labels.

The AdaBoost.M2 on the other hand uses the pseudo-loss instead of normal prediction error used by AdaBoost.M2. Pseudo-loss is a better measure of error computed by involving both the distribution examples and the incorrect labels. This gives this AdaBoost algorithm the ability to identify incorrect labels by focusing the WeakLearn on both the harder learning problems and the incorrect labels.

The AdaBoost.M2 works by providing the WeakLearner with a mislabel distribution, Dt, identified as, correct label, y1, or incorrect label, y. The WeakLearn then computes the weak hypothesis, ht, that is interpreted as correct and incorrect labels depending on the value of ht. From the interpretation, the pseudo-loss of the combined hypothesis is determined. The pseudo-loss of the hypothesis is minimized by assigning a value that is near 1 for the correct labels and a value near zero for the incorrect ones. The final hypothesis, hfn, is weighted mean of the weak hypothesis ht, values.

Three weak learning algorithms can work with the AdaBoost algorithm. The Find Attr Test, select a value from the distribution set given by the booster that has a specific attribute and has the minimum error or pseudo-loss. The other algorithm that works with AdaBoost is the FindDecRule, which works in two phases. In the first phase, the training sample is classified into the growing set (70%) which forms the grown list. In the second phase, the list is pruned by selecting examples with less error. The last algorithm is the Quinlan’s C4.5 decision tree algorithm that requires un-weighted training set and can only work with AdaBoost.M1 to reduce ordinary prediction error.

Another alternative method to AdaBoost is the ‘bagging’ algorithm that works in a similar way to AdaBoost. However, the difference arises in the mode of combining the many hypotheses into a single composite hypothesis. Bagging considers all the hypotheses in forming the final hypothesis while boosting only takes into account the hypotheses with minimal error or pseudo-loss. In addition, bagging uses independent random samples from the training set while boosting uses the distribution list directly, a method known as reweighting.