We are trying to get a diverse set of members by varying the training data.

We use a single ML algorithm trained on different sample of the same training dataset. The predictions made by the ensemble members are then combined using simple statistics such as voting or averaging.

Dataset Preparation

We will be focusing on how each sample of the dataset will be prepared such that each model gets its own sample of the dataset.

We draw rows from the dataset at random with replacement.

We can divide the aspects of Bagging into:

  1. Bootstrap samples of the training dataset
  2. Fit unpruned decision trees on each sample
  3. Voting / averaging of predictions