Sessão Temática 1

Poly-Bagging Predictors for Classification Modeling on Large Datasets

Francisco Louzada Neto (Des–UFSCar)*

In large datasets, classification modeling comprises one of the leading formal tools for supporting the decision making. In financial studies the core objective consists on the generation of a score by means of which potential clients can be listed in order of the probability of default. In biomedical area it is important to determining if a patient is committed with a disease. In industrial area a component should be detected for defect.

A critical factor is whether a classification model is accurate enough in order to provide correct classification, for instance, of a client as a good or a bad payer, of patient as a healthy and diseased individual.

In this context, the concept of bootstraping aggregating (bagging) arises. The basic idea is, given the abundance of data, to generate multiple classifiers by obtaining the predicted values from the fitted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy.

In this presentation we discuss a new bagging-type variant procedure, which we call poly-bagging procedure, consisting of combining predictors over a succession of re-samplings. The proposed poly-bagging procedure was applied to some different artificial datasets and to some real datasets up to three succession of resamplings considering regression tree model fitting. We observed better classification accuracy for the two-bagged and the three-bagged models for all considered setups. These results lead to a strong indication that the poly-bagging approach may promote improvement on the modeling performance measures, keeping a flexible and straightforward bagging-type structure.

* Trabalho conjunto com Osvaldo Anacleto Jr – Banco Itaú