About J48 Decision Tree Pruning In Weka
I think I finally have my head around how j48 works, and the various interactions of the switches. Sorry if this is all obvious to everyone else on the list, but maybe this might help some newbie like me.
J48's default is like C4.5, post pruning using subtree raising with a pruning confidence of 0.25. Subtree replacement can be used as an alternative post pruning method by turning off subtree raising with -S, and pruning confidence can be adjusted with -C.
Reduced error pruning (-R) is a post-pruning method that uses a hold out set for error estimates. It also turns off subtree raising (and of course subtree replacement), and will throw an exception if used in conjunction with -C, as it doesn't use confidence for pruning.
The proportion of data held back for pruning is controlled by setting the number of folds (an unfortunately overloaded term if you're also doing k-fold cross-validation). The number of folds is set with -N, with one set used for pruning and the rest for training. Setting -N will throw an exception unless -R is set.
All pruning can be turned off with -U, which will either throw an exception if any of the above pruning switches are set.
Seems pretty obvious now, must have been coffee deficient ...
-----
Michael C. Harris, School of CS&IT, RMIT University
http://twofishcreative.com/michael/blog
IRC: michaeltwofish #habari

















