An empirical analysis of binary transformation strategies and base algorithms for multi-label learning

Authors: Adriano Rivolli, Jesse Read, Carlos Soares, Bernhard Pfahringer and André de Carvalho

Abstract: The investigation of strategies that are able to efficiently deal with multi-label classification tasks is a current research topic in machine learning. Many of them have been proposed, making the selection of the most suitable strategy a challenging issue. From this premise, this paper presents an extensive empirical analysis of the binary transformation strategies and base algorithms for multi-label learning. This subset of strategies is algorithm independent and uses the one-versus-all approach to transform the original data, generating at least one binary data set per label. Considering that the influence of the base algorithm on the predictive performance of them have been ignored in many empirical studies, distinct base algorithms were investigated. Thus, this study covers a family of multi-label strategies and employs a diversified range of base algorithms, exploring their relationship over different and new perspectives. Experimental results show that although the strategies perform similarly, the base algorithm used has a stronger impact on the predictive performance of them. These findings have significant implications concerning the methodology of evaluation adopted in multi-label experiments containing binary transformation strategies, given the influence of the base algorithms on their results. Also, the strategies and base algorithms are recommended according to different performance criteria.

Additional Resources

Multi-label Predictive Performance:

Measure	Performance	Statistical
F1
Hamming-loss
Macro-F1
Macro-precision
Macro-recall
One-error
Ranking-loss
Subset-accuracy
MLP		-
WLP		-

Acknowledgements

This work was financially supported by CNPq (processes 305291/2017-3 and 152098/2016-0), FAPESP (processes 2016/18615-0, 2013/07375-0 and 2012/22608-8), CAPES and Intel. The experiments were performed using the computational resources of CeMEAI-FAPESP, Proc. 13/07375-0.