TY - JOUR
T1 - Large-dimensionality small-instance set feature selection
T2 - A hybrid bio-inspired heuristic approach
AU - Zawbaa, Hossam M.
AU - Emary, E.
AU - Grosan, Crina
AU - Snasel, Vaclav
N1 - Funding Information:
This work was supported by the IPROCOM Marie Curie initial training network , funded through the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme FP7/2007-2013/under REA grant agreement No. 316555 .
Funding Information:
This work was supported by the IPROCOM Marie Curie initial training network, funded through the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme FP7/2007-2013/under REA grant agreement No. 316555.
Publisher Copyright:
© 2018 The Authors
PY - 2018/10
Y1 - 2018/10
N2 - Selection of a representative set of features is still a crucial and challenging problem in machine learning. The complexity of the problem increases when any of the following situations occur: a very large number of attributes (large dimensionality); a very small number of instances or time points (small-instance set). The first situation poses problems for machine learning algorithm as the search space for selecting a combination of relevant features becomes impossible to explore in a reasonable time and with reasonable computational resources. The second aspect poses the problem of having insufficient data to learn from (insufficient examples). In this work, we approach both these issues at the same time. The methods we proposed are heuristics inspired by nature (in particular, by biology). We propose a hybrid of two methods which has the advantage of providing a good learning from fewer examples and a fair selection of features from a really large set, all these while ensuring a high standard classification accuracy of the data. The methods used are antlion optimization (ALO), grey wolf optimization (GWO), and a combination of the two (ALO-GWO). We test their performance on datasets having almost 50,000 features and less than 200 instances. The results look promising while compared with other methods such as genetic algorithms (GA) and particle swarm optimization (PSO).
AB - Selection of a representative set of features is still a crucial and challenging problem in machine learning. The complexity of the problem increases when any of the following situations occur: a very large number of attributes (large dimensionality); a very small number of instances or time points (small-instance set). The first situation poses problems for machine learning algorithm as the search space for selecting a combination of relevant features becomes impossible to explore in a reasonable time and with reasonable computational resources. The second aspect poses the problem of having insufficient data to learn from (insufficient examples). In this work, we approach both these issues at the same time. The methods we proposed are heuristics inspired by nature (in particular, by biology). We propose a hybrid of two methods which has the advantage of providing a good learning from fewer examples and a fair selection of features from a really large set, all these while ensuring a high standard classification accuracy of the data. The methods used are antlion optimization (ALO), grey wolf optimization (GWO), and a combination of the two (ALO-GWO). We test their performance on datasets having almost 50,000 features and less than 200 instances. The results look promising while compared with other methods such as genetic algorithms (GA) and particle swarm optimization (PSO).
KW - Antlion optimization
KW - Bio-inspired optimization
KW - Feature selection
KW - Grey wolf optimization
KW - Hybrid ALO-GWO
KW - Swarm optimization
UR - http://www.scopus.com/inward/record.url?scp=85043393542&partnerID=8YFLogxK
U2 - 10.1016/j.swevo.2018.02.021
DO - 10.1016/j.swevo.2018.02.021
M3 - Article
AN - SCOPUS:85043393542
SN - 2210-6502
VL - 42
SP - 29
EP - 42
JO - Swarm and Evolutionary Computation
JF - Swarm and Evolutionary Computation
ER -