Authors: Sreevarsha Sreejith, Sergiy Pereverzyev Jr., Lee S. Kelvin, Francine Marleau, Markus Haltmeier, Judith Ebner, Joss Bland-Hawthorn, Simon. P. Driver, Alister W. Graham, Benne W. Holwerda, A. M. Hopkins, J. Liske, Jon Loveday, Amanda. J. Moffett, K. A. Pimbblet, Edward. N. Taylor, Lingyu Wang, Angus. H. Wright
Abstract: We apply four statistical learning methods to a sample of 7941 galaxies at z < 0.06 from the Galaxy and Mass Assembly (GAMA) survey to test the feasibility of using automated algorithms to generate Hubble type estimates for galaxy datasets. Using 10 characteristic features measured for each galaxy, we apply the techniques of Support Vector Machines (SVM), Classification Trees (CT), Classification Trees with Random Forests (CTRF) and Neural Networks (NN) to our sample, returning True Prediction Ratios (TPRs) of 75.8%, 69.0%, 76.2% and 76.0%, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the respective visual classification (‘unanimous disagreement’) serves as an indicator of human error in visual classification, occurring in ∼ 9% of ellipticals, ∼ 9% of Little Blue Spheroids, ∼ 14% of early-type spirals, ∼ 21% of intermediate-type spirals and ∼ 4% of late-type spirals & irregulars. Considering the simplicity in its formulation and implementation, we find the CTRF method to be the optimal estimator of galaxy Hubble type when applied to contemporary galaxy datasets. Adopting the CTRF algorithm, the TPRs of the 5 galaxy types are as follows : E, 70.1%; LBS, 75.6%; S0-Sa, 63.6%; Sab-Scd 56.4% and Sd-Irr, 88.9%. Further, we train a binary classifier that divides galaxies into spheroid-dominated (E, LBS and S0-Sa) and disk-dominated (Sab-Scd and SdIrr) types, achieving an overall classification accuracy of 89.8%. This translates into an accuracy of 84.9% for spheroid-dominated systems and 92.5% for disk-dominated systems.