Submitted November 2023, Accepted April 2024, Published May 2024 | International Journal of Data Science and Analytics, Springer Nature | 10.1007/s41060-024-00552-7
João Pimentel, Joana Amorim and Frank Rudzicz
Detecting possible habitable planets outside of our solar system has been a growing field of study. Among several other topics, this field aims to classify stars using the transit method, i.e., using their light intensity measured over time to spot the moment when a planet follows its orbit and covers part of the star as seen by a satellite. We propose a novel approach to such classification, using an extracted set of features from individual time-series that cover three different domains: temporal, statistical, and spectral. These features are filtered based on relevant measures, and used to train and evaluate models on Kepler data. The results were compared to state-of-the-art methods evaluated on the same data set and surpass existing approaches for some data transformations. All these transformations are related to turning the time-series naïvely stationary before feature extraction. Using principal components extracted from the feature set during model training did not have a considerable impact on results. In order to better evaluate the results, a cross-validation process was performed to eliminate data set bias. During this step, the best model achieved 100% recall and 98.82% F1-score for the minority class. In the future, testing additional feature selection methods, as well as assessing feature importance using more explainable metrics is crucial to further understand the distinctions that separate stars with exoplanets from those without.