Exploration of machine learning techniques: PCA and sparse PCA coupled with random forest and Gaussian naive Bayes classifiers for breast cancer diagnosis
; ; Shalaby, Shefaa ; Khawfani, Ruba ; Almuhaya, Rahaf ; Alharbi, Wejdan
Shalaby, Shefaa
Khawfani, Ruba
Almuhaya, Rahaf
Alharbi, Wejdan
Type
Supervisor
Subject
Date
2026-01-17
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
ML applications require several steps before generating the ML model itself. It requires data exploration phase, which is by itself requires several steps, such as preprocessing, prunning, and feature reduction. In this chapter, we will focus on feature reduction techniques with a breast cancer dataset to experiment the models. Principal Component Analysis (PCA) is a widely used technique for data dimension reduction. Still, its reliance on linear combinations of all original variables can hinder interpretability, particularly in high-dimensional datasets. Sparse Principal Component Analysis (SPCA), a variant of PCA that incorporates sparsity by selecting only a subset of the most informative variables, addresses this limitation. This study investigates the effectiveness of SPCA in enhancing the clarity and usefulness of PCA analysis. We apply SPCA to a range of real-world and simulated datasets, focusing on breast cancer diagnosis using the Wisconsin breast cancer (diagnostic) dataset. We demonstrate improve
Department
Publisher
Sponsor
Effat University
Copyright
Book title
Mathematical Modeling for Big Data Analytics
