- Main
A Comparison of Imputative Capability on Algorithms for fitting the PARAFAC model to Biological Data
- Hodzic, Enio
- Advisor(s): Meyer, Aaron S
Abstract
Researchers often find it challenging to simplify complex data sets for downstream analysis.The combination of multiple variables can lead to complexity, and organizing such data sets into a higher dimensional structure can be more intuitive. For these data sets with an inherent multi-modal structure, a variety of dimensionality reduction techniques have allowed researchers to explore and infer biological interactions more effectively. Higher-order dimensionality reduction techniques all serve to accomplish the same purpose - to reduce the original data set and recover meaningful and interpretable patterns. The CANDECOMP/ PARAFAC (CP) model, a frequent choice among researchers for its interpretability, still requires metrics for validating its performance and assuring an appropriate model complexity is selected. While a common benchmark for these methods’ validation is typically the total residual error, imputation error (prediction error) can serve as a more trusted alternative. We describe an algorithm for fitting the PARAFAC model, censored alternating least squares, that innately handles missing values and compare it amongst alternating least squares and direct optimization using simulated and real data sets with varying degrees of missing values using these performance metrics. While each method has its own benefits, censored alternating least squares appears best suited for handling missing values, commonly present in the data that researchers look to investigate.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-