Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana

Fig. 4

Data processing in the LASSO and Ridge regression models. Sequences (nucleotide, codon, or corresponding amino acid sequence), stability of secondary structures, and ribosome occupancy information was obtained, and features with multi-collinearity among explanatory variables or no correlation to the objective variable were removed from the feature extraction process. Cleavage sites (CSsite values) were divided into training and test data sets, and input data were formatted. Subsequently, the LASSO or Ridge regression model was constructed using the training dataset. Finally, model performance was evaluated using test data, and features of non-zero coefficients were estimated according to the importance score (coefficients in the LASSO or Ridge regression model)

Back to article page