Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: HostNet: improved sequence representation in deep neural networks for virus-host prediction

Fig. 1

The overall architecture of HostNet. a The viral sequences are the original input to the system, which can be reads, contigs, or whole genomes. b The raw data is pre-processed by denoting single mixed bases in the sequence with N and filtering out sequences containing consecutive mixed base fragments. c The original data is divided into training, validation, and test datasets according to the specified proportions. d The genome sequence is vectorized using the K2V method and a pre-trained model. e The genome vector sequence is divided into subsequences by the ASW method. f The deep learning-based sequence analysis model contains Transformer encoder layers, convolutional layers, and BiGRU layers to capture the sequence features automatically. g The model's evaluation is based on metrics such as accuracy, precision, recall, and F1-score

Back to article page