Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks

Fig. 2

Complete text mining pipeline using NLProt and LAITOR4HPC. a MEDLINE files are downloaded from NCBI FTP as XML files; b a Python parser is executed to convert the XML files into input files for NLProt which are then c transferred into the interactive (head) node of the HPC system. d A job is then started and i different processes are launched in parallel on 60 computing cores (where: {i ∈ Ζ| {0 < i < 1305}). e In each core, the corresponding i-th MEDLINE input file is tagged by NLProt which generates f an i-th NLProt output file, which is then placed back to the head node together with the other outputs. g These files are used together with the DB file as input for the LAITOR4HPC job; h which loads an in-memory database before the i tagging of the bioentities and biointeraction present in the corpus. j After completion, the results are placed back to the head node and made available for downstream applications

Back to article page