Erratum to: A novel procedure on next generation sequencing data analysis using text mining algorithm

Zhao, Weizhong; Chen, James J.; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

doi:10.1186/s12859-016-1156-9

Erratum
Open access
Published: 03 August 2016

Erratum to: A novel procedure on next generation sequencing data analysis using text mining algorithm

Weizhong Zhao^1,2,
James J. Chen¹,
Roger Perkins¹,
Yuping Wang¹,
Zhichao Liu¹,
Huixiao Hong¹,
Weida Tong¹ &
…
Wen Zou¹

BMC Bioinformatics volume 17, Article number: 301 (2016) Cite this article

1476 Accesses
4 Citations
2 Altmetric
Metrics details

The Original Article was published on 13 May 2016

Erratum

After publication of the original article [1] it was brought to our attention that the following was incorrectly placed under subheading ‘3. Classification analysis and comparison’ of subsection ‘Evaluation of topic modeling performance’ of the ‘Methods’ section:

Topic model-derived clustering method [33] was applied, in which LDA was utilized as a feature reduction approach for cluster analysis. The LDAderived topics were considered as the new features of datasets. The sample-topic matrix (Fig. 1(f)) was treated as a new representation of the original dataset. Based on the sample-topic matrix (topic number was chosen as 5 and 30, respectively), conventional clustering algorithms, such as k-means, was used for the clustering analysis. The number of clusters was set as 7 in the k-means method due to 7 different serotypes in the dataset. While in comparison, k-means algorithm was also applied on VSM matrix using Hamming Distance similarities. For further comparison, due to the dimension reduction of topic modeling approach, the traditional tool of PCA was used to reduce features (Numbers of 2, 5, 10 and 30 were randomly selected as the reduced features, respectively) of VSM matrix followed by the k-means cluster analysis. Moreover, clustering by only LDA referred as “highest probable topic assignment” [33] (5 and 30 topics were used) was also used for comparison. In “highest probable topic assignment”, the LDA-derived topics were made as the clusters of the dataset. Then, each sample was assigned to the cluster (Topic) with the highest probability in the row of the sample-topic matrix. To interpret the clustering results obtained by the k-means algorithm, samples in each cluster were labeled as the dominant serotype of the samples in the cluster. The predicted labels of samples were compared with the true labels (serotypes) to evaluate the clustering quality. The clustering results were evaluated by Normalized mutual information (NMI) [34] and Adjusted Rand Index (ARI) [35]. NMI and ARI are two external validation metrics to evaluate the quality of clustering results with respect to the given true labels of datasets. The range of NMI and ARI values is 0–1. In general, the larger the value is, the better the clustering quality is.

This passage belongs under subheading ‘2. Cluster analysis and result comparison’ of subsection ‘Evaluation of topic modeling performance’ of the ‘Methods’ section.

References

Zhao et al. A novel procedure on next generation sequencing data analysis using text mining algorithm. BMC Bioinformatics 2016;17:213. doi 10.1186/s12859-016-1075-9

Download references

Author information

Authors and Affiliations

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, HFT-20, Jefferson, AR, 72079, USA
Weizhong Zhao, James J. Chen, Roger Perkins, Yuping Wang, Zhichao Liu, Huixiao Hong, Weida Tong & Wen Zou
College of Information Engineering, Xiangtan University, Xiangtan, Hunan Province, China
Weizhong Zhao

Authors

Weizhong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
James J. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Roger Perkins
View author publications
You can also search for this author in PubMed Google Scholar
Yuping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huixiao Hong
View author publications
You can also search for this author in PubMed Google Scholar
Weida Tong
View author publications
You can also search for this author in PubMed Google Scholar
Wen Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Zou.

Additional information

The online version of the original article can be found under doi:10.1186/s12859-016-1075-9.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Zhao, W., Chen, J.J., Perkins, R. et al. Erratum to: A novel procedure on next generation sequencing data analysis using text mining algorithm. BMC Bioinformatics 17, 301 (2016). https://doi.org/10.1186/s12859-016-1156-9

Download citation

Published: 03 August 2016
DOI: https://doi.org/10.1186/s12859-016-1156-9

Erratum to: A novel procedure on next generation sequencing data analysis using text mining algorithm

Erratum

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

BMC Bioinformatics

Contact us