Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts
© The Author(s) 2016
Published: 6 October 2016
The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs.
For approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms.
LSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin.
There is growing recognition that miRNAs regulate various diseases and biological processes [1–4] as evidenced by the rapidly growing body of literature related to miRNAs (Additional file 1: Figure S1). There are manually curated repositories such as miRBase  and miR2Disease  that catalog miRNAs in several organisms as well as summarize their associations with diseases and other biological processes. However, it is generally accepted that manual curation is unable to keep up with the rapidly growing genomic information . For instance, miRBase has not been updated since 2014 and miR2Disease has not been updated since 2009. It is therefore imperative to devise automated methods that can keep pace with the functional information which is deposited in the biomedical literature with respect to miRNAs.
Information retrieval (IR) is a key component of text mining . It consists of three types of models: set-theoretic (Boolean), probabilistic, and algebraic (vector space). Documents in each case are retrieved based on Boolean logic, probability of relevance to the query, and the degree of similarity to the query, respectively. The concept of literature-based discovery was introduced by Swanson  and has since been extended to many different areas of research. In the gene space, several approaches have focused on mining both explicit associations based on co-occurrence , as well as implicit associations based on higher order co-occurrence and indirect relationships .
Several IR approaches have focused on mining miRNA specific associations. miRCancer , miRSel  and miRTex  use co-occurrence and sentence level natural language processing to automatically extract direct relationships from text between miRNAs and genes or diseases. While useful, these tools may miss miRNA interactions where direct relationships were not explicitly stated. In such cases, automated extraction of semantic relationships would be useful to associate genes and miRNAs based on shared biological processes. Also, explicit relationships such as those based on co-occurrence count between miRNAs and genes may be harder to prioritize if they have the exact same score. In contrast, semantic associations that take into account other relationships could be useful for prioritization of miRNA and gene associations .
Aside from exploring miRNA to gene associations, semantic analysis could be useful for other research scenarios. For example, investigators may want to prioritize candidate miRNAs for specific diseases or phenotypes. Alternatively, investigators may want to understand the functional pathways shared between different miRNAs. To address these needs, we developed and evaluated an LSI based text mining approach. Previously, we applied LSI to extract functional relationships amongst genes  as well as relationships between genes and transcription factors  from MEDLINE abstracts. LSI uses Singular Value Decomposition (SVD) [17, 18], which is a dimensionality reduction technique that decomposes the original term-by-document weighted frequency matrix into a new set of factor matrices that can be used to represent both terms and documents in lower-dimensional subspace. Previously, we demonstrated that LSI can extract both explicit (direct) and implicit (indirect) semantic relationships amongst genes. In addition, LSI allows genes to be prioritized based on keyword queries as well as gene-abstract queries with better accuracy than term co-occurrence methods . Here, we applied this approach to miRNAs and demonstrate its utility to prioritize, cluster and functionally annotate miRNAs. The accompanying web based tool, miRNA Literature Network (miRLiN), available at http://bioinfo.memphis.edu/mirlin, provides an automated framework for interactively extracting and discovering functional information on human miRNAs based on up to date biomedical literature.
miRNA document collection
For 1881 human miRNAs indexed in the miRBase repository, 3 different abstract collections were built. Firstly, a curated collection limited to manually assigned abstracts was constructed. A total of 8110 unique abstracts (citations) cross referenced in the linkouts from miRBase as well as Entrez Gene  were collected. These citations (identified by unique PubMed identifiers or PMIDs) have been assigned either by professional staff at the National Library of Medicine, or by the scientific research community via Gene Reference into Function (GeneRIF) portal, or by curators of miRBase. Since these abstracts are manually curated, they are expected to have a very high precision for tagging correct citations to miRNAs but at the same time the number of citations referenced for each miRNA is a small proportion of the total number of relevant citations in MEDLINE for that miRNA, resulting in low recall.
In order to increase the information content for the miRNAs, a retrieved collection was built by querying the PubMed repository. A single miRNA can be referenced in the literature in several spelling variants e.g., mir19a, mir-19a, microRNA19a, microRNA-19a etc. For each miRNA, all such tentative synonyms with and without hyphens were constructed, and a PubMed query with the form ‘ synonym #1 OR synonym #2 OR...OR synonym #n’ was submitted using the NCBI efetch utility for retrieving relevant citations that have at least one synonym present in either title or abstract. Further restrictions were added to the query to limit the search to abstracts relevant to humans and miRNAs. A total of 19191 unique citations were retrieved.
The two collections were merged to get 19527 unique citations. We further filtered the nonspecific citations by removing PMIDs that referred to 7 or more miRNAs. Typically, these citations described sequencing experiments which mentioned a large number of miRNAs without substantive biological or mechanistic information. This threshold of 7 was derived as the smallest right outlier in the distribution of numbers of miRNAs linked to each unique citation. The outlier calculation was based on the IQR (interquartile range). The IQR is Q3 (75th percentile) – Q1 (25th percentile). The designated outliers were >Q3+1.5∗I Q R. Post filtering, 17076 unique citations and 878 active miRNAs (the ones referenced by at least one citation) remained in the collection, which comprised of less than half of the original number of 1881 miRNAs. Thus a large number of miRNAs were excluded from our collection because they lacked a specific citation. The number of citations assigned to the active miRNAs ranged from 1 (28 % of the collection) to 1451. The average and median number of citations in the collection were 38 and 4, respectively. For each of 878 active miRNAs, a miRNA document was created by concatenating the titles and abstracts of all citations referenced by the miRNA.
Construction of the LSI model
where f ij is the frequency of the i th term in the j th miRNA-document, p ij is the probability of the i th term occurring in the j th miRNA-document and n is the number of miRNA documents in the collection. The log-entropy weighting scheme is based on information-theoretic concepts and takes into account the distribution of terms over miRNA documents and has been found to be more useful in extracting implied relationships .
where ′ indicates transpose of the matrix obtained by permuting the modes, i.e., transforming rows into columns and vice versa, U is n×r, S is r×r, and V is m×r (V ′ is r×m). Both U and V are orthogonal, i.e., U U ′=I and V V ′=I where I is the identity matrix. S is a diagonal matrix with non-negative and non-increasing entries σ 1,σ 2,...,σ r which are known as singular values. r is the rank of the matrix, which is the number of linearly independent rows or columns of A. It is however, known from observation, for most practical datasets, r=m. The third matrix V is written as a transpose so that the rows of both matrices U and V correspond to terms and miRNAs, respectively.
The rows of A can be interpreted as term coordinates in an m-dimensional space. The axes of this space can be interpreted as rows of I (identity matrix). The SVD transforms the term coordinates to rows of U and the axes to the rows of S V ′. The matrix V ′ acts as the rotation matrix for the original axes and the diagonal of matrix S contains the scaling factor for each axis. The U matrix can now be construed as a new transformed dataset whose rows still correspond to the original n terms but the miRNAs are transformed into r eigen miRNAs (factors) that are a linear combination of the original miRNAs.
A ′ reverses the roles of terms and miRNAs. V plays the role originally played by U and U plays the role originally played by V. Since S is diagonal, S=S ′. The SVD transforms the miRNA coordinates to rows of V and the axes to the rows of S U ′. The matrix U ′ acts as the rotation matrix for the original axes and the diagonal of matrix S contains the scaling factor for each axis. The V matrix can now be construed as a new transformed dataset whose rows still correspond to the original m miRNAs but the terms are transformed into r eigen terms (factors) that are a linear combination of the original terms.
The new scaled and rotated axes and the coordinates tend to better fit the data than the original axes and coordinates. The singular values in S determine the relative importance of each axis. The first few axes capture the maximum variation in the data and the subsequent ones less so. Only the first k (where k<r) factors corresponding to k largest singular values may be used to represent the data. There are two potential benefits of performing this truncation. Firstly, for large datasets (with many attributes), this translates into savings in memory space as well as analysis time, as vectors in k dimensions can be compared in less time than vectors in m dimensions. Secondly, SVD reveals the true dimensionality present in the data, where the bulk of the information content in the original m-dimensional data may be captured in a lower dimensional manifold, after axis rotation and scaling.
Entropy measures the amount of disorder in the set of variations captured in the r dimensions. The magnitude of the entropy may vary from 0 (all variation is captured in the firrst dimension) to 1 (all dimensions are equally important). k is calculated as E×r. For the term-by-miRNA matrix, k was computed to be 560.
The association between any pair of entities (term-term, term-miRNA, miRNA-miRNA) can be calculated as the cosine of the angle between the respective k-dimensional vectors. The association scores can theoretically fall between −1 and 1, but in practice were observed to occur between −0.2472 and 1. A higher association score between a pair of entities indicates a stronger relationship in literature.
Information Gain calculation
miR2Disease was used for evaluating LSI performance. It is a comprehensive database containing descriptions of more than 100 diseases and their associated miRNAs.
The term-to-miRNA and miRNA-to-term prioritizations were evaluated against gold standards by generating Receiver Operating Characteristics (ROC) curves which display recall and false positive rates at each rank. The area under the curve (AUC) can be used as a measure of ranking quality [24, 25]. The AUC will have the value of 1 for perfect ranking (all relevant entities at the top), 0.5 for randomly generated ranking, and 0 for the worst possible ranking (all relevant entities at the bottom).
The cohesion for a set of miRNAs was calculated as described in [11, 26]. Given a set of n miRNAs for a disease, n AUCs were calculated. Each miRNA was treated as a query and the rest of the n−1 miRNAs were treated as gold standard. The set of all miRNAs (for all diseases) in miR2Disease were prioritized against the query miRNA using the cosine between the miRNA vectors as the similarity measure, and an AUC was calculated. The median AUC out of n AUCs was treated as the cohesion. If a set of miRNAs for a disease are closely related, then the miRNAs in the set would ideally have high cosine association with each other compared to remaining miRNAs that are not in the set, signifying a highly cohesive set.
miRNA Literature Landscape
Evaluation of the LSI model
LSI is a robust approach to extract both explicit and implicit relationships between terms and miRNAs directly from the biomedical literature. In this study, the performance of the LSI model was evaluated based on three different use-case scenarios as described below.
miRNA ranking by term query
Performance of the LSI model on disease or physiology term queries against expert determined gold standards culled from review papers
# Terms in
# miRNAs in
Cholesterol Homeostasis 
Endothelium (endothelial) 
HDL high density lipoprotein 
COPD Chronic obstructive pulmonary disease 
Cystic Fibrosis 
Idiopathic pulmonary fibrosis 
Term ranking by miRNA query
Another use-case for researchers would be to functionally annotate groups of miRNAs. This is relevant to genomic experiments which generally yield many differentially expressed miRNAs. Here, the miRNAs are treated as the query and the relevant terms are rank ordered. miR2Disease was used to select groups of miRNAs that were assigned to specific diseases. To evaluate the performance of the LSI model, the top 300 ranked terms associated with the group of miRNAs were compared to the disease descriptors in miR2Disease database. A threshold of 300 terms was chosen because it would be impractical for users to consider the entire prioritized list of 68596 terms and also to reduce the computational burden. The list of diseases and their respective term AUCs are available in Additional file 2: Table S1B. The AUCs for 59 diseases could not be obtained as none of the constituent terms in the names of these diseases were found amongst the top 300 ranked terms. Among the queries which returned at least one disease term in the top 300 ranked terms, 27 (46 %) queries produced an AUC above 0.8. Surprisingly, the average AUC for the gold standard II list was 0.54 and none of the disease queries produced and AUC above 0.8 (Additional file 2: Table S2A). These results suggest that the top 300 terms extracted from the LSI model may be related to other topics (such as molecular functions etc.) than only diseases.
miRNA ranking by miRNA query
Clustering and functional annotation of miRNAs
The terms were manually examined and used to label each cluster in Fig. 6. For instance, the largest cluster containing 73 miRNAs is associated with Alzheimer disease. This number is slightly different from the number (64 miRNAs) of Alzheimer related miRNAs in miR2Disease database. Interestingly, the largest miR2Disease group of miRNAs (152) was associated with hepatocellular carcinoma. It is important to note that the top nine miR2Disease categories, containing between 114 to 152 miRNAs, were all associated with some form of cancer. This suggests that there is a large bias in the miRNA databases as of 2009. By comparison, we found that the LSI-based clusters contained smaller number of miRNAs that were associated with more specific terms, which were functionally aligned. These results indicate that LSI based clustering allows for more robust functional clustering and more specific functional annotation beyond simply assigning miRNAs to diseases.
miRLiN web tool
For benchmarking, we compared the performance of our web tool with two existing web tools, miRCancer  and miRiaD . While both tools are disease focused, miRLiN is more flexible and can accept any type of query. Also, both tools rely on databases with binary associations between miRNAs and diseases. In contrast, miRLiN ranks miRNAs based on the functional relevancy to the query and also enables a genome-wide network view of miRNAs with multiple associations to one another. Additional file 2: Tables S4A and S4B compare the results from the 3 web tools for ‘choriocarcinoma’ and ‘meningioma’ queries. For the ‘choriocarcinoma’ query, miRCancer listed 3 miRNAs (hsa-mir-199b, hsa-mir-218, hsa-mir-34a) while miRiaD listed 2 additional miRNAs (hsa-mir-141, hsa-mir-126). Importantly, miRLiN retrieved all 5 miRNAs within the top 15 ranked miRNAs (Additional file 2: Table S4A). We manually evaluated the 10 additional miRNAs retrieved by miRLiN. We found that miRNAs hsa-mir-378a, hsa-mir-371b, hsa-mir-371a, hsa-let-7g, hsa-mir-373, hsa-mir-141 and hsa-mir-15a were co-mentioned with the query term in the same abstracts, but not in the same sentences. It appears that these miRNAs were found to be differentially expressed in choriocarcinoma cell lines. One miRNA hsa-mir-145 was co-mentioned with the query term in the same sentence that suggests a direct link. Interestingly, hsa-mir-585 association with choriocarcinoma appeared to be indirect via its association with hsa-mir-218. In addition, the abstract for hsa-mir-141 in miRLiN was different from the other two web tools, suggesting that our abstract retrieval approach is slightly different than the other two methods. Lastly, hsa-mir-624 did not appear to be related to choriocarcinoma or any other type of cancer, thus appears to be a false discovery.
For the ‘meningioma’ query, miRCancer retrieved 4 miRNAs (hsa-mir-128, hsa-mir-200a, hsa-mir-224, hsa-mir-335) and miRiaD retrieved 4 additional miRNAs (hsa-mir-145, hsa-mir-190, hsa-mir-219, and hsa-mir-29). Only two meningioma related miRNAs overlapped between miRiaD and miRCancer. In comparison, miRLiN retrieved all but one (hsa-mir-145, ranked 25th) amongst the top 12 ranked miRNAs (Additional file 2: Table S4B). Moreover, miRLiN identified two additional miRNAs (hsa-mir-4417 and hsa-mir-185). Manual examination found that hsa-mir-185 is in fact negatively associated with meningioma, where the citation explicitly negates its involvement in meningioma. This result reveals a shortcoming of our method, which does not take into account negations and other parts of speech that are considered in NLP based approaches. Lastly, manual examination did not find an association between hsa-mir-4417 and meningioma, albeit it is associated with other types of cancer.
We have developed an LSI based approach to prioritize, cluster and functionally annotate miRNAs. LSI enables representation of miRNAs and terms as vectors in low dimensional space that can be compared against each other. LSI provides an advantage over co-occurrence based methods as semantic associations between entities take into account not only the entities being compared but also indirect associations amongst all other related entities in the collection. Several choices were made in the construction of the model that affects its performance. The rationale behind the choices and the potential ramifications of the alternatives are discussed below.
While building the miRNA document collection, citations that referenced more than 7 miRNAs were filtered out. Manual examination of citations revealed that certain high throughput screening papers were associated with many miRNAs but these papers did not describe any functional information about the specific miRNAs. For instance, many citations described sequencing experiments that identified several miRNAs. Inclusion of such citations in the model would create strong semantic associations between pairs of miRNAs that are otherwise remotely related. Better automated methods are needed to identify and filter such abstracts that do not describe any functional relationships.
Our results suggest that parsing of terms from miRNA documents still needs improvement. We found that many of the top 300 terms associated with groups of miRNAs were indeed too specific, relating to gene symbols or non-standard abbreviations used in the papers. For the current LSI model, only designated stopwords were removed prior to factorization. Automated methods may need to be investigated that can filter out additional non-useful terms. Stemming of the terms to their roots may also be useful in terms of reducing the dictionary size, although strategies for expanding the roots to the most relevant expansion will need to be devised once the terms are to be used for functional annotation. Currently, the selection of interesting functional annotation terms is still manual but could be automated by restricting to MeSH , GO  and KEGG . However, this filtering strategy may result in loss of interesting terms such as gene or transcription factor names or phrases like ‘acaa-deletion’ that may indirectly link the miRNAs to a physiology or a biological function or a disease.
Several other methods may need to be investigated in the future to improve the performance of the LSI approach. For instance, different types of normalization methods for the term-by-miRNA matrix, in addition to the log-entropy method, may need to be investigated . In the current study, an entropy based method was used to select k highest magnitude singular values. Other strategies have been discussed in the literature that may improve performance . The web tool currently displays top 50 miRNAs and 300 terms in response to a query. Automated methods, such as one used for determining the singular value threshold, may also be useful in devising a prioritization threshold for cosines. Finally, adding collections for other model organisms such as mouse, rat etc. will make a more comprehensive text mining database for miRNAs.
All together, we have demonstrated that an LSI based approach provides a robust and automated method to interrogate the large amount of literature that is accumulating with respect to miRNAs. The approach enables rapid prioritization of miRNAs in relation to keyword or miRNA queries. Furthermore, the LSI based approach allows for global clustering of all miRNAs based on functional information in the literature and provides a method for annotating groups of miRNAs with highly specific terms and concepts.
The authors would like to thank the University of Memphis High Performance Computing facility for providing the needed computational resources for this study.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 13, 2016: Proceedings of the 13th Annual MCBIOS conference. The full contents of the supplement are available online at http://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-13.
This work and its publication was supported in part by the Memphis Research Consortium and the University of Memphis Center for Translational Informatics.
Availability of data and materials
The LSI model described in the paper can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin.
SR and RH designed the research and wrote the manuscript. SR and BM conducted data analysis. BC developed the web tool. RH supervised the research and assisted with interpretation of results. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Urbich C, Kuehbacher A, Dimmeler S. Role of micrornas in vascular diseases, inflammation and angiogenesis. Cardiovascular research. 2008; 156:581–88. doi:10.3389/fphys.2016.00021.View ArticleGoogle Scholar
- Nelson PT, Wang WX, Rajeev BW. Micrornas (mirnas) in neurodegenerative diseases. Brain Pathology. 2008; 18(1):130–138.View ArticlePubMedPubMed CentralGoogle Scholar
- Garzon R, Calin GA, Croce CM. Micrornas in cancer. Annual review of medicine. 2009; 60:167–179.View ArticlePubMedGoogle Scholar
- Huang Y, Shen XJ, Zou Q, Wang SP, Tang SM, Zhang GZ. Biological functions of micrornas: a review. J Physiol Biochem. 2011; 67(1):129–139.View ArticlePubMedGoogle Scholar
- Kozomara A, Griffiths-Jones S. mirbase: annotating high confidence micrornas using deep sequencing data. Nucleic acids Res. 2014; 42(D1):68–73.View ArticleGoogle Scholar
- Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y. mir2disease: a manually curated database for microrna deregulation in human disease. Nucleic acids Res. 2009; 37(suppl 1):98–104.View ArticleGoogle Scholar
- Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007; 23(13):41–48.View ArticleGoogle Scholar
- Baeza-Yates R, Ribeiro-Neto B, et al., Vol. 463. Modern Information Retrieval. New York: ACM Press; 1999.Google Scholar
- Swanson DR. Fish oil, raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986; 30(1):7–18.View ArticlePubMedGoogle Scholar
- Chen H, Sharp BM. Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinformatics. 2004; 5(1):1.View ArticleGoogle Scholar
- Burkart MF, Wren JD, Herschkowitz JI, Perou CM, Garner HR. Clustering microarray-derived gene lists through implicit literature relationships. 2007; 23(15):1995–2003.Google Scholar
- Xie B, Ding Q, Han H, Wu D. mircancer: a microrna–cancer association database constructed by text mining on literature. Bioinformatics. 2013; 29(5):638–44.View ArticlePubMedGoogle Scholar
- Naeem H, Küffner R, Csaba G, Zimmer R. mirsel: automated extraction of associations between micrornas and genes from the biomedical literature. BMC Bioinformatics. 2010; 11(1):135.View ArticlePubMedPubMed CentralGoogle Scholar
- Li G, Ross KE, Arighi CN, Peng Y, Wu CH, Vijay-Shanker K. mirtex: A text mining system for mirna-gene relation extraction. PLoS Comput Biol. 2015; 11(9):1004391.View ArticleGoogle Scholar
- Roy S, Heinrich K, Phan V, Berry MW, Homayouni R. Latent semantic indexing of pubmed abstracts for identification of transcription factor candidates from microarray derived gene sets. BMC Bioinforma. 2011; 12(10):1.Google Scholar
- Homayouni R, Heinrich K, Wei L, Berry MW. Gene clustering by latent semantic indexing of medline abstracts. Bioinformatics. 2005; 21(1):104–15.View ArticlePubMedGoogle Scholar
- Golub GH, Van Loan CF, Vol. 3. Matrix Computations. Baltimore: JHU Press; 2012.Google Scholar
- Skillicorn D. Understanding Complex Datasets: Data Mining with Matrix Decompositions. Abingdon: CRC press; 2007.View ArticleGoogle Scholar
- Entrez Gene. http://www.ncbi.nlm.nih.gov/gene. Accessed 1 Mar 2016.
- Zeimpekis D, Gallopoulos E. Tmg: A matlab toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data. Heidelberg: Springer: 2006. p. 187–210.Google Scholar
- Salton G. The smart document retrieval project. In: Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM: 1991. p. 356–358.Google Scholar
- Berry MW, Browne M, Vol. 17. Understanding Search Engines: Mathematical Modeling and Text Retrieval. Piladelphia: Siam; 2005.View ArticleGoogle Scholar
- Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci. 2000; s97(18):10101–10106.View ArticleGoogle Scholar
- Metz CE. Basic principles of roc analysis. In: Seminars in Nuclear Medicine. Amsterdam: Elsevier: 1978. p. 283–298.Google Scholar
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology. 1982; 143(1):29–36.View ArticlePubMedGoogle Scholar
- Jelier R, Jenster G, Dorssers LC, van der Eijk CC, van Mulligen EM, Mons B, Kors JA. Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics. 2005; 21(9):2049–2058.View ArticlePubMedGoogle Scholar
- Nascimento MC, De Carvalho AC. Spectral methods for graph clustering–a survey. Eur J Oper Res. 2011; 211(2):221–231.View ArticleGoogle Scholar
- Gupta S, Ross KE, Tudor CO, Wu CH, Schmidt CJ, Vijay-Shanker K. miriad: A text mining tool for detecting associations of micrornas with diseases. J Biomed semant. 2016; 7(1):1.View ArticleGoogle Scholar
- Lipscomb CE. Medical subject headings (mesh). Bull Med Libr Assoc. 2000; 88(3):265.PubMedPubMed CentralGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.Gene ontology: tool for the unification of biology. Nature Genet. 2000; 25(1):25–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al.Kegg for linking genomes to life and the environment. Nucleic Acids Res. 2008; 36(suppl 1):480–484.Google Scholar
- Zhu M, Ghodsi A. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis. 2006; 51(2):918–30.View ArticleGoogle Scholar
- Jeon TI, Osborne TF. mirna and cholesterol homeostasis. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids. 2016. doi:10.1016/j.bbalip.2016.01.005.
- Araldi E, Suárez Y. Micrornas as regulators of endothelial cell functions in cardiometabolic diseases. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids,. 2016. doi:10.1016/j.bbalip.2016.01.013.
- Price NL, Fernández-Hernando C. mirna regulation of white and brown adipose tissue differentiation and function. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids. 2016. doi:10.1016/j.bbalip.2016.02.010.
- Karunakaran D, Rayner KJ. Macrophage mirnas in atherosclerosis. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids. 2016. doi:10.1016/j.bbalip.2016.02.006.
- ángel Baldán, de Aguiar Vallim TQ. mirnas and high-density lipoprotein metabolism. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids. 2016. doi:10.1016/j.bbalip.2016.01.021.
- Maltby S, Plank M, Tay HL, Collison A, Foster PS. Targeting microrna function in respiratory diseases: mini-review. Front Physiol. 2016; 7:21.View ArticlePubMedPubMed CentralGoogle Scholar