Osmoprotectants figure among the most fundamental solutes in living organisms, being present from bacteria and fungi to higher plants and animals . Main plant osmoprotectants are chemically composed by amino acids or carbohydrates, but share common features as low molecular weight and nontoxic character even at high concentrations, playing vital roles during abiotic stresses in plants as salinity, drought and chilling .
To face such constraints many plants accumulate organic osmolytes, or compatible solutes, in response to the resulting osmotic stress, maintaining cell turgor and therefore the driving gradient for water uptake. They include sugars, mainly fructose and sucrose, sugar alcohols (like Myo-inositol), complex sugars (like trehalose and fructans) and charged metabolites (like glycinebetaine, proline and ectoine) [1, 3].
Osmolytes can also act as free-radical scavengers or chemical chaperones by directly stabilizing membranes and/or proteins . Moreover, the accumulation of compatible solutes may also protect plants against damage by scavenging of reactive oxygen species, and by their chaperone-like activities in maintaining protein structures and functions . Plant cells defend against stresses by modulating their expression according to the type and severity of stress and developmental stage of the plant .
Most previous works focused on expression assays regarding a single osmoprotectant as in Chen et al.  or searches in EST databases as in Barros et al.  or even their expression evaluation in transgenic plants [9, 10]. No previous appreciation regarding in deep evaluation of transcriptomics databases generated under stress with Next Generation Sequencing (NGS) was carried out up to date. In the present work an 'in silico' annotation workflow was carried out integrating high throughput transcriptomics in soybean (Glycine max) plants under water deficit and biotic stress using HT-SuperSAGE, as compared with traditional transcriptomics and genome distribution of plant osmoprotectants.
The present approach focused on seven genes related to the biosynthesis of four classes of the most important plant osmoprotectants: Proline (genes P5CS and P5CR), Trehalose (TPS1 and TPPB), Glycine betaine (BADH and CMO) and Myo-inositol (INPS1).
Proline - Comprises a proteinogenic amino acid, essential for primary metabolism in plants during drought and salt stresses, presenting a molecular chaperone role due to its stabilizing action either as a buffer to maintain the pH of the cytosolic redox status of the cell  or as antioxidant through its involvement in the scavenging of free highly reactive radicals  or still acting as a singlet oxygen quencher . In higher plants, proline biosynthesis may proceed either via glutamate, by successive reductions catalyzed by Delta(1)-pyrroline-5-carboxylate synthase (P5CS) and Delta(1)-pyrroline-5-carboxylate reductase (P5CR) or by ornithine pathway, by ornithine d-aminotransferase (OAT), representing generally the first activated osmoprotectant after stress perception [14, 15].
Trehalose - In plants this sugar participates mainly in the response to dehydration being first described in the so called resurrection plants Myrothamnus flabellifolius  and Selaginella tamariscina  both able to recover after almost complete dehydration. Such ability to act in the stabilization of proteins and membranes , as well as its role in ROS scavenging process  are the possible features of its cellular function during non-ideal conditions encountered by plants, where it's synthesis normally occurs by the formation of the trehalose-6-phosphate (T6P) from the UDP-glucose and glucose-6-phosphate, a reaction catalyzed by the trehalose 6-phosphate synthase (TPS). Afterwards the T6P is dephosphorylated by the trehalose-6-phosphate phosphatase (TPP) resulting in the formation of free trehalose . A transgenic assay using Agrobacterium-mediated gene transfer allowed the insertion of the gene TPS1 from yeast to tomato plants and resulted in higher content of chlorophyll and starch, besides pronounced tolerance to drought, salinity and oxidative stress, despite some pleiotropic changes .
Glycine betaine (GB) - Regards a quaternary ammonium compound (QAC) occurring in plants, animals and microorganisms. According to Chen and Murata  GB accumulates in chloroplasts and plastids especially in halotolerant plants, but also in other plants under high salinity, drought and cold stresses , with a recognized role associated to antioxidative responses . In most organisms GB is synthesized either by the oxidation (or dehydrogenation) of choline or by the N-methylation of glycine. However, the pathway from choline to GB has been the main GB-accumulation pathway in plant species . In this pathway choline is converted to betaine aldehyde by choline monooxygenase (CMO) , which is then converted to GB by betaine aldehyde dehydrogenase (BADH) .
-inositol - This osmoprotectant is an important cellular component forming the basis of a significant number of lipid signaling molecules involved in diverse pathways, including stress responses. Myo-inositol is the most abundant stereoisomer among the nine existing in nature, composed by a cyclohexanehexol, which is a cyclic carbohydrate with six hydroxyl groups, one on each carbon ring , acting as substrate in the biosynthesis of many compounds, especially the raffinose family oligosaccharides (RFOs)  that accumulate in plants under stress conditions . In multicellular eukaryotes, Myo-inositol becomes incorporated into phosphatidylinositol phosphate (PtdInsP), Myo-inositol phosphate (InsP), and certain sphingolipid signalling molecules that act in diverse processes, including regulation of gene expression . It is synthesized by a two-step pathway, including: (1) conversion of D-glucose-6-P to D-Myo-inositol (1)-Monophosphate, 1D-MI-1-P, which is catalyzed by a L-Myo-inositol 1-phosphate synthase (MIPS) , and (2) specific dephosphorylation to free Myo-inositol by the Mg++ dependent L-Myo-inositol 1-phosphate phosphatase (IMP) ).
Considering the potential of these molecules for plant biotechnological approaches, the present work generated a curated list of osmoprotectants, osmoprotectant-related sequences and important regulatory elements, indicating most adequate tools for their identification and annotation. To evaluate the sensitivity of the proposed approach, the generated seed sequences and the proposed workflow were used to search of osmoprotectant-related sequences in short sequences (26 bp) generated from HT-SuperSAGE  deposited in the GENOSOJA (Brazilian Soybean Genome Consortium) data Bank . A significant number of tags matched to known osmoprotectant-related sequences showing the effectiveness of the present approach useful for searches in other (actually very abundant) databanks comprising second generation sequences associated to the high performance sequencing approaches [e.g. Pyrosequencer (454 Roche®), Solexa (Illumina®) and SOLiD (Applied Biosystems®)] regarding genomic and transcriptomic libraries.
The present work also represents the first overall evaluation of the osmoprotectants in a higher plant comparing the prevalence of genes encoding enzymes of osmoprotectants biosynthetic pathways in sequence databanks with different backgrounds considering tissues, stages, stress conditions and also molecular approaches used to generate transcripts (ESTs, subtractive, cDNA full length, HT-SuperSAGE, BACs, etc.). In this aspect soybean offers one of the most abundant data sources for such an evaluation in legumes (see Benko-Iseppon et al. ), due to its importance as a source of food and oil in our planet.