Bioluminescence is an enchanting process in which light is produced by a chemical reaction within an organism [1, 2]. Bioluminescence is found in various organisms like ctenophora, bacteria, certain annelids, fungi, fish, insects, algae, squid, etc. Most of these organisms are found in marine, freshwater, and terrestrial habitats [2–4]. The bioluminescence mechanism involves two chemicals, namely luciferin, a substrate, and the enzyme luciferase [1, 5]. Luciferase catalyses the oxidation of luciferin, resulting in light and an intermediate called oxyluciferin. Sometimes, the luciferin catalyzing protein (the equivalent of a luciferase) and a co-factor such as oxygen are bound together to form a single unit called photoprotein. This molecule is triggered to produce light when a particular type of ion is added to the system. The proportionality of the light emission makes a clear distinction between a photoprotein and a luciferase . Photoproteins are capable of emitting light in proportion to the amount of the catalyzing protein, but in luciferase-catalyzed reactions, the amount of light emitted is proportional to the concentration of the substrate luciferins .
Different creatures produce different colors of light, from violet through red [3, 6]. The different colors of light produced are often dependent on the roles the light plays, the organism in which it is produced, and the varieties of chemicals produced. The dominant color on land is green, because it reflects best against green plants. The most common bioluminescent color in the ocean is blue. This color transmits best through sea water, which can scatter or absorb light.
Bioluminescence serves a variety of functions, but many of them are still unknown. The known functions include camouflage, finding food, attraction of prey, attraction of mates, repulsion by way of confusion, signaling other members of their species, confusing potential predators, communication between bioluminescent bacteria (quorum sensing), illumination of prey, burglar alarm, etc [3–5].
The application of bioluminescence promises great possibilities for medical and commercial advances. Bioluminescent proteins serve as invaluable biochemical tools with applications in a variety of fields including gene expression analysis, drug discovery, the study of protein dynamics and mapping signal transduction pathways, bioluminescent imaging, toxicity determination, DNA sequencing studies, estimating metal ions such as calcium, etc [7–14].
The detailed analysis of bioluminescence proteins helps to understand many of the functions which are still unknown and also helps to design new medical and commercial applications. Due to advances in sequencing technologies, huge amount of data is available in various databases . Despite tremendous progress in the annotation of protein, there are no existing online tools available for the prediction of bioluminescent proteins using primary protein sequences.
A Support Vector Machine (SVM) is a supervised learning algorithm, which has been found to be useful in the recognition and discrimination of hidden patterns in complex datasets . SVM has been successfully applied in various fields of computational biology, e.g., protein sequence/structure analysis, micro-array and gene expression analysis [16–18].
In this work, we present a novel prediction method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. So far, bioinformatics and statistical learning methods like Support Vector Machine and Random Forest have not been explored for the prediction of bioluminescent proteins. In this paper, we report a SVM approach to identify bioluminescent proteins from sequence information, irrespective of the sequence similarity.