A preliminary study on automated freshwater algae recognition and classification system
© Mosleh et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
Skip to main content
© Mosleh et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
Freshwater algae can be used as indicators to monitor freshwater ecosystem condition. Algae react quickly and predictably to a broad range of pollutants. Thus they provide early signals of worsening environment. This study was carried out to develop a computer-based image processing technique to automatically detect, recognize, and identify algae genera from the divisions Bacillariophyta, Chlorophyta and Cyanobacteria in Putrajaya Lake. Literature shows that most automated analyses and identification of algae images were limited to only one type of algae. Automated identification system for tropical freshwater algae is even non-existent and this study is partly to fill this gap.
The development of the automated freshwater algae detection system involved image preprocessing, segmentation, feature extraction and classification by using Artificial neural networks (ANN). Image preprocessing was used to improve contrast and remove noise. Image segmentation using canny edge detection algorithm was then carried out on binary image to detect the algae and its boundaries. Feature extraction process was applied to extract specific feature parameters from algae image to obtain some shape and texture features of selected algae such as shape, area, perimeter, minor and major axes, and finally Fourier spectrum with principal component analysis (PCA) was applied to extract some of algae feature texture. Artificial neural network (ANN) is used to classify algae images based on the extracted features. Feed-forward multilayer perceptron network was initialized with back propagation error algorithm, and trained with extracted database features of algae image samples. System's accuracy rate was obtained by comparing the results between the manual and automated classifying methods. The developed system was able to identify 93 images of selected freshwater algae genera from a total of 100 tested images which yielded accuracy rate of 93%.
This study demonstrated application of automated algae recognition of five genera of freshwater algae. The result indicated that MLP is sufficient, and can be used for classification of freshwater algae. However for future studies, application of support vector machine (SVM) and radial basis function (RBF) should be considered for better classifying as the number of algae species studied increases.
Algae have been long been used to assess environmental conditions in aquatic habitats throughout the world . Algae respond to wide range of pollutants. They provide an early caution signal of worsening ecological condition. They are highly sensitive to changes in their environment and therefore a good indicator . Shifts in abundance of algal species can be used to detect environmental changes, and also to indicate the trophic status and nutrient problems in lake . Nutrient stimulation of algal growth made algae part of the problem in the eutrophication of lakes, and trophic status of lakes can be monitored by algal taxa found in them.
Algae from the division of Bacillariophyta and Chlorophyta especially the desmids (e.g., Scenesdesmus) are highly sensitive to changes in the environmental parameters that could be considered as a bio-indicator for monitoring water quality [4–6]. However, several species of algae are capable to produce potentially harmful toxins as unpleasant taste and odour. Chlorophytes are often abundant in eutrophic lakes. Blooms of Staurastrum have created grassy odour problems. Navicula is a member of the group of algae called Bacillariophyta. The hard cell walls of Navicula do not decompose even when the cells die. The remaining skeletons of the cells create problems when they clog the filters at water treatment plants. Cyanobacteria are known to produce nuisance blooms in eutrophic waters. Furthermore, some species of cyanobacteria contributes to toxin, taste, and odour problem in water. Some types of cyanobacteria such as Microcystis, and Anabaena are toxin and odour producing. Cyanobacteria has become a critical problem over worldwide because of it is toxicity, and it is widely spread in eutrophic lakes. Surveys studies carried out in different countries demonstrated that about 75% of lake water samples contain toxic cyanobacteria [7, 8]. Moreover, cyanobacteria as a control parameter for water quality was included and recommended to be as a factor of risk assessment plans and safety level such as World Health Organization (WHO) and several national authorities worldwide [9–11].
However, identification of algae presents a problem in their taxonomy and the application of the organisms in environmental studies. Several studies reported the conventional identification of algae by using microscopy images is time consuming with the general decline in competent algae taxonomists. This has led many researchers to develop several systems to automate the analysing and classifying algae images [12, 13]. An automated computer-based recognition and classification system for rapid identification of microorganisms such as many algae will certainly reduce the burden of routine identifications borne by taxonomist whose service are needed in biodiversity studies . ANN based automated algae recognition is advantageous due to its learning capability from a given dataset, and it does not require a rule base to determine outcome. ANN is also capable to perform mapping arbitrarily between input and outputs. It can also be used in a wide variety of domains for classification, prediction, approximation, and clustering. It is also resistant to noise in the input data. ANN has been successfully applied for classification of two co-occurring species of Ceratium by applying the back propagation learning method with three hidden layers [15–17]. ANN has also been used widely to identify different type of algae species of lake water samples, and microorganism. Several researches were extracted a set of suitable features of algae images such as Fourier descriptors, geometrical features, and features characterizing of grey level distribution in a region to use it for training process of ANN [18, 19]. Different types of ANN have been employed to classify algae images such as feed-forward multilayer, back propagation error, Radial Basis, and support vector machine. For example, support vector machine (SVM) as a type of ANN had been used together with radial basis function kernel to distinguish between 241 species of marine phytoplankton with 89% accuracy . Research reported that recognition accuracy rate is mainly depends on image segmentation process, selected features to be extracted, and the classifier type or the type of ANN. Research used many segmentation methods for detecting algae objects in microscope images, a large variety of features had been extracted to enhance the recognition process including geometrical feature, colour features, and textures features. Geometrical feature is given measurement parameters about the object shape such as size, length, width, and texture features includes some features about image such as moments varying, image histogram, image texture, and image spectrum .
However, most efforts for automated analysis and identification of algae images were limited to some specific type of algae division only. This is because of the difficulties in implementation of an application that can detect all types of algae division due to the variation found in algae shapes, properties, and colours. So far, only a few or limited studies exist on automated identification of tropical freshwater algae .
Therefore, this study is an early attempt to devise an automated recognition and classification system for several common algae. A combination of image processing with ANN approaches used to automatic detection and recognition of some selected freshwater algae genera. These algae were from the divisions of Bacillariophyta(Navicula), Chlorophyta (Scenedesmus) and Cyanobacteria (Chroococcus, Microcystis and Oscillatoria) found in tropical Putrajaya Lake. Although this lake is a mesotrophic lake, there is a need to monitor changes in its water quality as socio-economic developments take place in surrounding areas. Automated recognition and classification system for algae will be one of the several tools to be developed for monitoring algae diversity of and hence, water quality changes, the lake. This study is also an extension of previous studies by other workers who focused on certain algal taxa only.
Putrajaya Lake is a man-made freshwater lake. The lake, which covers an area of 650 ha, is located at the new capital city of Malaysia known as Putrajaya. The lake was constructed to provide a landscape feature and varied recreational activities for the city population as well as creating wildlife habitats . Putrajaya Lake is warm polymictic, oligotrophic to mesotrophic, and is located at the south of the densely inhabited Klang Valley, Malaysia. Major inflows from upstream outside surrounding areas contain certain level of pollutants. Nutrient loading at the lake are mainly come from non-point sources. These include the use of agrochemicals, fertilizer, land clearing, and soil leveling at the surrounding areas. Freshwater algae images used in this work have been captured from water samples collected from different locations at Putrajaya Lake, Malaysia. Water samples were analyzed and examined by using electronic microscope Manufactured by Thermo fisher scientific company model(MTC#B1-220ASA), and freshwater algae images were transferred to digital storage devices by using a Dino-Eye Eyepiece camera Manufactured by Dutech scientific company model (AM423X) which attached to the microscope lens, and connected with personal computer via USB port for image acquisition.
Captured images are uploaded to the system using graphical user interface (GUI).
Contrast enhancement was performed to enhance uploaded images, to remove dark area, to increase image brightness, and to make images clearer. Histogram Equalization is applied to enhance the contrast of the color image intensity, before the image is transferred to gray scale image . The frequency occurrence of the pixel intensities was given by the histogram and mapped to a uniform distribution. This step was performed to improve the appearance of the images in terms of the image contrast.
Image converted from gray scale to binary image, and image complements obtained to produce image background in black color and image objects in white color.
Median filter (size 3 × 3) was used to reduce image noise, and to preserve edges. Some unwanted area and small objects were removed when the median filter was applied.
Gaussian filter was applied to smooth the image. It was used with a specified standard deviation, σ, to reduce noise.
Then the non-maximal suppression in the gradient magnitude image was used to give a thin line, which was the ridge of the edge points determined in (2). The ridge pixels were then threshold.
Finally, the algorithm performed edge linking by incorporating the weak pixels that were connected to the strong pixels.
Then, essential morphological operations performed on binary images such as image border removal, filling of boundary area, and exclusion of any small region that are < 50 pixels. Morphology operation is a set of image processing operations that process images based on shapes. In our system, we used dilation and erosion which considered the most basic morphological operations. To overcome with the problem of objects overlapping, each object was counted as a single item by the image analysis process, therefore it was necessary to separate individual objects. Regions with a maximum length of the rectangle fully enclosing the region > 50 pixels in length and perpendicular of at least 50 pixels were copied to a new binary image. These regions typically represented overlapping objects and the process resulted in their separation from isolated objects and from other regions.
Feature extraction used to transform binary and colour image from the pre-processed stage into a set of parameters that described the algae features. Feature extracted from the pre-processed algae image using both binary and colour image include: shape, area, minor and major axis, perimeter and Fourier spectrum with principal component analysis (PCA). The details of each extracted feature are described as follows:
This routine is designed to align the rotated shape into horizontal lines which ease the feature extraction process, and also improve the accuracy and performance of recognition process.
Where Rc is the ratio at column c, Wc is the width of object at column c, and L is representing the object length as shown in Figure 5(b). Object width factor results then normalized to obtain five features only.
Perimeter: The perimeter of object was the summation of the distance between each adjoining pair of pixels around the object border; it is shown in red pixel in Figure 6(b). It included in our features because it gives an indication about the image object size.
Fourier spectrum with PCA feature extraction: Fourier spectrum was applied to extract some texture feature for increasing the accuracy of the image detection. Fourier spectrum is ideally suitable for describing the directionality of periodic or almost periodic two-dimensional patterns. The spectrum features are expressed in polar coordinates to yield a function S (r, θ). Radius function (P 1 (r)) and angle function (P 2 (ø) obtained by annularity sampling of the function S (r, θ) are one-dimension functions. Radius function, (P 1 (r)), reveals energy distribution information with different frequency.
Feature sets extracted from an algae object may contain some redundant feature. PCA approach is used widely in most image processing application to reduce the number of features by normalization process. It has de-correlation ability that serves to de-correlate redundant features, and its energy packing property serves to compact useful information into a few dominant features . The PCA algorithm is also used to reduce and summarize the extracted features of the Fourier Spectrum method by removing redundancies. Eight Eigen value extracted and included in our feature extraction process.
Multilayer perceptron network (MLP) trained with back propagation error algorithm ANN was used to perform classification on extracted feature vectors . These types of ANN are widely used for pattern recognition and classification. In this study one hidden layer feed forward neural network was chosen mainly because it has been proven that such a topology can approximate any continuous function [30–32]. Devilliers and Barnard  found that the use of two hidden layers was only justified for the most esoteric applications. The hyperbolic tangent transfer function was used as recommended by most of researchers. The ANN architecture consists of three layers, the input layer which has 21 input nodes - hidden layer include 8 nodes and output layer include 5 nodes. The standard root mean squared error function (RMSE) was used to assess network performance, and a momentum value of 0.05 was set based on trial and error. With the above parameters fixed, optimal step sizes taken in weight space were a function of the learning rate of 0.05 with an epoch size of 400.
Extracted feature of algae used in this study.
F1, F2, F3
Shape index, Major Axis, Minor Axis
F6, F7, F8
Minor/Major, Area/Major, Perimeter/Major
Object Width Factor Strips
Fourier Spectrums Normalized by PCA
During the training phase the input data and desired responses were fed into the network. The network uses momentum learning algorithm to determine the weights in the network and after each presentation the weights were adjusted to minimize the error between desired and actual output. As training progressed the error between the desired response and the network output dropped towards zero. As an MLP with hidden layers could be approximate virtually any input-output map, it was possible that a network could have been over-trained, i.e. a network that classified the training data perfectly but unable to generalize and classify new 'unseen' data. To improve generalization, 10% of the input data was set aside for cross validation. The training was stopped when the error in the cross validation dataset began to increase. Testing dataset was then used to avoid biasness in result. This was a set of images that are not used for training the ANN.
Comparison results between manual and automated classification process for testing dataset.
Confusion matrix for testing dataset
No. of Test samples
System recognition results
In this study, we selected specific fresh water algae which impacted strongly the water quality. For example, in different studies performed in Malaysia to asses eutrophication status for 90 lakes, they reported that 56 lakes or 62% were eutrophic or in bad situation which requires immediate rehabilitation and restoration, also they found that the other 34 lakes which represent 38% of the study is classified as mesotrophic [34–36].
The main objective for this study was to develop a computer system to identify, and classify some types of algae. The system is designed and implemented in Matlab environments with friendly interfaces that make it easier for users. System accuracy and performance were calculated by comparing the automated and manual comparison for testing datasets, and by calculating the time of training and recognition process. The automated procedure for training process takes approximately 5 minutes; and the time required for identifying and classifying of input images is varying between 1 to about 1.5 minute. The comparison between the manual and automatic classification of each object found on a particular image which has been identified and extracted resulted in discarding of the of the overlapping images. The highest accuracy rate was achieved for identification of Scenedesmus as this alga has the most distinct feature compared to the other algae genus used in this study. Meanwhile Chroococcus has the lowest classification rate because of the process for separation resulted in the production of some short, irregularly shaped image region representing the algae. Microcystis which is circular in shape is difficult to distinguish because these algae exist in colonies and the images captured are prone to overlapping which cause the MLP to misclassify the algae to unidentified. The accuracy rate for Navicula and Oscillatoria can be misclassified with each other by automated system as their spiral shape seems similar for the classifier and extracted feature for both of them matching in some parameters. MLP was used in this study instead of SVM or RBF because the data utilized in this study are limited to small number of algae and also limited numbers of extracted features were used. The limited number of feature has been utilized in this study because of the selected features are sufficient to detect and classify selected algae used in this study with considerably high accuracy rate. Furthermore, MPL performs faster as compared to the other types of ANN when data volume is not an issue as the number of algae increases with the number of extracted features SVM and RBF are more suitable option. The overall system accuracy of developed system depends essentially on the ability of system to detect object within input image and the ability of the classification system to identify the detected object based on the extracted feature. Accuracy rate achieved in this study is acceptable and consider higher rate if compared with other similar studies. The system is developed essentially to support the process of monitoring water quality by detection some selected freshwater algae in Putrajaya Lake. Results showed that system able to achieve such tasks by providing the necessary data about the density and gens of selected algae.
In this paper, we presented an image processing techniques with ANN approach to identify and classify selected genus of freshwater algae from three different divisions of fresh water algae which varies in sizes and shapes. This study illustrated that computational recognition approach is important for freshwater algae, and prove that the classifying process is feasible for automatic identification of the selected freshwater algae. The better accuracy resulted was obtained due to the well preprocessing used techniques, and also due to the specific features selected during extract feature process. In addition, system reliability was dependent more on the combination of approaches used for image pre-processing, segmentation approach used, well selected features, and the training of data set. Testing results also showed that developed system was reliable to be used for monitoring water quality of Putrajaya Lake. The main limitation of our system its inability to work well with images that include a huge number of objects. We would like to solve these limitations in our future work and make the system even more robust in future studies.
This study was funded by UMRG grant of University of Malaya RG241-12AFR
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 17, 2012: Eleventh International Conference on Bioinformatics (InCoB2012): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S17.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.