An effective and robust method for tracking multiple fish in video image based on fish head detection
© The Author(s). 2016
Received: 31 May 2015
Accepted: 13 June 2016
Published: 23 June 2016
Fish tracking is an important step for video based analysis of fish behavior. Due to severe body deformation and mutual occlusion of multiple swimming fish, accurate and robust fish tracking from video image sequence is a highly challenging problem. The current tracking methods based on motion information are not accurate and robust enough to track the waving body and handle occlusion. In order to better overcome these problems, we propose a multiple fish tracking method based on fish head detection.
The shape and gray scale characteristics of the fish image are employed to locate the fish head position. For each detected fish head, we utilize the gray distribution of the head region to estimate the fish head direction. Both the position and direction information from fish detection are then combined to build a cost function of fish swimming. Based on the cost function, global optimization method can be applied to associate the target between consecutive frames. Results show that our method can accurately detect the position and direction information of fish head, and has a good tracking performance for dozens of fish.
The proposed method can successfully obtain the motion trajectories for dozens of fish so as to provide more precise data to accommodate systematic analysis of fish behavior.
KeywordsFish detection Fish tracking Global optimization Occlusion
Video based fish behavior analysis has become a hot research topic thanks to recent advances in computer vision methods [1–6]. To achieve such goal, it is necessary to first obtain the trajectory data for each fish by tracking their waving bodies, and then perform various statistical computing to discover interesting motion patterns and underlying rules. The robustness and accuracy of the tracking system can directly influence the effectiveness of behavior analysis. Therefore, fish tracking is the key step in the analysis of fish behavior. Because the fish body is not rigid, its shape changes during swimming; in addition, fish often mutually occlude, which has brought great difficulties for the fish tracking in video image.
Existing fish tracking methods are mainly based on motion information [7–10]. It predicts the position of the fish at the next moment by analyzing the motion state of each detected fish. Such method can track a large number of fish at the same time, but the tracking accuracy and stability are not good. In order to solve this problem, Pérez-Escudero et al.  put forward a tracking method based on appearance information. They conduct an appearance analysis for each detected fish to obtain the feature of “fish fingerprints”, and then associate with the targets that have the same “fish fingerprints” in different frames to obtain their motion trajectories. This method can correctly identify each individual even after crossings or occlusions, and can be applied to track different kinds of animals; but when the number of tracked objects is large, the identification error may occur due to the similarity of appearance between objects. Therefore, it is not suitable for tracking a large number of fish. There are usually lots of targets in the analysis of group behavior, so it is better to choose the tracking method based on motion information.
Detection error, motion prediction and mutual occlusion are the three most challenging problems for tracking methods based on motion information. Detection error usually includes two categories: missing detection and error detection, and they can directly affect the accuracy of follow-up tracking. The missing detection is inevitable because there is occlusion among fish in the process of their movements. In order to improve the tracking performance, the error detection rate must be minimized. Furthermore, due to the randomness of fish movements, it is difficult to analyze all of their motion state accurately by using a single motion model, and if a mixture motion model is used, though it can improve the tracking performance to some degree, meanwhile it also increases the tracking difficulty and complexity, which is not conducive to the realization of tracking. Finally, the mutual occlusion of fish can result in missing detection, and the longer the occlusion time is, the longer the missing detection time will be, which can cause the fragmentation of motion trajectory and thus degrade the tracking performance.
Our observation indicates that although fish movements are random, there is a good motion consistency to maintain the continuity of position and direction of the same target between consecutive frames. As long as the error detection rate in the detection phase can be decreased and meanwhile the direction of fish movements can be gotten, the targets between consecutive frames can be associated according to the position and direction information even without the use of motion prediction. Based on the above analysis, we put forward a multiple fish tracking method based on fish head detection. It has the following characteristics: (1) It can simultaneously detect the position and direction information of fish head and has a low error detection rate; (2) Without the use of motion model to conduct the motion prediction for fish, it greatly simplifies tracking processes; (3) It can better solve the occlusion problem in fish swimming and improves the tracking stability. The experimental results show that it can conduct a motion tracking for dozens of fish and has a good tracking performance.
Moving region segmentation
Centerline extraction procedures will be performed on moving regions obtained in the previous step. There are a number of existing methods for extracting the centerline of a region. In order to efficiently describe belt-like fish body, the augmented fast marching method (AFMM)  is adopted to extract the centerline. The basic idea of the AFMM is to construct an active narrowband in the peripheral image region. The arrival time U of the internal points of the active narrowband is undefined; the current spreading boundary transmits inward by using a reverse difference scheme and as the point spread, they are frozen at arrival time U, and then construct a new moving narrowband.
The centerline can be regarded as points that collapse when the edge of regions is propagated in the AFMM. Each propagated point has a source point at the edge. Hence, centerline of region can be extracted by locating points at edge corresponding to each of the propagated points.
When tracking occlusions, if head branches are removed, the head cannot be detected in centerline. In this case, based on the association rule (see “Global optimization association” section), tracking of this head is halted and the state of the head from the previous frame is maintained. Head branches will grow larger in subsequent frames when the fish moves. The head will continue to be tracked when the head branch occurs in centerline.
Head endpoint determination
Centerline characterizes the primary shape structure of a top view fish image and the endpoints of the curve indicate the positions of the fish head and tail. No matter how the centerline changes its shape, the obtained endpoints are usually located in the region of fish head or tail. However, because occlusions will cause complex shape variations during movement of fish, in rare cases, not only the head and tail of the fish, but other body parts may also have branch endpoints in centerlines. In order to remove these endpoints, we set a length threshold T l to filter all endpoints. When the distance between the endpoint and nearest intersection point is greater than T l , the endpoint will be regarded as the endpoint of the fish head or tail. Otherwise, this endpoint should be removed.
Head direction estimation
The previous step obtains the endpoint of the fish head which provides position information. In order to better analyze fish’s motion behavior, we estimate fish head direction by performing multiscale analysis of the Hessian matrix.
The Hessian matrix of the image function describes local structural information. Its eigenvalues and eigenvectors can be used to indicate the curvature and direction in the regional orthogonal direction [14, 15]. With this characteristic, the Hessian matrix of the head endpoint is used to estimate the direction of the head region.
After the detection of fish head, it becomes possible to track the fish based on the detected information. However due to the randomness and the frequent occlusion when fish swims, it’s very hard to obtain accurately the motion trajectory of fish based on the common motion prediction method that has been used in the multi-target tracking. In order to solve this problem, we propose a target association method according to the continuity of motion information between consecutive frames. First we construct a cost function of fish swimming according to the position and direction information of fish head. Then the cost function values are calculated based on the cost function for different targets between consecutive frames. Finally based on the global optimization of cost function value, the targets in the consecutive frames can be associated directly to get their motion trajectory.
Cost function calculation
Global optimization association
m = n: Two consecutive frames have the same number of targets and are fully matched.
m > n: The current frame has more targets than the previous frame. This implies that new targets occurred, resulting in more associated targets. In this case, among the m targets in the current frame, n targets are chosen based on Equation (10) to associate with the previous frame. The remaining m-n targets are ignored to ensure the number of associated targets in the current frame equals n.
m < n: The current frame has fewer targets than the previous frame. This means that tracked targets disappeared, resulting in fewer associated targets. In this case, all targets of the current frame are first associated with the previous frame based on Equation (10). As for the non-associated n-m targets in the previous frame, we maintain the state of these targets from the previous frame for the current frame to ensure the number of associated targets in the current frame equals n.
The above steps can be followed to guarantee that two consecutive frames have the same number of associated targets. In the tracking process, consider that the number of targets N stays constant, if the number of targets in the initial frame is equal to N, the number of associated targets for all subsequent frames remains the same. This method solves the problem of trajectory breaks that occur due to track merging and splitting when fish occlude.
Results and discussion
D1: 20 zebrafish are put in strong indoor lighting condition and the tank size is 20 cm × 20 cm filled with water of 3 cm deep. The video frame rate is 40 frames per second.
D2: 40 zebrafish are put in ordinary indoor lighting condition and the tank size is 30 cm × 30 cm filled with water of 3 cm deep. The video frame rate is 30 frames per second.
The shooting equipment is Flare 4 M180-CL high-speed camera with the image resolution being 2048 × 2040 pixels. All videos are top-view shooting with the length of zebrafish being 1.5-3 cm. From each group of video sequence, 2000 frames of images in which fish behaviors are active and the phenomenon of occlusion exists, are selected as the final test sets.
The average correct detection rate (ACDR), average error detection rate (AEDR), average occlusion detection rate (AODR) and average direction error (ADE) are used as detection evaluation criteria, and their definitions are as follows: (1) Average correct detection rate: a ratio of the total number of correctly detected targets to the total number of targets; (2) Average error detection rate: a ratio of the total number of wrongly detected targets to the total number of targets; (3) Average occlusion detection rate: a ratio of the total number of correctly detected occlusion targets to the total number of occlusion targets; (4) Average direction error: the average error between the detected direction of correctly detected targets and the direction of the reference line. (The vertexes of fish head where all test sets are calibrated manually. The straight line which connects the detected position and the vertex is considered as the reference line.)
The mostly tracked trajectories (MTT), partially tracked trajectories (PTT) and times of identity switches (TIS) are used as tracking evaluation criteria , and their definitions are as follows: (1) Mostly tracked trajectories: the number of trajectories which are tracked for more than 80 % of ground-truth trajectories; (2) Partially tracked trajectories: the number of trajectories which are tracked between 20 % and 80 % of ground-truth trajectories; (3) Times of identity switches: the times of the identity exchange of the targets in the tracking trajectories.
There are four threshold values to be set for detection, which are T g , T u , T l and T w respectively. For the threshold values T g and T l , their value is mainly determined by the size of the fish body in the image. The more pixels the fish body includes, the larger the value is and the fewer pixels the body includes, the smaller the value is. For the threshold T u , the value reflects how the centerline can describe the regional structures and the larger the value is, the less details can be obtained, the weaker the ability to handle the occlusion is, and the stronger the robustness is; the smaller the value is, the more details can be obtained, the stronger the ability to handle the occlusion is and the weaker the robustness is. For the threshold T w , its value is determined according to the average value of the width between the head and tail endpoints in the unblocked image.
Five parameters need to be set for tracking, which are ω, pc max, dc max, T o and N. As for ω, its value is determined according to the influence of the position and direction change rate of the fish head between consecutive frames on fish movements. As for pc max and dc max, their values are determined according to the max change of the position and direction of fish head between consecutive frames. As for T o , its value is determined according to the maximum occlusion distance of the target in images. The longer the distance is, the bigger its value will be, and the smaller on the contrary. As for N, its value is equal to the number of fish in each group.
Parameter settings in the test process
D1 (20 fish)
D2 (40 fish)
Detection performance on different groups
Number of occlusions
D1 (20 fish)
D2 (40 fish)
Blob detection and filtering 
D1 (20 fish)
D2 (40 fish)
Tracking performance on different groups (GT: ground truth)
In the nearest neighbor association method, as the direction information of fish is not considered, its ability to deal with occlusion is weak and many trajectory fragments occur. Furthermore, there is also a huge increase in the times of identity switches, which greatly degrades its tracking performance.
In the method in , as the error detection rate is relatively high and in order to reduce the influence of error detection on the tracking performance, the strategy to combine motion prediction with feature matching is used for tracking. However, this not only increases the complexity of tracking, but also may have the problem of mismatching. While in the proposed method, as the error detection rate is relatively low, the position and direction information of targets can be directly used to conduct data association, which not only simplifies the process of tracking, but also improves the tracking performance. Thus, it can be seen that as long as the detection performance can be ensured and the direction information of fish movements can be gotten, the strategy of direct tracking without motion prediction is also feasible and effective.
IdTracker performs better than the proposed method in D1, especially when there are only two identity switches, which shows the excellent tracking performance of idTracker. In D2, the increased number of fish adds to the difficulty of identity recognition. In this scenario, idTracker degrades, while the proposed method performs better and its performance changes slightly compared with idTracker. Therefore, the proposed method can still track targets effectively even when there are more tracked targets.
A video image fish tracking method based on fish head detection is proposed in this paper. Its contributions can be summarized as follows: (1) It takes full advantage of the most significant appearance characteristic of fish in images, that is, it analyzes fish head, and the position and direction information of fish head can be detected accurately and reliably; (2) According to the position and direction information of fish head and without motion prediction, the data association can be done directly for targets through the use of global optimization method. The experimental results show that the proposed method has a good tracking performance. Moreover, it can be seen from the experimental results that although fish movements are random, there are still some rules between the position and direction for the same target in adjacent images. A simple relationship is obtained by using only a small amount of data in this paper. Next, we will reveal these rules through more experiments.
ACDR, average correct detection rate; ADE, average direction error; AEDR, average error detection rate; AFMM, augmented fast marching method; AODR, average occlusion detection rate; DoH, Determinant of Hessian; GT, ground truth; IACUC, Institutional Animal Care and Use Committee; MTT, mostly tracked trajectories; PTT, partially tracked trajectories; TIS, times of identity switches
We are grateful to the anonymous reviewers whose suggestions and comments contributed to the significant improvement of this paper.
The research work presented in this paper was supported by National Natural Science Foundation of China (Grant No. 61175036 and No. 61363023). The funding bodies had no role in the design of the study and collection, analysis and interpretation of data and in writing the manuscript.
Availability of data and materials
The results supporting the conclusions of this article are included in the Additional files.
ZMQ and YQC conceived the method. ZMQ, SHW and XEC performed the experiments and analyzed the data. ZMQ and YQC wrote the paper. All authors have read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
All experimental procedures were in compliance with the Institutional Animal Care and Use Committee (IACUC) of Shanghai Research Center for Model Organisms (Shanghai, China) with approval ID 2010–0010, and all efforts were made to minimize suffering. This study was approved by the IACUC.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Delcourt J, Becco C, Vandewalle N, Poncin P. A video multitracking system for quantification of individual behavior in a large fish shoal: advantages and limits. Behav Res Methods. 2009;41(1):228–35.View ArticlePubMedGoogle Scholar
- Butail S, Paley DA. Three-dimensional reconstruction of the fast-start swimming kinematics of densely schooling fish. J R Soc Interface. 2012;9(66):77–88.View ArticlePubMedGoogle Scholar
- Delcourt J, Denoël M, Ylieff M, Poncin P. Video multitracking of fish behaviour: a synthesis and future perspectives. Fish Fish. 2013;14(2):186–204.View ArticleGoogle Scholar
- Martineau PR, Mourrain P. Tracking zebrafish larvae in group–Status and perspectives. Methods. 2013;62(3):292–303.View ArticlePubMedPubMed CentralGoogle Scholar
- Tunstrøm K, Katz Y, Ioannou CC, Huepe C, Lutz MJ, Couzin ID. Collective states, multistability and transitional behavior in schooling fish. PLoS Comp Biol. 2013;9(2), e1002915.View ArticleGoogle Scholar
- Dell AI, Bender JA, Branson K, Couzin ID, de Polavieja GG, Noldus LP, et al. Automated image-based tracking and its application in ecology. Trends Ecol Evol. 2014;29(7):417–28.View ArticlePubMedGoogle Scholar
- Kalman RE. A new approach to linear filtering and prediction problems. J Fluids Eng. 1960;82(1):35–45.Google Scholar
- Reid DB. An algorithm for tracking multiple targets. IEEE Trans Autom Control. 1979;24(6):843–54.View ArticleGoogle Scholar
- Fortmann TE, Bar-Shalom Y, Scheffe M. Sonar tracking of multiple targets using joint probabilistic data association. IEEE J Ocean Eng. 1983;8(3):173–84.View ArticleGoogle Scholar
- Fisher JL, Casasent DP. Fast JPDA multitarget tracking algorithm. Appl Opt. 1989;28(2):371–6.View ArticlePubMedGoogle Scholar
- Pérez-Escudero A, Vicente-Page J, Hinz RC, Arganda S, de Polavieja GG. idTracker: tracking individuals in a group by automatic identification of unmarked animals. Nat Methods. 2014;11(7):743–8.View ArticlePubMedGoogle Scholar
- Cucchiara R, Grana C, Piccardi M, Prati A. Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell. 2003;25(10):1337–42.View ArticleGoogle Scholar
- Telea A, Van Wijk JJ. An augmented fast marching method for computing skeletons and centerlines. In: Proceedings of the symposium on Data Visualisation 2002. Aire-la-Ville, Switzerland: Eurographics Association; 2002. p. 251–ff.Google Scholar
- Lindeberg T. Feature detection with automatic scale selection. Int J Comput Vis. 1998;30(2):79–116.View ArticleGoogle Scholar
- Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-up robust features (SURF). Comput Vis Image Underst. 2008;110(3):346–59.View ArticleGoogle Scholar
- Kuhn HW. The Hungarian method for the assignment problem. Naval Res Logist Q. 1955;2(1–2):83–97.View ArticleGoogle Scholar
- Wu B, Nevatia R. Tracking of multiple, partially occluded humans based on static body part detection. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference. Washington, DC, USA: IEEE Computer Society; 2006. p. 951–8.Google Scholar
- Qian Z-M, Cheng XE, Chen YQ. Automatically Detect and Track Multiple Fish Swimming in Shallow Water with Frequent Occlusion. PLoS One. 2014;9(9):e106506.View ArticlePubMedPubMed CentralGoogle Scholar
- Bar-Shalom Y. Tracking and data association: Academic Press Professional, Inc. 1987.Google Scholar