 Software
 Open Access
 Published:
CosinorPy: a python package for cosinorbased rhythmometry
BMC Bioinformatics volume 21, Article number: 485 (2020)
Abstract
Background
Even though several computational methods for rhythmicity detection and analysis of biological data have been proposed in recent years, classical trigonometric regression based on cosinor still has several advantages over these methods and is still widely used. Different software packages for cosinorbased rhythmometry exist, but lack certain functionalities and require data in different, nonunified input formats.
Results
We present CosinorPy, a Python implementation of cosinorbased methods for rhythmicity detection and analysis. CosinorPy merges and extends the functionalities of existing cosinor packages. It supports the analysis of rhythmic data using single or multicomponent cosinor models, automatic selection of the best model, populationmean cosinor regression, and differential rhythmicity assessment. Moreover, it implements functions that can be used in a design of experiments, a synthetic data generator, and import and export of data in different formats.
Conclusion
CosinorPy is an easytouse Python package for straightforward detection and analysis of rhythmicity requiring minimal statistical knowledge, and produces publicationready figures. Its code, examples, and documentation are available to download from https://github.com/mmoskon/CosinorPy. CosinorPy can be installed manually or by using pip, the package manager for Python packages. The implementation reported in this paper corresponds to the software release v1.1.
Background
Many biological processes display oscillations that are under the control of different biological clocks. For example, circadian clocks display daily oscillations, i.e., with a periodicity of approximately 24 h [1], and may regulate nearly half of all genes in a genome of an organism [2, 3]. Disrupted circadian rhythms might have several health implications, such as cardiovascular diseases, diabetes, and immune deficiencies [4]. Analysis of circadian data, especially in the combination with different omics approaches, thus increases our understanding of disease occurrence and progression. A vast amount of research has been devoted to the analysis of circadian rhythms in recent years [5]. We should as well strive towards the integration of such analyses into the clinical work for disease diagnostics, treatment, and prevention [6].
Detection and analysis of rhythmicity requires designated computational approaches. These approaches are focused on the identification of rhythmic datasets that correspond to specific biological entities (e.g., genes) and evaluation of their rhythmicity parameters. Several nonparametric methods for circadian data analysis have been proposed recently, such as JTK CYCLE and its extensions [7,8,9] and RAIN [10]. Even though nonparametric methods have several benefits, e.g., robustness to noise in the data, classical harmonic regression should still be used when rhythmicity parameters, such as oscillation amplitudes and acrophases, need to be evaluated, or when the noise in the data is nonGaussian [10]. Moreover, nonparametric methods often fail or do not perform well when data are (1) collected at irregular intervals, (2) without replicates, (3) unbalanced (i.e., more samples are collected at one time of a day compared to others), (4) full of outliers, and (5) very large. However, cosinor has been successfully applied even in such cases (see, e.g., [11,12,13]).
Cosinor presents a fundamental method for rhythmicity detection and analysis using cosine curve fitting [14, 15]. It is based on a trigonometric regression model
where t corresponds to the observed time points in the time series, N is the number of components in the model, namely number of cosine curves, and \(A_{i,1}\), \(A_{i,2}\), C and P are the parameters of the model, with M being the MESOR (Midline Statistic Of Rhythm), P the period of the observed rhythm, and e(t) the error term [15]. When the period is known the model can be converted to a linear regression model
where \(x_{i,1} = \sin \left( \frac{t}{P/i} \cdot 2 \pi \right)\) and \(x_{i,2} = \cos \left( \frac{t}{P/i} \cdot 2 \pi \right)\). When the period is not known, different periods within the feasible period ranges can be tested, or period detection methods such as periodograms, can be used for period estimation [14].
Cosinor has been widely applied to the analysis of rhythmicity detection and evaluation of rhythmicity parameters in time series data. Different software packages for cosinorbased rhythmometry, such as cosinor [16], cosinor2 [17], and DiscoRhythm [18], have been introduced in recent years. However, these packages lack certain functionalities, such as multicomponent cosinor regression and analysis, and do not support automatic identification of the best regression model. Moreover, the user must format the input data for each of these tools in a different manner. Herein, we describe CosinorPy, a Python package that merges and extends the functionalities of existing software packages. CosinorPy can as well be used to generate synthetic data, and supports different input and output formats compatible with other software packages for rhythmicity detection and analysis. It provides all functionalities required for the analysis of rhythmic data from data import and preprocessing to removal of outliers, identification of oscillation periods, assessment of the most suitable models and their statistics, analysis of differential rhythmicity, and data plotting, reporting, and export. Moreover, it implements functions to estimate the required number of samples to obtain the results with a predefined statistical significance and can thus as well be used to guide experimental work [19]. CosinorPy is currently the only cosinor package that is implemented in Python, which has become increasingly important in data science, as well as in the field of bioinformatics in recent years. A comparison of features of currently available cosinorbased software packages is provided in Table 1.
Implementation
CosinorPy is implemented in Python and relies on the stateoftheart Python packages for data management, scientific computing, visualisation, and statistical modelling, namely pandas, NumPy, SciPy, Matplotlib, and statsmodels. CosinorPy is comprised of three Python modules, namely file_parser, cosinor, and cosinor1. The file_parser module implements reading and writing of xlsx and csv files and generating synthetic data. The cosinor module implements the functionalities based on a single or multicomponent cosinor model that include model fitting, identification of the most suitable model, and analysis of differential rhythmicity. The cosinor1 module implements similar functionalities, which are adapted to a singlecomponent cosinor model, and thus provide more exhaustive results and additional statistics such as the significance of acrophase shifts within the differential rhythmicity analysis. The implementation of specific functionalities within the package are described below. Thorough documentation of the package is available at https://github.com/mmoskon/CosinorPy/blob/master/docs/docs.md.
Singlecomponent cosinor
When the observed data can be accurately described with a single harmonic component, a singlecomponent cosinor model can be used:
Using this model, amplitude (A) and acrophase (\(\phi\)) can be estimated directly from the assessed parameter values as:
and
The statistical significance of the model is assessed with an Ftest on the basis of the model sum of squares and sum of square residuals [15]. The significance of rhythmicity is evaluated with the zero amplitude test, which is as well based on the Fstatistic [15, 19]. Moreover, period and acrophase significance and confidence intervals are assessed directly from the model and its underlying data [19]. The adequacy of the model can be assessed using different regression diagnostic tests. When replicates are available or when the measurements are performed for several periods, the goodness of fit of the model is evaluated with an Ftest comparing the pure error and the lack of fit sum of squares [15].
When collecting circadian data, experimentalists should follow specific guidelines to obtain statistically significant results [20]. However, if certain requirements regarding the precision of the assessment of rhythmicity parameters, e.g. a maximal acceptable length of a confidence interval, can be specified in advance, we can approximate the minimal sample size necessary to achieve such precision [19]. CosinorPy implements these functionalities and can thus be used as well during a design of experiments.
Multicomponent cosinor
When a singlecomponent cosinor model is not able to describe our data satisfactorily, e.g., when the goodness of fit test rejects the model, a multicomponent cosinor model can be considered [15]. A multicomponent cosinor model is able to describe more complex oscillatory dynamics, e.g. peak asymmetry or multiple peaks within one period, which cannot be described with a single harmonic component. Rhythmicity parameters cannot be calculated analytically from this model, but are evaluated from the fitted curve.
Additional components will always increase a model’s accuracy, but on the account of a reduced number of degrees of freedom. This might cause the overfitting of a model to the observed data. Automatic selection of the best model regarding the number of components is performed using the extra sumofsquares Ftest:
where \(SSR_1\) and \(SSR_2\) present the sum of squared residuals (SSR) for a simpler and a more complex model, respectively, and where \(DoF_1\) and \(DoF_2\) present the degrees of freedom (DoF) of a simpler and a more complex model, respectively. The more complex model, i.e., the model with a smaller DoF, is selected as more appropriate when the obtained pvalue is lower than a predefined threshold. Moreover, the model selection process can be guided with the goodness of fit measures as for a singlecomponent cosinor model.
Additional advantage of our implementation of the multicomponent cosinor regression is that it allows the user to fit a cosinor model to count data. Here, a generalised Poisson model with a logarithmic link can be used in a combination with a cosinor model to handle over as well as underdispersed data [21]. Moreover, CosinorPy allows the user to as well select Poisson or negativebinomial models for the analysis of rhythmicity of count data.
Populationmean models
When dealing with at least three individuals, and when each individual produces a series of dependant measurements which can be used to establish the individual’s cosinor model, a populationmean cosinor should be used [15]. In this case a cosinor model is fitted to each individual. The response of the whole population is described and analysed as a mean of all individual cosinor models (populationmean cosinor). When using a singlecomponent populationmean cosinor, confidence intervals of rhythmicity parameters are assessed as described in [19]. Moreover, a pvalue for the null hypothesis of the amplitude of oscillations being zero is evaluated with an Ftest for a singlecomponent populationmean cosinor as described in [15]. The statistical significance and the goodness of fit of a single or multicomponent populationmean cosinor model are assessed in a similar way as for the basic cosinor models [15].
Analysis of differential rhythmicity
Cosinor models can be used to assess the difference in the rhythmic response of two groups of measurements. Each group either corresponds to a different variable (e.g., two different genes) or to the same variable in different conditions (e.g., the same gene before and after a perturbation). We are usually interested in amplitude changes and acrophase shifts between the groups. Several different methods, which we use in our implementation, have been proposed to assess these differences.
If the data describing both groups can be modelled with singlecomponent cosinor models, a singlecomponent cosinor can as well be used to assess the differential rhythmicity of these two groups. This model is implemented as
where g equals 0 if the data belong to the group a, and 1 if the data belong to the group b. Based on the assessed parameter values, we can estimate the acrophases and amplitudes of each of the groups, as well as the differences between these values and their significance [19]. Moreover, a populationmean singlecomponent cosinor model is adapted to analyse the differential rhythmicity in a similar way [17, 19].
LimoRhyde [22] presents a similar approach that uses a cosinor model and can be adapted to use an arbitrary number of components in the following form
The significance of each parameter in this model is assessed using a Ttest, where the null hypothesis is that a parameter equals zero. When this hypothesis is rejected for any of the rhythmicity parameters of the group b, namely \(A_{i,1,b}\) or \(A_{i,2,b}\), these two groups should reflect differential rhythmicity. While a singlecomponent cosinor can be used to assess the significance of acrophase shift and amplitude change, LimoRhyde is only able to assess whether the difference in rhythmicity between the groups is significant or not.
Nonlinear regression might as well be used to evaluate the differential rhythmicity parameters and their confidence intervals as described in CircaCompare [23]. This is implemented as the following model
where \(A_a\), \(\phi _a\) and \(M_a\) present the amplitude, acrophase, and MESOR of the group a, respectively, and \(A_a + A_b\), \(\phi _a + \phi _b\) and \(M_a + M_b\) present the amplitude, acrophase, and MESOR of the group b, respectively. CosinorPy as well implements differential rhythmicity assessments based on nonlinear regression. However, this approach does not provide any additional information to the approach described in Equation 7.
Results
We demonstrate the application of selected CosinorPy functionalities on two typical case studies using four groups of synthetically generated time series data with the attributes presented in Table 2. The whole analysis is available as interactive Python notebooks (IPYNB) at https://github.com/mmoskon/CosinorPy.
Case study 1: independent measurements
In our first case study, we presume that the measurements in each group are independent. This scenario complies with a transcriptomics data analysis or an analysis of qPCR data. CosinorPy successfully identifies the most suitable model, namely a 1component model in the first two scenarios and a 3component model in the last two scenarios. If the rhythmicity period is not known, the user could as well use the automatic identification of the best fitting period together with the best fitting model, or could rely on the period values assessed using periodograms (see https://github.com/mmoskon/CosinorPy/blob/master/demo_independent.ipynb). Results obtained with the multicomponent cosinor regression are presented in Fig. 1.
Even though a 3component model is more appropriate for the last two scenarios, a good fit is obtained as well with a 1component model with slightly higher SSR values than a 3component model (see Additional file 1: Table 1 and Additional file 2: Table 2). We thus opted to perform a differential rhythmicity analysis using a 1component model to obtain more informative results, namely the significance of amplitude change and acrophase shift. CosinorPy is able to produce different plots visualising the difference between fits (see the upper part of Fig. 2), as well as acrophase shifts in a polar coordinate system (see the lower part of Fig. 2). Moreover, results of the analysis are reported in a tabular form, i.e., as a pandas DataFrame, which can be easily stored to Excel or CSV format. These results are available in Additional file 3: Table 3 and summarised in Table 3.
The same data were used in a combination with the cosinor and cosinor2 R packages [16, 17] to validate the obtained results. These two packages support only singlecomponent cosinor analyses. Moreover, the cosinor2 package builds upon the cosinor package, which unfortunately reports incorrect acrophase values [17]. Even though the cosinor2 package provides a function to correct these values, their corresponding pvalues are not updated accordingly. The analyses performed with the CosinorPy package produce the same results as cosinor and cosinor2 packages with the above mentioned exception (see Additional file 7: Table 7 and Additional file 8: Table 8).
Case study 2: populationbased measurements
In our second case study, we presume that measurements in each group belong to the same individual, which means that populationmean models should be used. This complies with, e.g., bioluminescence data, where the same cell is observed throughout the whole experiment. We can also refer to such measurements as dependent measurements.
We again use CosinorPy to identify the most suitable model, and assess the rhythmicity parameters and significance of periodicity in the data (see https://github.com/mmoskon/CosinorPy/blob/master/demo_dependent.ipynb). As in the first case study, CosinorPy is able to identify the most suitable model for each dataset (see Fig. 3). Complete results of the fitting process are available as Additional file 4: Table 4 and Additional file 5: Table 5. We again opted to use a singlecomponent cosinor to perform the comparison analysis. Results of this analysis are available in Additional file 6: Table 6 and summarised in Table 4.
We validated the obtained results using the cosinor2 R package [17]. The populationbased tests implemented within this package do not rely on the cosinor R package. The reported acrophases and their corresponding pvalues thus fully comply with the results obtained with the CosinorPy package (see Additional file 9: Table 9 and Additional file 10: Table 10).
Case study 3: additional benefits of the multicomponent cosinor
To additionally investigate the benefits of multicomponent cosinor models we applied CosinorPy to a larger dataset downloaded from the JTK Cycle repository [7]. We analysed these data with both, singlecomponent cosinor, as well as multicomponent cosinor models with up to three components (see https://github.com/mmoskon/CosinorPy/blob/master/multi_vs_single.ipynb). Among 250 measurements 95 measurement were identified to be circadian using multicomponent cosinor models. Among these, 11 measurement were not identified to be circadian using a singlecomponent cosinor model. In all these 11 cases data reflected multiple peaks within a 24hour period, which cannot be fitted with a singlecomponent model. Multiple peaks were successfully incorporated into multicomponent models. However, in some of these cases the statistical significance was marginal, and additional data should be collected to confirm the circadian nature of observed measurements. In the future, largescale analyses that were performed with singlecomponent cosinor models in the past should be revised using multicomponent cosinor models. This could enable us to detect additional rhythmic genes and would thus provide novel insights into circadian dynamics of selected genes.
Conclusion
CosinorPy provides all the functionalities required for a rhythmicity analysis of experimental data. Its features merge and extend the functionalities of existing cosinorbased software packages. These range from data import and preprocessing to identification of the most suitable models, evaluation of rhythmicity parameters and their significance, and assessment of differential rhythmicity between groups of measurements. Moreover, CosinorPy produces publicationready figures, visualising the results of the fitting process as well as assessed parameter values, e.g., acrophase values in a polar coordinate system. With the vast scope of functionalities, as well as ease of use, the package presents an attractive alternative to other software packages for rhythmicity detection and analysis.
Availability and requirements

Project name: CosinorPy

Project home page: https://github.com/mmoskon/CosinorPy

Operating system(s): Platform independent

Programming language: Python

Other requirements: pandas, Matplotlib, NumPy, SciPy, statsmodels and openpyxl Python libraries

License: MIT license

Any restrictions to use by nonacademics: none
Availability of data and materials
All data generated or analysed during this study are included in this published article and its supplementary information files available at https://github.com/mmoskon/CosinorPy.
Abbreviations
 CSV:

Comma separated values
 DoF:

Degrees of freedom
 FDR:

False discovery rate
 IPYNB:

Interactive python notebook
 JTK:

Jonckheere–Terpstra–Kendall
 MESOR:

Midline statistic of rhythm
 qPCR:

Quantitative polymerase chain reaction
 RAIN:

Rhythmicity analysis incorporating nonparametric methods
 SSR:

Sum of squared residuals
References
 1.
Ramsey KM, Affinati AH, Peek CB, Marcheva B, Hong HK, Bass J. Circadian measurements of sirtuin biology. In: Sirtuins. Berlin: Springer; 2013. p. 285–302.
 2.
Zhang R, Lahens NF, Ballance HI, Hughes ME, Hogenesch JB. A circadian gene expression atlas in mammals: implications for biology and medicine. Proc Nat Acad Sci. 2014;111(45):16219–24.
 3.
Andreani TS, Itoh TQ, Yildirim E, Hwangbo DS, Allada R. Genetics of circadian rhythms. Sleep Med Clin. 2015;10(4):413–21.
 4.
Brainard J, Gobel M, Scott B, Koeppen M, Eckle T. Health implications of disrupted circadian rhythms and the potential for daylight as therapy. Anesthesiol. 2015;122(5):1170–5.
 5.
Xie Y, Tang Q, Chen G, Xie M, Yu S, Zhao J, et al. New insights into the circadian rhythm and its related diseases. Frontin Physiol. 2019;10:682.
 6.
Seifalian A, Hart A. Circadian rhythms: will it revolutionise the management of diseases? J Lifestyle Med. 2019;9(1):1.
 7.
Hughes ME, Hogenesch JB, Kornacker K. JTK\_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genomescale data sets. J Biol Rhythms. 2010;25(5):372–80.
 8.
Hutchison AL, MaienscheinCline M, Chiang AH, Tabei SA, Gudjonson H, Bahroos N, et al. Improved statistical methods enable greater sensitivity in rhythm detection for genomewide data. PLoS Comput Biol. 2015;11(3):e1004094.
 9.
Hutchison AL, Allada R, Dinner AR. Bootstrapping and empirical bayes methods improve rhythm detection in sparsely sampled data. J Biol Rhythms. 2018;33(4):339–49.
 10.
Thaben PF, Westermark PO. Detecting rhythms in time series with rain. J Biol Rhythms. 2014;29(6):391–400.
 11.
Anafi RC, Francey LJ, Hogenesch JB, Kim J. CYCLOPS reveals human transcriptional rhythms in health and disease. Proc Nat Acad Sci. 2017;114(20):5312–7.
 12.
Ruben MD, Wu G, Smith DF, Schmidt RE, Francey LJ, Lee YY, et al. A database of tissuespecific rhythmically expressed human genes has potential applications in circadian medicine. Sci Transl Med. 2018;10(458).
 13.
Ruben MD, Francey LJ, Guo Y, Wu G, Cooper EB, Shah AS, et al. A largescale study reveals 24h operational rhythms in hospital treatment. Proc Nat Acad Sci. 2019;116(42):20953–8.
 14.
Refinetti R, Cornélissen G, Halberg F. Procedures for numerical analysis of circadian rhythms. Biol Rhythm Res. 2007;38(4):275–325.
 15.
Cornelissen G. Cosinorbased rhythmometry. Theoret Biol Med Modell. 2014;11(1):16.
 16.
Sachs M. Cosinor: tools for estimating and predicting the cosinor model; 2014. R package version 1.1. https://CRAN.Rproject.org/package=cosinor.
 17.
Mutak A. Cosinor2: Extended tools for cosinor analysis of rhythms; 2018. R package version 0.2.1. https://CRAN.Rproject.org/package=cosinor2.
 18.
Carlucci M, Kriščiūnas A, Li H, Gibas P, Koncevičius K, Petronis A, et al. DiscoRhythm: an easytouse web application and R package for discovering rhythmicity. Bioinformatics. 2019.
 19.
Bingham C, Arbogast B, Guillaume GC, Lee JK, Halberg F. Inferential statistical methods for estimating and comparing cosinor parameters. Chronobiologia. 1982;9(4):397–439.
 20.
Hughes ME, Abruzzi KC, Allada R, Anafi R, Arpat AB, Asher G, et al. Guidelines for genomescale analysis of biological rhythms. J Biol Rhythms. 2017;32(5):380–93.
 21.
Ismail N, Jemain AA. Handling overdispersion with negative binomial and generalized poisson regression models. In: Casualty actuarial society forum. vol. 2007. Citeseer; 2007. p. 103–58.
 22.
Singer JM, Hughey JJ. LimoRhyde: a flexible approach for differential analysis of rhythmic transcriptome data. J Biol Rhythms. 2019;34(1):5–18.
 23.
Parsons R, Parsons R, Garner N, Oster H, Rawashdeh O. CircaCompare: a method to estimate and statistically support differences in mesor, amplitude, and phase, between circadian rhythms. Bioinformatics. 2020;36(4):1208–12.
Acknowledgements
I would like to thank Dr. Urša Kovač and Dr. Damjana Rozman for their collaborative work on the analysis of circadian data, which also inspired the development of CosinorPy. I would as well like to thank Dr. Miha Mraz and his group members for their support, and Robert McKenzie for proofreading the paper.
Funding
This work has been partially supported by the scientificresearch program P20359, and by the basic research projects J19176 and J51798, all financed by the Slovenian Research Agency. The funding body had no role in the design of the study and collection, analysis, and interpretation of data nor in writing the manuscript.
Author information
Affiliations
Contributions
MM wrote and tested the code, performed the analyses and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The author declares that he has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1: Supplementary Table 1
. Results of the fitting process for the first case study using 1, 2 and 3component cosinor models with the cosinor module. The results are presented in a CSV format as reported by CosinorPy.
Additional file 2: Supplementary Table 2
. Results of the fitting process for the first case study using 1component cosinor models with the cosinor1 module. The results are presented in a CSV format as reported by CosinorPy.
Additional file 3: Supplementary Table 3
. Results of the comparison analysis for the first case study using 1component cosinor models with the cosinor1 module. The results are presented in a CSV format as reported by CosinorPy.
Additional file 4: Supplementary Table 4
. Results of the fitting process for the second case study using 1, 2 and 3component cosinor models with the cosinor module. The results are presented in a CSV format as reported by CosinorPy.
Additional file 5: Supplementary Table 5
. Results of the fitting process for the second case study using 1component cosinor models with the cosinor1 module. The results are presented in a CSV format as reported by CosinorPy.
Additional file 6: Supplementary Table 6
. Results of the comparison analysis for the second case study using 1component cosinor models with the cosinor1 module. The results are presented in a CSV format as reported by CosinorPy.
Additional file 7: Supplementary Table 7
. Results of the fitting process for the first case study using cosinor and cosinor2 R packages.
Additional file 8: Supplementary Table 8
Results of the comparison analysis for the first case study using cosinor and cosinor2 R packages.
Additional file 9: Supplementary Table 9
. Results of the fitting process for the second case study using cosinor2 R package.
Additional file 10: Supplementary Table 10
. Results of the comparison analysis for the second case study using cosinor2 R package.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Moškon, M. CosinorPy: a python package for cosinorbased rhythmometry. BMC Bioinformatics 21, 485 (2020). https://doi.org/10.1186/s1285902003830w
Received:
Accepted:
Published:
Keywords
 Cosinor
 Rhythmicity analysis
 Circadian analysis
 Regression
 Python