CosinorPy: a python package for cosinor-based rhythmometry

Background Even though several computational methods for rhythmicity detection and analysis of biological data have been proposed in recent years, classical trigonometric regression based on cosinor still has several advantages over these methods and is still widely used. Different software packages for cosinor-based rhythmometry exist, but lack certain functionalities and require data in different, non-unified input formats. Results We present CosinorPy, a Python implementation of cosinor-based methods for rhythmicity detection and analysis. CosinorPy merges and extends the functionalities of existing cosinor packages. It supports the analysis of rhythmic data using single- or multi-component cosinor models, automatic selection of the best model, population-mean cosinor regression, and differential rhythmicity assessment. Moreover, it implements functions that can be used in a design of experiments, a synthetic data generator, and import and export of data in different formats. Conclusion CosinorPy is an easy-to-use Python package for straightforward detection and analysis of rhythmicity requiring minimal statistical knowledge, and produces publication-ready figures. Its code, examples, and documentation are available to download from https://github.com/mmoskon/CosinorPy. CosinorPy can be installed manually or by using pip, the package manager for Python packages. The implementation reported in this paper corresponds to the software release v1.1.

strive towards the integration of such analyses into the clinical work for disease diagnostics, treatment, and prevention [6].
Detection and analysis of rhythmicity requires designated computational approaches. These approaches are focused on the identification of rhythmic datasets that correspond to specific biological entities (e.g., genes) and evaluation of their rhythmicity parameters. Several non-parametric methods for circadian data analysis have been proposed recently, such as JTK CYCLE and its extensions [7][8][9] and RAIN [10]. Even though nonparametric methods have several benefits, e.g., robustness to noise in the data, classical harmonic regression should still be used when rhythmicity parameters, such as oscillation amplitudes and acrophases, need to be evaluated, or when the noise in the data is non-Gaussian [10]. Moreover, non-parametric methods often fail or do not perform well when data are (1) collected at irregular intervals, (2) without replicates, (3) unbalanced (i.e., more samples are collected at one time of a day compared to others), (4) full of outliers, and (5) very large. However, cosinor has been successfully applied even in such cases (see, e.g., [11][12][13]).
Cosinor presents a fundamental method for rhythmicity detection and analysis using cosine curve fitting [14,15]. It is based on a trigonometric regression model where t corresponds to the observed time points in the time series, N is the number of components in the model, namely number of cosine curves, and A i,1 , A i,2 , C and P are the parameters of the model, with M being the MESOR (Midline Statistic Of Rhythm), P the period of the observed rhythm, and e(t) the error term [15]. When the period is known the model can be converted to a linear regression model where x i,1 = sin t P/i · 2π and x i,2 = cos t P/i · 2π . When the period is not known, different periods within the feasible period ranges can be tested, or period detection methods such as periodograms, can be used for period estimation [14].
Cosinor has been widely applied to the analysis of rhythmicity detection and evaluation of rhythmicity parameters in time series data. Different software packages for cosinor-based rhythmometry, such as cosinor [16], cosinor2 [17], and DiscoRhythm [18], have been introduced in recent years. However, these packages lack certain functionalities, such as multi-component cosinor regression and analysis, and do not support automatic identification of the best regression model. Moreover, the user must format the input data for each of these tools in a different manner. Herein, we describe CosinorPy, a Python package that merges and extends the functionalities of existing software packages. CosinorPy can as well be used to generate synthetic data, and supports different input and output formats compatible with other software packages for rhythmicity detection and analysis. It provides all functionalities required for the analysis of rhythmic data from data import and preprocessing to removal of outliers, identification of oscillation periods, assessment of the most suitable models and their statistics, analysis of differential rhythmicity, and data plotting, reporting, and export. Moreover, it implements functions to estimate the required number of samples to obtain the results with a predefined statistical significance and can thus as well be used to guide experimental work [19]. CosinorPy is currently the only cosinor package that is implemented in Python, which has become increasingly important in data science, as well as in the field of bioinformatics in recent years. A comparison of features of currently available cosinor-based software packages is provided in Table 1.

Implementation
CosinorPy is implemented in Python and relies on the state-of-the-art Python packages for data management, scientific computing, visualisation, and statistical modelling, namely pandas, NumPy, SciPy, Matplotlib, and statsmodels. CosinorPy is comprised of three Python modules, namely file_parser, cosinor, and cosinor1. The file_ parser module implements reading and writing of xlsx and csv files and generating synthetic data. The cosinor module implements the functionalities based on a single-or multi-component cosinor model that include model fitting, identification of the most suitable model, and analysis of differential rhythmicity. The cosinor1 module implements similar functionalities, which are adapted to a single-component cosinor model, and thus provide more exhaustive results and additional statistics such as the significance of acrophase shifts within the differential rhythmicity analysis. The implementation of specific functionalities within the package are described below. Thorough documentation of the package is available at https ://githu b.com/mmosk on/Cosin orPy/ blob/maste r/docs/docs.md.

Single-component cosinor
When the observed data can be accurately described with a single harmonic component, a single-component cosinor model can be used: Using this model, amplitude (A) and acrophase ( φ ) can be estimated directly from the assessed parameter values as:

Table 1 Currently available software packages for rhythmicity detection and analysis based on cosinor method and its extensions
Package: package name; language: the programming language that is used in a combination with a specific package; differential rhythmicity: support for the assessment of a differential rhythmicity among two groups of measurement; linear regression and non-linear regression: what kind of regression the package supports; multiple components: support for multicomponent cosinor analysis; count data: support for count data analysis; population-mean: support for population-mean cosinor analysis; design of experiments: support for design of experiments to approximate the minimal number of required samples; reference: reference to the package and The statistical significance of the model is assessed with an F-test on the basis of the model sum of squares and sum of square residuals [15]. The significance of rhythmicity is evaluated with the zero amplitude test, which is as well based on the F-statistic [15,19]. Moreover, period and acrophase significance and confidence intervals are assessed directly from the model and its underlying data [19]. The adequacy of the model can be assessed using different regression diagnostic tests. When replicates are available or when the measurements are performed for several periods, the goodness of fit of the model is evaluated with an F-test comparing the pure error and the lack of fit sum of squares [15]. When collecting circadian data, experimentalists should follow specific guidelines to obtain statistically significant results [20]. However, if certain requirements regarding the precision of the assessment of rhythmicity parameters, e.g. a maximal acceptable length of a confidence interval, can be specified in advance, we can approximate the minimal sample size necessary to achieve such precision [19]. CosinorPy implements these functionalities and can thus be used as well during a design of experiments.

Multi-component cosinor
When a single-component cosinor model is not able to describe our data satisfactorily, e.g., when the goodness of fit test rejects the model, a multi-component cosinor model can be considered [15]. A multi-component cosinor model is able to describe more complex oscillatory dynamics, e.g. peak asymmetry or multiple peaks within one period, which cannot be described with a single harmonic component. Rhythmicity parameters cannot be calculated analytically from this model, but are evaluated from the fitted curve.
Additional components will always increase a model's accuracy, but on the account of a reduced number of degrees of freedom. This might cause the over-fitting of a model to the observed data. Automatic selection of the best model regarding the number of components is performed using the extra sum-of-squares F-test: where SSR 1 and SSR 2 present the sum of squared residuals (SSR) for a simpler and a more complex model, respectively, and where DoF 1 and DoF 2 present the degrees of freedom (DoF) of a simpler and a more complex model, respectively. The more complex model, i.e., the model with a smaller DoF, is selected as more appropriate when the obtained p-value is lower than a predefined threshold. Moreover, the model selection process can be guided with the goodness of fit measures as for a single-component cosinor model. Additional advantage of our implementation of the multi-component cosinor regression is that it allows the user to fit a cosinor model to count data. Here, a generalised Poisson model with a logarithmic link can be used in a combination with a cosinor model to handle over-as well as under-dispersed data [21]. Moreover, CosinorPy allows the user to as well select Poisson or negative-binomial models for the analysis of rhythmicity of count data.

Population-mean models
When dealing with at least three individuals, and when each individual produces a series of dependant measurements which can be used to establish the individual's cosinor model, a population-mean cosinor should be used [15]. In this case a cosinor model is fitted to each individual. The response of the whole population is described and analysed as a mean of all individual cosinor models (population-mean cosinor). When using a single-component population-mean cosinor, confidence intervals of rhythmicity parameters are assessed as described in [19]. Moreover, a p-value for the null hypothesis of the amplitude of oscillations being zero is evaluated with an F-test for a single-component population-mean cosinor as described in [15]. The statistical significance and the goodness of fit of a single-or multi-component population-mean cosinor model are assessed in a similar way as for the basic cosinor models [15].

Analysis of differential rhythmicity
Cosinor models can be used to assess the difference in the rhythmic response of two groups of measurements. Each group either corresponds to a different variable (e.g., two different genes) or to the same variable in different conditions (e.g., the same gene before and after a perturbation). We are usually interested in amplitude changes and acrophase shifts between the groups. Several different methods, which we use in our implementation, have been proposed to assess these differences.
If the data describing both groups can be modelled with single-component cosinor models, a single-component cosinor can as well be used to assess the differential rhythmicity of these two groups. This model is implemented as where g equals 0 if the data belong to the group a, and 1 if the data belong to the group b. Based on the assessed parameter values, we can estimate the acrophases and amplitudes of each of the groups, as well as the differences between these values and their significance [19]. Moreover, a population-mean single-component cosinor model is adapted to analyse the differential rhythmicity in a similar way [17,19].
LimoRhyde [22] presents a similar approach that uses a cosinor model and can be adapted to use an arbitrary number of components in the following form The significance of each parameter in this model is assessed using a T-test, where the null hypothesis is that a parameter equals zero. When this hypothesis is rejected for any of the rhythmicity parameters of the group b, namely A i,1,b or A i,2,b , these two groups should reflect differential rhythmicity. While a single-component cosinor can be used to assess the significance of acrophase shift and amplitude change, LimoRhyde is only able to assess whether the difference in rhythmicity between the groups is significant or not.
Non-linear regression might as well be used to evaluate the differential rhythmicity parameters and their confidence intervals as described in CircaCompare [23]. This is implemented as the following model where A a , φ a and M a present the amplitude, acrophase, and MESOR of the group a, respectively, and A a + A b , φ a + φ b and M a + M b present the amplitude, acrophase, and MESOR of the group b, respectively. CosinorPy as well implements differential rhythmicity assessments based on non-linear regression. However, this approach does not provide any additional information to the approach described in Equation 7.

Results
We demonstrate the application of selected CosinorPy functionalities on two typical case studies using four groups of synthetically generated time series data with the attributes presented in Table 2. The whole analysis is available as interactive Python notebooks (IPYNB) at https ://githu b.com/mmosk on/Cosin orPy.

Case study 1: independent measurements
In our first case study, we presume that the measurements in each group are independent. This scenario complies with a transcriptomics data analysis or an analysis of qPCR data. CosinorPy successfully identifies the most suitable model, namely a 1-component model in the first two scenarios and a 3-component model in the last two scenarios. If the rhythmicity period is not known, the user could as well use the automatic identification of the best fitting period together with the best fitting model, or could rely on the period values assessed using periodograms (see https :// (8)  Fig. 1.
Even though a 3-component model is more appropriate for the last two scenarios, a good fit is obtained as well with a 1-component model with slightly higher SSR values than a 3-component model (see Additional file 1: Table 1 and Additional file 2: Table 2). We thus opted to perform a differential rhythmicity analysis using a 1-component model to obtain more informative results, namely the significance of amplitude change and acrophase shift. CosinorPy is able to produce different plots visualising the difference between fits (see the upper part of Fig. 2), as well as acrophase shifts in a polar coordinate system (see the lower part of Fig. 2). Moreover, results of the analysis are reported in a tabular form, i.e., as a pandas DataFrame, which can be easily stored to Excel or CSV format. These results are available in Additional file 3: Table 3 and summarised in Table 3.
The same data were used in a combination with the cosinor and cosinor2 R packages [16,17] to validate the obtained results. These two packages support only singlecomponent cosinor analyses. Moreover, the cosinor2 package builds upon the cosinor package, which unfortunately reports incorrect acrophase values [17]. Even though the cosinor2 package provides a function to correct these values, their corresponding p-values are not updated accordingly. The analyses performed with the CosinorPy package produce the same results as cosinor and cosinor2 packages with the above mentioned exception (see Additional file 7: Table 7 and Additional file 8: Table 8).

Case study 2: population-based measurements
In our second case study, we presume that measurements in each group belong to the same individual, which means that population-mean models should be used. This complies with, e.g., bioluminescence data, where the same cell is observed throughout the whole experiment. We can also refer to such measurements as dependent measurements.
We again use CosinorPy to identify the most suitable model, and assess the rhythmicity parameters and significance of periodicity in the data (see https ://githu b.com/ mmosk on/Cosin orPy/blob/maste r/demo_depen dent.ipynb ). As in the first case study, CosinorPy is able to identify the most suitable model for each dataset (see Fig. 3). Complete results of the fitting process are available as Additional file 4: Table 4 and Additional file 5: Table 5. We again opted to use a single-component cosinor to Table 3 Summary of the differential rhythmicity assessment for the first case study The differential rhythmicity was assessed using a single-component cosinor. We are usually interested in the significance of amplitude changes, as well as acrophase shifts. When multiple tests are performed, CosinorPy adjusts the significance values using the false discovery rate (FDR) method (reported as q-values). The results here indicate that while the amplitude changes are not significant, acrophase shifts are ( q < 0.05)  Table 6 and summarised in Table 4. We validated the obtained results using the cosinor2 R package [17]. The populationbased tests implemented within this package do not rely on the cosinor R package. The reported acrophases and their corresponding p-values thus fully comply with the results obtained with the CosinorPy package (see Additional file 9: Table 9 and Additional file 10: Table 10).

Case study 3: additional benefits of the multi-component cosinor
To additionally investigate the benefits of multi-component cosinor models we applied CosinorPy to a larger dataset downloaded from the JTK Cycle repository [7]. We analysed these data with both, single-component cosinor, as well as multi-component cosinor models with up to three components (see https ://githu b.com/mmosk on/Cosin orPy/blob/maste r/multi _vs_singl e.ipynb ). Among 250 measurements 95 measurement Table 4 Summary of the differential rhythmicity assessment for the second case study (population-mean cosinor) The differential rhythmicity was assessed using a single-component population-mean cosinor. When multiple tests are performed, CosinorPy adjusts the significance values using the false discovery rate (FDR) method (reported as q-values). The results of the analysis indicate that while the amplitude changes are not significant, acrophase shifts are ( q < 0.05)   Table 3). CosinorPy is able to produce publication-ready figures illustrating a comparative analysis of time series data (upper part of the figure), as well as an analysis of acrophase shifts (the lower part of the figure) were identified to be circadian using multi-component cosinor models. Among these, 11 measurement were not identified to be circadian using a single-component cosinor model. In all these 11 cases data reflected multiple peaks within a 24-hour period, which cannot be fitted with a single-component model. Multiple peaks were successfully incorporated into multi-component models. However, in some of these cases the statistical significance was marginal, and additional data should be collected to confirm the circadian nature of observed measurements. In the future, large-scale analyses that were performed with single-component cosinor models in the past should be revised using multi-component cosinor models. This could enable us to detect additional rhythmic genes and would thus provide novel insights into circadian dynamics of selected genes.

Conclusion
CosinorPy provides all the functionalities required for a rhythmicity analysis of experimental data. Its features merge and extend the functionalities of existing cosinor-based software packages. These range from data import and pre-processing to identification of the most suitable models, evaluation of rhythmicity parameters and their significance, and assessment of differential rhythmicity between groups of measurements. Moreover, CosinorPy produces publication-ready figures, visualising the results of the fitting process as well as assessed parameter values, e.g., acrophase values in a polar coordinate system. With the vast scope of functionalities, as well as ease of use, the package presents an attractive alternative to other software packages for rhythmicity detection and analysis.