PartSeg, a Tool for Quantitative Feature Extraction From 3D Microscopy Images for Dummies

Background Bioimaging techniques offer a robust tool for studying molecular pathways and morphological phenotypes of cell populations subjected to various conditions. As modern high resolution 3D microscopy provides access to an ever-increasing amount of high quality images, there arises a need for their analysis in an automated, unbiased and simple way. Segmentation of structures within cell nucleus, which is the focus of this paper, presents a new layer of complexity in the form of dense packing and significant signal overlap. At the same time the available segmentation tools provide a steep learning curve for new users with limited technical background. This is especially apparent in bulk processing of image sets, which requires the use of some form of programming notation. Results In this paper, we present PartSeg, a tool for segmentation and reconstruction of 3D microscopy images, optimised for the study of cell nucleus. PartSeg integrates refined versions of several state-of-the-art algorithms, including a new multi-scale approach for segmentation and quantitative analysis of 3D microscopy images. The features and user-friendly interface of PartSeg were carefully planned with biologists in mind, based on analysis of multiple use cases and difficulties encountered with other tools, to offer ergonomic interface with a minimal entry barrier. Bulk processing in an ad-hoc manner is possible without the need for programmer support. As the size of datasets of interest grows, such bulk processing solutions become essential for proper statistical analysis of results. Advanced users can use PartSeg components as a library within Python data processing and visualisation pipelines, for example within Jupyter notebooks. The tool is extensible so that new functionality and algorithms can be added by the use of plugins. For biologists the utility of PartSeg is presented in several scenarios, showing the quantitative analysis of nuclear structures. Conclusions In this paper, we have presented PartSeg which is a tool for precise and verifiable segmentation and reconstruction of 3D microscopy images. PartSeg is optimised for cell nucleus analysis and offers multiscale segmentation algorithms best-suited for this task. PartSeg can also be used for bulk processing of multiple images and its components can be reused in other systems or computational experiments. Contact g.bokota@cent.uw.edu.pl, a.magalska@nencki.edu.pl, d.plewczynski@cent.uw.edu.pl


Background
For a decade, high-throughput bioimaging techniques offered a robust tool for studying molecular pathways and morphological phenotypes of cell populations subjected to various conditions [1,2]. Due to recent advances in light and electron microscopy, a large number of input images can be produced in a relatively short time span. Therefore, it becomes critical to extract numerical features from imaging data in an automated, unbiased and simple way.
The cell nucleus is a highly organised and crowded organelle, composed of many functional and structural domains (Fig.1). The past two decades of research show that changes in higher-order chromatin structures, that is, spatial and temporal rearrangements of chromatin, are involved in transcriptional control and other cellular functions [3,4]. The 3-C based methods, which decipher folding of chromatin, quantify the probability of interaction between two genomic fragments, such as promoters and enhancers, that are in close spatial proximity, but may be dispersed far across the genome. To refine this biochemical data, showing the outcome averaged over millions of cells, precise microscopic analysis of nuclear components like genes, chromatin and nuclear domains is necessary. Therefore, microscopy is often used for verification, at scales of individual cells or even genetic copies.
Automatic segmentation of nuclear structures poses many challenges because of confinement boundaries imposed by the nucleus, dense packing and significant background noise, resulting in signal overlap. Optimal segmentation of boundaries via intensity-threshold based methods is difficult, if not impossible, especially for conjoined structures. Yet, segmentation of ROI is essential for expert evaluation and serves as basis for calculating numerical descriptors of data.
Because of this, there is no possibility of creating a parameterless, universal algorithm for processing data from innovative experiments. Rather a flexible methodology for tuning data processing methods to current datasets is preferable. We take this task one step further by facilitating the process for users without programmatic knowledge.
In this paper, we present PartSeg, which offers a novel segmentation algorithm for a high-throughput imaging data and is specially optimised for analysis of cell nucleus. PartSeg relies on 3D segmentation of regions of interest (ROI) and subsequent analysis over a large number of morphological features of the segmented structures. It is equipped with an easy-to-use graphical user interface (GUI) and can alternatively be utilised as a Python library in composite data processing pipelines. Therefore, it meets the expectations of both biologists and bioinformaticians. Moreover, PartSeg allows for batch processing of datasets coming from different sources e.g. 2D/3D images from light or electron microscopy, which can be managed without the support of a programmer by the use of rapid prototyping on sample data.
The main design goal of PartSeg was to make it simple to learn for new users. To achieve this, we determined typical use cases for users interested in studying cell nuclei (see Subsection 2.1) and planned the GUI so that swift proficiency in usage of the toolkit can be quickly achieved. It is consistent, ergonomic and does not introduce unnecessary notions or mechanisms. For example, compatibility is maintained by supporting a diverse set of 3D image formats stored in the filesystem enabling interoperability with all popular platforms. This way no importing, or preprocessing is needed and user can focus on the tasks at hand, rather than the software's inner workings. This follows Human Computer Interaction studies [5], which show that users expect systems to work and will chose those that are easier to use.
The current revision of the PartSeg interface includes support for 2D and 3D multichannel data. It consist of two separate tools named Mask Segmentation and ROI Analysis. The former is intended for extraction of objects of interest from the data set and can store the basic information on those objects in a separate mask file. The latter can be used for the measurements of morphological parameters of specific structures within the extracted objects.
Finally, some ergonomics improvements are incorporated following examination of typical usage patterns. For example, segmentation parameters are automatically saved during and in between sessions, and specific segmentation profiles can be saved by users. The segmented area is graphically presented on top of the data which simplifies visual validation of utilised parameters. Part-Seg is also equipped with a synchronised view, where two windows are used simultaneously, permitting the user to compare the current segmentation with multichannel raw data or with other segmentation computed with different parameters. Last but not least, parallel batch processing is available, with the number of processes adjusted by the user depending on available system resources.

Implementation
In order to optimise Partseg GUI we have identified typical steps of nucleus analysis workflow by examining several real life use cases. In subsection 3.2 we provide four examples of such workflow.

Typical steps of nucleus analysis workflows
The steps are performed manually on a small subset of the data. This confirms the selection of the method and that the parameters are adjusted appropriately before processing of the whole dataset.
Preselection of nuclei in 3D images: In order to quantitatively analyse structures within cell nucleus, it is necessary to segment individual nuclei from the 3D image. Nuclei can be segmented based on any nuclear staining and thresholding method. However, in the case of heterogeneous samples with diverse staining intensities, one threshold is not enough to properly segment all nuclei. The possibility of limiting the selection of nuclei from the population for analysis provides opportunity to study cells in particular cell cycle phase or those of a specific type e.g. single cancerous cells embedded in a normal tissue.
Selection and verification of parameters for extraction of complex ROI inside the nucleus' territory: Next, segmentation of ROI within nuclei is necessary in order to obtain numerical values for nuclear assemblies. Extraction of complex structures from 3D images usually requires testing of several algorithms in order to select optimal parameters. Visualisation of performed segmentation on top of raw data allows the user to select methods and settings facilitating the extraction process. As was mentioned before, nuclear space is very crowded. For example pairs of chromosome territories (CTs) are often localised close to each other, which makes segmentation of a single copy very difficult even for an experienced eye. This step is tuned on a sample subset of nuclei.
Measurement set: A choice of numerical parameters (measurements) needs to be made. Values for those parameters are calculated for each image as a result of manual or batch segmentation processing and presented to the user in the form of a spreadsheet. The user then performs some simple computations and plots the results. Results should be based on the initial picture resolution and expressed in physical units.
Batch processing: When single image analysis is considered satisfactory, it can be repeated by executing batch processing on all input images to collect data for statistical analysis. Results are shown for each individual nucleus and structure. This allows conclusions to be drawn for the entire cell population based on data acquired from individual nuclei simplifying the subsequent tasks of categorising nuclei or structures and performing statistical analysis. Comparing, aforementioned biochemical techniques show an averaged outcome from large populations of cells, which does not necessarily reflect the biological heterogeneity of individual components.
Although several general and mature bioimaging tools like ImageJ [6], Icy [7], CellProfiler [8] and ImagePy [9] exist, none of these provide support for all of the aforementioned steps nor do they possess an interface optimised for the workflow as a whole. A detailed comparison with other systems is provided in Section 3.1.

Algorithms for ROI Analysis
The main feature of PartSeg is ROI analysis. ROI is a set of voxels, which can be distinguished and measured. In PartSeg there are several provided algorithms for that, which we list below, and more can be added as plugins. Here, we categorise the algorithms into two groups. The first contains algorithms, which are designed for ROI extraction from an initial 3D image, outputting a collection of ROI. The second contains algorithms for measuring various ROI features. These algorithms take a collection of ROIs as an input and use it to compute numerical or aggregate features.

ROI extraction
The canonical group of PartSeg algorithms designed for finding collections of ROI rely on threshold segmentation. These include: • Thresholding -ROI is defined by thresholding followed by identification of connected components and minimum size filtering. The upper or lower threshold level can be set manually by the user or calculated with common methods such as Otsu, Shanbhag, etc.
• Range thresholding -allows user to set specific threshold range marking ROI, which can be useful e.g. to eliminate background staining.
• Thresholding with watershed -ROI is found by two stage thresholding. First core objects are identified as in Thresholding, after which a second thresholding marks the whole area of interest. Voxels from this area are found using selected watershed-like methods initiated from core objects.
• Multiscale opening -is a type of watershed-like thresholding that is unique to PartSeg. Further details are given in Subsection 2.3.
• Multiple threshold Otsu -is a generalisation of the histogram-based Otsu method (see [10]) which identifies multiple types of ROIs using set of thresholds.

Plugins
PartSeg has plugin system which allows to expand it with additional features. This is convenient if one wants to add experiment specific computational methods and include tailored dependencies that need to be selected to match specific computer configuration. As an example we have provided plugins incorporating deep learning algorithms. It is well recognised that Deep Learning gives very good results, when applied to the nucleus segmentation. There are many published models (both networks topologies, and their software implementations), for example Stardist [11,12] and Cellpose [13]. Typically such highly-optimised statistical methodologies have specific configurations that depend on processor type, graphic card and installed drivers, hence we implement them as plugins.
Our plugins for these two models are available at: https://pypi.org/project/ PartSeg-stardist/ and https://pypi.org/project/PartSeg-cellpose/. ROI selection can be performed multiple times in an incremental manner. After determining the first collection, selected ROI can be converted to a mask and used as an input for the next level of extraction. When the final collection is obtained, its numerical description can be determined using algorithms described in the next section.

Measuring of ROI features
Measurements can be performed on an area defined with ROI, collection of ROI or a Mask. Several common measurement methods like Volume or Diameter are available which require no explanation.
PartSeg also provides more sophisticated methods of ROI measurements on Mask. They can, for example, be used to measure gene positions in the nucleus or to measure relative difference of concentrations of proteins within regions of the nucleus. These methods include: • Border rim -it allows to measure the total volume or pixel brightness of the selected ROI, which is located within a given distance from the border of mask. An example application would be to identify gene or the portion of chromosome territory positioned in close proximity to nuclear rim (see Figure 1 A.d. and [14]) • Mask distance splitting -splits the mask into concentric regions of increasing distance from the mask centre, which can be of equal radius or equal volume. It allows to measure volume and pixel brightness of ROI found within the designated regions. For example it shows radial position of CTs within nucleus (see Figure 1 A.c. and [15]).
• Mask-ROI distance -distance from ROI to mask is calculated based on their mass centre (taking brightness into account), geometrical centre, or border distance.
An example of application would be to identify gene positioning within the nucleus (see Figure 1 A.d. and [14]) 6 Some other noteworthy measurement types include: • Moment -One of possible measurements of mass (pixel brightness) distribution inside ROI. It allows us to determine if the structural mass is concentrated or distributed evenly. The formula is v∈ROI m v r 2 v , where m v is the brightness of a voxel and r v is its distance from the ROI's centre of mass. The interpretation is similar to the classical moment of inertia. This measure assume that voxel brightness is one to one correlated with object density.
• First/Second/Third principal axis -Aligns ROI using weighted PCA, then calculates ROI length along the corresponding axis. The weights correspond to voxel brightness and their position vector is in relation to the ROI's centre of mass. This measurement can be used to determine basic shape of ROI.

Multiscale Opening
Often in the analysis of imaging data the distance between two ROIs is smaller than the angular resolution of imaging method used. Historically, the first approach to separate such ROIs was the watershed [16] method. However if an object has a diverse morphology i.e. exhibits regions of higher brightness, classical watershed will incorrectly split such an object into multiple ROIs. To overcome this problem, the watershed transform was developed [17]. Unfortunately watershed transform works best for objects, which are spherical in a chosen metric. Because many nuclear domains are densely packed and non-spherical, it was important to develop methods that are capable of reliable segmentation. Therefore we implemented a novel Multiscale Opening [18]. It takes into account both the change in brightness and the physical distance along the joining path. The main difference between other watershed like algorithms and Multiscale Opening is its iterative voxel labelling. Voxels can be labelled only if they are closer to any object than to the background. This approach produces better results for stretched and non-symmetrical objects. An example is presented in Figure 3.

Tutorials
PartSeg can be used as a standalone program or as a Python library for example, as part of a larger pipeline. We provide two tutorials, which show how to use PartSeg from the biologists perspective with the Graphical User Interface (GUI) and from the bioinformaticians perspective as a library.

Using PartSeg GUI
https://github.com/4DNucleome/PartSeg/blob/master/tutorials/tutorial-chromosome-1/ tutorial-chromosome1_16.md. In this tutorial we present how PartSeg can be used to segment nuclei from 3D confocal images and subsequently analyse several parameters of chromosomal territories in chromosomes 1 and 16. This use case can be broken down into three parts. First, the segmentation of nuclei is performed using the DNA signal. Segmented nuclei are cut from the original pictures and the mask files containing segmentation parameters are created.
Second, in order to quantify features of chromosome 1 territories (CT1), segmentation of its specific signal is carried out. The parameters for segmentation are adjusted to accurately cover the chromosome's staining. Next, the method for measuring several morphological parameters of CT1 is presented. These parameters are calculated based on a fixed threshold value. Additionally, the volume ratio of CT1 and its nucleus is calculated.
Finally, the previously established settings profile is used in batch mode to measure features of the nuclei and CT1s. After repeating the process for chromosome 16, a comparison of size between chromosome 1 and 16 can be done in a fully automated way (see 1) A. In Figure 2 we utilise the PartSeg GUI to perform steps from the typical workflow in Section 2.1.

Using PartSeg as a Python library
https://github.com/4DNucleome/PartSeg/blob/master/tutorials/tutorial_ neuron_types/Neuron_types_example.ipynb In this tutorial we present how PartSeg components can be used as a Python library. Images of cells immunostained for markers Prox1 and CamKII and counterstained with DNA dye were acquired with confocal microscopy. Initially, segmentation of hippocampal neuron nuclei is shown. Next, a combination of PartSeg components and custom code is presented in order to assign segmented nuclei to 4 different classes based on the aforementioned markers. Finally, segmentation of chromatin based on DNA staining is performed to obtain a set of measurements. In the final step, Matplotlib plots of the obtained data are generated. Tutorial shows how to create aforementioned pipeline using API method for segmentation, or simply load it from a file exported in PartSeg GUI.

Comparison to existing tools
There exist several general tools for image processing and analysis like Im-ageJ [6], Icy [7], CellProfiler [8]. These are robust apps that have been developed over years and contain a wealth of options and plugins for many applications. For inexperienced users such a multiplicity is difficult to overcome and the general nature of these tools does not enforce a workflow adapted to nucleus analysis.
ImageJ's, Icy's and ImagePy's main workflow revolves around atomic operations like thresholding or filtering. In those tools presentation of processing results in the context of input data requires many additional steps. For example Icy allows for the creation of a pipeline in a graphical setting, but does not feature a simplified view or export of key parameters 4. This type of work resembles graphical programming, where user needs to think in the context of atomic operations and not a semantic meaning. For now such editor is unavailable for ImageJ and ImagePy, both offer only scripting, which demands basic programming knowledge.
CellProfiler provides a tool for pipeline creation and execution, however it does have an exploration mode. Although it gives a possibility to preview intermediate results, the implemented viewer was not designed for 3D data, therefore it is restricted to only a few layers automatically selected by the tool. The user is expected to define a multi-step computational pipeline himself, this often proves too difficult a task to overcome without programmatic experience and results in an interface less ergonomic in this respect to one offered by PartSeg.
We have taken an alternative approach, where the GUI is simple, compact and provides easy exploration of various algorithms and their wide range of parameters with immediate visualisation of results. The GUI is organised around the workflow, which we have defined with our users (see subsection 2). Even though our central focus is on analysis of cell nuclei, Partseg is general enough to cover various other use cases. The modular structure of PartSeg allows saving of ROI segmentation parameter settings at many levels of complexity: from simple profiles, through pipelines, to projects containing entire segmentation and imaging data. This enables easy collaboration and facilitates repeatable and verifiable research as the entire segmentation can be published as supplementary material. On the other hand power users can integrate PartSeg in their analysis pipelines with the use of Python API that we provide.
From the two other tools dedicated specifically to analysis of cell nucleus, NEMO [19] and TANGO [20] only the latter still can be obtained. TANGO, reviewed in [21], requires installation of external dependencies to work with a full efficiency. Such installation might be challenging for a typical user and cannot be done without administrative privileges. Moreover TANGO's GUI seems to be highly unintuitive. It seems that all aforementioned tools are better suited for bioinformaticians, than wet lab scientist. Yet, it is a wet lab scientist, who the best understands whole experimental setup and visual outcome and can asses if ROI is segmented properly. The philosophy behind PartSeg is to provide GUI equipped in high level operations with semantic descriptions, as well as Python library with API. Implementation details are hidden from the user, but at the same time they are empowered with multichannel 3D viewer, capable of showing ROI in the context of input data and recording the analysis parameters for reuse on other datasets. Users with more advanced programming knowledge can use the Python API and easily combine PartSeg with other data analysis libraries in their processing pipelines.

Examples of PartSeg Application on Real Data
This section contains four examples of the workflow aforementioned in subsection 2 applied to 3D confocal images of rat hippocampal neurons cultured in vitro (Fig. 1 A-C) and mouse neuroblastoma cell line (Fig. 1 D). Graphics and values shown on Fig. 1 were obtained using PartSeg.
First, we analysed the surface, relative volume and distribution of territories of chromosomes 1 and 16 in nuclei of rat hippocampal neurons (Fig. 1 A). The chromosomes were stained using FISH with chromosome 1 and 16 paint probes. The measured volume of both chromosomes territories (CT) roughly correlates with their size in Mbp-267.9 Mbp for chromosome 1 and 90.2 Mbp for chromosome 16, which is 10 and 3,5 % of the whole genome accordingly. Some discrepancies are expected due to varying distributions of heterochromatin and euchromatin in both chromosomes, as well as conditions of hybridisation, which require DNA heat denaturation. PartSeg allows us to analyse radial distribution of segmented structures within the nucleus ( Fig. 1. A, c), together with their proximity to the nuclear border ( Fig. 1. A, d and B, b). We checked localisation of both CTs in 3D nuclear space. Both CTs were in close proximity to the nuclear periphery ( Fig. 1 A, c and d) and on average most of the CT volume was located within 1500 nm from the nucleus boundary (50% of the volume of CT1 and 70% of CT16). It was shown that chromatin of mouse and human cells show presence of lamina associated domains (LADs) distributed along all chromosomes, which cover around one-third of the whole genome, therefore all chromosomes are in contact with nuclear lamina located at the nuclear border [22]. However the spatial organisation of CTs is flexible, permitting many local and long-range contacts of genes and regulatory elements, which influence their function [23].
Data suggest that radial positioning of genes often correlates with a transcriptional state, with actively transcribed genes located in the interior and silent genes at the periphery of the nucleus. Also, the gene position within CT reflects its state of expression, where active genes tend to localise the CT boundary [24]. Therefore, we have analysed the distribution of the Npas4 gene in rat hippocampal neurons subjected to sequential FISH with chromosome 1 paint probe and Npas4 probe (Fig. 1 B). Npas4 is a transcription factor involved in structural and functional plasticity, actively transcribed in neurons [25]. In mature rat hippocampal neurons Npas4 gene was located close to the border of chromosome 1 (Fig1. B, c) at the same time remaining relatively far from the nuclear periphery (Fig. 1 B, b), which is in agreement with the aforementioned observations for active genes.
Next, we looked at nuclei and chromatin of mature rat hippocampal neurons (Fig1. C). For 750 nuclei we have determined an average volume of 556 µm 3 and a diameter of 12,5 µm (Fig1. C, b). DNA stained with Hoechst dye, occupied on average 70% of the nucleus volume (Fig1. C, c). We have also calculated the average number and volume of chromocenters, which contain highly condensed and constitutively silenced, pericentromeric chromatin (Fig1.C, d). We found an average number of 20 chromocenters per nucleus, which occupied around 2% of the nuclear volume, with 50% of them localised within 800 nm of the nuclear border. Quantification of e.g. chromatin volume is interesting as its condensation is a dynamic process dependent on many physiological and environmental factors [26]. Chromatin compaction changes accompany cell differentiation, cell division, cell death, senescence, ischaemia and oxidative stress as well as constitutive expression silencing [27].
As the last example we have calculated the number, diameter and volume of PML bodies in mouse neuroblastoma cell lines (Fig. 1 D). PML (promyelocytic leukaemia) bodies are matrix associated, spherical, nuclear bodies, with a diameter of 0.1-1.0 µm, which can be found in most cell lines and many tissues. Nuclei of asynchronous neuroblastoma cell lines had on average 40 PML bodies per nucleus, with a diameter of 0,8 µm and volume of 0,17 µm 3 . It was shown that size and number of PML bodies heavily depends on cell cycle phase, cell type and stage of differentiation. Changes of the number and volume of these nuclear domains accompany induction of stress, senescence and tumorgenesis [28]. Figure 1 presents some examples of nuclei and nuclear domains, like chromosome territories, genes, PML bodies, chromatin and chromocenters, which can be quantitatively analyzed in PartSeg. However, PartSeg can easily be adapted to quantify other cellular structures like mitochondria or Golgi apparatus.

Conclusions
In this paper, we have presented PartSeg which is a tool for precise and verifiable segmentation and reconstruction of 3D microscopy images. PartSeg is optimised for cell nucleus analysis and offers multi-scale segmentation algorithms bestsuited for this task. PartSeg can also be used for bulk processing of multiple images and its components can be reused programmatically in other systems or computational experiments.
Furthermore, simple storing of the whole segmentation process with settings and data, empowers cooperation and independent verification of results in the spirit of data provenance [29], and open and verifiable science.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
GB and DP proposed the project and the software concept; GB performed the code development; DP, AM, JS provided the guidance and the project management; GB, AM, JS and DP were involved in software testing; PT, AG, YY and AM performed the biological experiments; SB and ND developed the multiscale opening (MSO) segmentation algorithm; GB implemented the MSO in PartSeg; JS, AM, PT, DP, GB were responsible for the writing of the manuscript. All authors read and approved the final manuscript.