This section outlines the three-dimensional PIV pipeline implemented in quickPIV. The workflow of a PIV analysis in quickPIV is illustrated in Fig. 1. This figure shows input volumes containing Gaussian particles to ease the visualization of the underlying translation. To accommodate all possible labeling schemes of biological samples, we generally refer to structures or intensity patterns in the analyzed data.

The input to a 3D PIV analysis is a pair of 3D volumes taken at consecutive time points, \(V_{t}[x,y,z]\) and \(V_{t+1}[x,y,z]\), where (*x*, *y*, *z*) corresponds to the unique 3D coordinates of each voxel. Both input volumes are assumed to have the same dimensions. First, \(V_{t}\) is subdivided into a 3D grid of cubic sub-regions known as interrogation volumes, *IV*[*i*, *j*, *k*], each specified by its position in the grid, (*i*, *j*, *k*). The dimensions of the grid subdivision are determined by the interrogation volume size and the overlap between adjacent interrogation volumes, see Fig. 1a. For each interrogation volume, a corresponding search volume, *SV*[*i*, *j*, *k*], can be defined in \(V_{t+1}\).

Structures moving inside *IV*[*i*, *j*, *k*] by a translation \({\mathbf {s}} = (s_x, s_y, s_z)\) are expected to be found \(\Vert {\mathbf {s}}\Vert\) voxels away in the direction of the translation in *SV*[*i*, *j*, *k*]. The underlying translation, \({\mathbf {s}}\), of the structures contained in *IV*[*i*, *j*, *k*] and *SV*[*i*, *j*, *k*] is recovered through a cross-correlation analysis [27]. The cross-correlation between a pair of interrogation and search volumes results in a 3D cross-correlation matrix. In the absence of other transformations, the vector from the center to the maximum peak of the cross-correlation matrix reflects the underlying translation of the structures contained in *IV*[*i*, *j*, *k*] and *SV*[*i*, *j*, *k*]. The structures visible in *IV*[*i*, *j*, *k*] may move outside the borders of the corresponding *SV*[*i*, *j*, *k*]. This is known as out-of-frame loss, and it limits the ability of cross-correlation to match the spatial intensity distributions between the pair of interrogation and search volumes. This can be compensated by enlarging the search volumes by a given margin along all dimensions, designated as search margin in quickPIV. The search margin should not be much larger than the expected translation strength of the structures, as enlarging the search volumes comes at the expense of performance. Figure 1b depicts the cross-correlation of the central interrogation and search volumes in Fig. 1a, including a search margin of 5 voxels around the search volume.

By computing a displacement vector for each pair of interrogation and search volumes, PIV analyses generate a vector field that describes the velocity distribution of the structures contained in the input volumes. The components of the PIV-computed vector field are returned separately in three 3D matrices: *U*, *V* and *W*. It should be noted that the resolution of the final vector field is decided by the size of the interrogation volumes and their overlap, which determine the grid subdivision of \(V_{t}\) and \(V_{t+1}\). Multi-pass is implemented to overcome this trade-off between resolution and the interrogation size of the PIV analysis.

### Cross-correlation

The cross-correlation of two one-dimensional real-valued functions is defined as:

$$\begin{aligned}{}[ f \star g ](s) = \int _{-\infty }^{\infty } f(x)g(x + s) \mathrm {d}x \ , \end{aligned}$$

(1)

where *s* has the effect of shifting *g*(*x*) along the x-axis. Cross-correlation involves computing the dot product of *f*(*x*) and \(g(x+s)\) for all possible values of *s*. Since the dot product entails a basic measure of similarity, the value of *s* that achieves the highest dot product represents the translation that best aligns the two functions.

The form of cross-correlation in Eq. (1) is known as spatial cross-correlation. Discrete implementations of spatial cross-correlation have a 1D complexity of \(O(N^2)\). Taking advantage of the convolution theorem, cross-correlation can be computed in the frequency domain through Fourier transforms of *f*(*x*) and *g*(*x*):

$$\begin{aligned} f \star g = {\mathcal {F}}^{-1}\{ \overline{{\mathcal {F}}\{f\}} \cdot {\mathcal {F}}\{g\}\} \ , \end{aligned}$$

(2)

where \({\mathcal {F}}\) and \({\mathcal {F}}^{-1}\) denote the Fourier and inverse Fourier transforms, respectively. Each Fourier and inverse Fourier transform in Eq. (2) can be computed efficiently with the Fast Fourier Transform (FFT) algorithm [37], which has a 1D complexity of \(O( N \log {N} )\). Since Eq. (2) does not involve any operations with higher complexities than FFT’s, the overall complexity of 1D cross-correlation in the frequency domain is \(O( N \log {N} )\). For this reason, cross-correlation in quickPIV is computed in the frequency domain. We rely on a Julia wrapper around the mature and optimized Fastest Fourier Transform of the West (FFTW) C library [38] to compute all Fourier and inverse Fourier transforms. FFTW implementations of FFT generalize to multi-dimensional data, enabling the efficient three-dimensional computation of cross-correlation.

To tackle the bias of the dot product towards high intensities, we implemented zero-normalized cross-correlation (ZNCC). Considering *IV* and *SV* as a pair of 3D interrogation and search volumes, ZNCC is calculated at each translation of *IV* by:

$$\begin{aligned} ZNCC[\mathbf{s }] = \sum _{x,y,z} \frac{ (IV[\mathbf{x }]-\mu _{IV})(SV[\mathbf{x } + \mathbf{s } ]-\mu _{SV}) }{ \sqrt{ \sum _{x,y,z}( IV[\mathbf{x }] - \mu _{IV} )^2 \sum _{x,y,z}( SV[\mathbf{x } + \mathbf{s }] - \mu _{SV} )^2 } } \, \end{aligned}$$

(3)

where **x** is a 3D index (*x*, *y*, *z*) running over all voxels of *IV*, **s** is the displacement vector \((s_x,s_y,s_z)\), and \(\mu _{IV}\) and \(\mu _{SV}\) are the average intensity values of the interrogation and search volumes, respectively. Zero-normalized cross-correlation is implemented efficiently in quickPIV following the work of Lewis, who noted that the numerator in Eq. (3) can be computed efficiently in the frequency domain, while each sum in the denominator can be calculated with eight operations from an integral array of the search volume [39].

To further improve the pattern-matching robustness of cross-correlation, quickPIV also offers normalized squared error cross-correlation (NSQECC). At each translation of *IV*, NSQECC is computed as [40]:

$$\begin{aligned} NSQECC[\mathbf{s }] = \sum _{x,y,z} \frac{ (IV[\mathbf{x }]-SV[\mathbf{x } + \mathbf{s } ])^2 }{ \sqrt{\sum _{x,y,z}(IV[\mathbf{x }])^2 \sum _{x,y,z}(SV[\mathbf{x } + \mathbf{s }])^2 } } \ , \end{aligned}$$

(4)

where **x** is a 3D index (*x*, *y*, *z*) running over all voxels of *IV*, and **s** is the displacement vector \((s_x,s_y,s_z)\). Following the example of [39], Eq. (4) is implemented efficiently in quickPIV by expressing the numerator and denominator in terms of three components: \(\sum (IV[\mathbf{x }])^2\), which is constant, \(\sum (SV[\mathbf{x }+\mathbf{s }])^2\), which is computed efficiently for each translation from an integral array, and \(-2\sum (IV[\mathbf{x }]SV[\mathbf{x }+\mathbf{s }])\), which can be computed as an unnormalized cross-correlation in the frequency domain. For convenience, quickPIV implements the inverse of Eq. (4), \(1 / ( 1 + NSQECC[\mathbf{s }])\), to obtain a maximum peak at the translation that minimizes the differences between the interrogation and search volumes.

### Peak sub-voxel approximation

In order to detect non-integer translations, two sub-voxel interpolation methods are included in quickPIV: the centroid-based and the 3-point Gaussian sub-voxel approximations [41]. In both methods, sub-voxel refinements are computed by considering the direct neighboring values around the maximum peak of the cross-correlation matrix. The centroid-based sub-voxel refinements, \(\Delta\), are computed by

$$\begin{aligned} \Delta [\mathbf{d }] = \frac{C[\mathbf{x }+\mathbf{d }] - C[\mathbf{x }-\mathbf{d }]}{C[\mathbf{x }+\mathbf{d }] + C[\mathbf{x }] + C[\mathbf{x }-\mathbf{d }]} \ , \end{aligned}$$

(5)

where *C* refers to the cross-correlation matrix, \(\mathbf{x }\) are the voxel coordinates of the maximum peak in the cross-correlation matrix, and \(\mathbf{d }\) is the standard basis vector for each dimension, e.g. (1, 0, 0) for the first dimension. Following the same notation, the 3-point Gaussian sub-voxel refinement of the integer displacement is given by

$$\begin{aligned} \Delta [\mathbf{d }] = \frac{ \ln { (C[\mathbf{x }+\mathbf{d }]) } - \ln { (C[\mathbf{x }-\mathbf{d }]) } }{ 2\;\ln {(C[\mathbf{x }+\mathbf{d }])} - 4\;\ln {(C[\mathbf{x }])} + 2\;\ln {(C[\mathbf{x }-\mathbf{d }])} } \ . \end{aligned}$$

(6)

To acquire sub-voxel precision, the interpolated \(\Delta\) is added to the integer displacement vector from the maximum peak to the center of the cross-correlation matrix. QuickPIV defaults to the 3-point Gaussian sub-voxel approximation, which performs particularly well when the input volumes contain Gaussian particles, as the convolution of Gaussians produces another Gaussian distribution [42].

### Multi-pass

We implemented a multi-pass procedure to increase the accuracy of the PIV analysis and to extend its dynamic range, i.e., the range of detectable displacements. While a search margin can be added to increase the dynamic range of a standard PIV analysis, it does not eliminate the dependence on small interrogation volumes to achieve high resolutions, which limits the specificity and enhances the noise of the intensity patterns contained in the interrogation volumes [43]. Alternatively, high resolutions with good dynamic ranges can be achieved by combining large interrogation volumes with high overlaps. However, this approach is computationally expensive and increases the final resolution by adding redundancy between consecutive cross-correlation computations [44].

The multi-pass algorithm starts the PIV analysis with up-scaled interrogation and search volumes, followed by iterative rounds of PIV analyses with gradually smaller interrogation size and search volumes. Additionally, the displacements calculated during previous rounds are used to offset the sampling of the search volumes at future rounds [45]. The multi-pass factor *f* defines the number of total rounds that will be conducted. Therefore, multi-pass is enabled by setting *f* larger than 1. At each multi-pass round, the interrogation size, search margin and overlap parameters are scaled with respect to their user-defined values. The value of these parameters in each round *r* is computed as follows:

$$\begin{aligned} \kappa _r = ( 1 + f - r )\ *\ \kappa _0 \ , \end{aligned}$$

(7)

where \(\kappa _0\) designates the user-defined value for interrogation size, search margin or overlap, \(\kappa _r\) is the up-scaled value of these parameters at round *r*, and *f* is the multi-pass factor. The final round is performed with a factor of 1, i.e., the initial interrogation sizes.

### Post-processing

Some of the post-processing features explained below include local information around the vector being processed. In such cases, a square (2D) or cubic (3D) region is sampled around each post-processed vector. For instance, \(r_x\) and \(r_y\) define a square area around an arbitrary vector in a 2D vector field, \(v_{i,j}\), given by \(L = \{ v_{i+r_x,j+r_y} \ | \ -r \le r_x \le r \ and \ -r \le r_y \le r \}\).

#### Filtering

A PIV-computed vector is considered unreliable if it was computed from a cross-correlation matrix containing multiple peaks with similar heights as the maximum peak. This reveals uncertainty about the underlying displacement, which might be caused by unspecific structures, background noise and/or loss of structure pairs [46, 47]. QuickPIV adopts the primary peak ratio, *PPR*, to measure the specificity of each computed vector,

$$\begin{aligned} \mathrm {PPR} = \frac{C_{\max1}}{C_{\max2}} \ , \end{aligned}$$

(8)

where \(C_{\rm max}{1}\) is the height of the primary peak in the cross-correlation matrix and \(C_{\rm max}{2}\) is the height of the secondary peak. Vectors with high *PPR* values are considered to have high signal-to-noise ratios [48]. Therefore, quickPIV offers filtering of unreliable vectors by discarding those vectors with a *PPR* value lower than a given threshold, \(th_{\rm PPR}\) [48].

Additionally, quickPIV includes both global and local filtering in terms of vector magnitudes. Currently, quickPIV offers low pass and high pass filters of vector magnitudes, which can be concatenated to perform band-pass filtering. Global magnitude filtering can also be performed on those vectors whose magnitude is more than a certain number of standard deviations away from the mean magnitude of the vector field. Local magnitude filtering is implemented by discarding vectors whose magnitude is at least *n* standard deviations away from the mean magnitude, computed in a radius *r* around each vector.

All filtering functions in quickPIV accept an optional argument that is used to determine the replacement scheme of the filtered vectors. Currently, quickPIV offers three replacement functions: zero-replacement, mean replacement and median replacement. The former sets all components of the filtered vectors to zero. Both the mean and median replacement schemes are parametrized by the radius of the neighboring region used to compute the mean or median vector.

#### Spatial and temporal averaging

Spatial and spatio-temporal averaging of the computed vector fields are included in quickPIV. Spatial averaging depends on one parameter: the radius, \(r_s\), of the considered neighboring region around each vector. Different radii for each dimension can be provided by passing an array of values, \([ r_x, r_y, r_z ]\). Spatio-temporal averaging considers two parameters: the averaging radius in space and the number, \(n_t\), of adjacent vectors along the time axis considered in the temporal averaging, e.g. \(\{ v_{i,j,k,t+r} | -n_t \le r \le n_t \}\).

#### Similarity-selective spatial averaging

Spatial averaging tends to dissolve vectors adjacent to the background and creates artifactual vectors in regions containing dissimilar vectors. A similarity-selective spatial averaging has been developed to overcome these limitations, and to enhance the visualization of collective migration. Two vectors are considered to be similar if they point in the same direction, which is established if their normalized dot product is greater than a user-defined threshold. Given any vector in the PIV-computed vector field, \(\mathbf {v}[i,j,k]\), an average vector is built by considering only those neighboring vectors at a radius *r* that are similar to \(\mathbf {v}[i,j,k]\). The averaged vector is then normalized to unit length, and its magnitude is further re-scaled by the ratio between the number of similar neighboring vectors and the total number of neighboring vectors. Therefore, the effect of similarity-selective averaging is to average the direction of each vector among similar neighboring vectors, and to re-scale the magnitude of each vector by the local collectiveness.

#### Mappings

QuickPIV provides functions for extracting several relevant quantities from the PIV-computed vector fields. Velocity maps are generated by returning the magnitude of each vector from a given vector field. QuickPIV implements convergence/divergence mappings to detect the presence of sinks and sources in the PIV-computed vector fields. This is done by generating a cube of normalized vectors that either converge (sink) or diverge (source) from the center of the cube, and cross-correlating this cube with the normalized vector field. This mapping is parametrized by the size of the cube, which determines the scale of the convergence/divergence map. Collectiveness maps are built by computing the number of neighboring vectors at a radius *r* from each vector in the vector field \(v_{i,j}\) whose normalized dot product is greater than a threshold.

#### Pseudo-trajectories

Pseudo-trajectories can be generated with quickPIV to visualize the approximate paths of cells and tissues from the PIV-computed vector fields. When computing pseudo-trajectories, a user-defined number of particles is randomly distributed within the dimensions of the vector field. The position of each particle is rounded to integer coordinates in order to sample a displacement from the vector field, which shifts the particle from its current position. By repeating this process, a three-dimensional path is obtained for each simulated particle. It is possible to constrain the computation of pseudo-trajectories to a period of interest by specifying the start and end time points. Moreover, spatially interesting regions can be selected by specifying the spatial range over which to initialize the positions of the particles.

#### Conversion to physical units

Last but not least, to convert voxel displacements into physically meaningful velocities both the frame rate and the physical units of each voxel dimension need to be taken into account. These values can be provided during the creation of the PIV-parameter object and quickPIV will automatically re-scale the resulting vector field after the analysis.

### QuickPIV accuracy evaluation

The correct implementation of a PIV analysis depends on its ability to detect translations. Accordingly, the accuracy of quickPIV is assessed by generating pairs of artificial images and volumes containing synthetic particles related by a known translation. Synthetic particles are rendered according to [49]. The bias and random errors are computed to evaluate the agreement of quickPIV predictions to the known translations [49]:

$$\begin{aligned}&\epsilon _{\rm bias} = \frac{1}{n}\sum _{i=1}^{n} | d_{PIV,i} - d_{\rm true} | \end{aligned}$$

(9)

$$\begin{aligned} \epsilon_{\rm rand} = \sqrt{ \frac{1}{n} \sum _{i=1}^{n} {(d_{{\rm PIV},i}-\overline{d_{\rm PIV}})^2}} \end{aligned}$$

(10)

where \(d_{{\rm PIV},i}\) is the \(i^{\mathrm {th}}\) PIV-computed displacement, \(d_{\rm true}\) is the known translation, \(\overline{d_{\rm PIV}}\) is the average PIV-computed displacement and *n* is the number of repeats. The bias and random errors represent the accuracy and the precision of quickPIV’s approximation of the underlying translation, respectively. The effect of the following parameters on the accuracy of quickPIV are evaluated, both in 2D and 3D: interrogation size, particle density, particle diameter, 3-point Gaussian sub-pixel approximation and the use of a search margin to correct for out-of-frame loss.

### QuickPIV performance evaluation

The performance of our software is evaluated by comparing the execution times of quickPIV with those of the C++ and Python implementations hosted in openPIV. First, we analyzed the time required to compute cross-correlation in the frequency domain with the three packages. By comparing the execution times of quickPIV and the C++ implementation, we can determine whether calling the FFTW C-library from Julia adds any noticeable overhead compared to C++. Since the Python implementation uses the NumPy library to compute the Fourier and inverse Fourier transforms, this test also reveals any performance differences between FFTW and NumPy. On the other hand, we compare the execution time of complete 2D and 3D PIV analyses between the three PIV packages. The set of parameters used in these PIV analyses are listed in the description of Table 1.

For the sake of using a common benchmarking pipeline, language-specific packages for measuring the execution times are avoided. Each execution time measurement shown in Fig. 2e corresponds to the minimum execution time from 1000 repeated measurements. Taking the minimum execution time filters out random delays originating from background processes [50]. The left panel in Fig. 2e illustrates the interference of background processes in the distribution of 1000 execution measurements of FFT cross-correlation. All measurements presented below were performed on a machine with an Intel Core i5-8300H processor \(4\times 2.3\) GHz. All PIV analyses were executed on a single thread.

### QuickPIV on the embryogenesis of *Tribolium castaneum*

To test the accuracy of quickPIV on biological data, we analyzed three 3D time-lapse data sets of the embryonic development of *T. castaneum*: (1) two embryos from a hemizygous transgenic line that ubiquitously expresses nuclear-localized mEmerald and (2) one embryo from a double hemizygous transgenic line that expresses nuclear-localized mRuby2 ubiquitously and actin-binding Lifeact-mEmerald only in the serosa [22]. Using LSFM, the embryos were recorded at intervals of (1) 30 minutes or (2) 20 minutes along 4 directions in rotation steps of 90\(^{\circ }\) around the anterior-posterior axis in (1) one or (2) two fluorescence channels [21]. The four directions were fused according to Preibisch et al. [51] to generate evenly illuminated volumes with isotropic resolution. The fused volumes were cropped to \(1000\times 600 \times 600\) voxels (height,width,depth), the embryos were manually placed in the center of the volumes and their anterior–posterior axis was manually aligned with the vertical axis.

Three time points during gastrulation were analyzed with quickPIV in the two embryos of data set (i). Two time points of the double hemizygous transgenic line (ii) were analyzed in both channels, allowing to compare the vector fields obtained from the Lifeact-mEmerald actin signal with those from the nuclear-localized mRuby2 marker. The PIV analyses were performed on both data sets with NSQECC. The vector fields resulting from these analyses are shown in Figs. 3 and 4, post-processed with similarity-selective averaging with an averaging radius of 2 neighboring vectors and a similarity threshold of 0.5. The visualization of the embryo volumes and the computed vector fields has been done in Paraview 5.7.0.