This section is divided into two parts: In the first part, we describe the method for simulating tomograms of crowded cellular environments. In the second part, we describe how we assessed the DoG particle picking method on simulated tomograms at various crowding levels and signal to noise ranges.

### Simulating tomograms of cell-like environments

#### Generating cell-like environments

### Selection of benchmark set

To represent the crowded cellular environment we collected a total of 21 abundant macromolecular complexes of varying sizes and shapes from the Protein Data Bank (PDB) [18] (Methods, Fig. 1a). The electron optical density of a complex is proportional to its electrostatic potential, which is determined by its atomic structure [15, 19]. For each complex, density maps are generated at 4 nm resolution and with voxel size of 1 nm using the PDB2VOL program of the Situs2.0 package [20].

### Generating crowded mixtures with random positions and orientations of all complexes

We then generated a composite density map of a crowded mixture of randomly placed and oriented complexes at high crowding levels, which mimicked the environment found in crowded cellular cytoplasm (Fig. 1). This density map then served as the input sample for simulating the cryo-electron tomography imaging process at different SNR levels.

To generate a crowded random mixture of complexes, we first represented each complex by its bounding sphere, which enclosed each complex. Then each complex was given a random copy number to define the composition of complexes in the mixture. After randomly positioning the corresponding spheres in a volume we used molecular dynamics simulations and simulated annealing for packing the crowded sphere mixture in a volume while preventing sphere-sphere overlaps. Then density maps of complexes were positioned in the corresponding spheres at a random orientation. The resulting composite density map of the crowded mixture was then used as the input sample to simulate the tomographic imaging of micrographs at different tilt angles followed by the reconstruction of the 3D density map to generate realistically simulated cryo-electron tomograms. These simulated tomograms contained imaging distortions from noise, missing wedge effects and effects from the Contrast Transfer Function (CTF). The computational details for each step are described in following subsections.

#### Minimum spherical bounding

We defined a minimum bounding sphere as the sphere with the smallest radius that entirely encloses the density map of the macromolecular complex at a given contour level. The contour level is a threshold to define a volume region of the complex [21]. We defined the contour level threshold as a proportion of the maximum density value in a density map. By inspection of the initial density maps for the 21 complexes, we empirically set the contour level ratio as *L* = 0.2, which resulted in a contour volume that best matches the van der Waals volume of the complexes. We then defined a subset of voxels (*R*) with density values larger than the contour level defined as:

$$ R\left(\boldsymbol{x}\right)=\left\{\forall \boldsymbol{x}\ \in\ {\mathrm{\mathbb{N}}}^{\mathbf{3}}\Big|D\left(\boldsymbol{x}\right)\ \ge\ Lm\right\} $$

Where *D*(*x*) with *x* = (*x*
_{
1
}, *x*
_{
2
}, *x*
_{
3
}) ∈ ℕ^{3} is the density map, *m* = max(*D*(*x*)|*x* ∈ ℕ^{3}) is the maximum density value of *D*, *Lm* is the contour level. *L* is the contour level ratio (i.e., the fraction of the maximum density value that defines the contour level).

Next, we calculated the convex hull for points located at all voxel locations with *D*(*x*) > **0** in *R* using the QuickHull algorithm [22]. The voxels in the interior of the convex hull regions were then used to calculate the minimum bounding sphere of the complex. The Emo Welzl’s algorithm was adapted to calculate the minimum bounding sphere for the set of voxels defined by the convex hull of the complex [23]. The minimum bounding sphere was used to simulate crowded mixtures of complexes. A minimum spherical bounding model has several advantages in comparison to other geometric bounding models such as cubic or cylinder models [24, 25]. The spherical bounding model is defined by only two descriptive parameters, the center and radius of the sphere, which simplifies the scoring function in the subsequent molecular dynamic simulations to minimize sphere-sphere overlaps. Also, in the subsequent replacement step, complexes can be placed at any random orientation within the sphere volume.

#### Generating macromolecular complex mixtures

The total volume occupancy of cell cytoplasms varies in different cells, and ranges between 5 % and 40 % in mammalian and between 34 % and 44 % in bacterial cells [26–29]. We defined the crowding level *C* as the ratio of the total volume occupied by all instances of macromolecular complexes and the total 3D volume of the tomogram.

$$ C=\frac{{\displaystyle {\sum}_{k=1}^n}{V}_k{N}_k}{V_T} $$

$$ {N}^{tot}={\displaystyle {\sum}_{k=1}^n{N}_k} $$

Where N_{
k
} is the copy number of macromolecular complex of type *k*, *N*
^{tot} is the total copy number of all complexes; *n* = 21 is total number of different types of macromolecular complexes, *V*
_{
k
} is the volume of the k-th macromolecular complex type, which is estimated from region *R* defined in section 2.1.2 and *V*
_{
T
} is the total volume of the tomogram defined by the length of its three principal.

In our study, each type of macromolecular complex is randomly assigned a copy number N_{k}, following a multinomial distribution with parameter *N*
^{tot} and *f* = (*f*
_{1}, … *f*
_{
n
}), where *f*
_{
i
} is a randomly selected frequency. We chose a random set of copy numbers because structures of many complexes and also their copy numbers in cells are still not known. It is challenging to determine the exact protein compositions in cells, which can differ even for the same cells under different growth conditions. To assess particle picking we therefore decided to have an entirely random mixture with variable sizes and shapes and copy numbers. Each instance of a macromolecular complex was also assigned a random orientation. To generate cellular environments at a defined crowding level we randomly positioned the bounding spheres of all complexes into a rectangular box volume. We then used molecular dynamics simulations and simulated annealing to optimize the packing of the crowded sphere mixture and remove any sphere-sphere overlaps. In our simulations the scoring function *S*
^{tot} consisted of two terms: First, a box volume restraint *S*
^{V}_{
i
}
, which enforced each sphere to lie within the volume of the simulation box, and second an excluded volume restraint *S*
^{EX}_{
ij
}
, which prevented any overlap between spheres.

$$ {S}^{tot}={\displaystyle \sum_{\mathrm{i}}^{{\mathrm{N}}^{tot}}}{S}_i^V+{\displaystyle \sum_{\mathrm{i}}^{{\mathrm{N}}^{tot}-1}}{\displaystyle \sum_{\mathrm{i}<j}}{S}_{ij}^{EX} $$

with

$$ {S}_i^V=\left\{\begin{array}{c}\hfill \frac{1}{2}{k}_d\mathrm{d}{\left(\mathrm{i}\right)}^2,\ \mathrm{if}\kern0.5em \mathrm{sphere}\ \mathrm{is}\ \mathrm{outside}\ \mathrm{the}\ \mathrm{container}\hfill \\ {}\hfill\ 0,\ \mathrm{if}\ \mathrm{sphere}\ \mathrm{is}\ \mathrm{inside}\ \mathrm{the}\ \mathrm{container}\kern0.5em \hfill \end{array}\right. $$

$$ {S}_{ij}^{EX}=\left\{\begin{array}{c}\hfill \frac{1}{2}{k}_p{\left[\mathrm{d}\left(\mathrm{i},\mathrm{j}\right)-\left({\mathrm{r}}_{\mathrm{i}}+{\mathrm{r}}_{\mathrm{j}}\right)\right]}^2,\ \mathrm{if}\ \mathrm{d}\left(\mathrm{i},\mathrm{j}\right)<\left({\mathrm{r}}_{\mathrm{i}}+{\mathrm{r}}_{\mathrm{j}}\right)\hfill \\ {}\hfill 0,\ \mathrm{if}\ \mathrm{d}\left(\mathrm{i},\mathrm{j}\right)>{\mathrm{r}}_{\mathrm{i}}+{\mathrm{r}}_{\mathrm{j}}\ \hfill \end{array}\right. $$

where *N*
^{tot} is the total number of spheres; *k*
_{
d
} is the spring constant and *d*(*i*) is the smallest distance between the center of sphere i and the container border; *d*(*i*, *j*) is the distance between the centers of i-th and j-th spheres, *r*
_{
i
}, *r*
_{
j
} are radius of the spheres. We used the IMP software package [30] to implement the scoring function and optimized the scoring function to a score of ~0. The initial velocities of all spheres were assigned based on a Maxwell-Boltzmann distribution at a given temperature. After starting from relatively high temperatures, an annealing process gradually reduced the temperature to relax the model.

Where *T*(*t*) indicates the system temperature at iteration step (time) t and *T*
_{0} = 3000 is the initial temperature, *c* is a constant for gradually reducing the system temperature. We set *c* = 100. Finally a conjugate gradient optimization reduced the score to ~0. After generating crowded mixtures of spheres, we placed the randomly oriented density map of each complex into their corresponding bounding sphere. This procedure produced a composite density map of a crowded mixture of complexes. We generated several different density maps at various crowding levels (see below).

#### Generating simulated cryo-electron tomograms

For a reliable particle-picking assessment, cryo-electron tomograms must be generated by simulating the actual tomographic image reconstruction process, which allows for the inclusion of noise, tomographic distortions due to missing wedge effects, and electron optical factors such as Contrast Transfer Function (CTF) and Modulation Transfer Function (MTF) [8]. CTF and MTF describe distortions from interactions between electrons and the specimen and the distortions due to the image detector [8, 13, 31, 32]. The so-called missing wedge effect leads to image distortions due the limited the tilt angle range. A typical tilt angle range is ±60 or ±70 degrees, with step increments of 1 or 2 degrees [5, 33]. We follow a previously applied protocol and simulated 2D projection electron micrographs of our crowded macromolecular sample using a tilt angle range from -60 to 60 degrees with step increments of 2 degrees, which is a typical procedure for experimental tomograms [8, 13, 11]. For the simulated tomogram, we set typical acquisition parameters used in actual experimental measurements of whole cell tomograms: voxel size = 1 nm, the spherical aberration= 2 × 10^{− 3} m, the defocus value= − 4 × 10^{− 6} m, the voltage = 300 kV, the MTF corresponded to a realistic electron detector [34, 35], defined as sinc(πω/2) where ω is the fraction of the Nyquist frequency. Finally 3D tomograms were reconstructed via a back projection algorithm [11, 31] from 2D micrographs at various tilt angles.

Signal to noise ratio (SNR) is an important factor to control the level of distortions of a simulated tomogram [5]. The SNR was defined as the quotient of the variance of signal and the variance of noise [12].

$$ SNR=\frac{\sigma_{signal}^2}{\sigma_{noise}^2} $$

In the process of generating simulated tomograms, noise was added at two stages: one fraction was added to the signal before convolution with CTF and another fraction added after it was convoluted with CTF [12]. We simulated cryo electron tomograms at various SNR levels (i.e. SNR = [50,20,10,1]).

### Assessment of DoG particle picking

Our simulated tomograms of crowded mixtures of macromolecular complexes served as the ground truth for the assessment of the template-free Difference-of-Gaussian (DoG) particle picking method.

#### Background: Difference of Gaussian (DoG) filtering

A number of particle-picking methods have been proposed for cryo-electron microscopy images and adapted to cryo electron tomography [2, 8, 14, 17, 32, 36, 37]. Reference-based methods use information from a template in the search process to detect potential particle positions in the tomogram. Potential particle positions are detected as peaks in a cross-correlation function between the target tomogram and a template [2, 14, 32]. However, when the structure of a complex is unknown, reference-based methods cannot be applied. Unbiased visual proteomics approaches must rely on reference-free particle picking methods that are also applicable in the crowded environment of whole cell tomograms.

The reference-free DoG particle picking method is based on the Difference of Gaussian (DoG) image transform. A DoG map is created via subtraction of two versions of Gaussian filtered images and peaks detected in the DoG map are potential particles [17]. Previous studies tested the reliability of the DoG method for 2D cryo-EM images [17, 37]. However, no study exists that assessed the performance of the DoG method and performed parameter optimizations for reference-free particle picking in highly crowded tomograms of whole cells.

The Gaussian blurred map was obtained through a convolution of the Gaussian function *G*(*σ*) with the original map I and defined as:

$$ {I}_G\left(\sigma \right)=I\ast G\left(\sigma \right) $$

$$ G\left(\sigma \right) = \frac{1}{\sigma \sqrt{2\pi }}\ {e}^{-\frac{r^2}{2{\sigma}^2}} $$

Where *σ* is referred to as the scaling factor of the Gaussian function and r is the position vector in the image. A DoG map was built from subtracting two versions of the same map blurred through two Gaussian kernels with different scaling factors *σ*. The DoG map, for two different values of *σ*, was then defined as:

$$ {I}_{DoG}\left({\sigma}_1,{\sigma}_2\right)=I\ast G\left({\sigma}_1\right)-I\ast G\left({\sigma}_2\right) $$

In our study, we followed the DoG Picker design and defined the ratios between the two scaling factors as the k-factor.

$$ {\sigma}_2=k{\sigma}_1 $$

We set *k* = 1.1, which had been shown to be a reasonable value for applications in single particle cryo electron tomography [17]. We refer to *σ*
_{1} as the DoG scaling factor and refer to it as *σ* from here on. The DoG scaling factor σ influences the performance of picking complexes of different sizes and the particle picking performance for different complexes will be evaluated for different scaling factors [17].

In our study, we first assessed the DoG particle picking performance with respect to different scaling factors, to identify an optimal setting. Then using the optimal scaling factor, we assessed the effects of noise and macromolecular crowding for the performance of the particle picking method.

#### Selection of local density peaks

To detect particle locations in a tomogram, we identified local density peaks in the DoG filtered tomograms (referred to as the set *P)* [38]. However, not all local density maxima correspond to complexes. Local density maxima can also result from noise. These maxima typically have lower density values than those of real complexes. We therefore used a lower density threshold *T* to define the set of local density maxima that likely correspond to particles *P*
_{
t
}. The density threshold *T* and the set *P*
_{
t
} are defined as:

$$ T=m+t\cdot \frac{M-m}{K} $$

$$ {P}_t=\left\{v\ \in\ P\Big|D(v)\ \ge\ T\right\} $$

Where *M* is the maximum density value of all local maxima in *P*, *m* is the smallest density value for all local maxima, *K* = 20 is the number of bins, *t* = 0, 1, 2, …, *K* is the threshold level, *P*
_{
t
} is the set of local density peaks at threshold level t, which had density values larger than the threshold *T*. In this paper we assessed the particle picking performance with respect to the threshold level *t* and determined the optimal value of *t* that maximizes the detection of complexes in the crowded environment.

#### Evaluating the particle picking performance

### Assessment of true positives

To evaluate the particle picking performance, we need to determine correctly and falsely detected particles. We assume two conditions to define a true positive particle detection: First, the detected density peak should be close to the center of the true particle, i.e. the peak should be within a threshold radius from the true particle center. Second, we only count a true positive if only one local maximum is detected within the bounding sphere of the true particle. Multiple maxima within the bounding sphere would be counted as a false particle detection. Every local density peak can be assigned to at most one nearest particle.

To determine if a local density maximum is a true positive detection, we first defined the relative shift ratio *S* as the quotient of the distance between a detected local density peak to the center of its nearest particle and the radius of the minimum bounding sphere of the corresponding complex.

$$ S=\frac{\left|{x}_p-{C}_g\right|}{R_g} $$

Where *x*
_{
p
} is the location of a local density peak, *C*
_{
g
} and *R*
_{
g
} are the center and radius of the minimum bounding sphere of its nearest complex. We set *S* ≤ 0.5 as a threshold to select local density peaks that are relatively near to the center of the ground truth complex. We can then determine how many particles are reliably detected with the DoG particle picking method.

### Statistical Analysis of particle picking performance

Precision and recall is widely used as an assessment of information retrieval and is used to evaluate particle picking performances in cryo electron microscopy [8, 37]. The precision is defined as the fraction of the correctly detected versus all the detected peaks whereas the recall is defined as the fraction of the correctly detected peaks to the total number of particles in the ground truth dataset:

$$ precision = \frac{\#TP}{\#TP+\#FP} $$

$$ recall = \frac{\#TP}{\#TP+\#FN} $$

With #TP as the number of true positives, #FP is the number of false positives, and #FN is the number of false negatives in the particle detection.

In addition to precision and recall, we also use the F-score to evaluate the overall particle picking performance [37]. The F-score is defined as the harmonic mean of precision and recall.

$$ F- score = \frac{2\cdot precision\cdot recall}{precision+ recall} $$

By calculating the harmonic mean of precision and recall, we can compare the particle picking performance for different parameter settings and determine the optimal setting for a given tomogram.