Mutual information analysis in homolmapper. The process used to evaluate and report mutual information in homolmapper is shown from the MSA (1) to final analysis (6) using an alignment of 75 heme oxygenases and the crystal structure of rat heme oxygenase (1DVE, ) for illustrative purposes. A portion of the MSA is shown in (1), with residues 136 (blue) and 140 (red) highlighted. The matched sequence is ho1rat, and the total alignment is 684 positions long. Calculation of mutual information begins with calculation of the Shannon entropies H
for all single positions i, j, k in the alignment . Next, following the method of Gloor and co-workers , joint entropies H
for all positions are calculated from the distribution of paired outcomes (2). Diagonal elements in this joint-entropy matrix are set to zero. The raw mutual information values are then calculated (3) by subtracting the joint entropy at each pair of positions from the sum of the single position entropies (H
), with the diagonal elements being kept at zero. Next, the raw mutual information scores can be normalized (4) by dividing by the joint entropy , the sum of the position entropies (redundancy), or neither. The resulting scores are converted to Z-scores (distance from the mean in standard deviations) for analysis. Maximum Z-score is reported to the B-factor field of the output PDB file for all residues (5). If this maximum Z-score is below a threshold value (by default 5, but user-controllable), a SegID of 'nast' (n othing a bove s ignificance t hreshold) is assigned, as is seen in residues 137-139 in the example. Residues that exhibit a maximum Z-score above the cutoff value have the residue number associated with that score reported in SegID. Such residues are considered to belong to mutually informative groups, and the remaining homolmapper output fields (element and occupancy) are used to provide information about the group. The number of residues in the group is reported to element, and the sum of their residue numbers is reported to occupancy. Thus, in this example, residues 136 and 140 are mutually informative and are the only members of the group. The Z-score is reported to B-factor (5.29), and each residue has the other residue number reported to SegID. The element field for these two residues is 2, because there are two residues in the mutually informative group, and the occupancy field is 276 (= 136 + 140). This reporting scheme permits information about mutually informative positions in the alignment that fall outside of the structure to be reported nevertheless. It is also possible to punch out the final matrix of Z-scores and the normalized matrix of mutual information values for the full alignment for further analysis. The joint-entropy matrix is punched out by default to permit rapid reruns with different threshold values or different normalizations. In (6), the output PDB file is shown at a cutoff of 5 (left). Residues 136 (blue) and 140 (red) are colored by SegID and are immediately adjacent. If the threshold is lowered to 3.75 (center), additional residues are detected. The mutually informative residues in this case are colored by occupancy. By examining the significant interactions in the structure or in the text file that details all significant hits, one can construct a diagram of the interactions and their Z-scores (right). Residues 136 and 140 are part of a larger network at the lower threshold. VMD , Stride , and homolmapper were used to prepare the structural panels.