The multigraph/metagraph structure underlying a GPM. (A) Each position in the sequence, or distinct feature in the set, can be modeled as a node, while each observed category present at a location or feature, can be modeled as a subnode of that node. The weight of each subnode encodes the probability of finding that subnode’s category in the training data, in that position. (B) Between every pair of nodes, there exists a complete bipartite graph of (potential) edges from the subnodes of one node to the other. Each edge encodes the probability of that connected pair of subnodes occurring in the training data. While it is easy to build this structure from the training data, it is almost always computationally intractable to use it to build a functional GPM. To create a tractably trainable GPM, the possible edges in (B) (and all other possible edges between each pair of columns) must be reduced to only the edges representing functionally important dependencies in the data.