Related works¶
PBG was designed with the expertise gained from many previous works in the knowledge base completion literature, integrating what has been shown to work well over time. In the sections below, we describe the models that inspired some of the operators and features of PBG.
TransE¶
TransE is a popular model in knowledge base completion due to its simplicity: when two embeddings are compared to calculate the score of an edge between them, the right-hand side one is first translated by a vector \(v_r\) (of the same dimension as the embeddings) that is specific to the relation type. Contrary to PBG, TransE aims at giving lower scores to entities that are nearby, hence the score of a triple \((x, r, y)\) is computed as:
where \(d\) is a dissimilarity function such as the \(L_1\) or \(L_2\) norm.
PBG can be configured to operate like TransE by using the translation
operator
and by introducing a new comparator based on the desired
dissimilarity function. However, contrary to the dot product or the cosine distance,
the comparison between all pairs of vectors from two sets using the \(L_1\) and
\(L_2\) norms cannot be expressed as a matrix multiplication and thus would be
challenging to implement as efficiently. One could consider using the cosine distance
instead of the \(L_2\) norm since the former measures (the cosine of) the angle
between two vectors which, when small, is approximately their \(L_2\) distance.
RESCAL¶
RESCAL is a restriction on the Tucker factorization. Relations are represented as matrices \(M_r\), and the score of a triple \((x, r, y)\) is computed as:
This corresponds to PBG’s linear
operator. The original paper suggests to use
weight-decay on the parameters of the model. Such regularization is not available
in PBG currently, which relies instead on early stopping and control of the maximum
norm of the embeddings which scales more easily.
The need for weight decay stems from each relation having a lot of parameters, which could lead them to overfit and to not perform well on, for example, FB15k. RESCAL should only be considered for models with a large number of edges for each relation type, where overfitting is not an issue.
DistMult¶
DistMult is a special case of RESCAL, in which relations are limited to diagonal matrices represented as vectors \(v_r\). The score of a triple \((x, r, y)\) is thus:
This is the diagonal
operator in PBG. Notice that with the same embedding
space on the left and right-hand side, this operator is limited to representing
symmetric relations. This restriction however leads to less over-fitting and good
performances on several benchmarks.
ComplEx¶
ComplEx is similar to DistMult, but uses embeddings in \(\mathbb{C}\) and represents the right-hand side embeddings as complex conjugates of the left-hand side ones. This allows to represent non-symmetric relations. The score of a triple \((x, r, y)\) is computed as:
The complex_diagonal
operator in PBG interprets a \(D\)-dimensional real
embedding as a \(D/2\)-dimensional complex one, with the first \(D/2\)
values representing the real part and the remaining ones for the imaginary part.
As shown in the original paper, the ComplEx score can then be written as a dot
product in \(\mathbb{R}^{D}\), hence replicated in PBG using the dot
operator.
Reciprocal Relations¶
Two papers ([1] and [2])
simultaneously suggested to explicitly train on reciprocal relations, i.e., for each triple
\((x, r, y)\) in the training set add another one \((y, r', x)\). This can
be done implicitly in PBG with dynamic relations. Jointly with
the complex_diagonal
operator, this allows reproducing state of the art results on FB15K
with PBG.