CategoricalDistribution.compute_gradient(g, u, phi)

Compute the Euclidean gradient.

In order to compute the Euclidean gradient, we first need to derive the gradient of the moments with respect to the variational parameters:

= N \cdot \frac {e^{\phi_i} \mathrm{d}\phi_i \sum_j e^{\phi_j}}
                {(\sum_k e^{\phi_k})^2}
  - N \cdot \frac {e^{\phi_i} \sum_j e^\phi_j \mathrm{d}\phi_j}
                  {(\sum_k e^{\phi_k})^2}
= \overline{u}_i \mathrm{d}\phi_i
  - \overline{u}_i \sum_j \frac{\overline{u}_j}{N} \mathrm{d}\phi_j

Now we can make use of the chain rule. Given the Riemannian gradient \tilde{\nabla} of the variational lower bound \mathcal{L} with respect to the variational parameters \phi, put the above result to the derivative term and re-organize the terms to get the Euclidean gradient \nabla:

= \tilde{\nabla}^T \mathrm{d}\overline{\mathbf{u}}
= \sum_i \tilde{\nabla}_i \mathrm{d}\overline{u}_i
= \sum_i \tilde{\nabla}_i (
      \overline{u}_i \mathrm{d}\phi_i
      - \overline{u}_i \sum_j \frac {\overline{u}_j} {N} \mathrm{d}\phi_j
= \sum_i \left(\tilde{\nabla}_i \overline{u}_i \mathrm{d}\phi_i
  - \frac{\overline{u}_i}{N} \mathrm{d}\phi_i \sum_j \tilde{\nabla}_j \overline{u}_j \right)
\equiv \nabla^T \mathrm{d}\phi

Thus, the Euclidean gradient is:

\nabla_i = \tilde{\nabla}_i \overline{u}_i - \frac{\overline{u}_i}{N}
           \sum_j \tilde{\nabla}_j \overline{u}_j

See also

Computes the moments \overline{\mathbf{u}} given the variational parameters \phi.