Implementing nodes¶
The main goal of BayesPy is to provide a package which enables easy and flexible construction of simple and complex models with efficient inference. However, users may sometimes be unable to construct their models because the built-in nodes do not implement some specific features. Thus, one may need to implement new nodes in order to construct the model. BayesPy aims to make the implementation of new nodes both simple and fast. Probably, a large complex model can be constructed almost completely with the built-in nodes and the user needs to implement only a few nodes.
Moments¶
In order to implement nodes, it is important to understand the messaging framework of the nodes. A node is a unit of calculation which communicates to its parent and child nodes using messages. These messages have types that need to match between nodes, that is, the child node needs to understand the messages its parents are sending and vice versa. Thus, a node defines which message type it requires from each of its parents, and only nodes that have that type of output message (i.e., the message to a child node) are valid parent nodes for that node.
The message type is defined by the moments of the parent node. The moments are
a collection of expectations: . The functions (and the number of
the functions) define the message type and they are the sufficient statistic as
discussed in the previous section. Different message types are represented by
Moments
class hierarchy. For instance, GaussianMoments
represents a message type with parent moments and WishartMoments
a message
type with parent moments .
Let us give an example: Gaussian
node outputs GaussianMoments
messages and Wishart
node outputs WishartMoments
messages.
Gaussian
node requires that it receives GaussianMoments
messages from the mean parent node and WishartMoments
messages from the
precision parent node. Thus, Gaussian
and Wishart
are valid
node classes as the mean and precision parent nodes of Gaussian
node.
Note that several nodes may have the same output message type and some message
types can be transformed to other message types using deterministic converter
nodes. For instance, Gaussian
and GaussianARD
nodes both
output GaussianMoments
messages, deterministic SumMultiply
also outputs GaussianMoments
messages, and deterministic converter
_MarkovChainToGaussian
converts GaussianMarkovChainMoments
to
GaussianMoments
.
Each node specifies the message type requirements of its parents by
Node._parent_moments
attribute which is a list of Moments
sub-class instances. These moments objects have a few purpose when creating the
node: 1) check that parents are sending proper messages; 2) if parents use
different message type, try to add a converter which converts the messages to
the correct type if possible; 3) if given parents are not nodes but numeric
arrays, convert them to constant nodes with correct output message type.
When implementing a new node, it is not always necessary to implement a new moments class. If another node has the same sufficient statistic vector, thus the same moments, that moments class can be used. Otherwise, one must implement a simple moments class which has the following methods:
Moments.compute_fixed_moments()
Computes the moments for a known value. This is used to compute the moments of constant numeric arrays and wrap them into constant nodes.
Moments.compute_dims_from_values()
Given a known value of the variable, return the shape of the variable dimensions in the moments. This is used to solve the shape of the moments array for constant nodes.
Distributions¶
In order to implement a stochastic exponential family node, one must first write
down the log probability density function of the node and derive the terms
discussed in section Terms. These terms are implemented and
collected as a class which is a subclass of Distribution
. The main
reason to implement these methods in another class instead of the node class
itself is that these methods can be used without creating a node, for instance,
in Mixture
class.
For exponential family distributions, the distribution class is a subclass of
ExponentialFamilyDistribution
, and the relation between the terms in
section Terms and the methods is as follows:
ExponentialFamilyDistribution.compute_phi_from_parents()
Computes the expectation of the natural parameters in the prior distribution given the moments of the parents.
ExponentialFamilyDistribution.compute_cgf_from_parents()
Computes the expectation of the negative log normalizer of the prior distribution given the moments of the parents.
ExponentialFamilyDistribution.compute_moments_and_cgf()
Computes the moments and the negative log normalizer of the posterior distribution given the natural parameters .
ExponentialFamilyDistribution.compute_message_to_parent()
Computes the message from the node to its parent node given the moments of the node and the other parents.
ExponentialFamilyDistribution.compute_fixed_moments_and_f()
Computes and for given observed value . Without this method, variables from this distribution cannot be observed.
For each stochastic exponential family node, one must write a distribution class which implements these methods. After that, the node class is basically a simple wrapper and it also stores the moments and the natural parameters of the current posterior approximation. Note that the distribution classes do not store node-specific information, they are more like static collections of methods. However, sometimes the implementations depend on some information, such as the dimensionality of the variable, and this information must be provided, if needed, when constructing the distribution object.
In addition to the methods listed above, it is necessary to implement a few more methods in some cases. This happens when the plates of the parent do not map to the plates directly as discussed in section Irregular plates. Then, one must write methods that implement this plate mapping and apply the same mapping to the mask array:
ExponentialFamilyDistribution.plates_from_parent()
Given the plates of the parent, return the resulting plates of the child.
ExponentialFamilyDistribution.plates_to_parent()
Given the plates of the child, return the plates of the parent that would have resulted them.
ExponentialFamilyDistribution.compute_mask_to_parent()
Given the mask array of the child, apply the plate mapping.
It is important to understand when one must implement these methods, because the default implementations in the base class will lead to errors or weird results.
Stochastic exponential family nodes¶
After implementing the distribution class, the next task is to implement the node class. First, we need to explain a few important attributes before we can explain how to implement a node class.
Stochastic exponential family nodes have two attributes that store the state of the posterior distribution:
phi
The natural parameter vector of the posterior approximation.
u
The moments of the posterior approximation.
Instead of storing these two variables as vectors (as in the mathematical
formulas), they are stored as lists of arrays with convenient shapes. For
instance, Gaussian
node stores the moments as a list consisting of a
vector and a matrix instead of reshaping and concatenating these into a
single vector. The same applies for the natural parameters phi
because it
has the same shape as u
.
The shapes of the arrays in the lists u
and phi
consist of the shape
caused by the plates and the shape caused by the variable itself. For instance,
the moments of Gaussian
node have shape (D,)
and (D, D)
, where
D
is the dimensionality of the Gaussian vector. In addition, if the node
has plates, they are added to these shapes. Thus, for instance, if the
Gaussian
node has plates (3, 7)
and D
is 5, the shape of
u[0]
and phi[0]
would be (3, 7, 5)
and the shape of u[1]
and
phi[1]
would be (3, 7, 5, 5)
. This shape information is stored in the
following attributes:
plates
: a tupleThe plates of the node. In our example,
(3, 7)
.
dims
: a list of tuplesThe shape of each of the moments arrays (or natural parameter arrays) without plates. In our example,
[ (5,), (5, 5) ]
.
Finally, three attributes define VMP for the node:
_moments
:Moments
sub-class instanceAn object defining the moments of the node.
_parent_moments
: list ofMoments
sub-class instancesA list defining the moments requirements for each parent.
_distribution
:Distribution
sub-class instanceAn object implementing the VMP formulas.
Basically, a node class is a collection of the above attributes. When a node is
created, these attributes are defined. The base class for exponential family
nodes, ExponentialFamily
, provides a simple default constructor which
does not need to be overwritten if dims
, _moments
, _parent_moments
and _distribution
can be provided as static class attributes. For instance,
Gamma
node defines these attributes statically. However, usually at
least one of these attributes cannot be defined statically in the class. In
that case, one must implement a class method which overloads
ExponentialFamily._constructor()
. The purpose of this method is to define
all the attributes given the parent nodes. These are defined using a class
method instead of __init__
method in order to be able to use the class
constructors statically, for instance, in Mixture
class. This
construction allows users to create mixtures of any exponential family
distribution with simple syntax.
The parents of a node must be converted so that they have a correct message
type, because the user may have provided numeric arrays or nodes with incorrect
message type. Numeric arrays should be converted to constant nodes with correct
message type. Incorrect message type nodes should be converted to correct
message type nodes if possible. Thus, the constructor should use
Node._ensure_moments
method to make sure the parent is a node with correct
message type. Instead of calling this method for each parent node in the
constructor, one can use ensureparents
decorator to do this automatically.
However, the decorator requires that _parent_moments
attribute has already
been defined statically. If this is not possible, the parent nodes must be
converted manually in the constructor, because one should never assume that the
parent nodes given to the constructor are nodes with correct message type or
even nodes at all.
Deterministic nodes¶
Deterministic nodes are nodes that do not correspond to any probability
distribution but rather a deterministic function. It does not have any moments
or natural parameters to store. A deterministic node is implemented as a
subclass of Deterministic
base class. The new node class must
implement the following methods:
Deterministic._compute_moments()
Computes the moments given the moments of the parents.
Deterministic._compute_message_to_parent()
Computes the message to a parent node given the message from the children and the moments of the other parents. In some cases, one may want to implement
Deterministic._compute_message_and_mask_to_parent()
orDeterministic._message_to_parent()
instead in order to gain more control over efficient computation.
Similarly as in Distribution
class, if the node handles plates
irregularly, it is important to implement the following methods:
Deterministic._plates_from_parent()
Given the plates of the parent, return the resulting plates of the child.
Deterministic._plates_to_parent()
Given the plates of the child, return the plates of the parent that would have resulted them.
Deterministic._compute_weights_to_parent()
Given the mask array, convert it to a plate mask of the parent.
Converter nodes¶
Sometimes a node has incorrect message type but the message can be converted
into a correct type. For instance, GaussianMarkovChain
has
GaussianMarkovChainMoments
message type, which means moments . These moments can
be converted to GaussianMoments
by ignoring the third element and
considering the time axis as a plate axis. Thus, if a node requires
GaussianMoments
message from its parent, GaussianMarkovChain
is a valid parent if its messages are modified properly. This conversion is
implemented in _MarkovChainToGaussian
converter class. Converter nodes
are simple deterministic nodes that have one parent node and they convert the
messages to another message type.
For the user, it is not convenient if the exact message type has to be known and
an explicit converter node needs to be created. Thus, the conversions are done
automatically and the user will be unaware of them. In order to enable this
automatization, when writing a converter node, one should register the converter
to the moments class using Moments.add_converter()
. For instance, a class
X
which converts moments A
to moments B
is registered as
A.add_conveter(B, X)
. After that, Node._ensure_moments()
and
Node._convert()
methods are used to perform the conversions automatically.
The conversion can consist of several consecutive converter nodes, and the least
number of conversions is used.