.. Copyright (C) 2014 Jaakko Luttinen This file is licensed under the MIT License. See LICENSE for a text of the license. .. testsetup:: import numpy numpy.random.seed(1) Principal component analysis ============================ This example uses a simple principal component analysis to find a two-dimensional latent subspace in a higher dimensional dataset. Data ---- Let us create a Gaussian dataset with latent space dimensionality two and some observation noise: >>> M = 20 >>> N = 100 >>> import numpy as np >>> x = np.random.randn(N, 2) >>> w = np.random.randn(M, 2) >>> f = np.einsum('ik,jk->ij', w, x) >>> y = f + 0.1*np.random.randn(M, N) Model ----- We will use 10-dimensional latent space in our model and let it learn the true dimensionality: >>> D = 10 Import relevant nodes: >>> from bayespy.nodes import GaussianARD, Gamma, SumMultiply The latent states: >>> X = GaussianARD(0, 1, plates=(1,N), shape=(D,)) The loading matrix with automatic relevance determination (ARD) prior: >>> alpha = Gamma(1e-5, 1e-5, plates=(D,)) >>> C = GaussianARD(0, alpha, plates=(M,1), shape=(D,)) Compute the dot product of the latent states and the loading matrix: >>> F = SumMultiply('d,d->', X, C) The observation noise: >>> tau = Gamma(1e-5, 1e-5) The observed variable: >>> Y = GaussianARD(F, tau) Inference --------- Observe the data: >>> Y.observe(y) We do not have missing data now, but they could be easily handled with ``mask`` keyword argument. Construct variational Bayesian (VB) inference engine: >>> from bayespy.inference import VB >>> Q = VB(Y, X, C, alpha, tau) Initialize the latent subspace randomly, otherwise both ``X`` and ``C`` would converge to zero: >>> C.initialize_from_random() .. currentmodule:: bayespy.inference Now we could use :func:`VB.update` to run the inference. However, let us first create a parameter expansion to speed up the inference. The expansion is based on rotating the latent subspace optimally. This is optional but will usually improve the speed of the inference significantly, especially in high-dimensional problems: >>> from bayespy.inference.vmp.transformations import RotateGaussianARD >>> rot_X = RotateGaussianARD(X) >>> rot_C = RotateGaussianARD(C, alpha) By giving ``alpha`` for ``rot_C``, the rotation will also optimize ``alpha`` jointly with ``C``. Now that we have defined the rotations for our variables, we need to construct an optimizer: >>> from bayespy.inference.vmp.transformations import RotationOptimizer >>> R = RotationOptimizer(rot_X, rot_C, D) In order to use the rotations automatically, we need to set it as a callback function: >>> Q.set_callback(R.rotate) For more information about the rotation parameter expansion, see :cite:`Luttinen:2010` and :cite:`Luttinen:2013`. Now we can run the actual inference until convergence: >>> Q.update(repeat=1000) Iteration 1: loglike=-2.33...e+03 (... seconds) ... Iteration ...: loglike=6.500...e+02 (... seconds) Converged at iteration ... Results ------- The results can be visualized, for instance, by plotting the Hinton diagram of the loading matrix: >>> import bayespy.plot as bpplt >>> bpplt.pyplot.figure() >>> bpplt.hinton(C) .. plot:: import numpy numpy.random.seed(1) M = 20 N = 100 import numpy as np x = np.random.randn(N, 2) w = np.random.randn(M, 2) f = np.einsum('ik,jk->ij', w, x) y = f + 0.1*np.random.randn(M, N) D = 10 from bayespy.nodes import GaussianARD, Gamma, SumMultiply X = GaussianARD(0, 1, plates=(1,N), shape=(D,)) alpha = Gamma(1e-5, 1e-5, plates=(D,)) C = GaussianARD(0, alpha, plates=(M,1), shape=(D,)) F = SumMultiply('d,d->', X, C) tau = Gamma(1e-5, 1e-5) Y = GaussianARD(F, tau) Y.observe(y) from bayespy.inference import VB Q = VB(Y, X, C, alpha, tau) C.initialize_from_random() from bayespy.inference.vmp.transformations import RotateGaussianARD rot_X = RotateGaussianARD(X) rot_C = RotateGaussianARD(C, alpha) from bayespy.inference.vmp.transformations import RotationOptimizer R = RotationOptimizer(rot_X, rot_C, D) Q.set_callback(R.rotate) Q.update(repeat=1000) import bayespy.plot as bpplt bpplt.hinton(C) The method has been able to prune out unnecessary latent dimensions and keep two components, which is the true number of components. >>> bpplt.pyplot.figure() >>> bpplt.plot(F) >>> bpplt.plot(f, color='r', marker='x', linestyle='None') .. plot:: import numpy numpy.random.seed(1) M = 20 N = 100 import numpy as np x = np.random.randn(N, 2) w = np.random.randn(M, 2) f = np.einsum('ik,jk->ij', w, x) y = f + 0.1*np.random.randn(M, N) D = 10 from bayespy.nodes import GaussianARD, Gamma, SumMultiply X = GaussianARD(0, 1, plates=(1,N), shape=(D,)) alpha = Gamma(1e-5, 1e-5, plates=(D,)) C = GaussianARD(0, alpha, plates=(M,1), shape=(D,)) F = SumMultiply('d,d->', X, C) tau = Gamma(1e-5, 1e-5) Y = GaussianARD(F, tau) Y.observe(y) from bayespy.inference import VB Q = VB(Y, X, C, alpha, tau) C.initialize_from_random() from bayespy.inference.vmp.transformations import RotateGaussianARD rot_X = RotateGaussianARD(X) rot_C = RotateGaussianARD(C, alpha) from bayespy.inference.vmp.transformations import RotationOptimizer R = RotationOptimizer(rot_X, rot_C, D) Q.set_callback(R.rotate) Q.update(repeat=1000) import bayespy.plot as bpplt bpplt.plot(F) bpplt.plot(f, color='r', marker='x', linestyle='None') .. currentmodule:: bayespy.nodes The reconstruction of the noiseless function values are practically perfect in this simple example. Larger noise variance, more latent space dimensions and missing values would make this problem more difficult. The model construction could also be improved by having, for instance, ``C`` and ``tau`` in the same node without factorizing between them in the posterior approximation. This can be achieved by using :class:`GaussianGammaISO` node.