A Bayesian value for may be found simply by treating it as another parameter in our hypothesis space. This procedure is outlined for the case of real images in Skilling (1989) and Gull & Skilling (1990), and we modify their treatment here in order to accommodate complex images .
After including into our hypothesis space, the full joint probability distribution can be expanded as
where in the last factor we can drop the conditioning on since it is alone that induces the data . We then recognise this as the likelihood. Furthermore, the second factor can be identified as the entropic prior and so (39) becomes
where and are respectively the normalisation constants for the entropic prior and the likelihood such that the total probability density function in each case integrates to unity. For convenience we have dropped the explicit dependence of the cross entropy on the models and .
Since we have assumed the instrumental noise on the data to be Gaussian, the likelihood function is also Gaussian and so the normalisation factor is easily found. Evaluating the appropriate Gaussian integral gives
where is the dimension of the complex data vector and is equal to the number of observing frequencies that make up the Planck Surveyor data set; is the determinant of the noise covariance matrix defined in (7).
The normalisation factor for the entropic prior is more difficult to calculate since this prior is not Gaussian in shape. Nevertheless, we find that a reasonable approximation to for all may be obtained by making a Gaussian approximation to the prior at its maximum, which occurs at . As discussed in Appendix A, the Hessian matrix of the entropy at this point is given by , where is the metric on image space evaluated at the maximum of the prior ; the metric matrix is real and diagonal. Remembering that and using the Gaussian approximation, is then given by
where is the dimension of the complex (hidden) image vector and is equal to the number of physical components present in the simulations.
Now, returning to (40), in order to investigate more closely the role of , we begin by considering the joint probability distribution , which may be obtained by integrating out in (40):
where we have defined the normalisation integral . In order to calculate , we follow a similar approach to that use to calculate and make a Gaussian approximation to about its maximum at . The required Hessian matrix is given by (38) evaluated at . Let us, however, define a new matrix that is given by
The integral is then approximated by
Thus, substituting into (42) the expressions for and given by (41) and (44) respectively, we find that in the Gaussian approximation the joint probability distribution has the form
Now, in order to obtain a Bayesian estimate for , we should choose an appropriate form for the prior . Nevertheless, for realistically large data sets, the distribution is so strongly peaked that it overwhelms any reasonable prior on , and so we assign the Bayesian value of the regularisation constant to be that which maximises . Taking logarithms we obtain
Differentiating with respect to , and noting that the -derivatives cancel, we find
where we have used the identity
which is valid for any non-singular matrix . From (43), however, we see that . Substituting this relation into (45) and equating to the result to zero, we find that in order to maximise , the parameter must satisfy