Nested sampling modelling of Star-Formation History of the Universe

As an exercise to put through the new InMin library through its paces, I have looked recently at applying Nested Sampling ( Skilling2006) algorithm to the fitting of star-formation history as function of redshift. Here is a complete guide on how to do this.

The first step is download the InMin library from here http://www.bnikolic.co.uk/inmin/inmin-library.html and build it. The parts that are required for this example are the main library and the SWIG python wrappers, and so to successfully build it you will need to have installed Eigen, Boost and SWIG libraries/programs.

The data I used are from Figure 1 of Yuksel08 but I use only the grey data paints, which are just the data originally from Hopkins2006. You can download my version of these data from here, but please double check them if you use them for any detailed scientific investigation. The data from the text file can be loaded simply by:

def loaddata(fnamein):
    return numpy.array([[float(x) for x in  l.split(",")] for l in open(fnamein) if l[0] != '#'])

The next step is to define the function which is the model for the history of the SFR density. I used the same function as used by Yuksel08, i.e.:

def yuksil08(z, p):
    """
    Model given in 2008ApJ...683L...5Y. Eta is the smoothing parameter
    """
    rho0, a, b, c, eta, B, C = p
    return rho0*( (1+z)**(a*eta)+
                  ( (1+z)/ B)**(b*eta) +
                  ( (1+z)/ C)**(c*eta))**(1.0/eta)

I’ve arranged this function so that it takes two parameters, where the first one is the “independent” variable and the second is the list of the the actual parameters of the model. The reason for this makes it easy to fit this same model using InMin’s least-squares solver.

Returning to the Nested Sampling, the next step is to define the likelihood function. This function, which is passed to the Nested Sampling algorithm, must take one argument only, which is the list of the parameters of the model at which the likelihood is to be evaluated. To avoid global variables, I’ve written a function that returns the likelihood function with all the ancillary data captured in the closure:

1 def mkLLklFunc(errscale=1.0):
2     """Make function that evaluates the log-likelihod"""
3     d=loaddata("data/2008ApJ...683L...5Y.f1.grayonly.txt")
4     def llfn(p):
5         ym=numpy.array([yuksil08(x[0],p) for x in d])
6         res= 0.5 * (((ym-d[:,1])/(d[:,3]*errscale))**2).sum() + 0.5 * (numpy.log(2*numpy.pi*d[:,3]**2)).sum()
7         return -res
8     return llfn

For this example, I have assumed independent Gaussian errors and ignored the uncertainty in redshift. (All three of these assumptions are probably poor.) Therefore the log-likelihood is proportional to the sum of error-weighted squares. In order to get meaningful evidence value to likelihood needs to be normalised which is implemented in the second part of line 6 of the above code example.

This completes the setup of the problem, and the next step is running the nested sampler:

 1 ll=mkLLklFunc()
 2     box=[0.01, 0.03,  # Constraint for 1st parameter  (rho0)
 3          3, 4,        # Constraint for 2nd parameter  (a)
 4          -1, 1,       # Etc...
 5          -5, 1,      
 6          -20, -10,
 7          0, 1e4, 
 8          0, 100]
 9     oo=inmin.nested_box(ll, box, 
10                         nss=200, 
11                         trace=True, 
12                         nsample=8000)

The box parameter defines prior space for the problem: the prior is assumed uniform inside the box and zero probability outside the box. The parameter “nss” defines the number of points in the live set and therefore controls the accuracy of the nested sampling: larger values of “nss” increase the accuracy but at the cost of longer run time. Finally, parameter nsample controls the number of samples to make.

After the execution is completed, the Bayesian Evidence can be calculated using:

1 inmin.pyinmin.inmin_evidence_rect(10, oo.llv, oo.wv)

While the fan-diagram can be computed using:

1 fhist=inmin.nested_fan(ff, oo, 0.01, 8, -1.5, -0.5)

Unlike the Radiospec package, in order to simplify the installation and setup, InMin does not provide any built-in graphing routines. But for example the raw histogram of the fan diagram can be plotted simply by:

1 pylab.matshow((fhist)[::-1,]); pylab.show()

The inversion of the first axis (i.e., the “::-1” index spec) is necessary so that rows increase in the up direction instead of the default down direction that matplotlib uses. The marginal probability distributions of each parameter can be plotted as follows:

1 xx=oo.g_x()
2 p=numpy.exp(oo.g_ll()) * oo.g_w()
3 for i in range(6):
4         pylab.clf()
5         pylab.hist(xx[:,i], weights=p, normed=True, bins=50)
6         pylab.savefig("hist-par%i.png"%i)

Which produces to following plots:

rho0 marginal distribution a marginal distribution b marginal distribution c marginal distribution eta marginal distribution B marginal distribution C marginal distribution

blog comments powered by Disqus