Please Note: the e-mail address(es) and any external links in this paper were correct when it was written in 1995, but may no longer be valid.

Go to: Front page, Participants, Group Photograph, Preface, Contributions, Acknowledgements

The use of Positive Matrix Factorization in the analysis of molecular line spectra from the Thumbprint Nebula

Mika Juvela, Kimmo Lehtinen and Pentti Paatero

Helsinki University Observatory, Tähtitorninmäki, 00014 Helsinki, FINLAND
University of Helsinki, Department of Physics, Siltavuorenpenger 20 D, SF-00170 Helsinki, FINLAND


We present the first results of the application of Positive Matrix Factorization (PMF) to the analysis of molecular line spectra. PMF is a new computational method, which tries to find out the underlying basic spectral components with the sum of which the measured spectra could be explained. The principles of the method are discussed and some results from the analysis of -spectra from the Thumbprint Nebula are shown.


1. Introduction

The analysis of molecular line spectra often concentrates on extracting some parameters from individual spectra separately. Usually this is done by making gaussian fits. In the case of several velocity components this is feasible only when the signal to noise ratio is good and the components clearly separated. The general velocity structure can be studied more easily either by drawing channel maps or by constructing space-velocity charts. These give, however, only a qualitative picture and velocity charts are moreover limited to one spatial dimension. In the case of multiple components positive matrix factorization (Paatero & Tapper (1994)) may extract more valuable information from the data.

2. The Positive Matrix Factorization

Positive matrix factorization (PMF) analyzes the matrix containing the measured spectra by calculating a small number of basic spectral profiles and the weights with which each of these basic components or factors is present in each individual spectrum.

Factor analysis (FA) and principal component analysis (PCA) are older methods, which are in principle also capable of doing this, when applied to a matrix of spectra. The results of these methods are, however, often ambiguous and difficult to interpret, since the basic profiles may include many negative values. The physically meaningful representation can be found only after a series of transformations which are called rotations.

By requiring non-negativity for both the weights and the spectral profiles PMF is able to produce results which are far easier to interpret. Another new aspect of PMF is the optimal use of error estimates. PMF computes the solution by minimizing the least squares error of the fit weighted with the error estimates.

2.1. The Principles of PMF

For the analysis the measured spectra are placed as rows in a matrix , so that the columns of the matrix correspond to different channels of the spectrum analyzer. Let the dimensions of be , i.e. there are n rows and m columns. In addition to the emission peaks there is also a random component present. Error estimates of individual channels can be calculated from the amount of noise in channels around the peak and placed in an other matrix .

Given the matrices and and the so called rank of the factorization , PMF solves a bilinear factorization problem


by calculating factor matrices and of dimensions and . This is a least squares solution which minimizes the error


An additional restriction on the solution is provided by a positivity constraint. That is, every element of the matrices and is required to be non-negative. This reflects the desire to find positive basic components, of which the spectra could be reconstructed by addition.

2.2. The rank of the factorization

An approximate rank of the factorization can be determined with the help of singular value decomposition (SVD), . By plotting the singular values of matrix X one gets a graph like that in Figure 1. The number of linearly independent basic components can be seen from the first singular values. The majority of the singular values fall on a sloping line. If the first singular values are above this line then we may assume that there might be independent components. In Figure 1 we have . The final test for is still the comparison of results received from factorizations of different rank.

Figure 1: a) The first 20 singular values of a matrix containing the 92 -spectra used in the analysis. b) An example of a factorization of rank one

Let us consider the simplest case, a factorization of rank 1. In this case is a column vector with as many elements as there were measured spectra and is a row vector with as many elements as there were different channels in the measurements. PMF approximates the matrix with the product of these, so that every row of should be approximately equal to (giving the shape of the peak) multiplied with the element of corresponding to the individual measurement. In this case the result of PMF is simple: the resulting peak shape in F is the weighted average of all spectra and gives the relative intensities. Usually the spectra are more complicated, containing several basic components, each of which requires an increase in the rank of the factorization. Every factorization can be thought of as a sum of factorizations of rank one, i.e. one simply adds more rows in F to represent new basic components and more columns in to give the corresponding weights for these.

2.3. Some complications

Given any factor matrices and , new factor matrices can be formed by mathematical operations called rotations. The possibility of rotations is one of the main reasons why PCA and FA are not readily applicable to the analysis of molecular spectra and why the positivity constraint of PMF turns out to be useful. All rotations can be performed as a sequence of elementary operations, each of which contains one subtraction between columns of or rows of . Since subtraction tends to produce negative elements, the positivity constraint decisively reduces the number of possible rotations or prevents them altogether. The results of PMF are thus generally less ambiguous and, since profiles consist of positive peaks, easier to interpret.

Rotations are not always totally eliminated, however, and it may be possible to get different solutions with different rotations. PMF gives several methods for the elimination of rotations by setting additional restrictions and thus ensuring a physically meaningful solution. In some cases one must settle for a range of possible solutions. Rotations are not a deficiency of the program, but indicate only that the data does not contain enough information to derive a single solution using the given model.

Velocity gradients make peaks appear in different locations in . Since a single component cannot represent such a set of spectra, PMF represents them with several basic components. Spectra with one shifted peak would be approximated with several close-lying peaks in matrix , the relative intensities of which change according to the gradient. The gradient can in such cases be deduced from this gradual change of the weights, and even the central velocities in each spectrum can be calculated from the factorization. On the other hand, the inadequacy of the model makes it necessary to be very careful in the interpretation of the results.

3. Thumbprint Nebula

The Thumbprint nebula is an isolated and highly symmetrical Bok globule situated in the Chamaeleon III region. The globule is in or near the state of virial equilibrium and shows no signs of star formation. Some -measurements from the nebula were analyzed using PMF. The spectra and the channel maps are shown in an article by Lehtinen et al. (1995).

Based on SVD the maximum feasible rank of factorization was found to be 4. Here we present, however, two factorizations to illustrate some features of PMF. In the first frame are the profiles of the basic components and in the other frames maps of the weights. The diameter of the dots is proportional to the weight, the maximum of which was scaled to one for all factors separately.

Figure 2: The results of a factorization of rank 3. The first frame shows the three basic profiles and the other frames the maps of the weights of these

In Figure 2 are the results of a factorization of rank 3. Two of the calculated profiles are nearly gaussian while the component with the lowest radial velocity shows a blue wing. Due to a limited rotational freedom there are some wrinkles in all profiles. These could at least partly be smoothed away by some rotations thus giving a more truthful representation.

Figure 3: The results of a factorization of rank 4. The first frame shows the four basic profiles and the other frames maps of the weights in an order of increasing radial velocity

A factorization of rank 4 is given in Figure 3. PMF uses the new component to represent more accurately the higher velocities while the low velocity component is unaltered. A clear change in velocity is seen from the maps of the weights of the three first factors and in the western part of the cloud the strongest component has a well-defined intensity peak. The two velocity components at the highest velocities originate from the dense material in the globule itself, while the two components at lower velocity originate from the diffuse material around the globule. The component from the globule is split into two components due to a velocity gradient (rotation) over the globule. The rotation is evident also from gaussian fittings.

4. Conclusions

PMF can be a valuable tool when studying complex regions, where possibly several velocity components are present. If these are blended together, channel maps do not necessarily reveal all components. PMF not only finds out of the existence of hidden components, but also gives the spectral shapes and spatial distributions of these. When individual spectra are too noisy to allow separate fitting of gaussian components, PMF can help by making a fit to a number of spectra simultaneously. The possibility of rotations can sometimes make the interpretations of the results difficult. In extreme cases PMF can be taken as a data compression tool without assuming any deeper meaning for the results.


YERAC 94 Account
Wed Feb 22 19:11:39 GMT 1995