2MASS Spring 1999 Explanatory Supplement: Data Processing

3. Overview of Extended Source Processor

The last major subsystem to run in the 2MASS quasi-linear data reduction pipeline is the extended source processor, referred to as GALWORKS. The primary role of the processor is to characterize each detected source and decide which sources are "extended" or resolved with respect to the point spread function. Sources that are deemed "extended" are measured further (mostly photometry) and the information is output to a separate table. In addition to tabulated source information, a small "postage stamp" image is extracted for each extended source from the corresponding J,H and Ks atlas images. The source lists and image data is stored in the 2MASS extended source database; see flow schematic, Figure 3.

The extended source database contains several classes of "extended objects", including real galaxies, galactic nebulae and pieces of large angular-size sources, galactic H II regions, multiple stars (mostly double stars), artifacts (pieces of bright stars, meteor streaks, etc) and faint (mostly point-like) sources with uncertain classifications. For extended sources, the ultimate goal of the 2MASS project is to produce a reliable catalog of real extended sources, predominantly galaxies. It is therefore necessary for additional ‘post-processing’ steps to eliminate artifacts and confusing objects like double stars. In section 4, we discuss in detail how the star-galaxy separation process is performed. For the GALWORKS processor, the emphasis is placed primarily on completeness; that is, we want to comprehensively detect and identify extended sources (especially galaxies) brighter than the level-1 specifications limits of K ~ 13.5, H ~14.3 and J ~15.0. Later in the post-processing operations phase the galaxy completeness is relaxed (but still within the level-1 specifications) in order to achieve the desired reliability in our galaxy catalog.

2MASS is an all sky project that will acquire over 10 Tb of data over the lifetime of the project. This places severe runtime restrictions on the pipeline reduction software; consequently, one important caveat is that most of the GALWORKS algorithms and flow structures were designed specifically to run/operate as fast and as efficiently as possible.

By the time GALWORKS is run in the 2MASS pipeline, point sources have been fully measured with refined positions and photometry, band-merged, coordinate positions calibrated, coadd images (atlas Images) constructed, and the time-dependent PSF characterized. GALWORKS, however, does generate PSF ridgelines (see section 2.4) on finer time scales depending on the number of sources available from the scan (i.e., the stellar number density), which are then used to parameterize sources and perform basic star-galaxy discrimination. The high-level steps that encompass GALWORKS include: (1) bright star (and their associated features) removal, (2) large (>5’) cataloged-galaxy removal, (3) atlas image background removal, (4) star counting and measurement of the confusion noise, (5) source parameterization and attribute measurements, (6) star-galaxy discrimination (discussed at length in the next section, 4), (7) refined photometric measurements, and (8) extraction; see flow schematic, Figure 4.

The background removal operation is a particularly crucial step since both star-galaxy discrimination and photometry rely upon accurate zeroing, smoothing and flattening of the image background. This operation is described in detail below (section 3.1). Steps 4-6 are designed to isolate ‘normal’ galaxies and other relatively high surface brightness extended sources. There are, however, other kinds of extended sources that 2MASS is capable of detecting, including bright Galactic young stellar objects (H II regions, T-Tauri stars, etc), Figure 5, faint nebulae and low surface brightness, Figure 6. These objects tend to be very rare or constrained to relatively small angular–sized fields toward the Galactic plane (e.g., molecular clouds) and as such there are no set requirements for their detection completeness or reliability. A separate catalog of bright extended stars and faint LSB galaxies will be released at some later date in the future. A description of the algorithm to detect stars with associated extended emission is given in Appendix ?, and the algorithm to detect low surface brightness galaxies in Appendix ?. The remainder of this paper will focus exclusively on the detection, characterization and extraction of ‘normal’ extended sources (mostly galaxies).

3.1 Atlas Image Background Removal

In the near-infrared, the background "sky" emission has structure at all size scales, primarily due to upper atmospheric aerosol & hydroxyl emission (the so-called "airglow" emission; cf. Ramsey et al 1992, MNRAS, 259. 751). The OH emission is the dominant component to the J (1.3 mm) and H-band (1.7 mm) backgrounds, while thermal continuum emission comprises the bulk of the K (2.2 mm) background. The J and H images tend to have more background "structure", and at times of severe airglow the background can have high frequency features on scale of tens of arcseconds that can trigger false extended source detections. However, for the most part, the background variation in a given image (size 8.5´ 17’) is smooth and can be modeled with a cubic polynomial. A third order polynomial is a good compromise between a planar fit (too stiff) and spline waves (too yielding). For extended sources, the primary objective of the 2MASS project is to find and characterize galaxies (and other extended objects) smaller than ~3’ in diameter. We therefore attempt to remove airglow features slightly larger than this limiting size scale to minimize random and systematic photometric error from non-zero background structure. For the case in which the airglow frequency is higher than we can adequately remove, the resultant photometry (particularly at H band) is severely compromised. These data are given a lower quality score and are in some instances are scheduled for re-observation. At the least, the poisson-derived uncertainties in the photometry are increased to more properly gauge the actual photometric quality of the data.

Using a least-squares technique, a cubic polynomial is iteratively fit with 3s rejection to each line (of the 512´ 512 block, with 8X8 median filtering). The line solutions are used for input to the next step, where we fit a cubic polynomial to each column, and thereby coupling the line and column background solutions. The three block solution images are combined with a (1/D r) weighting scheme. Here Dr refers to the relative radial ("in-scan") difference between any two given block solutions from some reference point. There are three "in-scan" reference points: 256, 512 and 768. So for example, combining the lower and central blocks at some point, Y’, gives the respective weights [1 / (256-Y’)] and [1 / (512 – Y’)]. With this technique we are able to smoothly combine the three independent solutions per coadd image. Note however, the boundary" solutions for the upper and lower blocks are better constrained near the center of the image due to the weighed addition of the central block solution image. Conversely, the background solutions are not as well determined at the upper, >900, and lower, <124, "in-scan" image extremes. The fitting schematic is illustrated in Figure 7. The 512´ 1024 coadd is represented by a thick-lined rectangle. As explained, cubic fits are applied to the lower half, 512 ´ [1:512] pixels, the upper half, 512 ´ [513:1024] pixels, and the central half, 512 ´ [257:768] pixels, where we have first resampled the data with an 8´ 8 median filter.

The background removal process is applied separately to the J, H, & K coadd images (512 ´ 1024 pixels each). Given the "cross-scan" size of one coadd image, a cubic polynomial, ax³ + bx² +cx + d, provides an effective model for smooth background variations larger than ~3’. In the "inscan" direction the larger size (1024") allows cubic fit to each half of the coadd (lower 512", upper 512"), and we also apply a fit to the "central" 512´ 512 pixels in order to smoothly ‘join’ the boundaries of the two background solution fits. The final 512´ 1024-solution fit is generated from a weighted average of each 512´ 512-block solution. The fitting procedure is first preceded by an image "clean" operation. Stars and catalogued galaxies are masked from the image. Very bright stars (K < 6) require more complicated masking, including removal of their bright internal reflection halo, diffraction spikes, horizontal streaks, filter glints and persistence ghosts. Finally, in order to minimize contamination from faint stars and objects that escaped the masking procedure, we median filter the coadd with an 8X8 pixel filter (thus, degrading the resolution of each pixel to 8" chunks).

Representative performance of the background removal operation is shown in Figures 8-10. The image data comes from a fairly typical ‘photometric’ Northern Hemisphere night, although the "airglow" emission is fairly severe during the period that this data was acquired (see H-band, Fig 9). The figures show the raw image coadd, resultant background solution and residual (background subtracted) image. The gray-scale stretch ranges from -2s to 5s of the mean background level. The J & K raw images (Fig 8,10) reveal fairly low level (smooth, but non-linear) background variations, while the corresponding residual images show very little (if any) background structure. However, airglow emission is much more prevalent in the H-band (Fig 9), with size scales smaller than ~1-2’, as evident in the residual image. It is this residual structure in the background (with amplitude >10% of the mean background noise) that can induce systematics in the photometry, parameterization (e.g., azimuthal ellipse fitting), and reliability. In the future we will apply higher order fits (e.g., using splines) in attempt to fit out these airglow features. The projected success of this procedure is altogether unknown at this time.

3.2 Source Parameterization and Shape-Attribute Measurements

Preliminary flux estimates come from the point source processor, which uses a characteristic PSF to derive total fluxes (assuming a point-like flux distribution). These measures systematically underestimate the flux of extended sources. Hence, one of the first tasks for GALWORKS is to deduce the nature of a source using some simple radial profile attributes. The median radial shape, or "msh", is both easy to compute (re: fast runtime) and a robust discriminator between stars/double stars and galaxies (see section 4 for more details). Applying a threshold to the "msh" measure for each source (per band) eliminates a large fraction of the total number of sources that require more exhaustive testing for star-galaxy separation. It also provides a measure of the "extendedness" of a source: if the source is highly likely a galaxy (large "msh" score) then its total flux is re-estimated using a fixed R=10" circular aperture.

Before the more time-consuming image attribute measurements are performed on each source (e.g., elliptical shape fitting; adaptive aperture photometry) it is necessary to perform additional star-galaxy separation tests, particularly when the stellar number density is very high (i.e., glat < 10 deg). Thresholds on the "sh", "wsh", "r1", and "r23" radial shape attributes (see section 4) are carried out to eliminate most non-extended sources (namely stars and double stars) from the scan/tile source list. For high glat fields, the remaining sources (in a typical scan) are mostly real galaxies intermixed with a few double stars, one or two isolated stars and low S/N objects of uncertain nature. In quantitative terms, the reliability is from 50 to 80% at this juncture , and thus the star-galaxy separation process has reduced the fraction of stars to galaxies from 10:1 to approximately 1:1.

The orientation of disk spiral and spheroid elliptical galaxies is estimated using a 2-D ellipse fit to a single isophote surface, which is used to compute various forms of aperture photometry (e.g., Kron, isophotal, etc) and symmetry parameters used for star-galaxy separation. Although galaxies can change orientation (e.g., ellipticity and position angle) with radius, the 2MASS pixel undersampling and runtime constraints limit measurements to one or two isophotes. Moreover, most 2MASS galaxies are small in size (<15"), so for our ~2" angular resolution multiple fits are impractical. It is to our advantage that in the near-infrared most galaxies appear to have relatively consistant orientations and axis ratios at different radii, so fitting one fiducial isophote is usually sufficient. To minimize the effect of PSF elongation, the fiducial isophote is fit at roughly a 3s level -- 20.09 mag/arcsec² at J, 19.34 mag/arcsec² at H and 18.55 mag/ arcsec² at K. A cautionary note: like all isophotes used in 2MASS pipeline processing, the magnitudes are uncalibrated and may be adjusted by ~0.1 to 0.2 mag in the later calibration processing step. Consequently, the isophote at which the 2-D elliptical parameters are derived can vary from (in background noise units) ~2.6s – 3.7s, depending on the calibration correction.

The ellipse-fitting method was designed to run fast and to minimize confusion from nearby sources (i.e., stars) and correlated noise features that form ‘extended’ limbs and other ‘disconnected’ extended features. Three elliptical parameters are derived from the image isophote: axis ratio (b/a), position angle, f (standard reference frame, east of north), and a goodness of fit metric. The goodness of fit is defined as follows:

where r_semi is the semi-major axis corresponding to some point along the isophote and a given (axis ratio, f ) solution, Dr_semi is the population standard deviation and r_semi is the population mean. That is to say, for a given solution ellipse (described by the axis ratio and position angle) the resultant shape of the distribution of r_semi along the isophote tells us the goodness of the fit. If the ellipse (b/a, j ) is perfectly matched to the isophote, the mean variance in r_semi is identically zero. If the match is poor, then the variance is large while the population mean can be large or small. Therefore, by minimizing the ratio of the standard deviation to the mean radius in the distribution, we arrive at the best-fit ellipse solution. In this fashion, the elliptical parameters were derived for each band.

An additional fit was performed on the combined (J+H+K_s ) "super" image. The "super" coadd represents the optimum signal to noise representation of the galaxy, assuming normal galaxy colors and minimal reddening. Accordingly, the derived "super" coadd ellipse serves as the "default" shape for cases in which the individual band flux is fainter than: 14.4 at J, 13.9 at H and 13.5 at K, or the SNR of the galaxy is less than 5, based on the R=10" fixed circular aperture photometry. For the case in which the derived semi-major axis is less than 5" or greater than 70", the source is assumed to be round and the parameters are set accordingly. For the case in which the derived axial ratio is less than 0.10, the ellipse fit parameters are set to the corresponding fit from the "super" coadd. Finally, the "super" coadd values are also used when the individual band fit for one reason or another is not possible (e.g., when masked pixels are present within 1" of the peak pixel).

A final note regarding the ellipse fitting operation relates to stellar masking. Bright galaxies (K < 12.5) in which the inclination is large (>40 deg), are apt to be multiple point source detections strung across the disk of the galaxy, falsely induced by the sharp intensity gradient of the disk with respect to the sky. Consequently, we do not perform any stellar masking or subtraction specific to the ellipse fitting step, except when the stellar number density is high (>2000 stars deg^-2for K < 14) in which case it is more favorable to mask out nearby stars given the high probability of contamination. This ellipse-fitting detail should not be confused with the general GALWORKS procedure of near-neighbor masking prior to photometry or symmetry measurements.

Once the general orientation of the galaxy is derived, various ‘symmetry’ measures are performed. The radial/azimuthal symmetry of an object is a good indicator of its true nature. Double stars appear asymmetric across the minor axis -- that is to say, with the ellipse centered on the primary component of the double star, the resultant profile is highly asymmetric comparing one side of the major axis to the other. This is also generally the case for triple stars, although there are configurations of ³ 3 stars in which the alignment is symmetric across both the minor and major axes.

One way to measure the "symmetry" of an object is to perform a bi-symmetric spatial autocorrelation. Divide the object across the minor axis. The integrated flux in each half gives the gross bi-symmetric flux ratio. Rotate one side 180 degrees with respect to the other and multiply the resultant pieces. The autocorrelation is then normalized by the original galaxy (squared). To minimize the effects of noise and the shape of the PSF, very low SNR points (< 1.5) and the inner 3" core are avoided in this procedure. In addition to the autocorrelation, we also compute bi-symmetric cross-correlation reduced chi-square,

where p and p* are the points 180 deg apart that are being compared, N is the number of points being compared, and sigma is the pixel noise. This c measure has the advantages that it has a distribution that is well understood statistically with tabulated confidence ranges, there are no asymmetries in the distribution like those introduced in a ratio comparison, and it is insensitive to low SNR or data points near zero.

Finally, we perform an ellipse fit to the 5s isophote (per band and "super" coadd). Comparison between the default 3s and 5s fit parameters may indicate either a real asymmetry due to stellar contamination or orientation changes as a function of radius. Likewise, the goodness of fit metric can indicate both problems with the fit (due to stellar contamination or noise in the case of faint sources) or real asymmetry in the object.

3.3 Photometry

Given the assorted shape, size and surface brightness that galaxies exhibit in the near-infrared, a corresponding diverse array of apertures are used to compute the integrated fluxes. Contamination from stars within or near the aperture boundary is minimized with pixel masking -- but still remains significant when the confusion noise is high. Flux from masked pixels is ‘recovered’ with isophotal substitution, where the mean value of the elliptical isophote (based on the elliptical shape parameters, b/a and f ) replaces the given masked pixel that the isophote passes through. More detailed discussion of stellar contamination and rectification thereof in 2MASS galaxy photometry can be found in Jarrett et al. (1996).

The simplest, and therefore most robust, measures come from fixed circular apertures. A set of fixed circular apertures include the following radii: 5, 7, 10, 15, 20, 25, 30, 40, 50, 60, & 70", centered on the peak pixel based on the J-band image. We report both the integrated flux within the aperture (with fractional pixel boundaries) and the estimated uncertainty in the integrated flux. The magnitude uncertainty is primarily based upon the measured noise in the coadd, which includes both the read-noise component and background Poisson component, as well as the confusion noise component (only relevant when the stellar source density is high). The mag uncertainty, however, does not incorporate real errors due to source contamination, background gradients (e.g., airglow ridges with a higher spatial frequency than the background removal process can handle; see section 3.3), zero-point calibration error, and uncertainties in the adaptive apertures (e.g., isophotal photometry, see below). A more detailed discussion of the 2MASS galaxy photometry error tree can be found in Appendix A. Contamination or confusion flags are also attached to each flux measurement (see Appendix B).

For most galaxies in the 2MASS catalog, small fixed circular apertures give adequate "total" flux measurements. In particular, the R=7" aperture appears to have the optimum match with the coupling between the 2MASS undersampling and PSF elongation, with the H and K background noise, and with the size of galaxies fainter than K~13 mag.

Adaptive aperture photometry includes isophotal and Kron metrics. The isophotal measurements are set at the 20 mag per arcsec² isophote at K and the 21 mag per arcsec² at J, using both circular and elliptically shape-fit apertures. Kron aperture photometry (Kron 1980) employs a method in which the aperture is controlled/adapted to the first image moment radius. The Kron radius, which is frequently used in galaxy photometry as a "total" measure of the integrated flux (cf. Koo 1986), turns out to roughly correspond to the 20 mag per arcsec² isophotal radius under typical observing conditions. The minimum radius is set at R=7" due to the rapidly increasing (PSF shape and background noise driven) uncertainty in the isophotal or Kron radial measurement for radii smaller than this limit.

For purposes of computing colors, two classes of adaptive photometry are carried out: individual and fiducial. ‘Individual’ photometry refers to the use of a different adapted aperture per band, which is useful for single-band limited studies. However, the real power of 2MASS data is having simultaneous J-K, J-H and H-K colors. Colors require a consistent aperture size and shape for all three bands, based on either the J or K isophotes, respectively referred to as the "J fiducial" and "K fiducial" photometry. For the brighter galaxies in the catalog, K < 13 mag, the "K" fiducial isophotal elliptical aperture photometry appears to give the most precise measurement (based on repeatability tests), but errors in the ellipse fit to the isophote (see section 3.2) result in an unknown uncertainty in the measurement (see for example, Appendix A). The adaptive circular apertures reduce some of that uncertainty, but do increase the overall noise due to additional sky noise within the non-optimized aperture – thus, resulting in a less precise measurement.

Additional flux measures include the central surface brightness (peak pixel flux) and the "core" surface brightness (integrated flux within 5" radius of peak). Finally, a "system" measurement is carried out such that no stellar masking is performed, nor any masking of flux from neighboring galaxies. The idea is that the "system" flux indicates the total flux in and about a galaxy, so it will include the total light in or closely interacting systems. A set of contamination flags accompany the system measurements: one indicating stellar contamination and the other neighboring galaxy ‘contamination’.

3.4 Source Positions

In addition to the coordinate position based on the PSF-fit operation, two additional "extended source" positions are computed. The first is based upon the peak pixel from the J-band image. The second is based upon the intensity-weighted centroid of the J+H+K_s "super" coadd image. The absolute precision of the peak-pixel coordinate is limited by the 2" resolution and convolution method used to construct/resample "coadd" images (1" pixels) from raw frames (with 2" pixels). These positions possess an absolute uncertainty ~0.5". The "super" centroid coordinate position is more precise since it applies a 2-D centroid to higher SNR data (the "super" image possesses effectively sqrt(2) lower noise relative to a single band image). The estimated uncertainty of the "super" centroid position is ~0.3 to 0.5" for normal surface brightness galaxies.

3.5 Source Extraction

Sources that pass the star-galaxy discrimination tests and have an integrated flux brighter than the mag limits: J= 15.5, H = 14.8, K = 14.3 MINUS the confusion noise (which is zero mag for low stellar number density fields), are extracted to the 2MASS extended source database. The source information includes coordinate positions, elliptical shape parameters, star-galaxy discrimination parametric scores, photometry and various flags indicating stellar contamination, cross-identification (with known galaxy catalogs derived from the NASA Extragalactic Database) and processing status. A list of the ‘standard’ extended source parameters can be found in Appendix B.

For each extended source, a small "postage stamp" image is clipped from the larger Atlas (coadd) Image. The stamp images are in fact data cubes, with each band J, H and Ks, forming the planes of the cube. The image/cube size is constrained by the final size of the galaxy as measured with maximum between the Kron and the isophotal radii, with a minimum diameter of 21" and a maximum diameter of 101". The dynamic image size (and boundary conditions) reflect the real-world limitations of the finite storage capability of the 2MASS database. The stamp images provide all of the information that is needed to extract image information (e.g., photometry, positions, shapes, morphology, etc) except the larger-area environment was used to fit remove a local background (section 3.1) and evaluate contamination from large-scale structure objects. Finally, the stamp images include (within the header) the photometric zero-point calibration values.