2MASS Spring 1999 Explanatory Supplement: Data Processing

6. Extended Source Catalog & Basic Results

Incremental 2MASS public data release is in the form of an extended source catalog, derived from the pipeline working survey database (WSD). The catalog is constructed to meet a pre-determined set of general requirements that include sensitivity, completeness and reliability limits designed to achieve the science goals of the survey. The first major public release (spring 1999) of the 2MASS extended source catalog will contain some 80,000 sources. A smaller "sampler" catalog was released in the fall of 1998 (see First 2MASS Public Release Extended Source Catalog and the 2MASS Data Sampler).

The requirements are germane to regions of the sky where confusion is not a major impediment, corresponding to a stellar number density less than ~1200 stars deg^-2 brighter than 13.5 mag at K -- which is most of the observable sky (see Figure C.1). The key level-1 requirements for unconfused regions are: the (differential) completeness limit: >90% for sensitivity limits of 15.0 mag at J, 14.2 mag at H and 13.5 mag at K; the reliability limit: >98% for the same sensitivity limits; and the photometric limit: ~13.2 mag at K (SNR 10:1), corresponding to 0.5 mag brighter than the survey limit. The reliability requirement is sufficiently difficult to achieve that special care must be taken to eliminate high-probability contaminant (false) sources from the catalog, including those in regions of close proximity to bright stars and transient events.

For more confused regions, 1300 to 4000 stars deg^-2 brighter than 14th mag at K, corresponding to galactic latitudes between 5-10° and 20° (see Figure C.1), the survey requirements are relaxed to a minimum of 80% internal completeness. Finally, for the highest stellar density fields, >4000 stars deg^-2 brighter than 14th mag at K (or |glat| < 5° ) there are no requirements. In practice, real extended sources are identified in regions with stellar number densities as high as 10000 stars deg^-2 brighter than 14th mag at K, but at higher densities stellar source confusion overwhelms any ability to positively identify galaxies, and the reliability dives to zero.

Below we describe the basic steps at catalog generation, followed by representative table and summary figures of the expected completeness and reliability of the catalog, as well as other basic properties of the extended sources.

6.1 Catalog Generation

Converting the 2MASS working survey database to final products for public use involves a number of steps, basically reduced to (1) artifact and false source removal, (2) consolidation or elimination of duplicate observations, (3) calibration adjustments, (4) supplementary contamination flagging, (5) catalog source selections, (6) catalog source associations, (7) star-galaxy classification scoring, and finally, (7) verification or validation of catalog and image products. Detailed explanation of these operations is given in the impending release of the 2MASS Explanatory Supplement, and can also be found at Recipe For Making An Extended Source Catalog For The First Release. For the extended source catalog, the most important steps are the artifact removal, classification and verification.

Image artifacts, which induce false source detections, are produced by bright stars and transient phenomena such as meteor streaks (cf. section 5.3 and Appendix H, Data Anomalies and Artifacts ; see also Data Artifacts: Overview of Known Artifacts and Data Artifacts: General Overview). For bright stars, the removal process is straight forward if the position and approximate brightness of the star is known. The predicted location of diffraction spikes, halo emission, horizontal stripes, glints and ghosts may be masked accordingly. For the "scan" or "tile" pipeline reduction process, only bright stars within the scan itself has this kind of masking applied. For bright stars located in an adjacent scan or on the edge of a scan, their artifacts may be present within the scan being processed without any pixel masking being possible of the location of the bright star is unknown. The working survey database, however, will usually contain the necessary information to catch these artifacts after the pipeline reduction process.

Identification and removal of meteor streaks is a more complicated matter. Their footprint ‘signature’ mimics real astronomical sources (e.g., edge-on galaxies), while their transient nature belies any prediction of their coming and going. It is possible, however, to identify highly probable meteor streaks using the working survey database to find position-correlated point sources (and to a lesser extent, extended sources) with non optically-selected counterparts. Another method is to use sources detected in only one out of 5 or 6 apparitions -- recall that the 2MASS scanning technique is to image each piece of sky a total of six times in a time period of less than 10 seconds, with each "frame" slightly position dithered to minimize the deleterious effects of the large 2MASS pixel scale -- since transient phenomena such as meteor streaks will appear in only one or two frames at most. A meteor streak will induce "false" sources along the trajectory of the streak, increasing the number density of in a small vectored area, which have no counterparts in other all sky surveys. Once these regions of highly probable meteor streaks are identified, all sources (including extended ones) within some distance to the streak are flagged as such and are eliminated from the catalog or simply flagged as probable artifact sources.

The artifact removal process greatly improves the reliability of the point and extended source catalogs, but the >98% requirement for extended sources necessitates additional pruning to eliminate double and triple stars. A simple but powerful sets of star-galaxy pruning tests or "separation nodes" include the "sh", "wsh" and "r23" parameters (see section 4.1 & 4.2 for details). Here we apply an oblique threshold in the parameter vs. integrated mag space (cf. Figure 24). A source must ‘pass’ this threshold in all three tests or nodes to be classified as an extended source (otherwise it is classified as non-extended). A combined "score", referred to as the "E" score, may be created by performing a weighted average of the J, H and Ks classification measures, but note, this will tend to favor the J-band classification since the sensitivity at J is superior to that of H and K for typical galaxy colors. The weights correspond to the signal to noise ratio as measured in the separate bands using the fixed circular R=7" aperture photometry (which is the most robust flux measure). In Figure 6-0, upper panel, the "E" score for ~2000 extended source candidates from the WSD are shown as a function of the integrated J mag. Real galaxies are denoted by filled circles, double stars by triangles and higher multiples of stars by cross symbols. Galaxies cluster in two places, either at a score of 1.0 or around 1.1 to 1.4 at the faint end (J > 14^th, K < 13^th). The latter clustering is due to the weighted-average nature of the scores (for each band separately, the score is either 1 or 2, so with weighted averaging, where the weight corresponds to the SNR of the source, the score value jumps from 1 to some intermediate value like 1.3). False galaxies are predominately located at 2.0 with clustering around 1.5 to 1.8. Hence, thresholding between 1.3 to 1.5 eliminate most false detections while retaining most real galaxies. In fact, the optimum "E" score dividing extended sources and point sources while balancing differential completeness and reliability, appears to be ~1.4 for each band, demonstrated in Figure 6.1A.

The "E" score is designed to select both galaxies and Galactic extended sources (e.g., planetary nebulae) at a reliability rate of >95%. Note however, even more accurate star-galaxy classification is needed in order to meet the level-1 reliability specifications. Given the large parameter space that is needed for star-galaxy separation (see section 4), well beyond the three parameters used in the "E" or "extended score", we employ a decision tree algorithm (section 4.4) to generate two distinct classifications: galaxy and non-extended (i.e., point-like objects and stars). This new score, referred to as the "G" score, is optimized to select galaxies (i.e., source beyond the confines of the Milky Way). The decision tree classifier is run on each band (J,H & K_s) separately, thus generating three independent measures of the "galaxyness". A combined "score" is derived from the weighted average of the three classification measures, the weights correspond to the signal to noise ratio as measured in the separate bands using the fixed circular R=7" aperture. Analogous to the "E" score, the "G" or "galaxy score", has values that range from 1 to 2, with unity being the most probable score of an extended object and 2 being the most probable value of a point source (See Figure 6-0, lower panel). The "optimum" score appears to be ~1.3 to 1.4 (see Figure 6-1B) for both low and moderate stellar number density fields.

The final catalog generation step is to test and validate the catalog and to confirm that the level-1 requirements are satisfied. The completeness and reliability are computed directly from ‘training’ sets or from data sets within the catalog that have been visually inspected (using both 2MASS image data and independent data, such as the Digital Sky Survey) and classified accordingly. The sensitivity limits (including completeness) are derived from the differential source counts. Plots of these basic measurements are given in the next section below.

6.2 Basic Results

Most of the following information is derived from sample of ~16,000 candidate extended sources coming from five separate nights of data taken in November and December of 1997, and January of 1998. Each source has been visually inspected using the 2MASS image data and the Digital Sky Survey (DSS) in order to derive a "classification", ranging from (1) galaxy or extended object, (2) isolated star, (3) double star, (4) triple star, (5) artifact, or (6) unknown. Here "unknown" refers to our inability to distinguish the object from one class to another. In this way we construct a "training set" from which we may test the automated classification schemes (e.g., the decision tree classifier) and verify the quality of the 2MASS extended source catalog. In addition to visual inspection, cross-referencing with previously catalogued galaxies (obtained from the NASA Extragalactic Database) is used to further refine the classification of the "training set". Finally, we have obtained both higher resolution imaging data and spectroscopic redshift information for a number of fields scattered across the sky to refine and check the reliability of the "training set" itself. We also show results for ~80,000 sources in the first 2MASS extended source catalog.

6.2.1 Internal Completeness and Reliability

The probability of a source being extended is derived from two separate "scores", the "E" score and the "G" score (see section 6.1). It is the "G" score that is relevant to the 2MASS level-1 requirements, which translates "extended source" to mean a galaxy. The following results are derived from application of a "G" score threshold equal to 1.4, the optimum balance between completeness and reliability.

The internal completeness is defined to be the ratio of the total number of real (verified) galaxies that pass the "G" threshold to the total number of real (verified) galaxies detected and extracted with 2MASS. The reliability is defined to be the total number of real (verified) galaxies that pass the "G" threshold to the sum of the total number of "false" galaxies (e.g., double stars) and real galaxies that pass the "G" threshold. The completeness and reliability computations are segregated into flux bins, with the typical bin size 0.5 mag. The completeness and reliability results corresponding to fields with low stellar number density, <1200 stars deg^-2 brighter than 14th mag at K, are shown in Figure 6-2. The first set (6-2a) shows the results with no restriction on the radial size of the extended source candidates, while the second set (6-2b) shows the results for sources satisfying the criterion: "sh" value (Eq. 1) > 0.5² . The level-1 requirements include the "sh" > 0.5² boundary condition, as well as the flux limits of J < 15 mag, H < 14.2 mag and K < 13.5 mag. Both the completeness and reliability range from 97 to 100% up to these flux levels.

At higher stellar number density, >1200 (Fig. 6-3) and > 4000 (Fig 6-4) stars deg^-2 brighter than 14th mag at K, the completeness and reliability range from 80 to 100%, with typical values around 90%. Here double and, in particular, triple+ stars are the primary contaminant. For number densities greater than 10,000 stars deg^-2 brighter than 14th mag at K, confusion from foreground stars completely overwhelms any reliable detection and extraction of real galaxies. Finally, note that both measures roll off 0.5 to 1 mag brighter than the low stellar number density case, again due to significant source confusion resulting in reduce sensitivity.

6.2.2 Source Counts

The low density fields comprise an area of some 200 square degrees containing mostly field galaxies, but also a few nearby clusters (e.g., Abell 262 and our parts of the Virgo cluster); hence, the counts are representative of the field. The differential galaxy counts, sources mag^-1 deg^-2 are given in Figure 6-5 (low density. The completeness limits are evidently ~15.2, 14.4, and 14.0, for J, H and K, respectively, with the K limits driven primarily by detections at J-band (for normal galaxy colors of J-K ~ 1.1). For comparison, galaxy counts coming from deep (but limited area) K-band surveys of Glazebrook et al. (1994) and Gardner et al. (1997) are also included in Figure 6-5. The 2MASS counts are in close correspondence with the Glazebrook counts for K < 13.5, but at the faint end the 2MASS counts are relatively more abundant. The latter effect may be due to flux overestimation, where intrinsically faint (but most abundant) sources scatter into the faintest 2MASS flux bins.

For the moderate density fields, ~100 square degrees of area, the counts are shown in Figure 6-6. Due to the increased confusion noise (cf. Figure C.1 and section 6.2.1) the completeness has now dropped about 0.5 mag relative to the low density fields with a 2 to 3 times fewer sources per mag interval. For the high density fields, ~25 square degrees of area, the confusion noise is >1 mag, and the galaxy counts (Figure 6-7) show the corresponding decrease in total numbers with few galaxies appearing beyond 13^th at K.

6.2.3. Angular Size Distribution

The 2MASS survey is designed to detect and characterize galaxies larger than ~5² and smaller than ~80² in radius. The lower limit is driven by the 2² plate-scale resolution and the throughput point spread function, typically 3 to 5² in full width. For sources smaller than ~5² , the ability to distinguish extended sources from point sources rapidly degrades. Moreover, for galaxies smaller than ~7² it is not possible to reliably isolate a 3-sigma, much less a 1-sigma, isophote from which a characteristic elliptical shape is derived. Hence, a conservative minimum radius at which isophotal photometry and ellipse fitting is performed is set at 7² . The upper limit is driven by both the cross-scan angular size of the 2MASS Atlas images, 8.5', and the 51² scan to scan overlap. Galaxies larger than ~50² are subject o incompleteness due to chance positions near the edge of the Atlas images (which overlaps fail to compensate for), while galaxies larger than ~3 to 5' adversely affect the background fitting procedure (due to their relatively large size with respect to the 8.5' X 16' Atlas images).

Five different radial measures characterize the size of a galaxy: 3-sigma isophote radius (corresponding to ~18.8 mag arcsec^-2 in K-band surface brightness), 20 mag arcsec^-2 K-band isophotal radius, 21 mag arcsec^-2 J-band isophotal radius and a Kron radius. Each measure has an elliptical (semi-major axis radius, axis ratio and position angle; see sec 6.2.6 below) and circular shape version. In order to compute colors, a J-band and a K-band "fiducial" measure are used to fix radii between bands. There is also an equivalent "3-sigma" isophote measurement from the combined J+H+Ks image. Three of the elliptical semi-major axis radii measures are shown in Figure 6-8. For galaxies brighter than K ~11.5 to 12 the J and K-band isophotal radial size diverges from the minimum 7² limit, following roughly the linear relation:

6.2.4 Surface Brightness

The "peak" and "central" mean surface brightness for 2MASS galaxies is shown in Fig. 6-9. The "peak" refers to the brightness pixel in the galaxy radial profile (generally located at Dr=0² ), while the "central" mean surface brightness is computed within a radial area of 5² from the center of the galaxy, roughly the 1´ to 2´ the size of the 2MASS point spread function. For most 2MASS galaxies, K > 13^th mag, the mean K-band central surface brightness ranges from 18 to 20 mag arcsec^-2, roughly corresponding to the equivalent optical central surface brightness of 21 to 24 mag arcsec^-2, placing them in a category that is historically referred to as "low surface brightness" galaxies (cf. Bothun et al. 1991). 2MASS will detect even fainter surface brightness galaxies (see Appendix B for a discussion of the LCSB detector) with release of this special catalog at a later (to be determined) date.

6.2.5 Galaxy Colors

Most 2MASS galaxies, with Kmag < 13.75, have J-K colors greater than 1.0 (cf. Fig. 17), which is redder than most stars (due in part to the stellar populations that near-infrared imaging is sensitive to and to the redshift), an effect exploited to derive the "G" classification score. In color-color space, J-H vs. H-K, 2MASS galaxies roughly follow the expected K-correction track; see Fig. 6-10, showing the colors of 2MASS galaxies found in low stellar density fields. Note the highest SNR galaxies scatter around the redshift z = 0 point in the K-correction track (consistent with their being nearby galaxies), while the lowest SNR galaxies are found both nearby and along distant (in redshift space) parts of the K-correction. The 2MASS survey can in fact be used to compute photometric redshifts for clusters of galaxies located as distant as ~0.1 in z. For galaxies located within or near the Galactic plane, Figs. 6-11 and 6-12, the reddening effect due to dust absorption is the dominant component to the scatter; moreover, due to the rapidly increasing confusion noise, there is a loss of sensitivity for faint and therefore distant galaxies, greatly reducing their numbers.

6.2.6 Ellipse Fit Parameters

To characterize the non-circular shape of a galaxy, an ellipse is fit to the ~3-sigma isophote. Three parameters are derived: semi-major axis radius, position angle and axis ration. The "quality" of the fit is quantified with a parameter referred to as "%echi", see Sec 3.4. In Figs. 6-13 and 6-14 the distribution of axis ratio for bright galaxies and faint galaxies, respectively, is shown. The distribution linearly increases from edge-on galaxies (b/a < 0.2) to nearly face-on galaxies, b/a ~0.8, with the peak axis ratio centered ~0.75. Between bands, there is consistent agreement between the J and K axis ratio measurements when the fitting quality is high (as measured with "goodness" of fit score, %echi), see the bottom panel of Figs. 6-13 and 6-14.

For the position angle distribution, the sample of galaxies is separated into two groups, those with highly inclined disks, b/a < 0.5 and the rounder (but not fully face-on, where the position angle is meaningless or poorly measured) galaxies, 0.5 < b/a < 0.75. Figs. 6-15 and 6-16 show the position angle distributions for highly inclined galaxies. For the brighter galaxies, J < 13.5, the distribution is flat across all angles, while the J to K comparison (bottom panels) is very good (scatter < 10° ). For faint galaxies, J > 13th, K > 12^th, there is indication of a slight position angle biased centered at -35° and +55° , possibly related to the non-circular (slightly out of focus) shape of the point spread function. Faint galaxies are typically smaller than 7" (see Fig. 6-8) and thus most of their flux is confined to an area associated with the point spread function. A slight (few degrees) bias in the position angle of the non-circular PSF can also induce a slight (few degree) bias in the galaxy 3-sigma isophote position angle. For more circular galaxies, Figs. 6-17 and 6-18 (bright and faint galaxies, respectively), not surprisingly the position angle is more uncertain (see bottom panels) with a dispersion ~10 to 20° and the faint galaxy position angle bias is even more apparent (see top panel Fig. 6-18) with these rounder-orientation galaxies.

6.2.7 Comparison with Previously Catalogued Galaxies

Cross-matching galaxies (with accurate coordinate positions) found in the NASA Extragalactic Database (NED) with the first public release 2MASS extended source catalog, galaxies brighter than 15^th at J and 13.5 at K, we find an average match rate of ~14%. This rate is mostly representative of galaxies found in the field, whereas cluster galaxies (e.g., Virgo cluster) would have a much higher match rate since they are well studied and mostly complete within NED. Fig. 6-19 shows the NED-2MASS match J-band flux distribution (top panel), coordinate position difference (middle panel) and redshift distributions (bottom panel). Most NED/2MASS galaxies have J-band fluxes between 13 and 15 mag, with redshifts mostly dominated by nearby (z < 0.05) galaxies. The equatorial coordinate difference shows both a dispersion (~0.5²) and a slight systematic in the right ascension, ~-0.4² , and declination, ~0.02² , in the sense that the 2MASS positions are systematically further east and south compared to the NED sample. The nature and origin of this position bias is unknown at this time, but we note that comparing the uniform-by-design 2MASS positions with the heterogeneous sample coming from NED is inevitably complex. Finally, a skyplot (Fig. 6-20) of the 2MASS catalog sources and the NED sources demonstrates the areal coverage of 2MASS (crossing the Galactic plane) and the match rate. Note that the match rate is very high (100%) for a few small areas: this NED data corresponds to the 2MASS Sampler Catalog, which is a subset of the larger incremental 2MASS Extended Source Catalog.