Appendix 6. Merged Working Databases

2. The Merging Process

Source merging within each of the individual 2MASS Working Database (WDB) tables first required finding all available detections of objects in regions of the sky that were observed multiple times. Once groups of associated detections were identified, then the independent measurements were combined to provide refined photometry and astrometry.

WDB merge processing for the 2MASS Extended Mission was performed using the Working Auto-Correlation software (WAX; Monkewitz and Wheelock 2005, in "Astronomical Data Analysis Software and Systems (ADASS) XIV," ASP Conference Series Vol. 347, p148; P.L. Shopbell, M.C. Britton and R. Ebert, eds. ) WAX was designed specifically for efficient operation on very large data sets allowing partitioning of the sky into declination-ordered bands for enabling parallel operation. Detailed documentation for WAX is available on-line at this link..

a. Associating Groups of Detections

Independent measurements of the positions of an inertial source will be spread over a small error because of natural measurement errors. The degree of spread is primarily a function of source signal-to-noise ratio, but can also be increased by systematic astrometric calibration errors, and confusion with nearby sources and transients such as cosmic rays and hot pixel events impinging on a source image. If the true position of a source was known a priori, then finding all available detections of it within a given WDB table would simply require searching a region around that position that was large enough to cover the expected spread in the individual sightings. Of course, a priori positions are not known, so WAX employed a detection density-directed search to identify spatially associated groups of extractions.

WAX processing began by making a pass through a WDB table to compute for each entry the centroid and density of a provisional group of extractions that lie within a specified angular distance from the initial entry, r_i. The density is defined as the number of other extractions that fall within the search region, and the centroid is the average position of all provisional group members. For this step, relatively small matching radii, r_i, were used: 1" for point source tables and 2.5" for extended source tables. These limits produced high reliability groupings at the expense of missing some real group members.

Next, the WDB entries were sorted into decreasing density order forming a queue of possible seeds. Final groups were then constructed from the queue of seeds as follows:

Generate a group g from the seed s at the head of the queue that contains all WDB extractions within a maximum separation, r_f, from the centroid of s.
Extractions assigned to g are removed from the queue of seeds, but are allowed to be associated with other groups. Note that s at the head of the queue is always removed from the seed queue.
While the queue is not empty, repeat steps 1 and 2.

The density-ordered use of seeds is akin to peak finding in source detection algorithms, but results in extremely poor WDB access patterns (both to disk and in memory data). The algorithm therefore traverses the input data multiple times in spatial order, generating groups around the seeds encountered according to the following rules:

A group is generated around a seed centroid if and only if the seed cannot be a member of any group generated from a denser seed.
When a group is generated around a seed s, then an extraction assigned to s is discarded from the set of seeds if and only if it is less dense than s.

These rules result in identical output to the basic in-density-order algorithm described above, regardless of the order in which seeds are actually considered.

In some circumstances, the merging algorithm assigned an extraction to more than one group. These extractions, and the groups containing them, are said to be confused. The grouping algorithm employed is conservative: if a detection can be associated with more than one group, then it is allowed to do so, and that detection and all of its containing groups are flagged as confused. No attempt was made to resolve confused groups to avoid introducing biases.

The group identification process lays down a set of sometimes overlapping circular regions on the sky, and is thus figuratively called the "swiss cheese" algorithm. Figure 1 illustrates the "swiss cheese" algorithm, and shows how confusion can arise. Panel (a) in the upper left shows the distribution of source extractions in a small region of the sky that was observed multiple times. Each one of these extractions has an associated density and centroid computed from the first pass of WAX processing. The first group is identified in panel (b) having the highest density of any of the extractions, and all of the extractions in that group are removed from the list of seeds. The second group is found among the remaining seeds in panel (c), and its members are removed from the seed list. In panels (d)-(f), groups are generated around the remaining seeds that can incorporate extractions already assigned to another group.

Figure 1 - The "swiss cheese" algorithm for identifying spatially associated groups of extractions in the 2MASS WDBs operating on a small region of the sky. (a) The raw extractions. (b-c) The first and second groups are generated. Unprocessed extractions are drawn in blak, unconfused extractions and groups in light grey. (d-f) As more groups are generated, confusion - drawin in dark grey - appears. (from Monkewitz and Wheelock 2005)

The angular separation limits used to generate the final associated groups, r_f, were 2" for the point source tables and 5" for the extended source tables. These limits are large enough to recover virtually all possible detections of sources, but not so large as to be badly contaminated by detections of nearby sources. Figures 2 and 3 show details of the distributions of the nearest-neighbor distances (prox) from the 2MASS All-Sky PSC and XSC, respectively. The r_f values were selected to be slightly less than half of the size defined by the separations at which the prox distributions turn over sharply. For extended sources, r_f was selected to be equal to the point at which the prox distribution drops to a constant, near zero level. A slightly more conservative (i.e. smaller) limit was adopted for point sources because the internal position residuals measured in scan overlap regions is typically <1" (see Figure 14 of Section I.6.b). The number of single (non-merged) extractions, unconfused merged groups and confused merged groups produced by merging a section of the 6x point source WDB using a range of matching radius values, r_f, are shown in Figure 4. The broad minimum in the number of confused merged groups seen in the bottom panel shows that a matching search radius of 2" provides the best balance between source detection completeness and nearby source confusion.

Figure 2 - The nearest neighbor distribution (prox) for all sources in the 2MASS PSC. Figure 3 - The nearest neighbor distribution (prox) for all sources in the 2MASS XSC. Figure 4 - The number of point source single extractions (top), unconfused merged groups (middle) and confused groups (bottom) produced as a function of merging separation radius, r_f.

The WAX spatial auto-correlation process described above produced a set of associated groups of extractions for each WDB table. Each associated group was assigned a group identifier (gcntr) that is unique within a given WDB table. The 2MASS Merged Point and Extended Source Cross-Reference tables provide a link between the group and the individual WDB extractions that comprise it. For each associated group, the Cross-Reference tables contain a set of lines that give the group identifier followed by the identification number of the individual extraction in the respective WDB table (pts_key/cntr or ext_key/cntr) and the number of groups of which that particular extraction is a member. The number of lines in the set is equal to the number of extractions in the group. The format of the Merged Source Cross-Reference tables is described in A6.3.c.

b. Pre-filtering the WDB Tables

In regions of the sky covered repeatedly by many 2MASS scans, such as the equatorial poles and the calibration fields, there is a high probability of chance associations between spurious noise extractions and between real detections and noise extractions. To minimize the potential of contamination by spurious extractions in the merging process, only WDB entries with >50% probability of being real source detections (rel MATCHES '[A-D]') were input to WAX.

The impact of pre-filtering of the WDBs before merging is illustrated in Figure 5. This is an animated GIF that cycles through five panels showing the J-band image of a 5'x5' region near the north celestial pole (00^h+89°54') from a single 2MASS survey scan. The first panel shows just image. In the second panel, red points show the location of all Survey point source WDB extractions in a 6' radius circular region centered on the field. The depth of coverage of survey scans ranges from ~45 near the lower right of the image to over 400 in the upper left, and is reflected in the gradient of the essentially randomly distributed noise extractions. The third panel shows in green the location of the extractions with >50% reliability probabilities. Note that the filtering does not remove all spurious extractions, but greatly reduces the apparent surface density. The resulting merged groups are shown by the large blue circles in the fourth and fifth panels. There are a few merged groups that do not correspond to any obvious source in the J-band image. Most of these are unreliable merges characterized by very low sdet/spos ratios. In a few cases, there are real sources that are detected on the H and K_s images.

Limiting the merge processing to WDB entries with >50% reliability probability has two consequences. First, not all unreliable extractions are filtered out, so there can still be a few chance associations resulting in merged sources that are not real objects on the sky. The merged groups that do not correspond to any obvious source in the last panel of Figure 5 are examples of this. Most of these are characterized by very low sdet/spos ratios. Second, pre-filtering the WDBs excludes some real but low SNR source detections from the merge processing. Consequently, some multiply-detected faint source may have artificially low confirmation rates (sdet/spos), or they may be omitted from the Merged Source output tables altogether. It is always a good idea to search the search the full WDB tables in the vicinity of faint sources of interest to determine if there are other detections available that were not captured in the merge.

Figure 5 - Animated GIF image showing J-band image from a single 2MASS Survey scan of a 5'x5' region near the north celestial pole. (A) shows just the image. (B) shows in red the location of all survey point source WDB extractions in the region. (C) shows in green the WDB extractions with >50% probability of being reliable, (D) same as (C), but adding the Merged survey point sources as blue circles. (E) shows just the merged point source locations.

c. Computing Merged Source Parameters

The second phase of WAX processing computed the refined or merged positions and brightnesses and an assortment of statistics for each associated group that contained more than one extraction using the individual measurements of the group members. The refined parameters for each merged group are contained in the point and extended source Merged WDB Source Information tables. A detailed listing and description of the column in the Merged Information tables is given in A6.3. The following sections describe how refined parameters were computed.

i. Group Information and Sky Coverage

Group identifier (gcntr) - Each associated group of extractions is assigned this identifier that is unique within a single WDB merge. The value of gcntr is set to the identifier (pts_key/cntr, ext_key/cntr) of the seed extraction from which the group was generated.
The group identifier has also been added to the WDB record for each group member so that it is easy to see when additional measurements of an object may be available. In confused cases where a WDB extraction has been associated with more than one group, the value of gcntr assigned to the WDB entry corresponds to the group with mean position closest to the position of the extraction.

Number of possible scan coverages (spos) - This indicates the number of scans that covered the mean position of the merged group within a given data set (i.e. within the Survey observations for the Survey Merged WDBs, within the 6x observations for the 6x Merged WDBs, etc). Scan boundaries are computed using a great-circle interpolation between the astrometrically-reconstructed positions of the scan's corners. The actual scan path of the telescopes sometimes deviated slightly from a great circle, so the so true scan boundaries can fall a few arcseconds inside or outside the interpolated boundaries (5.3.a).
This value is carried in the WDB record of each group member.

Minimum/Maximum Scan Coverage (smin,smax) - Because the footprint of each merged groups is defined by a circular region rather than a point, the number of coverages may differ from spos depending on what point in the region is considered. the values of smin and smax give the minimum and maximum number of scans within a data set that overlap the region covered by the merged group.

Total number of extractions (napp) - The total number of extractions associated with the merged group. This number is always >2 because single source detections were not included in the Merged WDB output. This number can be larger than spos or smax if the group contains extractions near scan edges because of the approximations used to define scan boundaries, and because a group may contain more than one extraction within a given scan in small angular search area used in the merge due to missed bandmerges or other confusion (see Figure 2).

Number of unique detections (sdet) - The number of unique scans in which there were detections in one or more bands associated with this group. The value of sdet is always <napp. It may be equal to "1" in cases where the all extractions associated with the group come from one scan.
This value is carried in the WDB record of each group member

ii. Position Refinement and Statistics

Refined position estimation for the groups were computed using the equatorial positions and uncertainties (where available) of all group members.

Mean group positions and uncertainties (ra, dec, emj_mr, emn_mrg, ean_mrg) - The merged position and uncertainty of each group was computed one of two ways, depending on the quality and availability of position error ellipses for each of the group members.
For the 2MASS Survey and 6x point source Merged WDBs, updated positions and uncertainties were evaluated using Gaussian refinement incorporating the covariance matrix of the individual extractions, _i:

(Eq. A6.2.b.1)

where X and Y correspond to right ascension and declination. The refined coordinates of the merged group, X_r, Y_r, are given by:

(Eq. A6.2.b.2)

where, _r is the merged position covariance:

(Eq. A6.2.b.3)

from which the components of the merged position error ellipse, emj_mrg, emn_mrg and ean_mrg, are derived.
Position uncertainties are not available for the 2MASS extended source WDBs, and those in the 2MASS Calibration point source WDB are not suitable for computing a reliable covariance because of the addition of a constant "floor" value (A4.4.b). For these WDBs, the merged group position is the simple average of the positions of the individual extractions. The dimensions of the merged position uncertainty ellipse are given by the standard deviation of the mean positions on each axis. The semi-major axis is assigned the value of the larger of the uncertainties on each axis, and the semi-minor axis the smaller of the two values. The position angle is set to 90° if the ra uncertainty is larger or 0° if the dec uncertainty is larger.

Radial position residuals (sep_avg, sep_sig, sep_mxfmrg, sep_jdmax) - The average and standard deviation of radial distance between the mean position and each group member are given by sep_avg and sep_sig, respectively.
For the merged point source WDBs, the maximum radial separation from the mean position of any group member is given by sep_mxfmr. The radial distance separating the two members of a group having the earliest and latest observation dates, respectively, is given by sep_jdmax. These two parameters may be useful for identifying merged groups that exhibit source motion or that otherwise have aberrant position distributions.
The extended source Merged Source Information tables do not contain the sep_mxfmr and sep_jdmax columns.

Position chi-squared statistics (chisq_grp2d, chisq_mx2d, sep_mxrad) - The two-dimensional ² statistic computed using the ensemble of group member positions is given by chisq_grp2d. The radial separation from the mean group position of the group member that has the largest individual positional ² value, chisq_mx2d, is given by sep_mxrad.
Note that the chi-squared statistics are computed only for the 2MASS Survey and 6x point source Merged Source WDBs. Those values are always "null" in the Calibration point source WDBs. The extended source Merged Source Information tables do not contain these columns.

iii. Flux Refinement and Statistics

The photometry for each associated group was combined band-by-band. Within a given band, the mean and weighted mean brightnesses were computed using the subset of detections that satisfied the following quality criteria:

Point sources:

Valid detection in the band considered (rd_flg=1,2,3,4); no upper limits included
Photometric uncertainties in the range 0<[jhk]_msig<5
Detection not based on a single R1 frame
Extraction position is >10" from a scan edge
Not identified as an image artifact detection in the band, and is not marked as being affected by a latent image or diffraction spike (cc_flg[1,2,3] NOT MATCHES '[A-Zdp]').

Extended sources:

Valid measurement in either a 7" circular aperture or the K_s=20 mag arcsec² elliptical isophotal aperture ([jhk]_m_7 IS NOT NULL OR [jhk]_m_k20fe IS NOT NULL; no upper limits included
Photometric uncertainties in the range 0<[jhk]_msig<20
Extraction position is >15" from a scan edge
Photometric quality flags indicating other measurement contamination ([jhk]_flg_7 != 0 OR [jhk]_flg_k20fe != 0).

Because upper limits for non-detections were not included in the flux refinements for each band, the merged brightness for faint sources near the 2MASS detection threshold will by systematically overestimated. This essentially "institutionalizes" the flux-overestimation bias that exists in the single measurements of low SNR sources in the WDB.

All brightness computations were performed in flux units. Extraction magnitudes were converted into flux units, mean fluxes and statistics were evaluated, and then the results were converted back into magnitudes, where appropriate.

Mean magnitude and standard deviation ([jhk]_mavg, [jhk]_mstdev, [jhk]_m_[7,k20fe]_avg, [jhk]_mstdev_[7,k20fe]) - The mean and standard deviation of the mean magnitude of all detections in each band. For point sources, this is computed using the "default magnitudes" ([jhk]_m). For extended sources, this is computed using the brightnesses in the 7" circular aperture ([jhk]_m_7) and the K_s=20 mag arcsec² elliptical fiducial isophotal apertures ([jhk]_m_k20fe). If there is only one detection available in a band, then its magnitude and uncertainty are reported. This column is null if there are no measurements available in a band.

Inverse variance-weighted mean magnitude and uncertainty ([jhk]_mwavg, [jhk]_mwunc, [jhk]_m_[7,k20fe]_wavg, [jhk]_m_[7,k20fe]_wunc) - Inverse variance-weighted mean of all detections in each band. The refined uncertainty is computed from the inverse variances of each available detection ( _r^-2 = sum[_i^-2] ). For point sources, this is computed using the "default magnitudes" and their uncertainties ([jhk]_m). For extended sources, this is computed using the photometry and uncertainties in the 7" circular aperture ([jhk]_m_7) and the K_s=20 mag arcsec² elliptical fiducial isophotal apertures ([jhk]_m_k20fe). If there is only one detection available in a band, then its magnitude and uncertainty are reported. This column is null if there are no measurements available in a band.

Number of extractions going into the mean flux computations - ([jhk]_n, [jhk]_n_7, [jhk]_n_k20fe) - These columns give the number of measurements that met the quality criteria given above and were used in the respective average and weighted average brightness computations. This number can be zero.

Flux chi-squared statistics - ([jhk]_m_chisq, [jhk]_mchisq_7, [jhk]_mchisq_k20fe - Chi-squared statistic of the flux distribution of detections in each band that were used in the mean and weighted-mean computations. Values >>1 indicate that the flux distributions are non-gaussian which can be a sign of merge confusion or flux variability. These columns are null if there are fewer than 2 usable extractions in a band.

Min and max flux deviations ([jhk]_mndev, [jhk]_mxdev, [jhk]_mndev_[7,k20fe], [jhk]_mxdev_[7mk20fe]) - The largest positive (mxdev) and negative (mndev) deviations from the weighted average magnitude, scaled by the photometric uncertainty, among all selected extractions in each band. These columns indicate whether there are significant deviations from the weighted mean brightness in n-sigma units. These columns are null if there are fewer than 2 usable extractions in a band.

Stellar confusion flag (Merged extended source WDBs only) (max_[7,k20fe]_flg) - Maximum value of the J, H or K_s photometry confusion flag for the 7" circular or K_s=20 mag arcsec² elliptical isophotal aperture measurements for all selected extractions in this group. Non-zero values indicate some level of stellar contamination in the extended source measurements for this object.

Figures 6-8 illustrate how the combined photometric statistics can be used to identify candidate variable sources. Distributions of selected J-band merged source statistics taken from the Survey Merged point source table in a 5 deg² region in the Orion Trapezium region are shown in the figures. Carpenter, Hillenbrand & Skrutskie (2001 AJ, 121, 3160) identified 1,235 variable stars from an analysis of the 18-26 repeated 2MASS survey observations of this region. These variables are indicated with white points in Figures 6-8, and are clear outliers in each of the distributions.

Figure 6 - Standard deviation of the mean J-band brightness (j_mstdev) plotted versus average J-band magnitude for stars in the Orion Trapezium region scanned 18-26 times during the 2MASS survey. The white points highlight variable stars that were identified in the study of Carpenter, et al. 2001 AJ, 121, 3160). Figure 7 - J-band chi-squared (j_m_chisq) plotted versus average J-band magnitude for same Orion stars shown in Figure 6. Color-coding is the same. Figure 8 - Largest negative n-sigma deviations from the weighted average J-band magnitudes (j_mndev) plotted versus weighted average J-band magnitude for same Orion stars as shown in Figure 6. Color-coding is the same.

iv. Combined Shape Information (Merged extended source WDBs only)

The Merged extended source Information tables contain refined elliptical isophote fit parameters that are computed using the ellipse parameters of each individual extraction in the associated groups. The ellipse size, shape and orientation values for a given extraction were used only if they are all not null.

Mean and standard deviation of K_s ellipse parameters - (r_k20fe_avg, k_ba_avg, k_phi_avg, r_k20fe_rms, k_ba_rms, k_phi_rms - The mean and standard deviation of the mean of the semi-major axis (r_k20fe), axial ratio (k_ba), and position angle (k_phi) of the ellipses fit to the K_s=20 mag arcsec^-2 isophotes of each extended source extraction in the group.

Mean and standard deviation of "super" ellipse parameters - (sup_ba_avg, sup_phi_avg, sup_ba_rms, sup_phi_rms - The mean and standard deviation of the mean of the axial ratio (sup_ba), and position angle (sup_phi) of the ellipses fit to isophotes fit to the combined J+H+K_s images of each extended source extraction in the group.

Number of extractions in average shape (k_n_shape, sup_n_shape) - The number of extended source extractions that were used in the computation of the average K_s=20 mag arcsec^-2 isophotal and "super" isophotal ellipse parameters.

v. Merge Quality Flags

The following parameters and flags are provided with each merged source to help with assessment of the accuracy of the association of extractions and the ensuing combined positions and photometry.

Confusion flag (gcnf) - Merged groups that contain extractions that were associated with more than one group are said to be confused. This flag indicates the absence or presence of merge confusion.
- "0,1" - Extractions in this group associated with this group only
- "2,3" - One or more extractions in this group were associated with one or more other groups
The merged combined photometry and/or astrometry of merged sources with gcnf>1 should be used with caution. You are strongly encouraged to examine each of the detections comprising the merged group, and surrounding groups that are available in the respective WDB tables to determine for yourself if there are better possible combinations.
This flag is carried in the WDB record of each group member.

Average reliability (rel_avg) - The average reliability score of all members of an associated group. This parameter is intended to reflect the quality of the extractions comprising each merged source in the same sense as the reliability score. It is computed by converting the reliability score (rel) to an integer between 1 and 5 (rel=F and A, respectively), averaging the integers, and converting back into a letter score. Averages with fractional parts >0.5 were rounded up to the nearest integer, otherwise they were rounded down.

Scan edge proximity (n_dist) - The number of extractions in an associated group that within 10" (point sources) or 15" (extended sources) of a scan edge. These extractions are not used in flux refinement for the group, but they are used in computation of coverage and detection confirmation statistics.

Point source specific quality flags

Number of artifacts ([jhk]_nart) - The number of extractions in an associated group identified as artifacts in each band (cc_flg[band] MATCHES '[A-Z]'). These extractions are not used in the flux refinement, but they are used in the computation of detection confirmation statistics.
Number of photometrically confused extractions ([jhk]_nconf - The number of extractions in the group that are flagged as photometrically confused, by band (cc_flg[band]='c'). These extractions are used in the flux refinement and detection confirmation statistics.
Number of single-frame R_1 extractions ([jhk]_nr1fr1) - The number of extractions in the group that have photometry taken from a single Read_1 frame, by band. Such measurements can exhibit biases of up to 10% relative to well-sample R_1 or R_2-R_1 measurements. These extractions are not used in the flux refinement, but they are used in the computation of detection confirmation statistics.
Number of extractions affected by latent images (n_pers) - The number of extractions that are flagged in any band as being affected by latent or persistence images from brighter sources (cc_flg MATCHES 'p'). These extractions are used in the flux refinement and detection confirmation statistics.
Number of extractions affected by diffraction spike (n_spike) - The number of extractions in the associated group flagged as being affected by diffraction spike in any band (cc_flg MATCHES 'd'). These extractions are used in the flux refinement and detection confirmation statistics.
Merge photometry caution flag (ce_flg) - Flag indicating overall quality of combined point source photometry. This is a single-digit flag that rolls up possible contamination due to array edge proximity, artifacts, confusion, single-frame biases, etc. The possible flag values and their meaning are:
- "0": All extractions in the associated group fall well away from frame edges, and are not flagged as artifacts or confused.
- "1": One or more extractions in the associated group has a non-zero value of [jhk]_narts, [jhk]_nconf, [jhk]_nr1fr1, n_dist, n_pers or n_spike.
Users are encouraged to examine the individual quality flags and photometry of the individual extractions in the group for any merged source entry that has a non-zero value for ce_flg.
Number of extractions affected by extended sources (n_galcontam) - The number of extractions in the associated group that have non-zero values of gal_contam, indicating that the extractions are themselves resolved or are contaminated by nearby extended source emission. The accuracy of the point source photometry of such extractions may be poor, resulting in higher than expected dispersions in the merged source photometry. These extractions are used in the flux refinement and detection confirmation statistics.

[Last Updated: 2006 November 13; by R. Cutri, S. Wheelock, S. Monkewitz, J. Carpenter]

Previous page. Next page.
Return to Explanatory Supplement TOC Page.


Figure 2 - The nearest neighbor distribution (prox) for all sources in the 2MASS PSC.	Figure 3 - The nearest neighbor distribution (prox) for all sources in the 2MASS XSC.	Figure 4 - The number of point source single extractions (top), unconfused merged groups (middle) and confused groups (bottom) produced as a function of merging separation radius, r_f.


Figure 6 - Standard deviation of the mean J-band brightness (j_mstdev) plotted versus average J-band magnitude for stars in the Orion Trapezium region scanned 18-26 times during the 2MASS survey. The white points highlight variable stars that were identified in the study of Carpenter, et al. 2001 AJ, 121, 3160).	Figure 7 - J-band chi-squared (j_m_chisq) plotted versus average J-band magnitude for same Orion stars shown in Figure 6. Color-coding is the same.	Figure 8 - Largest negative n-sigma deviations from the weighted average J-band magnitudes (j_mndev) plotted versus weighted average J-band magnitude for same Orion stars as shown in Figure 6. Color-coding is the same.