IPAC 2MASS Working Group Meeting #65 Minutes

IPAC 2MASS Working Group Meeting #65 Minutes, 6/20/95

Attendees: T. Conrow, T. Evans, J. Fowler, L. Fullmer, T. Jarrett, G. Kopan, B. Light, J. White

AGENDA

  1. EXEC/PCP Review
DISCUSSION
  1. EXEC/PCP Review -- T. Conrow presented an overview of the processing implemented in his test environment for EXEC/PCP development. Code that is functionally very similar to the 2MAPPS prototype code is already being used in the proto-pipeline processing of the April '95 data, so some actual realistic experience has been gained and is continuing to accumulate. The key areas of similarity are the automating of the multi-CPU multi-host client/server launching and the monitoring of standard error and standard output.

    It should be noted that T. Conrow's work to date has focused on the part of EXEC that will service the parallel processing of separate scan pipelines, not the entire EXEC subsystem, most of which can be implemented via standard scripts. What has been developed is the capability to launch client/server pairs that will run the PCP portion of the 2MAPPS processing as parallel pipelines. The way in which these are launched is controlled in a convenient fashion: a separately maintained resource file lists the hosts available for processing, the number of client/server pairs each host can handle, and the disk drive IDs for each host.

    When the first client is launched by the EXEC subsystem, it reads the resource file and spawns enough copies of itself to utilize all the CPUs available for 2MAPPS. Before this, however, EXEC activates a parent server daemon to service requests from the clients; when each client contacts the parent server, the latter starts a dedicated child server to handle that client's requests. These requests are to run command lines obtained from the "run file", which is a file set up earlier by EXEC (simulated so far). These command lines will invoke scripts delivered by the subsystem cognizant engineers to run the subsystem software with appropriate command-line parameters. In this way, each client/server pair runs the PCP software that processes a single scan. The client/server pairs operate in parallel, so that parallel pipelines operate simultaneously to do the scan-oriented part of the 2MAPPS processing.

    In addition to the client/server pairs, EXEC starts up the "monitor" program. This program runs continuously, but can be interrupted to service requests from the operator. Each child server sends its standard output and standard error information back to its client, and this communication is visible to the monitor program, which channels relevant information to a display in the operator's room. Furthermore, the parent server and all child servers maintain shared-memory areas in which status information is kept. This is also read periodically by the monitor program, and relevant status information is sent to the operator's display.

    When the monitor is interrupted, it will accept interactive commands from the operator. For example, the operator can instruct the monitor to lock the run file, which prevents any new command lines from being accessed by the child servers, pausing all processing after each currently running subsystem finishes its current task. The monitor can also be told to reset run-file command lines and reinsert them at the top of the queue; this consists of clearing all information added by the corresponding scan-processing software to indicate progress in processing that scan, and the reinsertion at the top of the queue makes the reset task the next that will be taken by a server looking for its next task. Other utility functions for the monitor are also being considered.

    It was noted that FORTRAN programs writing to the standard error device (unit zero) with the intention of the monitor picking up the information for immediate display should flush the standard-error buffer to ensure that the information goes out immediately if execution is to continue. If the error message being sent accompanies the aborting of execution, there is no need to flush any buffers, since FORTRAN I/O does that automatically at termination.

    The members present were very positively impressed, and the consensus was that no significant ambiguity remained in the understanding of how the subsystems will be operated by EXEC/PCP. This implementation approach sheds light on how the entirety of 2MAPPS can be implemented, and the workability of the system design was felt to have been demonstrated.