Job Monitor Utility

In the following you will learn how to monitor your submitted jobs and how to identify and report an error occurred in the submission process.The documentation can be subdivided in three sections:

 

Monitoring the job execution

When you submit a job you pass though two differnt steps: i) the submission procedure and ii) the simulation process. In the former, the Queue System of ICODE handles your request assigning the job to one of the possible slots, representing a CPU node, usually the one with the lowest load. The Queue Manager also controls if all the necessary physical actors of the simulation procedure (the application server, the executable file of the simulator, the storage area, etc) are present and accessible. Only if this check is positive you pass to the latter step, in which your job run ad provide results.
 
job control
After the job has been accepted (step ii) you can monitor the job execution by clicking on the Job Info link in system menu.
 
no job
 
When you click the link a Job Monitor Window opens. If there aren't running jobs, you get the system warning in the figure at the left. If you submitted a job, or it is finished already, either an error occurred (see the .o or .e files for details).
 
job status
For eachrunning job present in the systemyou get the following information (see figure at the left): Job Id, which is the Job Identification Number, and Type, indicating which type of simulator you are using. The Status link points to the Job Status Info Window (figure down right), containing some useful information about the running process.
   
Among them there are (see right figure) the name of the submitter, the working directory where the output is saved, the submission date and time. At the bottom of the window there are two buttons: Kill Job and Resources.
status info
resource
 

The former allows a user to destroy the running process (use with a lot of care). The latter give access to important information about the hardware resources used by the selected process (see the figure down, Resource Info Window): cpu, the current accumulated cpu usage; memory, the current accumulated memory usage; io, the current accumulated intpu/output; vmem, the current virtual memory usage; maxvmem, the current accumulated virtual memory usage.

 

the NOTIFICATION/information files

If the submission procedure succed, some notification files will be created in your work directory. The file jobinfo is a text file, containing the Job Identification Number (Job Id) of your simulation. If you have submitted more than one process at the same time, it is necessary to read the Job Id in this text file (simply clicking on it or open it with the on-line editor) to correctly identify any reference to the different simulations. When a simulation has started a folder named job is created in your work directory. In this folder you can find one line for each process you are running. The entries are characterized by their Job Id, and associated to a Status link, which opens the relative Job Status Info Window, as described in the previous section. During the simulation two new files are created and updated, if necessary, in your working directory. The former is identified by the syntax: "username"_"program-name".xxx"e", the latter with: "username"_"program-name".xxx"e".

username and program name stand for the usual name respectively of the user and the simulator, xxx is the Job Id of the process, and the extension o is for the standard output, while e is for the standard error. These files are very important to report every kind of error to the I-CODE Staff for support.