Administrator Handbook Table of contents version française

Process Surveyor

Overview of process monitoring

With process surveyor you can monitor the status of system process and trigger alarms on missing or failed process. The Process Surveyor Plug-in from LoriotPro can follow the evolution of process statuses running on hosts and servers. When doing process monitoring you have a regular and permanent checking on application availability at the lowest level of the system. If something changes in process statuses you are notify by the LoriotPro event manager and subsequent actions can be triggered.

This Plug-in works with the Window 2000 SNMP agent and upper version, with Unix system running the NET-SNMP agent and with any system or equipment that respond to requests on the objects defined in the HOST RESOURCE MIB defined in RFC 2790 mib file.

iso(1).org(3).dod(6).internet(1).mgmt(2).mib-2(1).host(25)

To install and activate the SNMP agent on Windows 2000, refers to our HOW TO on www.loriotpro.com

To install and activate the SNMP agent on LINUX, refers to our HOW TO on www.loriotpro.com

Preliminary checking

Before using this plug-in it is necessary to check that the host you want to monitor supports the HOST-RESOURCE-MIB.

The host is in your Directory and is responding to simple ping and SNMP requests. Double click on it open the Common SNMP Query Tool.

snmp query

The host is responding to SNMP request. The next point to check is the Host Resource MIB support. The Host-Resource MIB is not implemented in all SNMP agents. Run the SNMP walker tool and start a walk at the mib-2 level of the MIB tree.

Select the host to monitor and open the SNMP walker tool from the contextual menu

 

Check resource mib

The result here is good, the HOST-RESOURCES-MIB displayed in the left pane of the windows shows that the MIB has been detected.

Installation of process monitor

The installation of the Interface Monitor Plug-in is performed in the directory workspace.
Select a host in your directory and then from the contextual Menu select Insert Task and Plugin. 

The main screen Plug-in is displayed.

process monitor
Process surveyor and monitored process list

If the host is responding to SNMP, the description field is immediately completed with Host information.

In the directory the process is running attached to the host. 

Configuration of Process Monitor

The configuration is performed from the main screen. The window contains two tables, the Process List table and the monitored Process List table.

The Process List table displays the list of current processes, their parameters and statuses. This table is refreshed at a regular interval defined by the Polling Interval value and when the start button is pressed.

Process monitor
Process monitor window

The current processes on this host are displayed.

To hide the window, do not stop it, click in the upper right button This action will close the window but will not stop the Plug-in. If the Plug-in is currently polling a host, it will continue to do it.

The first thingto do is the configuration of the Polling interval.

process monitor polling interval

The polling interval is the same for all configured processes.

Two options allow you to send a predefined alarm when a new process appears in the list or when a process disappears from the list.

A message with number 5555 and level 4 is receive by the event manager.

The monitored Process table contains the processes that you want to monitor. For each process you have to define the condition on which an event has to be sent.

To add process to the monitored Process Table you have the choice to add it from the Process List table or to manually specify a name.

The second option is useful if you want to detect when an unwanted process is started. If this one is not currently running on the system, you can’t pick it from the current list because it is not in the current list.

To add a process to the monitored list you should specify the following option:

The Process Name

Either pick it in the Process List by a click on it or type the name in the Process field. Stop first the Polling by clicking the stop button.

Event Number

The number that will be display in the LoriotPro Event Manager. This number allows you to identify and create filter and action on specific event.

Level

The level defines the severity by colouring the event message in the Event Log display.

If Process

Specify the action on which the system should test the condition. Possible actions are :

Is

the process status currently satisfies the condition. At each polling interval if the condition is satisfied an event is generated. This choice generates event until the condition is not longer satisfied.

Is not

the process status do not currently satisfy the condition. At each polling interval if the condition is not satisfied an event is generated

Change to

the process status change to the condition. This generates event only when status change.

Change to not

the process status change to something that is not the condition.

Status

Status are either the SNMP status (invalid, nor runnable, runnable, running) or missing

The screenshot below displays an example of setting. We want to monitor the sendmail process on the ulysse host. We first start and stop the process polling. We select the “sendmail” process in the list, the “is” action on condition “missing”. The event number 78456 at level 4 will be sent at each polling interval if the process is not present in the list SNMP whatever status it can have.

process monitor
Process monitor configuration

The result in the event manager is:

We now add a condition on the find process manually

We run on the unix server the command “find& / -name a* -print”

We get the event in green that the process change to the runnable status

Warning. The change of the status of a process could be furtive and the polling process could miss it. Remember that the process surveyor compares the status gather during the last polling against the status condition or the change between two pollings.

Instead of checking that the process is runnable, we could check that the process is Not Runnable. The process is ran in background mode and will first take the runnable status and then the not runnable status until the administrator kill it.

On Windows system the status is only running.

About the Process Status

The Process status is read from the HOST RESSOURCE MIB, object hrswrunstatus

host resource mib  

The status can have four states.

Running

The process is currently running on the host an use the processor time

Runnable

The Process is waiting for the processor or an external input (standby) 

Not runnable

The process is terminated

Invalid

The process is not able to run, something fail

These are SNMP status, set by the monitored Host and not the Plug-in. You can see the status in the last column of the Process list table. On Unix system processes change from running to runnable status frequently. An application is define has working when its process is in one of these states.

The missing status

We add one more status in the condition list, the missing status. The missing status is locally set by the Plug-in when a process is not in the process list. This status is not displayed in the status column of the process list but is displayed in the status column of the monitored process list.

Process control and status on Unix

UNIX is a multi-tasking system that allows you to run multiple jobs, programs, or processes at the same time. There are some simple commands to control your foreground and background jobs as well as determine the status of your jobs and processes. A foreground job is one with which you are currently interacting. A background job is any job with which you are not currently interacting.

The following graphic shows the main process and status and control command.


Checking on the Status of your Jobs

The command jobs is used to check on the status (and number) of your jobs.  If you do have something in the background, however, jobs output may look something like this:

ulysse:˜> jobs

  [1]    Done          emacs -g 80x20 textfile
  [2]  + Running       xterm -g 80x20
  [3]    Running       xv
  [4]    Suspended     sed 's/.$//' < winf.txt > ufile
  [5]  - Running       find ~ -name '*.txt' > txtlist
  [6] - suspended vi .forward
  [7] + suspended vi sas_data.sas
  [8] running lynx http://www.uwyo.edu

The jobs command lists information on jobs currently running in the shell. 

The job number.

 The first number in brackets enumerates the jobs in the background for you. Note that job numbers are not always continuous

[+ or -] The plus (+)

indicates the current job. Job control commands use the current job as a default argument. The minus (-) indicates the previous job. The previous job will become the current job when the current job exits.

The status

The status can be Running, Done, Stopped or Suspended.

Done indicates that a job has completed. Done may be followed by an exit status returned by the job. For example, Done(2) means the job exited with a status of two.

Stopped indicates that the job was suspended. Stopped may be followed by a signal name. For example, Stopped(SIGSTOP) indicates that the job was suspended by the SIGSTOP signal. Suspended is used in place of Stopped. The meaning is the same.

The command-line. The command associated with the job.

Options    -l    Include the process ID (PID) of each job in output.

The job name

Job name and line parameters

Putting a Job in the Background

It is possible to start any job in the background and return control to the terminal by just appending the & character to command. An example would look like this:

ulysse:˜> lynx http://www.loriotpro.com &

You can also put a job currently running into the background by typing <Ctrl>+z. This will suspend the job and put it in the background. Note, this control character may be changed, see document "How to Use UNIX Control Characters" for more information.

Moving a Job to the Foreground

To put a job to the foreground, type fg %jobnumber. An example would be moving the previous vi sas_data.sas job to the foreground by typing:

ulysse:˜>fg %2

Replacing the digit 2 in the previous example with a - (dash) will bring the job indicated by the minus (-) in the display to the foreground. Using a % character in place of the 2 will put the job indicated by the plus (+) in the display in the foreground.

Killing a Job

To kill a job is to stop the job right where it is. If you have not saved the output of that particular job, killing a job will not save anything. Killing a job is not an elegant way to end the job. Instead, it is much better to bring a job to the foreground and exit it naturally.

To kill a job, type kill %jobnumber. Here is an example that will kill the previous vi .forward job:

ulysse:˜>kill %1

You may sometimes need to add the switch -9 to the kill command if a job will not die. <Ctrl>+c will send an interrupt signal to a running job. This has the effect of killing it. Note, this control character can be changed, see the UNIX Control Characters How To for more information. Here is an example of this command:

ulysse:˜>kill %1 -9

Process Priority

All processes have a priority assigned to them. By default, your jobs and processes are 10. A higher priority number means lower priority. Any processes found running on the system with a total of more than 5 minutes of system time used will be nice'd (priority set down). Processes found with over 60 minutes of system time used that are not nice'ed will be kill'ed.

nice command arguments is the command syntax for this. An example would be to make a program called bigjob run with the command option -o at a lower priority:

ulysse:˜> nice bigjob -o


Checking on Processes

The ps command is used to check on processes. An example with the results looks like:

ulysse:˜> ps PID TTY TIME CMD 3427 pts/12 0:00 zsh

The first column, PID, is the process ID. The second column, TTY, is the port on which this process had been started. The third column, TIME, is the amount of CPU time the process has used. The fourth column, CMD, is the name of the command being run.

Running Processes While Logged Out

Normally, upon log out a hang-up signal (HUP) is sent to all processes started during that session. It is possible, however, to continue a process even after you have logged out. This may be beneficial for particularly lengthy processes. The shell command used to continue a process is nohup. Again, we ask you to nice any computationally intensive process. An example would be:

ulysse:˜> nohup nice bigjob &

The output from a nohup'ed command will be sent to a file called nohup.out, unless otherwise redirected. You can no longer have command line interaction with a job that has been nohup'ed. The nohup'ed process will run to completion or until it is killed. To find such jobs use the ps -ef | grep username command.

 


www.loriotpro.com