Technote (troubleshooting)
Problem(Abstract)
While investigating a problem with an application or server, you may need to watch for events like error log messages and take action when they occur.
Resolving the problem
The trapit script provides an easy way to perform actions based on events such as error messages written to log files, or data being written to new diagnostic files such as WebSphere MQ FDC or WebSphere Application Server ffdc files. If you need to monitor SystemOut.log files from WebSphere Application Server,use the TrapIt.ear tool, which is specifically optimized for that case.
Using trapit
In order to use trapit, you must first download the script to your system and make it executable, for example by running: chmod a+x trapit
Syntax
trapit -?
trapit -e Error... -f File... [-i Interval] [-t Trigger]...
trapit -e Error... -f File... [-i Interval] [-t Trigger]...
Required Parameters
-e Error
- The error message or other pattern which you want trapit to find. You can ask trapit to look for a simple string or you can use an extended regular expression (ERE) instead. Use quotation marks around the error string if it contains special characters.
You can repeat this parameter to specify as many error strings or regular expressions as you need.
-f File
- The file names which trapit should watch. You can give trapit a simple file name, or you can provide a wildcard pattern. Use quotation marks around the file name if it contains special characters like wildcards.
You can repeat this parameter to specify as many file names or wildcards as you need.
Optional Parameters
-i Interval
- How often trapit should wait between scans when watching for errors (default: 5 seconds). A shorter value means trapit may find errors more quickly, at the expense of efficiency. A longer value may be better if you are watching large files which are frequently updated.
-t Trigger
- A command or script which trapit should run when it finds the error message or pattern. By default, trapit will simply end successfully when it finds the error, but you can ask trapit to run commands or scripts instead. Be sure to use quotation marks around each trigger command and arguments.
You can repeat this parameter to specify as many trigger commands as you like. However, it is probably easier to put all your commands into a simple script and tell trapit to run the script.
Usage Notes
The trapit script can look for any text in any file, provided you have authority to read the files. Although trapit was written by the WebSphere MQ team, anyone can use it to watch for errors in:
- Application logs
- Operating system logs
- Product logs, such as the WebSphere MQ error log files (AMQERRxx.LOG)
- Files created to record information about a specific occurrence of an error, such as WebSphere MQ FDC files and WebSphere Application Server ffdc files
If the error message or pattern you are looking for already exists, then trapit will trigger immediately as you run it. Your only option is to delete or archive the files which already show the error, then start trapit and let it watch for new occurrences of the error.
Trapit is particularly efficient with things like WebSphere MQ FDC and WebSphere Application Server ffdc files, where new problems are usually recorded in new files rather than appending to a single log. For example, if you ask trapit to watch "/var/mqm/errors/*.FDC" for a particular message, trapit will start by scanning every one of your FDC files (which could be thousands, if you have not cleaned them up recently). Thereafter, trapit will only scan new and updated FDC files (which might be none, or just a few).
While the trapit script has often been used to turn off tracing when a problem occurs, WebSphere MQ V7.0 and later can actually turn tracing off automaticallywhen FDC files with specified Probe Id values are generated. Use this feature instead of trapit if you need to turn off tracing when an FDC Probe Id occurs, for example:
- sh> strmqtrc -m PROD.QMGR -c FDC=XC308010,XC307040
Examples
Example 1
Ask trapit to check every 10 seconds for messages AMQ7466 and AMQ7469 in the error logs for the WebSphere MQ queue manager MY.QMGR:
sh> trapit -e AMQ7466 -e AMQ7469 -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
Or using a regular expression, you could run the same command a few other ways:
sh> trapit -e "AMQ7466|AMQ7469" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
sh> trapit -e "AMQ746[69]" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
sh> trapit -e AMQ7466 -e AMQ7469 -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
Or using a regular expression, you could run the same command a few other ways:
sh> trapit -e "AMQ7466|AMQ7469" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
sh> trapit -e "AMQ746[69]" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
Example 2
To run the stackit script against a queue manager when a WebSphere MQ FDC file showing Probe Id ZX159002 or error code xecL_W_LONG_LOCK_WAIT is generated:
sh> trapit -e ZX159002 -e xecL_W_LONG_LOCK_WAIT -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"
sh> trapit -e "ZX159002|xecL_W_LONG_LOCK_WAIT" -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"
sh> trapit -e ZX159002 -e xecL_W_LONG_LOCK_WAIT -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"
sh> trapit -e "ZX159002|xecL_W_LONG_LOCK_WAIT" -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"
Example 3
To gather full WebSphere MQ diagnostic information from the system with the runmqras command when your application records a message in its own log files:
sh> trapit -e "com.example.MyAppUnexpectedException" -e "JMSCMQ0002: The method 'MQCTL' failed." -f "/var/MyApp/MyApp-*.log" -t "/opt/mqm75/bin/runmqras -section all 1>/tmp/runmqras.txt 2>&1"
sh> trapit -e "com.example.MyAppUnexpectedException" -e "JMSCMQ0002: The method 'MQCTL' failed." -f "/var/MyApp/MyApp-*.log" -t "/opt/mqm75/bin/runmqras -section all 1>/tmp/runmqras.txt 2>&1"
Example 4
Trigger commands and their arguments should be enclosed with quotation marks. If the command you want to trigger also uses quotation marks, you have two choices: Use backslashes to escape the quotation marks inside the command, or use double quotes to enclose the trigger and single quotes inside the command. Just be aware that shell variables inside double quotes will be expanded, while those inside single quotes will not be expanded:
sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMA/errors/AMQERR01.LOG" -t "echo 'DISPLAY QSTATUS(SYSTEM.CHANNEL.INITQ)' | runmqsc QMA > /tmp/mqsc.txt"
sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMB/errors/AMQERR01.LOG" -t "echo \"DISPLAY QSTATUS(${QNAME})\" | runmqsc QMB > /tmp/mqsc.txt"
sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMA/errors/AMQERR01.LOG" -t "echo 'DISPLAY QSTATUS(SYSTEM.CHANNEL.INITQ)' | runmqsc QMA > /tmp/mqsc.txt"
sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMB/errors/AMQERR01.LOG" -t "echo \"DISPLAY QSTATUS(${QNAME})\" | runmqsc QMB > /tmp/mqsc.txt"
Example 5
To run an operating system command and a custom script (provided by IBM or one you have written) when a particular symptom appears in a WebSphere Application Server ffdc file:
sh> trapit -e "DSRA8100E: Unable to get a PooledConnection from the DataSource" -e "ERRORCODE=-1042, SQLSTATE=58004" -f "/usr/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/*.txt" -t "netstat -an > /tmp/diag_netstat.txt" -t "/tmp/diag_script.sh -s server1 -p 297364 > /tmp/diag_script.log"
sh> trapit -e "DSRA8100E: Unable to get a PooledConnection from the DataSource" -e "ERRORCODE=-1042, SQLSTATE=58004" -f "/usr/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/*.txt" -t "netstat -an > /tmp/diag_netstat.txt" -t "/tmp/diag_script.sh -s server1 -p 297364 > /tmp/diag_script.log"
Example 6
As an alternative, you could call trapit from within a diagnostic script and use it to pause until an error occurs. Since the trapit script exits with reason code 0 when it finds a match, you could run a script like the one below, for example:
sh> /tmp/diag_script.sh > /tmp/diag_script.log
sh> /tmp/diag_script.sh > /tmp/diag_script.log
diag_script.sh
#!/bin/sh
printf "Diagnostic script started at %s\n" "`date`"
printf "Waiting for the system to run low on memory...\n"
trapit -e SIGDANGER -f "/var/mqm/errors/*.LOG" || {
printf "Trapit ended with return code $?: Exiting\n"
exit 1
}
printf "WebSphere MQ reported SIGDANGER at %s\n" "`date`"
printf " * Process listing:\n"
ps -eo pid,ppid,nlwp,s,vsz,pmem,pcpu,start,time,user,egroup,args
printf " * System V IPC listing:\n"
ipcs -a
printf " * Network connections:\n"
netstat -an
printf "Finished gathering data\n"
exit 0
Tail and Run Batch File
可用於當偵測到特定MQ error發生時, 立即執行批次命令。
我們可將停止MQ trace的命令寫入批次檔中,達成錯誤出現自動即時停止MQ trace的目標,提高MQ trace收集的效率。
補足Windows平台只能依特定FDC發生才能自動停MQ trace的缺點。
可用於當偵測到特定MQ error發生時, 立即執行批次命令。
我們可將停止MQ trace的命令寫入批次檔中,達成錯誤出現自動即時停止MQ trace的目標,提高MQ trace收集的效率。
補足Windows平台只能依特定FDC發生才能自動停MQ trace的缺點。
https://sourceforge.net/projects/tailandrunbatch/
Description
Tail for Windows is used to monitor changes to files; displaying the changed lines in realtime. This makes Tail ideal for watching log files.
Tail search a Text String and run a Batch File
Feature:
- Detect keyword matches, and run a Batch file.
- Autostart function
Description
Tail for Windows is used to monitor changes to files; displaying the changed lines in realtime. This makes Tail ideal for watching log files.
Tail search a Text String and run a Batch File
Feature:
- Detect keyword matches, and run a Batch file.
- Autostart function