星期三, 12月 28, 2016

Using the trapit script to watch for events

http://www-01.ibm.com/support/docview.wss?uid=swg21590151

Technote (troubleshooting)


Problem(Abstract)

While investigating a problem with an application or server, you may need to watch for events like error log messages and take action when they occur.

Resolving the problem

The trapit script provides an easy way to perform actions based on events such as error messages written to log files, or data being written to new diagnostic files such as WebSphere MQ FDC or WebSphere Application Server ffdc files. If you need to monitor SystemOut.log files from WebSphere Application Server,use the TrapIt.ear tool, which is specifically optimized for that case.


Using trapit

In order to use trapit, you must first download the script to your system and make it executable, for example by running: chmod a+x trapit

    Syntax

    trapit -?

    trapit -e Error... -f File... [-i Interval] [-t Trigger]...

    Required Parameters

    -e Error
      The error message or other pattern which you want trapit to find. You can ask trapit to look for a simple string or you can use an extended regular expression (ERE) instead. Use quotation marks around the error string if it contains special characters.

      You can repeat this parameter to specify as many error strings or regular expressions as you need.


    -f File
      The file names which trapit should watch. You can give trapit a simple file name, or you can provide a wildcard pattern. Use quotation marks around the file name if it contains special characters like wildcards.

      You can repeat this parameter to specify as many file names or wildcards as you need.


    Optional Parameters

    -i Interval
      How often trapit should wait between scans when watching for errors (default: 5 seconds). A shorter value means trapit may find errors more quickly, at the expense of efficiency. A longer value may be better if you are watching large files which are frequently updated.


    -t Trigger
      A command or script which trapit should run when it finds the error message or pattern. By default, trapit will simply end successfully when it finds the error, but you can ask trapit to run commands or scripts instead. Be sure to use quotation marks around each trigger command and arguments.

      You can repeat this parameter to specify as many trigger commands as you like. However, it is probably easier to put all your commands into a simple script and tell trapit to run the script.



Usage Notes

The trapit script can look for any text in any file, provided you have authority to read the files. Although trapit was written by the WebSphere MQ team, anyone can use it to watch for errors in:
  • Application logs
  • Operating system logs
  • Product logs, such as the WebSphere MQ error log files (AMQERRxx.LOG)
  • Files created to record information about a specific occurrence of an error, such as WebSphere MQ FDC files and WebSphere Application Server ffdc files


If the error message or pattern you are looking for already exists, then trapit will trigger immediately as you run it. Your only option is to delete or archive the files which already show the error, then start trapit and let it watch for new occurrences of the error.

Trapit is particularly efficient with things like WebSphere MQ FDC and WebSphere Application Server ffdc files, where new problems are usually recorded in new files rather than appending to a single log. For example, if you ask trapit to watch "/var/mqm/errors/*.FDC" for a particular message, trapit will start by scanning every one of your FDC files (which could be thousands, if you have not cleaned them up recently). Thereafter, trapit will only scan new and updated FDC files (which might be none, or just a few).

While the trapit script has often been used to turn off tracing when a problem occurs, WebSphere MQ V7.0 and later can actually turn tracing off automaticallywhen FDC files with specified Probe Id values are generated. Use this feature instead of trapit if you need to turn off tracing when an FDC Probe Id occurs, for example:

    sh> strmqtrc -m PROD.QMGR -c FDC=XC308010,XC307040


Examples


    Example 1

    Ask trapit to check every 10 seconds for messages AMQ7466 and AMQ7469 in the error logs for the WebSphere MQ queue manager MY.QMGR:

    sh> trapit -e AMQ7466 -e AMQ7469 -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10


    Or using a regular expression, you could run the same command a few other ways:


    sh> trapit -e "AMQ7466|AMQ7469" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10

    sh> trapit -e "AMQ746[69]" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10
     

    Example 2

    To run the stackit script against a queue manager when a WebSphere MQ FDC file showing Probe Id ZX159002 or error code xecL_W_LONG_LOCK_WAIT is generated:

    sh> trapit -e ZX159002 -e xecL_W_LONG_LOCK_WAIT -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"

    sh> trapit -e "ZX159002|xecL_W_LONG_LOCK_WAIT" -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"

    Example 3

    To gather full WebSphere MQ diagnostic information from the system with the runmqras command when your application records a message in its own log files:

    sh> trapit -e "com.example.MyAppUnexpectedException" -e "JMSCMQ0002: The method 'MQCTL' failed." -f "/var/MyApp/MyApp-*.log" -t "/opt/mqm75/bin/runmqras -section all 1>/tmp/runmqras.txt 2>&1"

    Example 4

    Trigger commands and their arguments should be enclosed with quotation marks. If the command you want to trigger also uses quotation marks, you have two choices: Use backslashes to escape the quotation marks inside the command, or use double quotes to enclose the trigger and single quotes inside the command. Just be aware that shell variables inside double quotes will be expanded, while those inside single quotes will not be expanded:

    sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMA/errors/AMQERR01.LOG" -t "echo 'DISPLAY QSTATUS(SYSTEM.CHANNEL.INITQ)' | runmqsc QMA > /tmp/mqsc.txt"

    sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMB/errors/AMQERR01.LOG" -t "echo \"DISPLAY QSTATUS(${QNAME})\" | runmqsc QMB > /tmp/mqsc.txt"

    Example 5

    To run an operating system command and a custom script (provided by IBM or one you have written) when a particular symptom appears in a WebSphere Application Server ffdc file:

    sh> trapit -e "DSRA8100E: Unable to get a PooledConnection from the DataSource" -e "ERRORCODE=-1042, SQLSTATE=58004" -f "/usr/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/*.txt" -t "netstat -an > /tmp/diag_netstat.txt" -t "/tmp/diag_script.sh -s server1 -p 297364 > /tmp/diag_script.log"


    Example 6

    As an alternative, you could call trapit from within a diagnostic script and use it to pause until an error occurs. Since the trapit script exits with reason code 0 when it finds a match, you could run a script like the one below, for example:

    sh> /tmp/diag_script.sh > /tmp/diag_script.log

      diag_script.sh


      #!/bin/sh

        printf "Diagnostic script started at %s\n" "`date`"
        printf "Waiting for the system to run low on memory...\n"

        trapit -e SIGDANGER -f "/var/mqm/errors/*.LOG" || {
          printf "Trapit ended with return code $?: Exiting\n"
          exit 1
        }

        printf "WebSphere MQ reported SIGDANGER at %s\n" "`date`"

        printf " * Process listing:\n"
        ps -eo pid,ppid,nlwp,s,vsz,pmem,pcpu,start,time,user,egroup,args

        printf " * System V IPC listing:\n"
        ipcs -a

        printf " * Network connections:\n"
        netstat -an

        printf "Finished gathering data\n"
        exit 0
但trapit只能用於UNIX 平台, 若是在Windows 平台 則可考慮使用以下的工具﹕
Tail and Run Batch File
可用於當偵測到特定MQ error發生時, 立即執行批次命令。
我們可將停止MQ trace的命令寫入批次檔中,達成錯誤出現自動即時停止MQ trace的目標提高MQ trace收集的效率。
補足Windows平台只能依特定FDC發生才能自動停MQ trace的缺點。
https://sourceforge.net/projects/tailandrunbatch/

Description
Tail for Windows is used to monitor changes to files; displaying the changed lines in realtime. This makes Tail ideal for watching log files.
Tail search a Text String and run a Batch File

Feature:
- Detect keyword matches, and run a Batch file.
- Autostart function









星期二, 12月 27, 2016

WebSphere Application Server Security configuration changes done with wsadmin are not activated immediately.

WebSphere Application Server Security configuration changes done with wsadmin are not activated immediately.

Problem(Abstract)

Some administrative actions (like mapping administrative users or groups to security roles) might not get activated immediately and require a restart of the JVM.

For example, you want to map the group called "wasadmins" to the Administrator role:

AdminTask.mapGroupsToAdminRole('[-roleName administrator -accessids [group:defaultWIMFileBasedRealm/cn=wasadmins,cn=groups,dc=mycompany,dc=com ] -groupids [wasadmins@defaultWIMFileBasedRealm ]]')

AdminConfig.save()

Symptom

Although the configuration change has been saved with AdminConfig.save() you cannot login immediately, although your user is member of the "wasadmins" group.
If you login to the AdminConsole with the primary administrative user and go to the "Administrative group roles" page, the new group mapping will be listed.
If you quit the Console again, you can login with a member of the newly mapped group.

Cause

Some changes of the WAS configuration require a restart of the JVM, or at least a refresh of the configuration for the running instances.
This refresh is done, when you go to the ISC "Administrative group roles" page.

Resolving the problem

When the configuration changes are completed and saved, you can force a refresh of the security configuration with the AdminControl action "refreshAll":
authGrpMgr = AdminControl.completeObjectName('WebSphere:type=AuthorizationGroupManager,*')
AdminControl.invoke(authGrpMgr, 'refreshAll')

Now the login with a newly mapped user is possible.

The above command will work fine for the DMgr or for a Base instance where you are connected to via wsadmin.
But if you want to execute tasks with the newly created user on federated nodes (e.g. start application server JVM, etc..) then the nodeagents also need to refresh the security configuration.

Which means, you need to extend the script e.g. like this:

authGrpMgr = AdminControl.queryNames('type=AuthorizationGroupManager,process=nodeagent,node=node1,*') AdminControl.invoke(authGrpMgr, 'refreshAll')

authGrpMgr = AdminControl.queryNames('type=AuthorizationGroupManager,process=nodeagent,node=node2,*') AdminControl.invoke(authGrpMgr, 'refreshAll')