What is oak?
Oak is a program that can be used to monitor syslogs from a collection
of servers and notify operators when problem conditions arise. In
addition to providing immediate notification of critical problems oak
will also batch less critical problems into summary messages that can
be sent less often and via any medium. For example you may wish to
have oak page you on critical events while sending a summary of less
important messages to your terminal once an hour. In addition you
could send a daily email message summarizing all events.
How does oak work?
Oak runs as a daemon and monitors a syslog file for events. A common
way to run oak would be on a server that is receiving syslogs
forwarded from other servers. Based on a series of configurable
regular expressions each message is placed into one or more
user-defined queues. Each queue is configured by the user to send out
its messages after waiting some period of time.
Oak keeps its messages succinct in a number of ways. Oak is aware of
some information in log messages that will be unnecessary and produce
needlessly repeated messages. For example process id's and sendmail
queue id's can be automatically filtered out, thereby condensing
hundreds of message to one short notification. In the case that oak
does not know about a kind of log that can be condensed the
configuration file can specify the custom information to be removed.
Finally, for each medium being used to send messages the user can
specify limits on the length of the message, the line length, the
number of hosts being reported on, the number of messages per host
etc. This helps ensure that a runaway message won't overwhelm its
recipient.
Download oak
The latest release of oak is
oak-1.3.5.
Other versions are available
here.
Who wrote oak and where do I report bugs?
Oak was written by James Kretchmar at MIT and bugs can be reported to
<oak@mit.edu>. Also send mail
to this address if you would like to receive future announcements
about oak releases.
How do I use it?
Oak is run as:
oak -c <config>
where <config> is the name of the configuration file.
The configuration file controls the entire operation of oak.
A sample configuration file can be found
here.
The general concepts are these. The configuration file first defines
a number of queues. Each queue will take a certain action at a
certain time interval. For example you might define a queue called
"daily-mail" that fires once a day and sends a piece of email. Or you
might define a queue called "immediate-page" which would page you as
soon as a problem was noticed.
Next the configuration file specifies lists of regular expressions and
each regular expression is associated with one or more of the queues
defined earlier. Messages that come in are compared against the
regular expressions in order. When the first regular expression is
found that matches the message that message is placed in the queues
associated with that expression.
Note that a trash queue is defined for you by default and any
messages queued to it are discarded. Because oak uses the first
regular expression it finds the trash queue may be helpful in
discarding unwanted messages while still allowing others to fall
through by default.
Config Syntax
- set infile <file>
- Set the file being monitored to
<file>. If this option is not specified in the config then it
will default to /var/adm/messages.
- set nukepid
- Automatically remove process id's from logs. This option is on by
default and is strongly recommended.
- set nukepid
- Don't automatically remove process id's from logs.
- set nukeciscoid
- Automatically remove log id numbers from cisco syslogs. This
option is on by default and is recommended if you are processing logs
from cisco equipment.
- set no nukeciscoid
- Don't automatically remove log id numbers from cisco syslogs.
- set nukesmqid
- Automatically remove sendmail queue id numbers from logs. This
option is on by default and is recommended if you are processing logs
from sendmail.
- set no nukesmqid
- Don't automatically remove sendmail queue id numbers from logs.
- set ignorehosts <host> [ <host> ... ]
- Ignore logs from the hosts in the list. Make sure each host is
listed exactly as it will appear in the log (i.e. exactly as it will
be resolved by the local syslogd). This command can not be used at
the same time as the set onlyhosts command.
- set onlyhosts <host> [ <host> ... ]
- Process logs only from the hosts in the list. Make sure each
host is listed exactly as it will appear in the log (i.e. exactly as
it will be resolved by the local syslogd). This command can not be
used at the same time as the set ignorehosts command.
- set replacestr <string>
- Set the string to be used when a section of log is blanked out,
such as the pid. By default the string is "___". Anything in parens
in a regular expression is blanked out, as is described below.
- define queue <queue>
- Define a new queue whose name is <queue>. The
following subcommands can be issued after defining a queue. They
pertain to the most recent queue defined.
- action <action> [ <arg> ... ]
- Direct the queue to take specified <action> when it
receives messages. You may use multiple action commands
to specify more than one action. Currently supported values for
<action> are mail, zwrite, and
exec. The arguments for each are as follows:
- action mail <to> <from> <subject>
- action zwrite <class> <instance> <recipient>
- action exec <program> [ <arg> ... ]
In the case of the exec command the message are piped to the stdin
of the named program.
- action-limits <numlines> <linelen>
<numhosts> <hostents>
- Set limits on the size of messages sent by this queue.
<numlines> is the total number of lines in the
message. <linelen> is the maximum length of a line.
<numhosts> is the maximum number of hosts in a
message. <hostents> is the maximum number of logs
per host. If the limits set by numlines,
numhosts, or hostents are exceeded then the
message will be truncated appropriately and a message will be
included noting that fact. If linelen is exceeded by a
line, the final characters will silently be stripped off.
- fire <time>
- Specify how often the queue should send
messages. <time> can be in one of three formats.
- *<num>[m|h|s]
This specifies a
repeated interval. For example *5m means to fire
every 5 minutes.
- <hour>:<min>
This specifies a
static time to fire at, using a 24 hour clock. 17:00
would fire every day at 5pm.
- now
This indicates that message should be
sent immediately. This option should almost always be used in
conjunction with the locking command described below.
- locking <time>
- This option specifies how long a queue should wait after
sending a message before it will send another message that matches
the same regular expression as the first. This is typically used
with queues that fire immediately or at very short intervals. For
example, if a queue were set to page someone on a "file system
full" message it would be desirable to not receive the page every
on each successive log of the error; there would be a flood of
pages. If the queue were set to be locking 30m then a
"file system full" page would be sent at most once every thirty
minutes.
- header <text>
- Set text to be sent at the beginning of the message.
- prescan
- This option indicates that the queue should include messages
that are already in the log file. Normally a queue will only pick
up new messages after oak has been started. This option is useful
if you want to restart the oak daemon, but not lose messages for a
daily report. It is not recommended for queues that send frequent
messages since with the prescan option set those messages
will all be sent when oak is started.
- on <regex>
- Specify a regular expression that can be matched. The
subcommands following the on command indicate what to do when
the expression is matched. Anything in the regular expression that
falls between parenthesis will be blanked out.
- queues <queuename> [ <queuename> ... ]
- Spool the message being matched into the queues named by
<queuename>
Oak todo list
- features to add:
- a facility for dumping messages on SIGUSR1 or similar
- a "replace" option to an "on" command that will replace the line with a string
- a general regex replacement section in the config
- should we let users make oak not stop looking for matches once a line matches?
- an option to an action to cause it to fire even without any messages
- be able to fire on *:00 instead of *1hr, if you like.
- it would be nice to apply a matchline to only a given set of hosts.
- queue length limits (actual msg queue length, not displayed?)
- things to do
- put the pid in the oak tktfile, but move the tkt stuff to the zephyr action.
- features maybe to add
- a default config?
- some kind of critical messages marking? (this probably doesn't make sense)
- bugs
- the user should be warned on startup if an action name isn't valid
- there needs to be a free of the queuelist removed in queuelist_remove_element_n (but don't free the queue)
- because of the way the fires work a static time will be off by
an hour after crossing a daylight savings time.