Bischeck
-
Installation and administration guide
Version 1.1.0
2014-06-16
Legal Notice Copyright
© 2013-2014 Ingenjörsbyn AB.
This document is licensed by Ingenjörsbyn AB under the Creative Commons Attribution-ShareAlike 3.0 Unported License,
http://creativecommons.org/licenses/by-sa/3.0/. If you distribute this document, or a modified version of it, you must provide attribution to Ingenjörsbyn AB and provide a link to the original.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
Nagios® is an official trademark of Nagios Enterprise Inc.
All other trademarks are the property of their respective owners.
Abstract
This guide provides information of the installation and administration of Bischeck. For more in-depth configuration options please see the “Bischeck - configuration guide”.
1 Getting help
More information about Bischeck and related projects can be found on
www.bischeck.org.
2 Installation and upgrading
2.1 Installation
Download the distribution file and follow the steps below to install. Make sure you have
root privileges when running the installation if you want the init scripts to be installed for automatic startup and shutdown. Before installing, please see the
System requirements ↓ chapter.
# tar xzvf bischeck-x.y.z.tar.gz
# cd bischeck-x.y.z
# chmod 755 install
# ./install -u #Get usage
# ./install #Install default
# service bischeckd start #Redhat/Centos
# /etc/init.d/bischeck start #Debian/Ubuntu
To get a full list of available options to the install script use -u.
# ./install -u
Welcome to bischeck 1.0.0_RC2 installer
====================================
Copyright Anders Håål, Ingenjörsbyn AB 2011-2013
Licensed under GPL version 2
Usage for the bischeck installer
-u show usage.
-U the user name to install bischeck as and run the daemon as - default nagios.
-J java home directory, no default value.
-I installation directory for bischeck - default /opt/socbox/addons/bischeck.
-R uninstall - permanently remove the installation.
-X upgrade from current version - if possible.
-d set the linux distribution name if the installer can not detected. Supported are
rh, rhel, redhatenterpriseserver, centos, debian and ubuntu
-p the port number for the JMX RMI server, for example 3333. No default value.
-i the IP address where the RMI port should run, for example 127.0.0.1. No default value.
-a if authentication should be used applied on JMX connection. Default is false.
By default, the install script will install Bischeck in directory /opt/socbox/addons/bischeck, referred to as $BISHOME in this documentation and with the ownership of the user id nagios. Make sure that the user exists before running install.
The last commands start the Bischeck daemon with the effective user id of the user id set during the installation, default user nagios. The installation will configure bischeckd to start automatically in run level 3, 4 and 5.
The process id of the java process running Bischeck in daemon mode will be stored in a file, default in /var/tmp/bischeck.pid.This file is used by the bischeckd script to stop the java process running Bischeck and ensure that only one instance of Bischeck is started on the server.
If the installation is done as a none root user, effective user id is not root, Bischeck will be installed as the user running the install script. This means that the init script, bischeckd, will not be installed in /etc/init.d. To start Bischeck, the command line utility must be used, see Command line utilities↓.
2.1.1 Installation directory structure
When the installation is complete, the following directory structure exists:
$BISHOME -
|- /bin - Scripts and init scripts
|- /customlib - Custom jar files, like jdbc drivers
|- /etc - Bischeck XML based configuration files
|- /lib - All jar files requiered for Bischeck
| including the Bischeck jar
|- /resources - Different resources files, like XML schema
| files, logback.xml, etc.
|- svninfo.txt - Subversion commit number for the version
|- version.txt - Bischeck version
2.2 Upgrading
Download the new version as described in the previous chapter and to upgrade run the install script with the option -X:
The upgrade will automatically stop the currently running Bischeck and save the current installation in a directory parallel to the new version named bischeck_x.y.z, where x.y.z is the version of the old installation.
# ./install -I /usr/local/bischeck -X
The -I switch must be used if the currently installed version of Bischeck is not in the default installation directory.
The file migrationpath.txt describes the supported upgrade paths and the migration scripts that will run by the install script.
If the upgrade is successful Bischeck can be started manually by executing:
# service bischeckd start
2.2.1 Rollback upgrade
In the event of an upgrade failure or if the new version of Bischeck do not work properly a simple rollback can be conducted with the following step. The process assume that the installation directory is /opt/socbox/addons/bischeck:
-
Stop the new version if running:
-
Go to the parent directory of where Bischeck has been installed:
-
Remove the newly installed Bischeck:
-
Move the old version, saved by the upgrade to bischeck_x.y.z where x.y.z is the version number of the previous installed version, to the install directory:
# mv bischeck_x.y.z bischeck
-
Replace the init script bischeckd with the old version:
# cp bischeck/bin/bischeckd /etc/init.d/bischeckd
# chmod 755 /etc/init.d/bischeckd
2.3 Getting started
In the $BISHOME/etc directory there are working examples of all the configuration files. These files serve as an examples to get started with a simple Bischeck setup of monitoring the response time of a tcp ping on the ssh port, port 22, on localhost. You will find this familiar if you have been using Nagios before and the example uses the well known Nagios check command, check_tcp. The example assumes that you have the Nagios plugins installed. To get the example running you need to perform the following steps:
-
Go to the $BISHOME/etc directory, default is /opt/socbox/addons.
-
Edit the bischeck.xml at line 16 with the path to the check_tcp command. If you have check_tcp installed in /usr/lib/nagios/plugins you do not need to edit the file.
-
Edit the servers.xml file to add the correct host name, line 12, and password, line 22, for your NSCA server. Even if you do not have an NSCA server defined, Bischeck will still run, but with a connection error when trying to send data to NSCA.
-
Now you are done to start Bischeck and check out what happens in the log file. See Logging↓ for more information about Bischeck logging.
$ sudo /etc/init.d/bischeckd restart
$ tail -f /usr/tmp/bischeck.log
Maybe you ask yourself why you should use Bischeck to do a check_tcp? This is just a simple example, but when you have this running you can check out the 24thresholds.xml file to get some more understanding about the power of dynamic and adaptive thresholds.
3 Customization
3.1 Jar customization
To enable support for custom jar files, please place them in the directory $BISHOME/customlib. This can typically be jdbc drivers, custom threshold classes, etc. Any jar file in the $BISHOME/customlib will automatically be class loaded by Bischeck.
3.2 Logging
Bischeck use logback,
http://logback.qos.ch/ for log management. The logback configuration is described in the logback.xml file located in the
$BISHOME/resources directory. By default Bischeck writes log information at level INFO to file
/var/tmp/bischeck.log.
3.3 Integration with pnp4nagios
pnp4nagios can create graph layouts depending on the check command used for the service on the Nagios server. Since Bischeck uses a passive check, we need to create a unique check command that matches the pnp4nagios layout for Bischeck. Create a link in the libexec directory on the nagios server:
nagios$ ln -s check_dummy check_bischeck
When describing the service, always use the check_bischeck as the check command in the Nagios configuration. The check_bischeck.php that controls the pnp4nagios layout must be copied to the directory pnp4nagios/share/templates on the Nagios server.
4 Command line utilities
There are a number of command line utilities available in Bischeck. All of which can be run through the bischeck script located in the $BISHOME/bin directory. To use the bischeck script, add it to your PATH variable.
$ PATH=$BISHOME/bin:$PATH
4.1 Run Bischeck
The normal way to run Bischeck is as a daemon using the init.d script bischeckd, but it is also possible to start Bischeck in continuous running mode by executing:
Running in this way have limitations since the execution will not automatically be placed as a background process and the effective user id will be the user starting the process which may not have all necessary permissions according to the installation. Neither will the pid files be updated correctly. For production systems, always use the init.d script.
$ sudo /etc/init.d/bischeckd start
or
# service bischeckd start
For testing purposes, it can be beneficial to run Bischeck once and make sure that everything is executing as expected. This is done by executing:
This will override all scheduling definitions and execute everything directly, but only once.
To show the pid file used for the Bischeck daemon running:
$ bischeck configuration.ConfigurationManager -p
This command is used in the init script bischeckd to retrieve the current pid.
4.2 Validating configuration files
To validate the correctness of the xml configuration files, use the following command, which will return 0 if files are correct. Use $? to display return status.
$ bischeck configuration.ConfigurationManager -v; echo $?
4.3 Threshold testing
[Enhanced 1.1.0]
Verification of thresholds can be done through the command utility bin/bischeck threshold.Twenty4HourThreshold. The utility provide different levels of information depending on input parameters. Mandatory parameters are host, service and serviceitem. The utility also accept date and time of the day to verify the exact threshold rule to use and if a ,metric value is provided it will calculate the state. For complete list of options run the utility with -u.
$ bischeck threshold.Twenty4HourThreshold -h erphost -s orders -i ediorders -d 20141207 -m 899
899@12:57 State=WARNING Threshold=1000 (>) warning=900(0.9) critical=800(0.8)
The above command will display the result of the threshold processing if the metric value is 899. The output display the state, threshold, warning and critical level for the current time of the day, 12:57.
The time of the day can also be set:
$ bischeck threshold.Twenty4HourThreshold -h erphost -s orders -i ediorders -d 20141101 -m 899 -H 16 -M 30
899@16:02 State=CRITICAL Threshold=1000 (>) warning=950(0.95) critical=900(0.9)
If the date, -d, is not set the current data will be used.
Additional info about which rule set was select and the full hours configuration can be displaed by increased verbose level:
$ bischeck threshold.Twenty4HourThreshold -h erphost -s orders -i ediorders -d 20141101 -m 899 -v2
Rule 1 - month is 2 and day is 25 - hourid: 101
Hour 00 threshold=1000.0 warning=0.9 critical=0.8
Hour 01 threshold=1000.0 warning=0.9 critical=0.8
Hour 02 threshold=2000.0 warning=0.8 critical=0.7
Hour 03 threshold=2000.0 warning=0.8 critical=0.7
Hour 04 threshold=1000.0 warning=0.9 critical=0.8
Hour 05 threshold=1000.0 warning=0.9 critical=0.8
Hour 06 threshold=2000.0 warning=0.8 critical=0.7
Hour 07 threshold=2000.0 warning=0.8 critical=0.7
Hour 08 threshold=1000.0 warning=0.9 critical=0.8
Hour 09 threshold=1000.0 warning=0.9 critical=0.8
Hour 10 threshold=2000.0 warning=0.8 critical=0.7
Hour 11 threshold=2000.0 warning=0.8 critical=0.7
Hour 12 threshold=1000.0 warning=0.9 critical=0.8
Hour 13 threshold=1000.0 warning=0.9 critical=0.8
Hour 14 threshold=2000.0 warning=0.8 critical=0.7
Hour 15 threshold=2000.0 warning=0.8 critical=0.7
Hour 16 threshold=1000.0 warning=0.9 critical=0.8
Hour 17 threshold=1000.0 warning=0.9 critical=0.8
Hour 18 threshold=2000.0 warning=0.8 critical=0.7
Hour 19 threshold=2000.0 warning=0.8 critical=0.7
Hour 20 threshold=1000.0 warning=0.9 critical=0.8
Hour 21 threshold=1000.0 warning=0.9 critical=0.8
Hour 22 threshold=2000.0 warning=0.8 critical=0.7
Hour 23 threshold=2000.0 warning=0.8 critical=0.7
899@13:18 State=CRITICAL Threshold=1300 (>) warning=1170(0.9) critical=1040(0.8)
For thresholds that are based on cached expression the threshold will be calculate if the data are available in the cache..
4.4 Cache browser cli
[1.1.0]
Through the command line tool all Bischeck expressions are support to calculate cached metric data. This enable exploring and testing when configure new thresholds and virtual services. The utility support common readline functions like history.
$ bischeck cli.CacheCli
cachecli> avg(host0-sshport-response[0:4])
[2/1/3 ms] avg(0.000087,0.000097,0.000092,0.000087,0.000096) = 0.0000918
The output result is the parsed expression populated with the Bischeck cache data and the calculated result. It also show the time in milliseconds (ms) to parse and calculate the expression and the total time.
By executing the utility with the -p option, CacheCli will read expression from stdin.
$ echo avg(host0-sshport-response[0:4]) | bischeck cli.CacheCli -p
avg(0.000087,0.000097,0.000092,0.000087,0.000096) = 0.0000918
For all options please use -u for usage.
Use help inside the cli to get a list of available commands.
5 Bischeck internal surveillance
Bischeck uses the Java JMX standard for internal monitoring. Bischeck exposes a number of JMX MBeans that enables controlling and instrumentation of the running Bischeck daemon. For example, Bischeck can be shutdown or reloaded through JMX. A reload operation is used to force Bischeck to re-read and deploy its configuration files but without restarting the process.
JMX is only enabled when the bischeck script is called with the argument "Execute -d", which is the way the bischeckd init script call the script bischeck to start Bischeck in daemon mode.
5.1 JMX over RMI
To allow remote JMX connections the following JMX settings are used by the bischeck script located in the $BISHOME/bin directory.
jmxport=-Dcom.sun.management.jmxremote.port=<set by the installer script>
jmxrmiserver=-Djava.rmi.server.hostname=<set by the installer script>
jmxauth=-Dcom.sun.management.jmxremote.authenticate=<set by the installer script>
jmxssl=-Dcom.sun.management.jmxremote.ssl=false
jmxpasswd=-Dcom.sun.management.jmxremote.password.file=$bishome/etc/jmxremote.password
jmxaccess=-Dcom.sun.management.jmxremote.access.file=$bishome/etc/jmxremote.access
5.2 JMX over HTTP/JSON
[1.1.0]
Jolokia is a jmx agent that support HTTP/JSON access and remove all the problems with the standard JMX agent that use RMI. RMI is especially problematic in any network environment with firewalls. With Jolokia its simple to tunnel the JMX connection over ssh. Jolokia provides fine grain security and access capabilities.
The RMI based JMX agent is still the default, but that will probobly change in the future releases of Bischeck.
Two additional configuration files has been added to the $BISHOME/resources directory to control the behavior of Jolokia:
5.3 JMX supported attributes and operations
Bischeck enables a number of attributes and operations that can be managed over JMX. Please read the javadoc for the MBean classes to understand the methods available or just start jconsole, hawtio (Jolokia agent) or equivalent tool to browse available MBeans.
The key MBeans are:
-
com.ingby.socbox.bischeck:type=Execute
Includes information about Bischeck version, configuration directory, reload count and reload time and information about class cache. Operations provided are shutdown and reload, where reload is a restart of Bischeck without the creation of a new processbut the configuration is re-read.
-
com.ingby.socbox.bischeck.configuration:name=configuration,type=ConfigurationManager
Display different configuration details.
-
com.ingby.socbox.bischeck.cache.provider.redis:name=stats,type=LastStatusCache
Display different statistics about cache usage.
-
com.ingby.socbox.bischeck.servers:name=XYZ,type=CircuitBreak
For server classes that implement a circuit break, this bean will display information about the current state, the number of times the circuit break has been opened, etc. The circuit break can also be enabled/disabled and the timeout and fail count can be changed.
-
com.ingby.socbox.bischeck.service:type=ExecuteServiceOnDemand
[1.1.0]
Operation to force the execution of a host service on demand.
5.4 Timers
Different execution times of the Bischeck daemon are also exposed through JMX. All timers are based on the
http://metrics.codahale.com/ and shows the following parameters for each defined timer:
-
FifteenMinuteRate
-
FiveMinuteRate
-
OneMinuteRate
-
MeanRate
-
Count
-
Max
-
Min
-
Mean
-
Standard deviation (StdDev)
-
50th Percentile
-
75th Percentile
-
95th Percentile
-
98th Percentile
-
99th Percentile
-
999th Percentile
The timers used in Bischeck are:
-
com.ingby.socbox.bischeck.service:name=executeTotalTimer,type=ServiceJob
This is the most important metric since it shows the total count and execution time to process a service definition (a host, service(s) and serviceitem(s)) every time it is scheduled. The execution time includes connecting and retrieving data, execution of the serviceitem logic, cache writes and threshold calculation. It does not include the time to send data to the defined servers.
-
com.ingby.socbox.bischeck.service:name=executeServiceTimer,type=ServiceJob
This timer shows the count and execution time to process a service definition. The execution time includes connecting and retrieving data, execution of the serviceitem logic and cache writes. This timer is part of the timer executeTotal.
-
com.ingby.socbox.bischeck.service:name=executeThresholdTimer,type=ServiceJob
This timer shows the count and execution time to publish the result of executing the service definition to the queue so it can be processed by servers. This timer is part of the timer executeTotal.
-
com.ingby.socbox.bischeck.service:name=publishTimer,type=ServiceJob
The time to publish the measured and threshold data to the configured servers. This does not include the time it actually takes to send to the remote servers, that is included in respective XYZ_sendTimer. The timer is part of the executeTotal.
-
com.ingby.socbox.bischeck.threshold:name=recalculateTimer,type=Twenty4HourThreshold
The time it takes to recalculate the threshold value. Can be part of the executeThresholdTimer if recalculation is needed.
-
com.ingby.socbox.bischeck.configuration:name=purgeTimer,type=CachePurgeJob
The time it took for the cache purger to run.
-
com.ingby.socbox.bischeck.cache.provider.redis:name=writeTimer,type=LastStatusCache
The time it took to write a single data entry to the redis cache.
-
com.ingby.socbox.bischeck.jepext:name=calulateTimer,type=ExecuteJEP
The execution time to calculate any mathematical expressions like sum, average, etc used in serviceitem and threshold logic.
-
com.ingby.socbox.bischeck.configuration:name=initializationTimer,type=ConfigurationManager
The time to read and setup the configuration of Bischeck. This occurs on start up and on every reload.
-
com.ingby.socbox.bischeck.cache:name=parseTimer,type=CacheEvaluator
The execution time to parse and retrieve data from the fast cache and/or the redis cache. This includes all different ways to retrieve data based on index, index range, time and time range. Standard deviation could be high on this timer depending on difference in logic and query pattern.
-
com.ingby.socbox.bischeck.servers:name=XYZ_sendTimer,type=<Server class>
The time it took to send data to server XYZ. One timer exists for each server configured in the server.xml.
6 Building Bischeck from source
To build Bischeck from source, check out the Bischeck trunk from gforge.ingby.com:
$ svn checkout --username anonymous http://gforge.ingby.com/svn/bischeck/bischeck/trunk bischeck
To build a Bischeck distribution, run ant from the directory where you checked out the Bischeck code:
This will create a compressed tar file in the
target directory, named bischeck-x.y.z.tar.gz where x.y.z is the version number. Different versions of Bischeck can be checked out from the tags directory located in
http://gforge.ingby.com/svn/bischeck/bischeck/tags
The dist directive will also convert all documentation in lyx format to pdf. This requires the lyx and imagemagick packages to be installed.
To run all unit tests:
To set the Bischeck version number, the build.xml file must be edited or the ant variable app.version can be overridden:
$ ant -Dapp.version=X.Y.Z dist
6.1 Developing with Bischeck
It is simple to develop your own service, serviceitem, threshold and server classes. To develop your own you must follow the interface that exists for each type. For service and serviceitems, an abstract class exists with default implementation of most of the methods described in the interfaces. For more information, check out the Bischeck javadoc, that is generated using ant and that is part of the distribution in the $BISHOME/docs/javadoc directory.
6.2 Developing custom JEP functions
To develop custom mathematical JEP functions that operate on cached data, please see the JEP documentation,
http://www.cse.msu.edu/SENS/Software/jep-2.23/doc/website/doc/doc_usage.htm. To make a function recognized by Bischeck, it should be put in a jar and copied to the
$BISHOME/customlib directory. The class must then be mapped to a name that will be used in the Bischeck. This is done in the file
jepextfunctions.xml located in the
$BISHOME/resources directory. The
jepextfunctions.xml file is a standard Java XML property file where the key is the name of the function and the value is the class name that implements it.
7 System requirements
Bischeck should run on any operating system that supports Java 6 or higher. The installation and init scripts are supported on Redhat and Debian equivalent Linux distributions. Running on Microsoft Windows has been tested successfully but is not supported by the installation scripts.
Since version 1.0.0 of Bischeck, redis,
http://www.redis.io, is the default memory cache. Redis provides a different range of persistence options like point-in-time snapshots and persistence logs. Redis also supports replication and high-availability options.
Bischeck has been tested with redis version 2.6.7. Installation of redis is not part of the Bischeck installation and should be installed prior to installing Bischeck. Redis is not required to run on the same server as Bischeck.
The connection to redis is configured in the $BISHOME/etc/property.xml file. The following properties control the connection:
-
cache.provider.redis.server - the hostname or IP address of the redis server, default is localhost
-
cache.provider.redis.port - the socket port where redis server listens, default is 6379
-
cache.provider.redis.db - the redis database number to use, default is 0
-
cache.provider.redis.auth - the authorization token for redis, default is the empty string
-
cache.provider.redis.timeout - the connection timeout, default is 2000 ms.
-
cache.provider.redis.poolsize - the size of the redis connection pool, default is 50.
There is also an additional property that can be used with the new redis based cache and that is cache.provider.redis.fastCacheSize. This parameter defines the number of cache items that for each service definition (host-service-serviceitem) should be held in parallel with the redis cache in the bischeck heap space. If Bischeck finds the data in the fast cache on a read, access to redis is not needed. Data writes are always written to redis storage. The default value for the option is 0, meaning that the fast cache is size 0. Depending on how the cached data is used in your configuration, increasing the fast cache can improve performance but will also increase the heap space.
With redis it is possible to access the data in the cache at runtime. Redis provides many client libraries for different programming languages,
http://redis.io/clients. The quickest way is just to use the
redis-cli command line utility.
The cache prior to version 1.0.0, should be regarded as deprecated. Many of the the new features available as of version 1.0.0 are not supported in the old implementation, such as e.g. aggregations.
7.3 Jar dependencies
The following jar packages are distributed as part of the Bischeck distribution. All these packages have their own open source licenses.
All jar files are distributed as part of Bischeck and located in the lib directory.
8 Bischeck license
9 Bug reports and feature requests
Please submit bug reports and feature requests on
www.bischeck.org in the Forge section.
10 Credits
Thanks to all people and organizations who developed all the great open source software that Bischeck depends on. The Bischeck project would like to thank the following companies for sponsoring the project with valuable commercial tools and development environments: