Bischeck
-
Release Notes
Version 1.0.2
2014-04-04
Legal Notice Copyright
© 2013 Ingenjörsbyn AB.
This document is licensed by Ingenjörsbyn AB under the Creative Commons Attribution-ShareAlike 3.0 Unported License,
http://creativecommons.org/licenses/by-sa/3.0/. If you distribute this document, or a modified version of it, you have to provide attribution to Ingenjörsbyn AB. and provide a link to the original.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
Nagios® is an official trademark of Nagios Enterprise Inc.
All other trademarks are the property of their respective owners.
1 Release 1.0.2 - 2014-04-04
Release 1.0.2 is a minor bug fix release.
1.1 New features
None
1.2 Bugs fixed and important issues
-
[TR-256] “Error with Null value in the cache“. It is strongly recommended that upgrade is done immediately.
1.3 Upgrading
Release 1.0.2 support upgrade from release 0.4.3, 1.0.0 and 1.0.1. If you upgrade from 0.4.3, please follow the upgrade instructions described in the below upgrading section for 1.0.0.
2 Release 1.0.1 - 2014-03-27
Release 1.0.1 is a minor bug fix release, but fixing a major bug related to threshold management.
2.1 New features
None
2.2 Bugs fixed and important issues
-
[TR-255] “Bug in class Twenty4HourThreshold when configure periods with months and weeks”. This bug require immediate upgrade.
2.3 Upgrading
Release 1.0.1 support upgrade from release 0.4.3 and 1.0.0. If you upgrade from 0.4.3, please follow the upgrade instructions described in the below upgrading section for 1.0.0.
3 Release 1.0.0 - 2014-03-15
Release 1.0.0 is a major upgrade of Bischeck.
3.1 New features
-
The installation script support installation as none root.
-
Cache improvements
-
Redis is now the default Bischeck cache. With redis the persistence and availability of the cached data is improved. Using Redis enable third party to easy access the cached data.
-
The number of cached data can now be set by service definition through the cache template configuration.
-
Automatic aggregations of cached service definition data is implement with resolution of hour, day, week and month.
-
END statement is support in a list based cache query like host-service-serviceitem[-24H:END]. This will retrieve all data from 24 hours ago to the oldest data that exists in the cache for the service definition.
-
Configuration improvements
-
[FR-245] “Templates for threshold and baseline definitions”.
-
Templates has been implement both for bischeck.xml and and 24thresholds.xml. Templates will improve configuration reuse.
-
Template overrides allowing overriding definitions in a template.
-
Configuration macros enable dynamic configuration.
-
Cache directives to control aggregation and cache sizing per service definition.
-
Inactivation of hosts and services
-
Automatic aggregation of monitored data on hour, day, week and month period.
-
[FR-243] “Least square method calculation” - new mathematical function to do linear prediction.
-
The execution flow has been changes from a synchronous process where each ServiceJob thread executed the whole process from data collection to send monitoring result to servers. The new architecture is based on an asynchronous design separating the ServiceJob execution and the server integration. Jetlang, https://code.google.com/p/jetlang/, is used to pass messages between the independent threads.
-
Servers integrations improvements:
-
NSCA and NRDP workers - A worker pool enables parallel access to the NSCA and NRDP server. This enables better concurrency and throughput especially for “slow” servers like NSCA in daemon mode.
-
Circuit breaks - is an optional configuration for NSCA and NRDP. If the remote server connection fails or is timing-out the circuit break will go to an OPEN state and stop sending data to the remote server for a specific time period before retrying. This prevents Bischeck from overloading the remote server.
-
[FR-250] “Adding warning and critical level values in the performance data”.
-
Logback has replaced log4j as the logging framework. Configuration of logback is done in the $BISHOME/resources/logback.xml file.
-
JEP custom development has been improved.
-
New manual set. The manuals are now divided into “Bischeck installation and administration guide”, “Bischeck configuration guide” and “Bischeck release notes”.
-
Improved JMX monitoring especially for monitoring the different execution steps with timers and counters.
3.2 Bugs fixed and important issues
-
[TR-248] “If bischeck output contains a decimal mark the systems locale settings affect the bischeck output to NSCA”
-
[TR-249] “Aggregated data can not be retrieved from the cache”
-
[TR-251] “Bug in the formatting of the threshold values”
-
All configuration management classes is moved to a separate java package.
-
Bisconf 0.3.1 do not support Bischeck 1.0.0. A later version of Bisconf is target for Bischeck 1.0.0.
-
The DocManager utility has been removed.
-
New java interface for Service and ServiceItem. This may break custom developed Service and Serviceitem classes. Review javadoc to see the changes.
3.3 Upgrading
Release 1.0.0 support upgrade from release 0.4.3. The install scripts upgrade option will not migrate the 0.4.3 cache to Redis automatically. After the upgrade has been ran the following steps must be conducted to migrate the cached Bischeck data to Redis.
-
First run the migration according to the procedure in the “Bischeck installation and administration guide”.
-
Make sure Bischeck is not started, but that Redis is started when doing the following steps.
-
Make sure the Redis related properties in the properties.xml of the new installation is correct according to your Redis installation. The data migration program MoveCache2Redis will use the setting in the properties.xml or if not set use the default values. Please see the “Bischeck installation and administration guide” for more information.
-
Run the data migration program supplied with Bischeck in the following way:
$BISHOME/bin/bischeck migration.MoveCache2Redis
or if the cache file is located in different location then default:
$BISHOME/bin/bischeck migration.MoveCache2Redis -f /tmp/lastStatusCacheDump
$BISHOME should be replaced with the path to the location of the newly installed Bischeck installation directory. The data migration program can be run with the -v option to show the data migration in verbose mode. You can also run the data migration just in test mode by using the flag -c. If the migration of cache data fails it can be rerun, but the Redis data storage should be flushed using the Redis FLUSHDB command. For more information about Redis please visit http://redis.io/.
4 Release 0.4.3 - 2013-03-27
This release is a minor upgrade.
4.1 New features
-
[FR-231] “Add Livestatus as a new server integration alongside nsca and nrdp”.
-
[FR-236] “Extended configuration of threshold for threshold class Twenty4HourThreshold”. This include the new hours format with to-from definition
-
The installer script was updated to manage configuration of JMX. The installer add switches for which JMX port to use (-p), which JMX RMI server IP address to bind the port to (-i) and if authentication should be enforced or not (-a). This is related to the [TR-238].
If Bisconf will be used its required to configure JMX RMI support by defining hostname/IP and port.
-
The internal surveillance of Bischeck that is based on JMX Mbeans was cleaned up.
-
The initial configuration files have changed to an example using a Nagios check command instead of the previous example that used a database connection to mysql. This will be easier to get started with since no JDBC drivers for mysql is required.
4.2 Bugs fixed and important issues
-
[TR-235] “NullPointer in management of exponential data”
-
[TR-237] “Faulty behavior for property lastStatusCacheDumpDir”
-
[TR-238] “Bischeck will not start without hostname in /etc/hosts”. This is not a bug in Bischeck, but more a configuration of JMX to work with remote connections. The install script how been updated to manage the this in a better way.
4.3 Upgrading
Release 0.4.3 support upgrade from release 0.4.2.
5 Release 0.4.2 - 2012-12-21
Except for the new features introduced and bugs that has been fixed in 0.4.2, there has also been some major work in making Bischeck more stable and performance enhancements running many host, service and service items configurations. Our goal has been to secure a stable resource utilization of the Java virtual machine (jvm) when running Bischeck over a long period of time.
5.1 New features
-
[FR-234] Distribute the start time when running interval based scheduling. This will increase the distribution of service executions especially if Bischeck is configured with many services that is executed on the same interval.
-
[FR-233] Support for server integration with Graphite. See section.
-
[FR-232] Support for executing check commands. The check commands that can be used must support an output of performance data. Bischeck will not care about the status that comes from a check command. Instead it will only use the performance data to evaluate its own threshold. This include the new service class ShellService and serviceitem class CheckCommandServiceItem.
-
[FR-224] Support for NRDP through the new server class NRDPServer.
-
Related to bug [TR-227 ] the naming of host, service and serviceitem names has been improved. This include quoting of dash (-) if used in a name of a host, service or serviceitem.
-
Execution statements and thresholds hour specification where cache data is retrieved as a list, like in a function as avg(x-y-z[4:10]) and max(x-y-z[-5M:-15M]), can now be configured to return a value as long as at least one index in the range is not null. To support backwards capability the new functionality will only be used if the property notFullListParse is set to true in the properties.xml. The default value is false.
-
There has been some discussion about what Nagios state should be sent if the a the returned execute statement of a service item is null. In previous releases this has been hard coded to OK, but now its possible to define it by setting the property stateOnNull. The property can be set to an integer 0,1,2 or 3 or to a string OK, WARNING, CRITICAL or UNKNOWN. The default is UNKNOWN.
-
When a service class get an exception when creating a connection the previous versions did not save any data to the cache. If the property saveNullOnConnectionError you will now get a null value inserted into the cache when a connection exception is thrown. For backwards compatibility the default value of the property is false.
-
More mathematical functions like multNull and divNull that support null value as part of the calculation and can be used with functions that take a list of numbers to manage calculation where cache data may be null.
5.2 Bugs fixed and important issues
-
[TR-227] “Cache parser do not work for host, service or serviceitems if the name include 0 (zero)” has been resolved.
-
[TR-228] “Threshold factory return wrong threshold definition if service and serviceitem name is the same for different hosts” has been resolved.
-
[TR-229] “When using service ShellService the number of open files limit will be reached” has been resolved.
-
[TR-230] “NRDP submissions all come in as OK” has been resolved.
-
Fixed migration script from 0.4.0 to copy etc directory content correctly. Changes in the file urlservices.xml will be overwritten. Existing 0.4.0 configuration will still be available in the previous version backup directory, bischeck_0.4.0.
5.3 Upgrading
Release 0.3.3, 0.4.0 and 0.4.1 is supported for upgrade to 0.4.2. The upgrading is NOT applicable for release candidate.
6 Release 0.4.1 - 2012-10-01
6.1 New features
-
[FR-224] Beta support for NRDP through the new server class NRDPServer.
-
Beta support for executing check commands. The check commands that can be used must support an output of performance data. Bischeck will not care about the status that comes from a check command. Instead it will only use the performance data to evaluate its own threshold. This include the new service class ShellService and serviceitem class CheckCommandServiceItem.
-
[FR-223] Instrumentation metrics provided by http://metrics.codahale.com/ has been implemented to enable fine grain real time process measuring through JMX. This is implemented for
-
All external interfaces to measure response time
-
Service execution time
-
Threshold processing time, etc.
6.2 Bugs fixed and important issues
-
[TR-225] “A service url that has no equivalent in urlservices.xml generate a NullPointerException” has been resolved.
-
[TR-226] “For thresholds with more the two decimals will not be correctly validated” has been fixed. The bug was related to when Bischeck was used with data points with many decimals, like the result from a ping. Bischeck would strip of all decimals except 2 with caused small values to become 0. Now Bischeck will determine the number of decimals used by the collected data point and use it when formatting the calculated threshold, warning and critical value. This will hopefully make data presented in Nagios UIs and graphs like pnp4nagios look better.
6.3 Upgrading
Release 0.4.0 is supported for upgrade to 0.4.1.
7 Release 0.4.0 - 2012-08-31
7.1 New features
-
[FR-197] Support for different and multiple integration with different surveillance and monitoring systems. With version 0.4.0 Bischeck is not limited to send data to Nagios. It can now send the data to multiple Nagios servers and to other servers like OpenTSB. This is done by moving server formatting and protocol to server integration classes that implements the interface com.ingby.socbox.bischeck.servers.Server. The server integration is described in the xml configuration file servers.xml. This also means that that some Nagios NSCA specific properties previous configured in properties.xml has been moved to the servers.xml file in the NSCA section. The OpenTSDB server class should be regarded as beta.
-
[FR-204] The bischeck cache will be saved when the bischeck daemon is shutdown and reloaded on bischeck startup. Keeping the cache persistent between restarts is important since 0.4.0 support time based cache retrieval. The limitations is currently that if the Bischeck daemon is killed by a signal that can not be caught or the daemon crash the data will not be saved. This will be improved in future versions.
-
[FR-202] The implementation of running bischeck once, in a none daemon mode, is changed so the same code is used as running in daemon mode. The only difference is that the initialization of triggers are different so all service items are just ran directly and and just once.
-
[FR-218] The bischeck daemon can now reload the configuration without a process restart. This is support through the JMX operation “reload”. The feature will limit the need of operating system access and authorization.
-
[FR-219] Bischeck can now retrieve state and performance data from a Nagios server supporting livestatus. With the service class LivestatusService a connection is set up over livestatus and with the and serviceitem class LivestatusServiceItem state and/or performance data can be retrieved from the a Nagios service. This can be useful when when creating virtual services in Bischeck or used in complex thresholds.
-
[FR-220] Bischeck now support one additional scheduling method where scheduling can be defined to run a service after a different service has executed. This can be useful when a service is depending on data for another service for its thresholds or execution statement.
-
[FR-221] Cache retrieval is now support by using a time offset to find the nearest cache element to the time offset.
-
Cache data can be retrieved as a list of elements based both on index and time.
-
Support for additional mathematical functions like average, min and max calculations on list of elements.
-
Bischeck can now support the usage of cached data in an execution statement of a serviceitem. This is typical useful when a serviceitem execute statement is depending on other service data. For example in a SQL query string:
select value from table1 where id = host1-web-state[0] and createdate = ’%%yyyy-MM-dd%%’");
-
Added support for other Linux distributions then Redhat based. Bischeck should now install on Debian 6 and Ubuntu 10/11.
-
Configuration listing. The configuration listing has been moved from the ConfigurationManager class to the DocManager class. Currently html and text listing is supported. The generated configuration data will by default placed in the bischeckdoc directory.
-
A configured service can be configured not to send its data to a the configured monitoring servers like Nagios. This can be useful if the service is just to be used to create virtual services or just to be used as thresholds.
-
The bischeck script now support JMX authentication. The authentication files are located in the etc directory and named jmxremote.password and jmxremote.access. Default is to that authentication is disabled by the system property
“-Dcom.sun.management.jmxremote.authenticate=false”. To enable authentication set the property to true. For more info about JMX see
http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html.
7.2 Bugs fixed and important issues
-
The Twenty4Thresholds class was in previous version not doing a correct linear equation calculation if a expression based threshold was defined. Lets illustrate the errors with this example from the 24thresholds.xml configuration file having a mix with static and expression based thresholds.
....
<!-- 12:00 -->
<hour>7000</hour>
<!-- 13:00 -->
<hour>testhost-testservice-testitem[1] / 3</hour>
<!-- 14:00 -->
<hour>testhost-testservice-testitem[1] / 2</hour>
<!-- 15:00 -->
<hour>testhost-testservice-testitem[1] + 1000 </hour>
<!-- 16:00 -->
<hour>12000</hour>
....
In the previous version the threshold value between 12:00 and 13:00 would be null since it was a mix of static and expression based thresholds. And between 15:00 and 16:00 the threshold would have been calculated as “testhost-testservice-testitem[1] + 1000” independent of the time between 15:00 and 16:00.
Now the linear equation will correctly be calculated with any mix of static and expression based definitions. In the above example the calculated threshold for 12:20 will now be:
20*((testhost-testservice-testitem[1]/3) - 7000)/60 + 7000
This fix will improve the correctness and also the capability of threshold adaptivity.
-
The Service interface has a number of new methods that should been there from the beginning. If you developed any service class you need to add these, but if you just inherited ServiceAbstract its fixed for you. The new methods are:
public NAGIOSSTAT getLevel();
public void setLevel(NAGIOSSTAT level);
public boolean isConnectionEstablished();
public void setConnectionEstablished(boolean connected);
public Boolean isSendServiceData();
public setSendServiceData(Boolean sendServiceData);
-
Property cacheclear is renamed to thresholdCacheClear.
-
All the nsca related properties has been moved from properties.xml to servers.xml when used for the NSCAServer class. The new property names has also gone through some minor changes. When upgrading a manual update is needed of the servers.xml file with the current setting of nsca related properties in properties.xml. Recommended that these are later removed.
-
All JAXB generated configuration classes now support serialization.
-
Quartz jar is upgraded from 2.0.1 to 2.1.5.
-
[TR-216] “Shutdown is automatic triggered”
-
[TR-217] “Configuration Manager initialization failed with java.lang.NullPointerException”
-
[TR-207] “sudo in bischeckd script cause problem at boot”
7.3 Upgrading
Release 0.3.3 and 0.4.0_RC2 are supported for upgrade to 0.4.0.
8 Release 0.3.3 - 2011-11-14
8.1 New features
-
Bischeck are no longer limited to just be integrated with a single Nagios server over the NSCA protocol. Now is it possible to integrate with multiple monitoring servers over different protocols. Currently Nagios/NSCA and OpenTSDB is support. To enable this a new class component called Server has been introduced. The class is responsible for communication and formatting for the specific monitoring server it integrate against. A new configuration file, server.xml is used for configuration of server integration.
8.2 Bugs fixed and important issues
-
[TR-207] “sudo in bischeckd script cause problem at boot”
-
[TR-214] “Threshold object cache is no cleared”
9 Release 0.3.2 - 2011-07-29
9.1 New features
-
The configuration system has been completely rewritten and now us xml based configuration files. Each configuration file has a corresponding xsd file that can be used for verifications. The dependencies to sqlite3 has been deprecated and is just part of this release to support upgrade.
-
The scheduling of services and its related serviceitem(s) has been rewritten to support different scheduling polices per service instead of earlier versions of fixed interval scheduling. With 0.3.2 each service can have one to many schedule tags in bischeck.xml configuration file. For more info please see the chapter on page 1↓.
9.2 Bugs fixed and important issues
-
The active attribute on Hosts, Services and Serviceitem has been removed.
-
The interface com.ingby.socbox.bischeck.threshold.Threshold has a new signature on the method init(). This method now throws Exception.
-
The Service interface has two additional methods, setSchedules() and getSchedules().
-
The Service interface has changed the signature of getServicesItems() to return Map instead of HashMap.
-
buildr has been replaced by ant as the build management system.
10 Release 0.3.1 - 2011-04-08
10.1 New features
-
The ServiceFactory class now use a property table, urlservice, to map what Service class should be instantiate for a specific url schema. The url schema is the key. The current default mapping are:
-
jdbc -> JDBCService
-
bischeck -> LastCacheService
-
The ServiceItemFactory class use an additional field, serviceitemclass, in the items configuration table to determine what ServiceItem class to instantiate.
-
Calendar in Bischeck follows the ISO 8601 date standard by default. This means that the first day in the week is Monday, day 2 according to java.utilCalendar, and that the first week of the year must have a minimum of 4 days. The importance of this is to get the week numbering correct that is used in the configuration in Twenty4HourThreshold class, but day one (1) in the week is still Sunday when defining the tag dayofweek in 24threshols.xml. The setting can be overidden by setting the properties “mindaysinfirstweek” (default 4) and “firstdayofweek” (default 2) in the properties.xml file.
-
If no threshold class has been specified, null in the thresholdclass field in the items table, Bischeck will instantiate the “empty” class DummyThreshold.
-
For all class configuration of Service, ServiceItem and Threshold its now possible to specify the class name without the package path if the class is part of the Bischeck distribution.
-
Clean up of the exception handling process when starting Bischeck. Now the execution should not start if there are configuration issues with missing classes for Service, ServiceItem and Threshold.
10.2 Bugs fixed and important issues
10.3 Upgrade issues
-
Upgrade by doing a fresh installation, but first save the old installation directory. After saving the old installation do a new install. Then copy the files bischeck.conf and 24threshold.conf from old to new installation dir.
-
The field serviceitemclass (varchar(256)) in table items in configuration database bischeck.conf must be manual added and populated with the right Service class name. If corresponding service is jdbc:// set the field serviceitemclass to SQLServiceItem and if the service is bischeck:// set the field to CalculateOnCache.
-
To add the column:
$ sqlite3 bischeck.conf sqlite> ALTER TABLE items ADD COLUMN serviceitemclass varchar(256);
-
Update the serviceitemclass for all rows in items:
-
sqlite> update items set serviceitemclass=’SQLServiceItem’ where .... ....
-
sqlite> update items set serviceitemclass=’CalculateOnCache’ where ..... ....
-
Add the new table url2service in database bischeck.conf.
$ cat << EOF | sqlite3 bischeck.conf
drop table IF EXISTS urlservice;
create table urlservice(key varchar(128), value varchar(256));
insert into urlservice values ("jdbc","JDBCService");
insert into urlservice values ("bischeck","LastCacheService");
EOF
-
Copy all file located in the old installation customlib directory to the customlib directory in the new installation.
11 Release 0.3.0 - 2011-03-03
11.1 New features
-
This is the first binary distribution, but should be regarded as a beta version.
11.2 Bugs fixed and important issues