Netways OSMC was an awesome conference with lots of excellent talks. If you are interested in our talk you can check out the presentation about our favorite topic about bischeck and what dynamic and adaptive thresholds can enable in your monitoring environment.
All posts by Anders Håål
Prediction based monitoring
With the upcoming version 1.0.0 of Bischeck we have add some new capability for prediction based monitoring. Prediction mean that we use historical data to calculate the future, commonly called regression analysis. Continue reading Prediction based monitoring
Bisdw – a simple ETL tool
Bisdw is a simple ETL tool that we developed for monitoring use case that demanded us to retrieve data from different source and put it into a local database. As a tool it can be used independent of Bischeck. Most of the ETL logic is provided by the Scriptella project. What we have added are functionality for scheduling, FTP integration, init scripts, etc.
You can download Bisdw from gforge.ingby.com. Documentation is available here.
Bischeck 1.0.0 RC1 is available
After a little longer than expected, we finally have RC1 of version 1.0.0 available. This is not a production ready version and should only be used for testing. We hope to get feedback and bug reports from all of you who take the time to test.
RC1 does not support upgrade from 0.4.3, but should run with your current configuration files. And if you like to use the existing cached data from your bischeck 0.4.3 you need to migrate it to redis cache as explained below. Continue reading Bischeck 1.0.0 RC1 is available
Highlight features for the future Bischeck 1.0.0
Bischeck 1.0.0 will include a number of new features and enhancements. The plan is to get a release candidate out in the end of the summer. Thanks to all the people that used Bischeck and given feedback and provided feature requests. Continue reading Highlight features for the future Bischeck 1.0.0
Netways Open Source Monitoring Conference 2013
Do not miss Netways Open Source Monitoring Conference in Nuremberg, Germany, the 22 to 24 of October. We will be there taking about our favorite topic in monitoring.
New white paper about Bischeck
New white paper describing some of the unique capabilities with Bischeck. You can check it out at in the Documentation section.
Bisconf 0.3.1 released
A minor release of Bisconf fixing 3 bugs. Please upgrade your 0.3.0 with this release. Please see the Bisconf documentation about the bug fix and download it from this link.
Bischeck distributed with OP5 Monitor 6.1
OP5 will include Bischeck in the new upcoming release of Monitor 6.1. Thanks to all at OP5 that worked with the release.
bischeck 0.4.3 and bisconf 0.3.0 released
We are pleased to release bischeck 0.4.3 and bisconf 0.3.0. To get all the details about about new features and bug fixes please see the documentation for Bischeck and Bisconf.
As usual any feedback is appreciated.
Icinga writes about Bischeck
The Icinga project has published a short QA about bischeck on the Icinga blog.
bisconf 0.2.0 released
Bischeck 0.4.2 performance testing
Introduction
Performance testing is key to secure that your software can handle the load and to verify the robustness of the software. With server based software, running as a daemon, it is especially important to verify that the software is stable during a long period of continues uptime without decreased throughput and by leaking resources, like memory.
Since bischeck is designed to do advanced service check with dynamic and adaptive thresholds we know that cpu and memory will be important resources when operating with mathematical algorithms over historical collected data.
The test setup will start with a baseline that is scaled in two dimensions, increase the load by increase the number of service jobs and increase the load by decrease the interval between service job schedules.
Read the full benchmark report
bischeck 0.4.2 released
We are pleased to release bischeck 0.4.2. To get all the details about 0.4.2 please read more about the features in the documentation.
As usual any feedback is appreciated.
bischeck 0.4.2 RC2 is released – test now!
We are pleased to release the second release candidate for bischeck 04.2. The major change in this release candidate is how null values are managed for mathematical functions that takes a list of arguments like sum and avg. Read more about this feature in the documentation.
This release include the following features and fixes:
New feature
• Related to bug [TR-227 ] the naming of host, service and serviceitem names has been improved.
• Execution statements and thresholds hour specification where cache data is retrieved as a list, like in a function as avg(x-y-z[4:10]) and max(x-y-z[-5M:-15M]), can now be configured to return a value as long as at lest one index in the range is not null. To support backwards capability the new functionality will only be used if the property notFullListParse is set to true in the properties.xml. The default value is false.
• There has been some discussion about what Nagios state should be sent if the a the returned execute statement of a service item is null. In previous releases this has been hard coded to OK, but now its possible to define it by setting the property stateOnNull. The property can be set to an integer 0,1,2 or 3 or to a string OK, WARNING, CRITICAL or UNKNOWN. The default is UNKNOWN.
• When a service class get an exception when creating a connection the previous versions did not save any data to the cache. If the property saveNullOnConnectionError you will now get a null value inserted into the cache when a connection exception is thrown. For backwards compatibility the default value of the property is false.
Bugs fixed and important issues
• [TR-227] “Cache parser do not work for host, service or serviceitems if the name include 0 (zero)” has been resolved.
• [TR-228] “Threshold factory return wrong threshold definition if service and serviceitem name is the same for different hosts” has been resolved.
• [TR-229] “When using service ShellService the number of open files limit will be reached” has been resolved.
• [TR-230] “NRDP submissions all come in as OK” has been resolved.
• Fixed migration script from 0.4.0 to copy etc directory content correctly. Changes in the file urlservices.xml will be overwritten. Existing 0.4.0 configuration will still be available in the previous version backup directory, bischeck_0.4.0.
Bad directory location
Bischeck use the directory /var/tmp to store log files, pid file and persistent cache data. For logs this is not a bad location, but for pid file and cache data this is not a very smart location. The main reason for this are that if your bischeck process will run for a very long time, which it should, there is a risk that your pid file and cache data will be removed. This is due to the fact that distributions like Centos has a cron script that run a command tmpwatch that remove files in different “tmp” directories if files are not updated for a long time. This can be fixed by changing the cron script, /etc/cron.daily/tmpwatch on Centos or by changing the directory location by the properties in bischeck configuration file properties.xml.
The properties to change are:
- pidfile – default is /var/tmp/bischeck.pid
- lastStatusCacheDumpDir – default is /var/tmp/
bischeck 0.4.2 RC1 is released – test now!
We are pleased to release bischeck 0.4.2 release candidate 1. This release include the following features and fixes:
New feature
- Related to bug [TR-227 ] the naming of host, service and serviceitem names has been improved. For more info please see 8.1↑
- Execution statements and thresholds hour specification where cache data is retrieved as a list, like in a function as avg(x-y-z[4:10]) and avg(4,6,8), can now be configured to not return a null value if at least the first index in the list definition has a cached value. This means for the example that if, at least, index 4 as a value for the x-y-z an average will be calculated. To support backwards capability the new functionality will only be used if the property notFullListParse is set to true in the properties.xml. The default value is false.
- There has been some discussion about what Nagios state should be sent if the a the returned execute statement of a service item is null. In previous releases this has been hard coded to OK, but now its possible to define it by setting the property stateOnNull. The property can be set to an integer 0,1,2 or 3 or to a string OK, WARNING, CRITICAL or UNKNOWN. The default is UNKNOWN.
- When a service class get an exception when doing a connection the previous versions did not save any data to the cache. If the property saveNullOnConnectionError you will now get a null value inserted into the cache when a connection exception is thrown. For backwards compatibility the default value of the property is false.
Bugs fixed and important issues
- [TR-227] “Cache parser do not work for host, service or serviceitems if the name include 0 (zero)” has been resolved.
- [TR-228] “Threshold factory return wrong threshold definition if service and serviceitem name is the same for different hosts” has been resolved.
- [TR-229] “When using service ShellService the number of open files limit will be reached” has been resolved.
- [TR-230] “NRDP submissions all come in as OK” has been resolved.
- Fixed migration script from 0.4.0 to copy etc directory content correctly. Changes in the file urlservices.xml will be overwritten. Existing 0.4.0 configuration will still be available in the previous version backup directory, bischeck_0.4.0.
bischeck quick start is available
Check out our new quick start for bischeck.
Limitation in host, service and serviceitem naming
Currently we have a naming limitation in the naming of a host, service and serviceitem. The issue is seen when using dynamic thresholds that do calculations on cached entries. When describing a cache entry in the 24threshols.xml file in a hour tag you should use the format of host-service-serviceitem, erphost-erpOrders-weborders. The problem with the current format is that the names given must be based on any letter, upper or lower case, and the number 1-9. Yes the missing of 0 is a major bug. Execept for the 0 bug the format has the following limitations:
- Dash (-) is used as the separator between the host, service and serviceitem name, which means that using dash in the name is a problem.
- Other characters like dot (.), plus (+), underscore (_) or any other character then the described above is not supported. This is a major weakness since many will use, for example dot and underscore in their existing Nagios host and service name.
bischeck 0.4.1 released – upgrade now!
We have made a quick new release of bischeck due to a bug that caused truncation of all measured and threshold values with more then 2 decimal values. This caused some obvious problems, especially if we are measuring stuff like network times. So if you are monitoring these kind of stuff please upgrade asap.
Since we had some new stuff in the trunk we chose to include it to, but they should be regarded as beta functionality. The new functionality are:
- Sending passive checks over NRDP as an alternative to NSCA
- New Service and serviceitem that support execution of local check commands. With this functionality any Nagios check commands that output performance data can now be executed through bischeck. The state is of course ignored since bischeck will do its own threshold calculation of the performance data. Thanks to Eric Loyd at Bitnetix (www.bitnetix.com) that gave me the idea during Nagios World 2012.
For more information about this new functionality please check out the 0.4.1 README. Feedback on 0.4.1 is more then welcomed.
To download bischeck 0.4.1 please visit our download area.