Hopefully this is the final release candidate before its time for the real version 1.0.0 of Bischeck. We still lacking the new manuals, but we will do the best to get it out as soon as possible.
Thanks to everybody that has tested RC1. A special thanks to Pasquale Settanni at Eutelsat Broadband for his testing effort and valuable feedback.
Bischeck 1.0.0 RC2 can be downloaded from here. To read what is already in RC1 check out this post.
Hope you will enjoy the final release candidate and as always, feedback is appreciated.
Contents
Installing Bischeck 1.0.0 RC2
RC2 is installed in the same way as RC1, but with some new functionality. The install script now support the following additional features:
- If the installation is done as a none root user, effective user id is not root, Bischeck will be installed as the user running the install script. This means that the init script, bischeckd, will not be installed in /etc/init.d.
- Additional validation has been added for installation and upgrade.
- Better information about the installation steps.
New configuration options
The new template configuration has some additional functionality in RC2.
Template overrides
Template overrides is intended to even increase the flexibility with templates. Overrides are supported for service and serviceitems templates.
XML |copy code |?
01 . . .02 <host>
03 <name>host1</name>04 <alias>127.0.0.1</alias>05 <desc>Host host1</desc>06 07 <service>
08 <template>sshporttemplate</template>09 <serviceoverride>
10 <name>myssh</name>11 <alias>10.10.10.10</alias>12 <schedule>20S</schedule>13 <schedule>30S</schedule>14 </serviceoverride>
15 </service>
16 17 </host>
18 . . .
A serviceoverride can override all attributes in a service template and a serviceitemoverride and override all attributes in a serviceitem template.
Cache templates
The cache tag located in the serviceitem and serviceitem template can now be specified as a cache template.
XML |copy code |?
01 . . .02 <serviceitemtemplate templatename="sshresponsetimetemplate1">
03 <name>response1</name>04 <desc>Response time for tcp check</desc>05 <execstatement>{"check":"/usr/lib/nagios/plugins/check_tcp -H $$HOSTALIAS$$ -p 22","label":"time"}</execstatement>06 <thresholdclass>Twenty4HourThreshold</thresholdclass>07 <serviceitemclass>CheckCommandServiceItem</serviceitemclass>08 <cache>
09 <template>defCache</template>10 </cache>
11 12 </serviceitemtemplate>
13 14 <cachetemplate templatename="defCache">
15 <aggregate>
16 <!-- Aggregate with using average -->
17 <method>avg</method>18 <!-- Include weekend data in the aggregation -->
19 <useweekend>false</useweekend>20 <!--
21 Define retention for the aggregated periods.
22 If no retention is define for a period no retention will be done.
23 Periods that can be define are (H)our,(D)ay, (W)eek and (M)onth
24 -->
25 <retention>
26 <!-- Purge hours after after 7 days (24*7) -->
27 <period>H</period>28 <offset>168</offset>29 </retention>
30 <retention>
31 <!-- Purge days after 60 days -->
32 <period>D</period>33 <offset>60</offset>34 </retention>
35 </aggregate>
36 <!--
37 Define purge rules for the data that is collected with this serviceitem
38 -->
39 <purge>
40 <!-- The max number of items -->
41 <maxcount>1000</maxcount>42 </purge>
43 </cachetemplate>
44 . . .45
Inactivation of hosts and services
With RC2 it possible to set a host and a service in a inactivation state. Setting it on a host means that none of the related services for the host will be configured. On a service it means that the specified service will not be configure. This is an easier way then comment out or delete hosts and services that should temporarily not be configured. The inactivation can also be define as an override option but not in a service template.
. . .
XML |copy code |?
01 <host>
02 <name>host0</name>03 <inactive>false</inactive>04 <alias>127.0.0.1</alias>05 <desc>Host host0</desc>06 07 <service>
08 <template>sshporttemplate</template>09 </service>
10 <service>
11 <template>webporttemplate</template>12 <serviceoverride>
13 <name>webmain</name>14 <inactive>true</inactive>15 <alias>10.10.10.10</alias>16 </serviceoverride>
17 </service>
18 19 </host>
20 . . .
By default inactive is set to false.
For more examples on all new configuration options please see the bischeck.xml file located in the etc directory part of the installation.
NSCA and NRDP workers
In RC2 the NSCAServer and NRDPServer classes as been implemented with a worker pool to increase the concurrency. This is especially usefully for slow servers, like the NSCA in daemon mode. Bischeck will dynamically scale the pool size depending on the need. The idea is to implement the same pattern for the other Server classes in the future.
Circuit break pattern
The NSCA and NRDP server classes has also been improved by a circuit breaker, inspired by Michael Nygard’s circuit break pattern in the excellent book “Release It!“. The circuit breaker can detect if the remote server is not available and stop sending data to secure that the remote server is not overloaded. The circuit breaker is configurable by the properties in the server.xml.
XML |copy code |?
01 <server name="NSCA-1">
02 <class> NSCAServer </class>03 <property>
04 <key>hostAddress</key>05 <value>172.25.1.212</value>06 </property>
07 <property>
08 <key>encryptionMode</key>09 <value>XOR</value>10 </property>
11 <property>
12 <key>password</key>13 <value>nsca</value>14 </property>
15 <property>
16 <key>port</key>17 <value>5667</value>18 </property>
19 <property>
20 <key>connectionTimeout</key>21 <value>5000</value>22 </property>
23 24 <property>
25 <key>cbEnable</key>26 <value>true</value>27 </property>
28 29 </server>
On line 25 the circuit break is enabled for the NSCA server configuration. Circuit breaker is not enabled by default. Other properties are:
- cbAttempts – number of times the connection must fail before the breaker is opened, default is 5.
- cbTimeout – the number of milliseconds the breaker is opened before it tested again, default is 60000.
The circuit breaks properties and the enabling can be set through Bischeck JMX interface.
It’s important to understand that Bischeck will drop any data that was not possible to send to a specific server. This behavior is not change by using circuit breaker. Even if data is dropped due to a server is not running the data is always available in the Bischeck cache.
Nagios performance data output [FR-250]
It has been requested that the Nagios performance data should include the warning and critical data as its own data entity, so it can be stored in RRD and be plotted through pnp4nagios or equivalent solutions. The extended format will be like the following example:
response=0.000192;0.000167;0.000158;0; threshold=0.000176;0;0;0; warning=0.000167;0;0;0; critical=0.000158;0;0;0; avg-exec-time=13ms
To get this extended format the following property must be set in the properties.xml file:
XML |copy code |?
1 . . .2 <property>
3 <key>NagiosUtil.extendedformat</key>4 <value>true</value>5 </property>
6 . . .
The default is false. Setting this to true will make the extend format valid for all performance data created by Bischeck.
It’s important to understand that this do not work well if the threshold method used by the serviceitem is set to interval (=). It will also be confusing for RRD and the graph generation if a service include multiple serviceitems. Maybe some expert on RRD and pnp4nagios php scripts can have some input.
Bug fixed
[TR-249] – Aggregated data can not be retrieved from the cache
[TR-251] – “Bug in the formatting of the threshold values”