The last weeks I was working on a script to automatically installing patches in our different vsphere environments.
While working on that script, i came across the fact that i had to disable the monitoring while the components the script is patching are rebooted. That was the point i needed access to our monitoring system. We use the software check_mk. (https://checkmk.com)
After reading the docs and search the forums i found out that there seems to be not an easy way to accomplish my wishes. Because on my wishlist is:
- Check if the host i want to patch is actually present in check_mk
- Check if the host in check_mk is shown as healthy
- Set a downtime for the host in check_mk
- After patching the host, remove the downtime for the host in check_mk
- Check again if the host in check_mk is shown as healthy
My problem with check_mk is, that there is not only one api for all of the tasks above. From what i found out till now there exists 3 ways to accomplish these tasks:
1.) check_mk web api
Description: https://checkmk.com/cms_web_api.html
Command reference: https://checkmk.com/cms_web_api_references.html
With this api you can manage what to monitor as you would do it on the WATO part of the management interface of check_mk. (https://checkmk.com/cms_wato.html) This api help me to find out if the host i want to patch is configured in check_mk or not with the function get_host. (https://checkmk.com/cms_web_api_references.html#get_host)
So #1 of my wishlist above is fulfilled.
Sadly it is not possible to get status informations via this api. I can only get configuration informations about the hosts to be monitored.
2.) check_mk Livestatus
Description: https://checkmk.com/cms_livestatus.html
Livestatus is simply a tcp port on the check_mk system which offers status informations about the monitored hosts. It has no authentication mechanism and is not encrypted. So access to it should be restricted.
This is in my situation a problem, because i don’t have the necessary firewall rules and permissions to access it. (Security is being taken seriously in my company.) Because of that, i searched for other ways getting the status of a host.
3.) check_mk view.py
After excessive searching i learned that it is possible to send requests to view.py. It is the same way a user does it with a browser. The tricky part is to know the right parameters to submit.
Since now i found no good documentation to this part of check_mk. I have only this information snippets:
- https://checkmk.com/cms_legacy_multisite_automation.html
Short documentation about how to access the view.py. After an introduction it simply states that i should access the check_mk web interface and analyze the browser url after doing the task i want to automate.
This documentation is marked as old and has a link to the new documentation. But there i found no informations about view.py or what to use instead. - /opt/omd/versions/<your check_mk version>/share/doc/check_mk/treasures/downtime
Somewhere i found a hint that on every check_mk installation there is a script for setting and deleting the downtime of a host. It can be found in the path /opt/omd/versions/<your check_mk version>/share/doc/check_mk/treasures/downtime.
With this script it is possible to set and remove downtimes on hosts in check_mk. The script calls the view.py with the appropriate parameters.
With all these information snippets i started to play around and cobbled together the needed parameters to pass to view.py. I’m now able to set and delete a downtime and also found a way to get the status of a host via view.py.
So #2,#3,#4 and #5 of my wishlist is fulfilled and i could start coding.
The script
I only have a very basic understanding of web requests and how to handle them. I normally using powershell modules like powercli. So i didn’t had the need to directly accessing an api. But i tried my best to do some error handling. In my test environment it worked very well. So i think its maybe not the best way get the job done, but it works!
To keep it simple to handle the requests, i created two powershell functions. You can just include the definition of these functions into your script and use it as shown in the example at the end of the script.
#!/usr/bin/env pwsh # Declaration of a function to check if a host is healthy in check_mk function checkmkhealth { # parameter definition param( [Parameter(Position=0, Mandatory=$true)] [string]$checkmkserver, [Parameter(Position=1, Mandatory=$true)] [string]$checkmkinstance, [Parameter(Position=2, Mandatory=$true)] [string]$checkmkuser, [Parameter(Position=3, Mandatory=$true)] [string]$checkmksecret, [Parameter(Position=4, mandatory=$true)] [string]$hostname, [Parameter(mandatory=$false)] [bool]$https = $false ) ####################### # variable definition # ####################### # declare return variable for the function as hashtable $return = @{} # decide wether to use https or not if ($https) { $URI = "https://" } else { $URI = "http://" } $URI += "$checkmkserver/$checkmkinstance/check_mk/view.py" # build the web request for displaying the host with its status $postParams = @{ _username=$checkmkuser _secret=$checkmksecret _transid='-1' host_regex=$hostname # the hostname to search for view_name='allhosts' output_format='json' } # execute the web request $result = Invoke-WebRequest -Uri $URI -Method Post -Body $postParams # check if a respones was getting back if (!$result) { Write-Host "error while connecting the check_mk api." Write-Host $return.status = 1 $return.msg = "error while connecting the check_mk api." # return the error Return $return } # check if the web request was successful if ($result.StatusCode -ne 200) { Write-Host "error in the response from the check_mk api." Write-Host Write-Host $result $return.status = 1 $return.msg = "error in the response from the check_mk api." # return the error Return $return } # convert the string content of the response into a powershell json object $json = $result.content | convertfrom-json # check if the json object has only 2 lines. # 1st line are the property names # 2nd line are the values if ($json.count -ne 2) { Write-Host "found not exactly one host object in check_mk." Write-Host Write-Host $result $return.status = 1 $return.msg = "found not exactly one host object in check_mk." # return the error Return $return } # create a empty hashtable $hoststatus = @{} # loop through the first line of the json object to build a proper hashtable for ($i=0; $i -lt ($json[0].length - 1); $i++ ) { # add the property name and the value to the hashtable object $hoststatus.($json[0][$i]) = $json[1][$i] } # check if anything is wrong with the host if ( ($hoststatus.num_services_warn -ne 0) -or ($hoststatus.num_services_crit -ne 0) -or ($hoststatus.num_services_unknown -ne 0) ) { # something is wrong with this host says check_mk Write-Host "something is wrong with this host says check_mk." Write-Host Write-Host "Warnings: " $hoststatus.num_services_warn Write-Host "Critical: " $hoststatus.num_services_crit Write-Host "Unknown: " $hoststatus.num_services_unknown Write-Host Write-Host $result $return.status = 1 $return.msg = "something is wrong with this host says check_mk." } else { # no errors found on host Write-Host "found no errors with host $hostname." Write-Host Write-Host $result $return.status = 0 $return.msg = "found no errors with host $hostname." } # return the error or the success message Return $return } # Declaration of a function for set and remove a downtime function checkmkdowntime { # parameter definition param( [Parameter(Position=0, Mandatory=$true)] [string]$checkmkserver, [Parameter(Position=1, Mandatory=$true)] [string]$checkmkinstance, [Parameter(Position=2, Mandatory=$true)] [string]$checkmkuser, [Parameter(Position=3, Mandatory=$true)] [string]$checkmksecret, [Parameter(Position=4, mandatory=$true)] [string]$hostname, [Parameter(Position=5,ParameterSetName='setdowntime', mandatory=$true)] [switch]$setdowntime, [Parameter(Position=5,ParameterSetName='releasedowntime', mandatory=$true)] [switch]$releasedowntime, [Parameter(Position=6,ParameterSetName='setdowntime', mandatory=$true)] [int]$downtimeminutes, [Parameter(Position=7,ParameterSetName='setdowntime', mandatory=$true)] [string]$downtimecomment, [Parameter(mandatory=$false)] [bool]$https = $false ) ####################### # variable definition # ####################### # declare return variable for the function as hashtable $return = @{} # decide wether to use https or not if ($https) { $URI = "https://" } else { $URI = "http://" } $URI += "$checkmkserver/$checkmkinstance/check_mk/webapi.py" # build the web request for checking if the host exists in check_mk $Body = @{ action="get_host" _username=$checkmkuser _secret=$checkmksecret request_format="json" output_format="json" hostname=$hostname } # execute the web request $result = Invoke-WebRequest -Uri $URI -Method Post -Body $Body # check if a respones was getting back if (!$result) { Write-Host "error while connecting the check_mk api." Write-Host $return.status = 1 $return.msg = "error while connecting the check_mk api." # return the error Return $return } # check if the web request was successful if ($result.StatusCode -ne 200) { Write-Host "could not connect to the check_mk api." Write-Host Write-Host $result $return.status = 1 $return.msg = "could not connect to the check_mk api." Return $return } # check if a json object was returned if ( $result.content[0] -ne "{" ) { Write-Host "no json object was returned. error occoured." Write-Host Write-Host $result.content $return.status = 1 $return.msg ="no json object was returned. error occoured." Return $return } # check if the check_mk api returned an error if ( ($result.content | ConvertFrom-Json).result_code -ne 0 ) { Write-Host "error returned from within check_mk." Write-Host Write-Host ($result.Content | ConvertFrom-Json).result $return.status = 1 $return.msg = "error returned from within check_mk." Return $return } # check if a downtime has to be set and build the appropriate request body if ($setdowntime) { Write-Host "set downtime on $hostname for $downtimeminutes minutes." $postParams = @{ _username=$checkmkuser _secret=$checkmksecret _transid='-1' _do_confirm='yes' _do_actions='yes' host=$hostname _down_from_now='yes' _down_minutes=$downtimeminutes _down_comment=$downtimecomment view_name='hoststatus' } } # check if downtime has to be deleted and build the appropriate request body if ($releasedowntime) { Write-Host "delte downtime on $hostname." $postParams = @{ _username=$checkmkuser _secret=$checkmksecret _transid='-1' _do_confirm='yes' _do_actions='yes' host=$hostname _remove_downtimes='Remove' _down_remove='Remove' view_name='hoststatus' } } # clear return variable $result = $null # decide wether to use https or not if ($https) { $URI = "https://" } else { $URI = "http://" } $URI += "$checkmkserver/$checkmkinstance/check_mk/view.py" # execute the web request $result = Invoke-WebRequest -Uri $URI -Method Post -Body $postParams # check if a response was getting back if (!$result) { Write-Host "error while connecting the check_mk api." Write-Host $return.status = 1 $return.msg = "error while connecting the check_mk api." # return the error Return $return } # check if the web request was successful If ( $result.StatusCode -ne 200 ) { Write-Host "could not connect to the check_mk api." Write-Host Write-Host $result $return.status = 1 $return.msg = "could not connect to the check_mk api." # return the error Return $return } # check if the check_mk api returned an error if ( !($result.Content | Select-String -Pattern "<div class=""error"">") ) { Write-Host "execution successful" $return.status = 0 $return.msg = "execution successful" # return the message Return $return } else { Write-Host "execution was not successful" $retunr.status = 1 $return.msg = "execution was not successful" # return the error Return $return } } ###########End of function declarations####################### ############ # Example # ############ # connection variables $mycheckmkserver = "<your-checkmk-server>" $mycheckmkinstance = "<the-name-of-your-checkmk-instance>" $mycheckmkuser = "<your-checkmk-automation-user>" $mycheckmsecret = "<insert-your-automation-secret-here>" # host to be checked, set downtime, patched, removed downtime and checked again $myhost = "<insert-the-hostname-which-has-to-be-patched-here>" #################################### # Step 1: check if host is healthy # #################################### # execute function to check the health of the host $result = checkmkhealth -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost # if host was not healthy, write error and exit if ($result.status -ne 0) { Write-Host "check_mk says the host $myhost is not healthy. Stopping script." Write-Output $result exit } ####################################### # Step 2: set a downtime for the host # ####################################### # execute function to set downtime $result = checkmkdowntime -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost -setdowntime -downtimeminutes 30 -downtimecomment "Don't worry. I'm just beeing patched." # if execution was not successful, write error and exit if ($result.status -ne 0) { Write-Host "setting the downtime for $myhost was not successful. Stopping script." Write-Output $result exit } ####################### # Step 3: Maintenance # ####################### # Do your thing with the host. # Patch it, reboot it or whatever. # It's up to you. # But for demonstration we can run a simple sleep statement at this step Start-Sleep -Seconds 60 ########################### # Step 4: delete downtime # ########################### # execute function to delete downtime $result = checkmkdowntime -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost -releasedowntime # if execution was not successful, write error and exit if ($result.status -ne 0) { Write-Host "deleting the downtime for $myhost was not successful. Stopping script." Write-Output $result exit } ############################ # Step 5: check host again # ############################ # execute function to check the health of the host $result = checkmkhealth -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost # if host was not healthy, write error and exit if ($result.status -ne 0) { Write-Host "check_mk says the host $myhost is not healthy after the change. Stopping script. Good luck with the repair." Write-Output $ergebnis exit } ####### # end # #######
Acknowledgments
Last but not least, i want to thank Karl Widmer. (Twitter Link -> https://twitter.com/widmerkarl)
After a short tweet on twitter he asked me to write a blog post about my findings. This pushed me forward to start blogging again. The fun part was that while i collected all the websites i got my informations from, i found a solution to get the status of a host via the view.py. A problem which i could not solve till then. So Karl, as my drill inspector, pushed me forward to write things down and find a solution for the problem. Thank you very much Karl!
great script – head problems with $https boolean and removed the option
Could you describe your problems more in detail?
Would be interested to know what I can improve.
Good Job! Very useful
Found a wrong variable name and corrected and updated the post.