Solving problems invented by others...
My way to access check_mk monitoring software via powershell script

My way to access check_mk monitoring software via powershell script

The last weeks I was working on a script to automatically installing patches in our different vsphere environments.

While working on that script, i came across the fact that i had to disable the monitoring while the components the script is patching are rebooted. That was the point i needed access to our monitoring system. We use the software check_mk. (https://checkmk.com)

After reading the docs and search the forums i found out that there seems to be not an easy way to accomplish my wishes. Because on my wishlist is:

  1. Check if the host i want to patch is actually present in check_mk
  2. Check if the host in check_mk is shown as healthy
  3. Set a downtime for the host in check_mk
  4. After patching the host, remove the downtime for the host in check_mk
  5. Check again if the host in check_mk is shown as healthy

My problem with check_mk is, that there is not only one api for all of the tasks above. From what i found out till now there exists 3 ways to accomplish these tasks:

1.) check_mk web api

Description: https://checkmk.com/cms_web_api.html

Command reference: https://checkmk.com/cms_web_api_references.html

With this api you can manage what to monitor as you would do it on the WATO part of the management interface of check_mk. (https://checkmk.com/cms_wato.html) This api help me to find out if the host i want to patch is configured in check_mk or not with the function get_host. (https://checkmk.com/cms_web_api_references.html#get_host)

So #1 of my wishlist above is fulfilled.

Sadly it is not possible to get status informations via this api. I can only get configuration informations about the hosts to be monitored.

2.) check_mk Livestatus

Description: https://checkmk.com/cms_livestatus.html

Livestatus is simply a tcp port on the check_mk system which offers status informations about the monitored hosts. It has no authentication mechanism and is not encrypted. So access to it should be restricted.

This is in my situation a problem, because i don’t have the necessary firewall rules and permissions to access it. (Security is being taken seriously in my company.) Because of that, i searched for other ways getting the status of a host.

3.) check_mk view.py

After excessive searching i learned that it is possible to send requests to view.py. It is the same way a user does it with a browser. The tricky part is to know the right parameters to submit.

Since now i found no good documentation to this part of check_mk. I have only this information snippets:

  • https://checkmk.com/cms_legacy_multisite_automation.html
    Short documentation about how to access the view.py. After an introduction it simply states that i should access the check_mk web interface and analyze the browser url after doing the task i want to automate.
    This documentation is marked as old and has a link to the new documentation. But there i found no informations about view.py or what to use instead.
  • /opt/omd/versions/<your check_mk version>/share/doc/check_mk/treasures/downtime
    Somewhere i found a hint that on every check_mk installation there is a script for setting and deleting the downtime of a host. It can be found in the path /opt/omd/versions/<your check_mk version>/share/doc/check_mk/treasures/downtime.

With this script it is possible to set and remove downtimes on hosts in check_mk. The script calls the view.py with the appropriate parameters.

With all these information snippets i started to play around and cobbled together the needed parameters to pass to view.py. I’m now able to set and delete a downtime and also found a way to get the status of a host via view.py.

So #2,#3,#4 and #5 of my wishlist is fulfilled and i could start coding.

The script

I only have a very basic understanding of web requests and how to handle them. I normally using powershell modules like powercli. So i didn’t had the need to directly accessing an api. But i tried my best to do some error handling. In my test environment it worked very well. So i think its maybe not the best way get the job done, but it works!

To keep it simple to handle the requests, i created two powershell functions. You can just include the definition of these functions into your script and use it as shown in the example at the end of the script.

#!/usr/bin/env pwsh

# Declaration of a function to check if a host is healthy in check_mk
function checkmkhealth
{
    # parameter definition
    param(
        [Parameter(Position=0, Mandatory=$true)]    
        [string]$checkmkserver,
        [Parameter(Position=1, Mandatory=$true)]
        [string]$checkmkinstance,
        [Parameter(Position=2, Mandatory=$true)]
        [string]$checkmkuser,
        [Parameter(Position=3, Mandatory=$true)]
        [string]$checkmksecret,
        [Parameter(Position=4, mandatory=$true)]
        [string]$hostname,
        [Parameter(mandatory=$false)]
        [bool]$https = $false
    )

    #######################
    # variable definition #
    #######################

    # declare return variable for the function as hashtable
    $return = @{}

    # decide wether to use https or not
    if ($https)
    {
        $URI = "https://"
    }
    else
    {
        $URI = "http://"
    }
    
    $URI += "$checkmkserver/$checkmkinstance/check_mk/view.py"


    # build the web request for displaying the host with its status
    $postParams = @{
    _username=$checkmkuser
	_secret=$checkmksecret
	_transid='-1'
    host_regex=$hostname # the hostname to search for
    view_name='allhosts'
    output_format='json'
	}

    # execute the web request
    $result = Invoke-WebRequest -Uri $URI -Method Post -Body $postParams

    # check if a respones was getting back
    if (!$result)
    {

        Write-Host "error while connecting the check_mk api."
        Write-Host
        $return.status = 1
        $return.msg = "error while connecting the check_mk api."
        
        # return the error
        Return $return

    }

    # check if the web request was successful
    if ($result.StatusCode -ne 200)
    {

        Write-Host "error in the response from the check_mk api."
        Write-Host
        Write-Host $result
        $return.status = 1
        $return.msg = "error in the response from the check_mk api."
        
        # return the error
        Return $return

    }

    # convert the string content of the response into a powershell json object
    $json = $result.content | convertfrom-json

    # check if the json object has only 2 lines.
    # 1st line are the property names
    # 2nd line are the values
    if ($json.count -ne 2)
        {

        Write-Host "found not exactly one host object in check_mk."
        Write-Host
        Write-Host $result
        $return.status = 1
        $return.msg = "found not exactly one host object in check_mk."

        # return the error
        Return $return

    }

    # create a empty hashtable
    $hoststatus = @{}

    # loop through the first line of the json object to build a proper hashtable
    for ($i=0; $i -lt ($json[0].length - 1); $i++ )
    {

        # add the property name and the value to the hashtable object
        $hoststatus.($json[0][$i]) = $json[1][$i]

    }

    # check if anything is wrong with the host
    if ( ($hoststatus.num_services_warn -ne 0) -or ($hoststatus.num_services_crit -ne 0) -or ($hoststatus.num_services_unknown -ne 0) )
    {

        # something is wrong with this host says check_mk
        Write-Host "something is wrong with this host says check_mk."
        Write-Host
        Write-Host "Warnings: " $hoststatus.num_services_warn
        Write-Host "Critical: " $hoststatus.num_services_crit
        Write-Host "Unknown: " $hoststatus.num_services_unknown
        Write-Host
        Write-Host $result
        $return.status = 1
        $return.msg = "something is wrong with this host says check_mk."

    }
    else {
        
        # no errors found on host
        Write-Host "found no errors with host $hostname."
        Write-Host
        Write-Host $result
        $return.status = 0
        $return.msg = "found no errors with host $hostname."
    }
    
    # return the error or the success message
    Return $return
}


# Declaration of a function for set and remove a downtime
function checkmkdowntime
{
    
    # parameter definition
    param(
        [Parameter(Position=0, Mandatory=$true)]    
        [string]$checkmkserver,
        [Parameter(Position=1, Mandatory=$true)]
        [string]$checkmkinstance,
        [Parameter(Position=2, Mandatory=$true)]
        [string]$checkmkuser,
        [Parameter(Position=3, Mandatory=$true)]
        [string]$checkmksecret,
        [Parameter(Position=4, mandatory=$true)]
        [string]$hostname,
        [Parameter(Position=5,ParameterSetName='setdowntime', mandatory=$true)]
        [switch]$setdowntime,
        [Parameter(Position=5,ParameterSetName='releasedowntime', mandatory=$true)]
        [switch]$releasedowntime,
        [Parameter(Position=6,ParameterSetName='setdowntime', mandatory=$true)]
        [int]$downtimeminutes,
        [Parameter(Position=7,ParameterSetName='setdowntime', mandatory=$true)]
        [string]$downtimecomment,
        [Parameter(mandatory=$false)]
        [bool]$https = $false
    )

    #######################
    # variable definition #
    #######################

    # declare return variable for the function as hashtable
    $return = @{}

    # decide wether to use https or not
    if ($https)
    {
        $URI = "https://"
    }
    else
    {
        $URI = "http://"
    }
    
    $URI += "$checkmkserver/$checkmkinstance/check_mk/webapi.py"

    # build the web request for checking if the host exists in check_mk
    $Body = @{
    action="get_host"
    _username=$checkmkuser
    _secret=$checkmksecret
    request_format="json"
    output_format="json"
    hostname=$hostname
    }

    # execute the web request
    $result = Invoke-WebRequest -Uri $URI -Method Post -Body $Body

    # check if a respones was getting back
    if (!$result)
    {

        Write-Host "error while connecting the check_mk api."
        Write-Host
        $return.status = 1
        $return.msg = "error while connecting the check_mk api."
        
        # return the error
        Return $return

    }

    # check if the web request was successful
    if ($result.StatusCode -ne 200)
    {

        Write-Host "could not connect to the check_mk api."
        Write-Host
        Write-Host $result
        $return.status = 1
        $return.msg = "could not connect to the check_mk api."
        
        Return $return

    }

    # check if a json object was returned
    if ( $result.content[0] -ne "{" )
    {

        Write-Host "no json object was returned. error occoured."
        Write-Host
        Write-Host $result.content
        $return.status = 1
        $return.msg ="no json object was returned. error occoured."

        Return $return

    }


    # check if the check_mk api returned an error
    if ( ($result.content | ConvertFrom-Json).result_code -ne 0 )
    {

        Write-Host "error returned from within check_mk."
        Write-Host
        Write-Host ($result.Content | ConvertFrom-Json).result
        
        $return.status = 1
        $return.msg = "error returned from within check_mk."
        
        Return $return

    }


    # check if a downtime has to be set and build the appropriate request body
    if ($setdowntime)
    {
        Write-Host "set downtime on $hostname for $downtimeminutes minutes."

        $postParams = @{
	        _username=$checkmkuser
	        _secret=$checkmksecret
	        _transid='-1'
	        _do_confirm='yes'
	        _do_actions='yes'
	        host=$hostname
	        _down_from_now='yes'
	        _down_minutes=$downtimeminutes
	        _down_comment=$downtimecomment
	        view_name='hoststatus'
        }
    }

    # check if downtime has to be deleted and build the appropriate request body
    if ($releasedowntime)
    {
        Write-Host "delte downtime on $hostname."

        $postParams = @{
	        _username=$checkmkuser
	        _secret=$checkmksecret
	        _transid='-1'
	        _do_confirm='yes'
	        _do_actions='yes'
	        host=$hostname
	        _remove_downtimes='Remove'
            _down_remove='Remove'
	        view_name='hoststatus'
        }
    }

    # clear return variable
    $result = $null

    # decide wether to use https or not
    if ($https)
    {
        $URI = "https://"
    }
    else
    {
        $URI = "http://"
    }

    $URI += "$checkmkserver/$checkmkinstance/check_mk/view.py"

    # execute the web request
    $result = Invoke-WebRequest -Uri $URI -Method Post -Body $postParams

    # check if a response was getting back
    if (!$result)
    {

        Write-Host "error while connecting the check_mk api."
        Write-Host
        $return.status = 1
        $return.msg = "error while connecting the check_mk api."
        
        # return the error
        Return $return

    }

    # check if the web request was successful
    If ( $result.StatusCode -ne 200 )
    {
	
        Write-Host "could not connect to the check_mk api."
        Write-Host
        Write-Host $result
        $return.status = 1
        $return.msg = "could not connect to the check_mk api."
        
        # return the error
        Return $return

    }

    # check if the check_mk api returned an error
    if ( !($result.Content | Select-String -Pattern "<div class=""error"">") )
    {

        Write-Host "execution successful"
        $return.status = 0
        $return.msg = "execution successful"

        # return the message
        Return $return

    }
    else
    {

        Write-Host "execution was not successful"
        $retunr.status = 1
        $return.msg = "execution was not successful"

        # return the error
        Return $return

    }

}

###########End of function declarations#######################

############
# Example #
############

# connection variables
$mycheckmkserver = "<your-checkmk-server>"
$mycheckmkinstance = "<the-name-of-your-checkmk-instance>"
$mycheckmkuser = "<your-checkmk-automation-user>"
$mycheckmsecret = "<insert-your-automation-secret-here>"

# host to be checked, set downtime, patched, removed downtime and checked again
$myhost = "<insert-the-hostname-which-has-to-be-patched-here>"

####################################
# Step 1: check if host is healthy #
####################################

# execute function to check the health of the host
$result = checkmkhealth -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost

# if host was not healthy, write error and exit
if ($result.status -ne 0)
{
    Write-Host "check_mk says the host $myhost is not healthy. Stopping script."
    
    Write-Output $result

    exit
}

#######################################
# Step 2: set a downtime for the host #
#######################################

# execute function to set downtime
$result = checkmkdowntime -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost -setdowntime -downtimeminutes 30 -downtimecomment "Don't worry. I'm just beeing patched."

# if execution was not successful, write error and exit
if ($result.status -ne 0)
{
    Write-Host "setting the downtime for $myhost was not successful. Stopping script."
    
    Write-Output $result

    exit
}

#######################
# Step 3: Maintenance #
#######################

# Do your thing with the host.
# Patch it, reboot it or whatever.
# It's up to you.
# But for demonstration we can run a simple sleep statement at this step

Start-Sleep -Seconds 60

###########################
# Step 4: delete downtime #
###########################

# execute function to delete downtime
$result = checkmkdowntime -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost -releasedowntime

# if execution was not successful, write error and exit
if ($result.status -ne 0)
{
    Write-Host "deleting the downtime for $myhost was not successful. Stopping script."
    
    Write-Output $result

    exit
}

############################
# Step 5: check host again #
############################

# execute function to check the health of the host
$result = checkmkhealth -checkmkserver $mycheckmkserver -checkmkinstance $mycheckmkinstance -checkmkuser $mycheckmkuser -checkmksecret $mycheckmsecret -hostname $myhost

# if host was not healthy, write error and exit
if ($result.status -ne 0)
{
    Write-Host "check_mk says the host $myhost is not healthy after the change. Stopping script. Good luck with the repair."
    
    Write-Output $ergebnis

    exit
}


#######
# end #
#######

Acknowledgments

Last but not least, i want to thank Karl Widmer. (Twitter Link -> https://twitter.com/widmerkarl)

After a short tweet on twitter he asked me to write a blog post about my findings. This pushed me forward to start blogging again. The fun part was that while i collected all the websites i got my informations from, i found a solution to get the status of a host via the view.py. A problem which i could not solve till then. So Karl, as my drill inspector, pushed me forward to write things down and find a solution for the problem. Thank you very much Karl!

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

eighty eight − = eighty three