ChipChop Forum

REPLIES: 0

REPLY TO THIS POST

Gizmo
19 Jul 2024

New Feature - Device off-line monitor

The Off-line Device Monitor is a new feature providing a simple and straight forward mechanism to get notified if one of your devices has gone off-line for too long.
This is quite useful if you have a power cut or your internet connection goes down and you are not at home (we don't want that freezer with expensive food to start defrosting :-)

It can also be used for devices that spend most of their time hibernating to preserve power and during the day only occasionally send a heartbeat, which can be challenging to observe.

This utility is not intended to be a super accurate and responsive monitoring tool (there are other ways that I will explain later) , it's rather intended to provide you with a general warning that something is wrong with your device and a rough idea when it has happened.

How to use

To activate the Off-line Monitor you simply have to create an Action and specify couple of things.

1. Select the Trigger Device (the device that you will be monitoring)

2. As a Trigger Component select at the bottom of the list the option ENTIRE DEVICE

3. Select the time after which if the device is still off-line the action will be triggered (from 1 and up to 23 hours)

4. Add a new "target" and choose what you would like to happen. Most likely you would want to receive a notification on your phone but you can also send a command to some other device...for example if you have some sort of logging device

That's it, make sure to set the action to be Active and to save it. To test, start the device and give it a minute or so until you can see it clearly on-line in the ChipChop app, note the time and then unplug the device. You should receive a notification at some point within 30 minutes after the monitoring time specified.

How it works

- On every device heartbeat (meaning it's still alive) if the device has this type of action it will register itself in the next notification time slot as specified in the action. The time slots are in half an hour intervals and ChipChop engine will always pick the next available slot after the specified time.

- All API ChipChop servers (from now on) run a check routine every half an hour and any devices that have subscribed to the off-line monitoring will be checked if their connection is still alive and if not the routine will be executed.

- The mechanism sort of works as a "dead man's switch", if the device is sending heartbeats its monitoring slot is constantly shifted and hour ahead but if it goes off-line the slot will eventually fall in the monitoring routine and the action will get triggered.

- Once the action is executed the device is removed from the monitoring list ensuring that you will only get one notification and not get flooded with messages every half an hour. Only once the device is back on-line the monitoring process can restart.

Example

- You have specified 1 hour monitoring

- Your device sends a heartbeat at 7:15 AM, if it was to go off-line at that point the check would need to happen at 8:15

- As the off-line check routine runs ever half an hour "globally" the next slot that would give at least 1 hour of being off-line is the one starting at 8:30 (if it was the 8 Am slot then it would be only 45 minutes)

- You would get notified at 8:30 meaning that your device was off-line for 1 hour and 15 minutes. You can easily figure out the last time the device was live by checking in the ChipChop app the last status update for one of its components.

Alternative

If this automatic feature doesn't provide you with an accurate enough monitoring it's pretty simple to create your own monitor using a separate device to act as a monitor and it could work for any number of devices that you have.

How to do it

Let's take two devices, ESP1 & ESP2. The ESP2 will act as the monitor device.

ESP1 - add an extra component called "online" or "alive" or something like that and set it to type "State View" (that takes values "OK" or "ERROR")
in the Arduino code make sure to

void setup(){

    //.... whatever code you have 

    ChipChop.updateStatus("online","OK")

}

void loop(){

    //.... whatever code you have 

    ChipChop.updateStatus("online","OK")

}

This will keep sending the "online" status "OK" on every ESP1 heartbeat

ESP2 - this is the "monitor" device
add a "fake" component called "ESP1_online" and set it to type "text value"

Create a new Action

Action Trigger
- Trigger Device > ESP1
- Trigger Component - ESP1 > online
- Value Is: equal == "OK"

Action Target
- target device > ESP2
- target component > ESP1_online
- set value to > "OK"

Create another Action

Action Trigger
- Trigger Device > ESP2
- Trigger Component - ESP2 > ESP1_online
- Value Is: equal == "OFF"

Action Target
- target device > Notification
- notification text: "ESP1 is offline"

Do something like this in the ESP2 code:

unsigned long esp1_online = 0;
bool can_notify = 0;

void ChipChop_onCommandReceived(String target_component,String command_value, String command_source, int command_age){
    Serial.println(target_component);
    Serial.println(command_value);

    // this is our "dead-mans switch"
    // we reset the timer every time we get a message from the action that was triggered on ESP1 heartbeat
    if(target_component == "ESP1_online" && command_value == "OK"){
        esp1_online = millis(); // reset the timer
        can_notify = 1; // we can send an event if the timer fires up in the main loop
    }

}

void loop(){


    if(can_notify == 1){

        if(millis() - esp1_online > 60000){ //this will execute if the ESP1 is gone off-line for more than 1 minute
            
            ChipChop.triggerEvent("ESP1_online","OFF");                

            can_notify = 0; // <<< ensure that we don't flood ourselves with notifications


        }
    }


}

So, what will happen here is every time the ESP1 sends a heartbeat and includes the "online" status "OK" the ESP2 will get the status of its "ESP1_online" component changed to "OK" and reset the monitoring timer.
Once the ESP1 goes off-line, there will be nothing to reset the monitoring timer and the ESP2 will trigger the second action.

This is a quite simplified example and you can easily extend this to monitor as many devices as you want.

If you want to implement this method and are unsure just post a question and I will help you

Enjoy

Gizmo ✌️

Attached images