ChipChop Forum

Gizmo
05 Jul 2024

Ah, that's a super tricky one!

Unfortunately this can't be done on the ChipChop engine level, well at least not on the Community engine.

I'll try to explain why not as best as I can but I'll also give you a potential solution so bear with me :-)

The ChipChop engine is "event driven", meaning that your device is the boss pushing the data flow. The ChipChop engine is aware of every socket connection at all times and runs a keep alive cycle every 30 seconds (ping/pong) but if your device closes the connection then the engine forgets about it completely and frees the resources.

On a server where thousands of devices are constantly connecting & disconnecting and they can belong to many many many users it's impossible to figure out what is going to be a valid "long lived" connection and what is someone testing and re-compiling the code every minute so the socket is flicking in and out of existence.

Also, timers are a no-go on ChipChop, it's way way to much to process. I know it's maybe confusing as you can set the time in the Actions but that only works when a device sends a heartbeat and an action is triggered but if the device doesn't have a heartbeat the action won't do anything.
Same is with notifications, they are "push" based so again are initiated by a device being alive and doing something.

Now, it's not all doom and gloom and I get why you would need something like that so I will give you a potential solution 🙃

What you would need is another ESP device to act as a sort of "connection monitor" for the battery operated one. It can be any old ESP32/8266 chip or some other device you've made for something else...it just needs to be connected to ChipChop on a more regular heartbeat.

This is something I do for similar purposes

ESP1 - this will be the battery operated device
add another component called "health" or "alive" and set it to type "State View" (that takes values "OK" or "ERROR")
in the Arduino code make sure when the device wakes up to do ChipChop.updateStatus("health","OK")

ESP2 - this will be the monitoring device
add another "fake" component called "ESP1_alive" and set it to type "text value"

Create a new Action

Action Trigger
- Trigger Device > ESP1
- Trigger Component - ESP1 > health
- Value Is: equal == "OK"

Action Target
- target device > ESP2
- target component > ESP1_health
- set value to > "OK"

So, what will happen here is every time the battery ESP1 sends heartbeat and includes the "health" status "OK" the ESP2 will get the status of its "ESP1_health" component changed to "OK"
All you need to do in the ESP2 Arduino code is to keep track of when the last time change happened and start a timer, when that timer goes over 6 hours you can send a triggerEvent from ESP2 for the ESP1_health "ERROR" and have another Action that will send you a notification.
You just need to make sure that you cancel that timer so you don't flood yourself with hundreds of notifications :-)

If you have a spare ESP and think that this would work for you and you are unsure about the code on the Arduino side just let me know and I'll help you out...don't have the time right now but it's super easy, I may post it later anyway if someone else needs it.

Let me know what you think

TomaszB
05 Jul 2024

Hi Gizmo

Thanks for the reply and explanation of technical details. The workaround certainly makes sense, but it requires additional hardware just for monitoring purpose.
Maybe it will be useful to have such connection monitoring implemented on the ChipChop backend even in a very limited and simplified form (for example once a day and limited to few such monitors per account).
The additional hardware means also need for its connectivity monitoring. Certainly, ESP1 can monitor ESP2 and vice versa but it makes whole setup way more complex where simplicity is a huge advantage of ChipChop platform.

Kind regards
Tomasz

Gizmo
05 Jul 2024

Hey, I'm always up for improving things :-)

Can you sort of give me a description how you think it should work? You know, where it should be setup (is it a Dev Console thing or phone app) and just in general what the logic should be

I'm having trouble visualising a mechanism without involving timers and loops through thousands and thousands of records on every processor cycle so maybe having a little brainstorm here will shift my brain in a different direction.

I am going to have anyway another look at the phone app background process as it could be a perfect method for something like that

TomaszB
07 Jul 2024

Hi Gizmo

I don't know the architecture of your solution thus the idea below is very, very naive.... Anyway, let mi think how to minimize cpu usage:

1) The action "connection lost" can be done per a device rather than per a component (the list of devices is way shorter...)

2) There is defined a minimum, quite long, time interval of "connection lost" action (let's presume 1 hour but can be chosen by you). I'll explain it later.

3) User defines action "connection lost" and choose integer multiplication of the minimum interval (so it can be 1hour ago, 2 hours ago, 3 hours ago....). User cannot select any time (2:30:15) but only integer multiplications of the minimum interval. If 2 hours ago are selected, it means that a device must be silent at least 2 hours to trigger a "connection lost" action. It also means that the first alert is triggered sometime between second and third hour since connection was lost rather than exactly after two hours.

4) Maybe you can further limit capabilities (save cpu) to only one "connection lost" action per device but I don't think that it will be needed.

5) In your backend you have all "connection lost" actions (for all devices) stored in the one list. Every hour it goes through the list and compare device's last communication timestamp (it is already in db) with current time and the defined multiplication (point 3). If condition is met - the action is triggered. It means that if a device remains silent longer, the action will be triggered every iteration until the device connects again .
Even though such iteration is cpu consuming process it is done quite rare and it goes only through these devices where "connection lost" action is defined.

Yes - it is not an exact timeout functionality but it provides a good value for the majority of use cases where something goes totally wrong. It handles all situations when connectivity to your home is broken due to tripped circuit breaker, hung router, hung esp code.

Tomasz

Gizmo
08 Jul 2024

Ok, we are brainstorming here so let's see :-)

Everything you've described is ok and sounds simple enough although in no particular order:

1. Having a separate database, let's say 10,000 devices to do a check would take around 2 milliseconds plus executing the notifications it could take 10-20 milliseconds to send maybe a bulk instruction to the notifications server (depends on the Kb of the data + latency to the notification server)
That's kinda borderline too slow as ChipChop handles approx 10,000 request/heartbeats per 2-3 milliseconds so there would be a backlog.

Still, it's doable but I would have to introduce a separate CPU process or maybe even a "sub-server" within each API server to handle that.

2. Everything on ChipChop is device driven, if a device sends a heartbeat then it's assumed "real" and Actions get executed. Actions are a part of each device database record (if you delete a device all Actions for it are gone and there is no processing waste). Introducing a "global" Actions database would require re-sync and re-indexing every time a device is added/removed in the Dev Console.
The way it happens in real life is: you add a new device, add actions to it, change your mind and delete it, add it again, do some tests and delete it again and repeat that who knows how many times...and thousand users can do that at the same time....all the time ! :-)
At the moment ChipChop handles that ok but I know at what cost is to re-sync/re-index a database thousands of times a day.

But I'm not against it!

3. How do we know that your "dormant" device is "real" or still operational? There is no difference if you create a new device in the Dev Console, maybe test it once so some heartbeat gets recorded and then you abandon it for weeks or months until you have the time to play with it again?
Actually, it's not a question here if a device is real it's more of a question is your account still alive? Your devices heartbeat is your account's "pulse"....no pulse...no life...account dead, more resources for everyone

Basically I have to make ChipChop to be fair to everyone and any dead accounts take resources from those that need them.

Don't worry, I am still thinking as I'm, writing this :-)

4. You are kinda looking for a "dead-man's switch", if it's pressed nothing happens and if it's not pressed something goes booom 💣 ChipChop works in reverse, things only happen if something is pressed "live-man's switch" :-)

5. If I can know 100% that your account is alive, I mean you can have 10 devices all on batteries sending a heartbeat once a month and that's fine as ChipChop will only act on a live heartbeat....but...if nothing is sent...hmmm
I guess setting up an action like that would have to be considered a "contract" a "promise" by you that your device is still alive and maybe only execute the action once and if the device doesn't come on-line after certain period of time dispose of the action?

6. As I've mentioned, this is all easily solvable with another device that is constantly live but then we get into a logical loop as you've said, if all your devices are in your house and you get a power cut everything will go dead...damn man, get a mini nuclear reactor or something

Ok, fuck it, give me this week to tinker with it. I may have to re-activate API26 and give you access for testing...leave it with me and I'll ping you a message when I have a prototype

TomaszB
08 Jul 2024

Hi Gizmo

Let's further save cpu cycles:

1) The action "conection lost" is executed only once ( but not continously until connection is restored). In this way you cut off not used devices.
2) The list of devices that need to be iterated will be than significantly shorter. At the end there will be there only devices where "no connection" action has been defined and not executed. If action for such device was already triggered, it is removed from this list. When device appears online, it is added back to the list for periodical iteration (presuming that it still has action on "no connection" defined).

As you mentioned not used devices can kill platform thus they must be eliminated by algorithm.

Best regards
Tomasz

Gizmo
08 Jul 2024

haaa...you've described pretty much exactly what I had in mind !!! :-))) (sorry, couldn't reply earlier)

needs to be a tiny bit more complex but ultimately it's an event list like a stack where the device adds itself on a heartbeat and when the timer/action get's executed it takes the device off the stack.

What I need to do is just to reduce the number of times the device adds itself, I know you are sensible and need this for legit reasons with a device that is dormant but I can see someone doing it on a 10 sec heartbeat, that's just too much shifting the timestamps in the stack!

I think it will end up being something that works on an "in-between" accuracy so the action will get executed at some point within 30 minutes after the time you've specified

So, if you say that you want a warning when the device is off-line after 3 hours you will get a notification at some point between 3 and 3.5 hours later depending when the device's heartbeat has hit the checking routine cycle which I think I can do on 30 mins basis

I've just re-fired the testing rig so will need a bit of time to write all that and test but will let you know when you can join in and have a play

TomaszB
08 Jul 2024

Thanks Gizmo. It looks very promising.

I will go to a short vacation this week thus will not be able to respond but I'm looking forward to test it.

Tomasz

Gizmo
08 Jul 2024

Hey enjoy your vacation! ☀️🏖️🍹

TomaszB
18 Jul 2024

Hi Gizmo

The "no connection" action works as charm. Simple but very, very useful. In my case it just sends notification to my mobile, what is exactly what I needed.
Thanks a lot!

Regards
Tomasz

Gizmo
19 Jul 2024

Hey, you are back from your vacation Whilst you were drinking cocktails and getting your ass tanned some of us were hard at work you know!

Why to f*** did I get persuaded to make this? It was an absolute ball ache to implement and took me almost 4 days of messing around with the Actions panels and a gazillion tests (wife is asking for divorce from the constant notifications pings :-)

I still have couple of tweaks to do but in essence I've managed to squeeze the execution between the processor ticks so it doesn't block the main thread. The ChipChop engine runs a check routine on every half an hour (human time) and if there are any devices in the list for that particular time slot it will process the first device, remove it from the stack and then it will release control to the main thread so something else can be processed and then on the next tick it will do the next one in the list.

Depending on the workload between the processor cycles it may take a millisecond or so but even if there was 10,000 devices to send the notifications it will happen in 10 seconds and once finished the engine can chill for 30 mins.

I will write today a proper post on how to use it but in essence the notifications will happen in the time slot that falls within 30 minutes after the monitoring time you've specified (the cycle goes every 30 minutes). My logic was that if I have a power cut and the power comes back quickly (within 1 hour) I don't particularly care (the freezer won't defrost :-) or you have short power cuts in a succession it would be annoying getting a ton of notifications.

Ultimately if you need a lot more precise monitoring you will need a secondary device living physically somewhere that is not your house (friends, family, office) and implement the mechanism I've explained.

Let me know if you notice any problems or irregularities, it's really difficult to debug this.

p.s. As this was such a pain to build whilst I was twiddling my thumbs waiting for notifications to happen I've implemented a QR Code scanner, QR Code component and done some small design changes to the app, will also post about it today :-)