Replies: 11 comments 52 replies
-
|
@boris, I will try to add some notes and commenst which may help you investigate further. You have multiple BRs in your setup within the same mesh. Note that multiple BRs can act as SRP servers within the mesh. Devices within the mesh may choose to register their service with one of the available SRP servers. There are certain rules for this, which I won't go into detail about here. The error itself indicates that the SRP server that received the request could not accept the service because the same name is owned and being advertised (over mDNS) by another entity. Some common scenarios that cause this:
|
Beta Was this translation helpful? Give feedback.
-
|
@abtink , |
Beta Was this translation helpful? Give feedback.
-
|
If you are facing the same/similar issue with devices from the same vendor, it may make sense to investigate if there may be some issue/bug with that vendor's firmware. I know some vendors may choose to add some extra customization or optimizations which may unintentionally lead to incorrect behavior and issues.
It is configurable on the SRP client (per client and/or per individual registered service). The default values (if not specified) are as follows: openthread/src/core/config/srp_client.h Lines 152 to 169 in 111db8a |
Beta Was this translation helpful? Give feedback.
-
|
@abtink , given the issue #12088 , the fact that that happens more often than I thought initially #12088 (comment) , plus you mention that the observations made in this conversation could possibly also indicate issues with mDNS, I might very well wait what happens after issue 12088 got resolved. That said, the OTBR is currently my only observability tool into my Thread network. It helped me a number of times, e.g. choosing a good RF channel. I usually make my OTBR the leader of the herd (for some reason it calculates the leader weight as 65 which is one higher than the nest hubs. Don't know if that is the reason, but they never refuse to hand over the leader role when I |
Beta Was this translation helpful? Give feedback.
-
|
@abtink , is there any magic btw. to I had hoped to see some key material as described in README_MDNS.md |
Beta Was this translation helpful? Give feedback.
-
|
@abtink , With #12128 pulled in I still see: But also: Wrt comment 14916689 I cannot actually tell if the recordquerier can retrieve keys only for OTBR locally registered keys, or if it should be able to show them for registrations on other routers as well. But for a successful registration: Key retrieval: and next: pull in #12129 |
Beta Was this translation helpful? Give feedback.
-
|
@abtink , I have a hypothesis (more or less guessing, I didn't look into the code very deep): Behavior without #12128:
Behavior with #12128:
Now looking at There is a small window, where This would explain why before the patch we saw devices continuously trying to register their names while at that point no other router was holding these names any longer. Of course the ot mDNS module doesn't serve names in conflict. It would also explain why after the patch registering/updating for these poisoned names became possible, but the ot mDNS module doesn't start to serve them. What do you think? |
Beta Was this translation helpful? Give feedback.
-
|
@abtink , thanks again for your continued support -- I create a new thread here. Hosts names go into conflict right after registration I could capture 2.7G of logs and identify at least two cases where registration goes well, and straight after that (within a second) mDNS puts the host in conflict. The two cases are: and Client Then -- as observed before --, the successfully registered clients stick to this SRP server with continued successful update requests. I have stripped the log file to a time range that I thought might be relevant. If it is not sufficient I can send more. EDIT: |
Beta Was this translation helpful? Give feedback.
-
|
Hi @abtink , it has bin a while, mostly due to the fact that I could not always force my OTBR into the srp server role when I wanted to. In general the OTBR works way better as an SRP server with your modifications applied. Today though, I have a scenario that (from what I have learned from you) I didn't even thought it was possible: I have a device that is being shown as registered in the mdns module, but not in srp: Is that by any chance an expected state? It is also actively responding to mdsn queries: |
Beta Was this translation helpful? Give feedback.
-
|
Hi @abtink , I have a question: Today My assumption was that in this situation registered devices are being announced by the ot_mdns module, so they probably are in conflict on all currently active srp servers and cannot re-register somewhere else... |
Beta Was this translation helpful? Give feedback.
-
|
@abtink -- Thanks again for going through this together. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
Yesterday, 11 of my 125 thread devices became unavailable in my matter controller, so I gave them a power cycle.
At that point in time the following started to show in the OTBR logs:
Over the course of the day the light bulbs did not come back online in matter while these messages continued to appear.
Today I restarted the border router at wich point another border router took the role of the SRP server which immediately solved the issue.
I quickly did a search and found: #8056
While I do understand the explanation in general, I do not understand why this all of a sudden can be a mass phenomenon with about 10% of my light bulbs. Like all of them losing their SRP Keys in a short time frame (while many/most others do not). Is that somehow rooted in the design of SRP?
Additional information
OTBR_OPTIONS:
IEEE 802.15.4 hardware platform: EFR32MG21
Git hash for OTBR is ca4ba9ae63e963f62277a2d847c9e3a5dee5ab20
Network topology:
Beta Was this translation helpful? Give feedback.
All reactions