Caching Problem with Broadcast Pings
Normally, a broadcast ping results in respondse from all nodes connected to the same layer 2 (ethernet) network, also referred to as the link-local network.
We experienced in some situations that:
1) A broadcast ping from node A to node B arrives at node B, but node B does not responds
2) If node A (or other nodes? need to check) subsequently sends a unicast ping to node B, node B does respond
3) After that, node B normally responds to broadcast pings
In particular, we experienced this problem when we had a very high number of switches between node A and node B: in this case, the distance between the two nodes was 18.000 km (A in Amsterdam, B in San Diego), and there were switches in San Diego (2), Seattle (unknown), Chicago (1) and Amsterdam (1). It is unclear if the long RTT, the number of switches, the behaviour of a particular switch, or a particular host causes the problem.
ARP cache
The above is very strange. Further investigation revealed that node B indeed tries to respond to the ARP request, in all cases. However, at first, when it tries to do so, it gives to following message to itself: "host not reachable". This indicates it is an ARP caching problem. A further indication is that if we 'fix' this by sending an unicast ping, and leave the hosts alone for 5 minutes, it gets the same problem again. The default ARP caching is exactly 5 minutes indeed.
What we do not understand why ARP has that information in it's cache or how it gets there. Neither do we still understand why it does not just try to send out a packet to see if it is reachable now.
Categories
CategoryZeroconf
There are no comments on this page. [Add comment]