Discussion:
disable ARP gleaning
Stuart Kendrick
2009-08-12 21:59:54 UTC
Permalink
Hi folks,

Is there a way to disable ARP gleaning under Solaris?

I have broken end-stations which emit confused ARP responses, mixing up their
MAC addresses with other end-stations' IP addresses.

e.g.
Sender MAC address: 00:11:22:33:44:55 (correct)
Sender IP address: 10.1.2.9 (INCORRECT)
Target MAC address: 00:aa:bb:cc:dd:ee
Target IP address: 10.1.2.20

In this case, the sender's actual IP address is, say, 10.1.2.3, *NOT* 10.1.2.9.
10.1.2.9 in fact belongs to a legitimate end-station. But not this one.
Solaris gleans the IP address / MAC address mapping from watching this traffic,
updates its ARP cache with this incorrect entry ... and then starts addressing
frames to 10.1.2.9 using MAC address 00:11:22:33:44:55 ... and this of course
doesn't work too well.

I have a number of these confused boxes, and I am gradually hunting them down.
In the meantime, I'm wanting to harden my Solaris boxes against gleaning these
addresses. Actually, even once I've cleaned up my confused end-stations, I'd
like to harden Solaris against this kind of experience ... this smells like a
classic man-in-the-middle vulnerability to me. If Solaris wants a MAC address,
let it ARP for it ... I don't want it trying to save a little work by gleaning.

?

--sk

Stuart Kendrick
Fred Hutchinson Cancer Research Center
Seattle, WA USA
Peter Memishian
2009-08-12 22:46:09 UTC
Permalink
Post by Stuart Kendrick
Is there a way to disable ARP gleaning under Solaris?
Solaris will not track the ARP mappings for random on-link hosts -- the IP
address will need to already be in its cache for some reason. My guess
would be that the Solaris box was already communicating with 10.1.2.9 and
then got the bogus ARP packet from 10.1.2.3 and dutifully updated its
cache with the bogus information. There's really nothing Solaris can do
about this case -- it is core to many failover technologies that Solaris
updates its ARP cache in this situation.
Post by Stuart Kendrick
I have broken end-stations which emit confused ARP responses, mixing up their
MAC addresses with other end-stations' IP addresses.
e.g.
Sender MAC address: 00:11:22:33:44:55 (correct)
Sender IP address: 10.1.2.9 (INCORRECT)
Target MAC address: 00:aa:bb:cc:dd:ee
Target IP address: 10.1.2.20
In this case, the sender's actual IP address is, say, 10.1.2.3, *NOT* 10.1.2.9.
10.1.2.9 in fact belongs to a legitimate end-station. But not this one.
Solaris gleans the IP address / MAC address mapping from watching this traffic,
updates its ARP cache with this incorrect entry ... and then starts addressing
frames to 10.1.2.9 using MAC address 00:11:22:33:44:55 ... and this of course
doesn't work too well.
I have a number of these confused boxes, and I am gradually hunting them down.
In the meantime, I'm wanting to harden my Solaris boxes against gleaning these
addresses. Actually, even once I've cleaned up my confused end-stations, I'd
like to harden Solaris against this kind of experience ... this smells like a
classic man-in-the-middle vulnerability to me. If Solaris wants a MAC address,
let it ARP for it ... I don't want it trying to save a little work by gleaning.
--
meem
Stuart Kendrick
2009-08-13 12:58:10 UTC
Permalink
Ahh. I've been staring at a lot of ARP frames these last few days. But I
hadn't thought through this.

I see three classes of ARP gleaning, all of them rely on the entry already
existing in Solaris' cache, per your point.

(1) Remote station emits a gratuitous broadcast ARP: Solaris updates its cache
(2) Remote station emits a gratuitous unicast ARP: Solaris updates its cache
(3) Remote station emits an ARP Request for the Solaris box's address: Solaris
responds *and* uses the information in the ARP Request to update its cache

#3 is where I'm encountering confused machines: the source IP address in these
ARP Requests is accurate, but the source MAC address is not.

And I can see how failover schemes might use any or all of these techniques to
propagate a change in their IP address <==> MAC address mappings. Dang. I
don't see a way to harden against this, at the host level, not without getting
into static ARP mappings, which looks like a swamp to me.

Thanx for the explanation,

--sk
Post by Peter Memishian
Post by Stuart Kendrick
Is there a way to disable ARP gleaning under Solaris?
Solaris will not track the ARP mappings for random on-link hosts -- the IP
address will need to already be in its cache for some reason. My guess
would be that the Solaris box was already communicating with 10.1.2.9 and
then got the bogus ARP packet from 10.1.2.3 and dutifully updated its
cache with the bogus information. There's really nothing Solaris can do
about this case -- it is core to many failover technologies that Solaris
updates its ARP cache in this situation.
Post by Stuart Kendrick
I have broken end-stations which emit confused ARP responses, mixing up their
MAC addresses with other end-stations' IP addresses.
e.g.
Sender MAC address: 00:11:22:33:44:55 (correct)
Sender IP address: 10.1.2.9 (INCORRECT)
Target MAC address: 00:aa:bb:cc:dd:ee
Target IP address: 10.1.2.20
In this case, the sender's actual IP address is, say, 10.1.2.3, *NOT* 10.1.2.9.
10.1.2.9 in fact belongs to a legitimate end-station. But not this one.
Solaris gleans the IP address / MAC address mapping from watching this traffic,
updates its ARP cache with this incorrect entry ... and then starts addressing
frames to 10.1.2.9 using MAC address 00:11:22:33:44:55 ... and this of course
doesn't work too well.
I have a number of these confused boxes, and I am gradually hunting them down.
In the meantime, I'm wanting to harden my Solaris boxes against gleaning these
addresses. Actually, even once I've cleaned up my confused end-stations, I'd
like to harden Solaris against this kind of experience ... this smells like a
classic man-in-the-middle vulnerability to me. If Solaris wants a MAC address,
let it ARP for it ... I don't want it trying to save a little work by gleaning.
James Carlson
2009-08-13 13:36:24 UTC
Permalink
Post by Stuart Kendrick
Ahh. I've been staring at a lot of ARP frames these last few days. But
I hadn't thought through this.
I see three classes of ARP gleaning, all of them rely on the entry
already existing in Solaris' cache, per your point.
(1) Remote station emits a gratuitous broadcast ARP: Solaris updates its cache
(2) Remote station emits a gratuitous unicast ARP: Solaris updates its cache
Solaris
responds *and* uses the information in the ARP Request to update its cache
Actually, as of my ARP RFC 826 fixes in OpenSolaris and Solaris 10,
there's a fourth case to consider:

(4) Remote station emits an ARP message of any type with either the
Solaris box or broadcast as the destination, and the Solaris box
has a mapping for the ar$spa IP address in its cache. The Solaris
system will update its mapping based on ar$sha.

What the RFC says is that when you receive an ARP message, no matter how
you got it, you first look at ar$spa and ar$sha, before looking at the
type (request or response) of the message. If ar$spa matches something
in your cache, then update.
Post by Stuart Kendrick
#3 is where I'm encountering confused machines: the source IP address
in these ARP Requests is accurate, but the source MAC address is not.
Yep; that's toxic.

Besides defenestrating the bad boxes -- which would be my first choice
-- it'd probably be helpful if there were Ethernet level filtering
available for Solaris as a temporary work-around. The negative aspect
of that "fix," besides the fact that it's not yet available, is that the
Solaris system is not likely to be the only system on the network that's
confused by bad messages, and teaching all of the possible listeners to
ignore the idiot may be prohibitively difficult to achieve.
--
James Carlson 42.703N 71.076W <carlsonj-dlRbGz2WjHhmlEb+***@public.gmane.org>
Stuart Kendrick
2009-08-13 18:21:51 UTC
Permalink
So, in fact, all these cases collapse into RFC 826-compliant behavior: as you
say, no matter how I receive the ARP nor which type of ARP frame it is, I should
glean information from it

Yes, I can see how Ethernet filtering would allow the administrator to defend
the host against such behavior, in an ad hoc way. Still a hack.

I understand that some Ethernet switches allow one to lock down IP address /
port mappings. But this just moves the administrative problem to someone else,
it doesn't solve it.

OK, I'm going hunting for confused machines.

Thanx for the discussion,

--sk
Post by James Carlson
Post by Stuart Kendrick
Ahh. I've been staring at a lot of ARP frames these last few days. But
I hadn't thought through this.
I see three classes of ARP gleaning, all of them rely on the entry
already existing in Solaris' cache, per your point.
(1) Remote station emits a gratuitous broadcast ARP: Solaris updates its cache
(2) Remote station emits a gratuitous unicast ARP: Solaris updates its cache
Solaris
responds *and* uses the information in the ARP Request to update its cache
Actually, as of my ARP RFC 826 fixes in OpenSolaris and Solaris 10,
(4) Remote station emits an ARP message of any type with either the
Solaris box or broadcast as the destination, and the Solaris box
has a mapping for the ar$spa IP address in its cache. The Solaris
system will update its mapping based on ar$sha.
What the RFC says is that when you receive an ARP message, no matter how
you got it, you first look at ar$spa and ar$sha, before looking at the
type (request or response) of the message. If ar$spa matches something
in your cache, then update.
Post by Stuart Kendrick
#3 is where I'm encountering confused machines: the source IP address
in these ARP Requests is accurate, but the source MAC address is not.
Yep; that's toxic.
Besides defenestrating the bad boxes -- which would be my first choice
-- it'd probably be helpful if there were Ethernet level filtering
available for Solaris as a temporary work-around. The negative aspect
of that "fix," besides the fact that it's not yet available, is that the
Solaris system is not likely to be the only system on the network that's
confused by bad messages, and teaching all of the possible listeners to
ignore the idiot may be prohibitively difficult to achieve.
Peter Memishian
2009-08-13 17:10:21 UTC
Permalink
Post by Stuart Kendrick
I see three classes of ARP gleaning, all of them rely on the entry already
existing in Solaris' cache, per your point.
(1) Remote station emits a gratuitous broadcast ARP: Solaris updates its cache
(2) Remote station emits a gratuitous unicast ARP: Solaris updates its cache
(3) Remote station emits an ARP Request for the Solaris box's address: Solaris
responds *and* uses the information in the ARP Request to update its cache
#3 is where I'm encountering confused machines: the source IP address in these
ARP Requests is accurate, but the source MAC address is not.
And I can see how failover schemes might use any or all of these techniques to
propagate a change in their IP address <==> MAC address mappings. Dang. I
don't see a way to harden against this, at the host level, not without getting
into static ARP mappings, which looks like a swamp to me.
Precisely -- and it would be a swamp. Seem you'll have to either fix
those toxic boxes or isolate them on another LAN/VLAN.

--
meem
Jason King
2009-08-13 17:29:04 UTC
Permalink
Do you by any chance have boxes running Windows with broadcom NICs and
are using their teaming software?
Ahh.  I've been staring at a lot of ARP frames these last few days.  But I
hadn't thought through this.
I see three classes of ARP gleaning, all of them rely on the entry already
existing in Solaris' cache, per your point.
(1) Remote station emits a gratuitous broadcast ARP:  Solaris updates its
cache
(2) Remote station emits a gratuitous unicast ARP:    Solaris updates its
cache
 Solaris
   responds *and* uses the information in the ARP Request to update its
cache
#3 is where I'm encountering confused machines:  the source IP address in
these ARP Requests is accurate, but the source MAC address is not.
And I can see how failover schemes might use any or all of these techniques
to propagate a change in their IP address <==> MAC address mappings.  Dang.
 I don't see a way to harden against this, at the host level, not without
getting into static ARP mappings, which looks like a swamp to me.
Thanx for the explanation,
--sk
 > Is there a way to disable ARP gleaning under Solaris?
Solaris will not track the ARP mappings for random on-link hosts -- the IP
address will need to already be in its cache for some reason.  My guess
would be that the Solaris box was already communicating with 10.1.2.9 and
then got the bogus ARP packet from 10.1.2.3 and dutifully updated its
cache with the bogus information.  There's really nothing Solaris can do
about this case -- it is core to many failover technologies that Solaris
updates its ARP cache in this situation.
 > I have broken end-stations which emit confused ARP responses, mixing up
their  > MAC addresses with other end-stations' IP addresses.
 >  > e.g.
 > Sender MAC address:  00:11:22:33:44:55  (correct)
 > Sender IP address:   10.1.2.9 (INCORRECT)
 > Target MAC address:  00:aa:bb:cc:dd:ee
 > Target IP address:   10.1.2.20
 >  > In this case, the sender's actual IP address is, say, 10.1.2.3,
*NOT* 10.1.2.9.  >   10.1.2.9 in fact belongs to a legitimate end-station.
 But not this one.  > Solaris gleans the IP address / MAC address mapping
from watching this traffic,  > updates its ARP cache with this incorrect
entry ... and then starts addressing  > frames to 10.1.2.9 using MAC address
00:11:22:33:44:55 ... and this of course  > doesn't work too well.
 >  > I have a number of these confused boxes, and I am gradually hunting
them down.  > In the meantime, I'm wanting to harden my Solaris boxes
against gleaning these  > addresses.  Actually, even once I've cleaned up my
confused end-stations, I'd  > like to harden Solaris against this kind of
experience ... this smells like a  > classic man-in-the-middle vulnerability
to me.  If Solaris wants a MAC address,  > let it ARP for it ... I don't
want it trying to save a little work by gleaning.
_______________________________________________
networking-discuss mailing list
Stuart Kendrick
2009-08-13 18:22:09 UTC
Permalink
Yes

Many

Why?

--sk
Post by Jason King
Do you by any chance have boxes running Windows with broadcom NICs and
are using their teaming software?
Post by Stuart Kendrick
Ahh. I've been staring at a lot of ARP frames these last few days. But I
hadn't thought through this.
I see three classes of ARP gleaning, all of them rely on the entry already
existing in Solaris' cache, per your point.
(1) Remote station emits a gratuitous broadcast ARP: Solaris updates its cache
(2) Remote station emits a gratuitous unicast ARP: Solaris updates its cache
Solaris
responds *and* uses the information in the ARP Request to update its cache
#3 is where I'm encountering confused machines: the source IP address in
these ARP Requests is accurate, but the source MAC address is not.
And I can see how failover schemes might use any or all of these techniques
to propagate a change in their IP address <==> MAC address mappings. Dang.
I don't see a way to harden against this, at the host level, not without
getting into static ARP mappings, which looks like a swamp to me.
Thanx for the explanation,
--sk
Post by Peter Memishian
Post by Stuart Kendrick
Is there a way to disable ARP gleaning under Solaris?
Solaris will not track the ARP mappings for random on-link hosts -- the IP
address will need to already be in its cache for some reason. My guess
would be that the Solaris box was already communicating with 10.1.2.9 and
then got the bogus ARP packet from 10.1.2.3 and dutifully updated its
cache with the bogus information. There's really nothing Solaris can do
about this case -- it is core to many failover technologies that Solaris
updates its ARP cache in this situation.
Post by Stuart Kendrick
I have broken end-stations which emit confused ARP responses, mixing up
their > MAC addresses with other end-stations' IP addresses.
Post by Stuart Kendrick
Post by Stuart Kendrick
e.g.
Sender MAC address: 00:11:22:33:44:55 (correct)
Sender IP address: 10.1.2.9 (INCORRECT)
Target MAC address: 00:aa:bb:cc:dd:ee
Target IP address: 10.1.2.20
Post by Stuart Kendrick
In this case, the sender's actual IP address is, say, 10.1.2.3,
*NOT* 10.1.2.9. > 10.1.2.9 in fact belongs to a legitimate end-station.
But not this one. > Solaris gleans the IP address / MAC address mapping
from watching this traffic, > updates its ARP cache with this incorrect
entry ... and then starts addressing > frames to 10.1.2.9 using MAC address
00:11:22:33:44:55 ... and this of course > doesn't work too well.
Post by Stuart Kendrick
Post by Stuart Kendrick
I have a number of these confused boxes, and I am gradually hunting
them down. > In the meantime, I'm wanting to harden my Solaris boxes
against gleaning these > addresses. Actually, even once I've cleaned up my
confused end-stations, I'd > like to harden Solaris against this kind of
experience ... this smells like a > classic man-in-the-middle vulnerability
to me. If Solaris wants a MAC address, > let it ARP for it ... I don't
want it trying to save a little work by gleaning.
_______________________________________________
networking-discuss mailing list
Jason King
2009-08-13 18:27:41 UTC
Permalink
I would strongly recommend upgrading all the broadcom drivers and
teaming software as soon as you possibly can, or at minimum disable
the teaming software.

There is a nasty (understatement) bug where certain driver versions +
their teaming software causes random ARP cache poisioning on a subnet.
I ran into this at work, and others have hit it as well. I think the
profanities are still lingering in the air around here once I figured
out what's going on :) Updating to the latest drivers and and teaming
software should fix it, of course you need to do that for all of the
boxes on a given subnet before the problem goes away.
Yes
Many
Why?
--sk
Post by Jason King
Do you by any chance have boxes running Windows with broadcom NICs and
are using their teaming software?
Ahh.  I've been staring at a lot of ARP frames these last few days.  But I
hadn't thought through this.
I see three classes of ARP gleaning, all of them rely on the entry already
existing in Solaris' cache, per your point.
(1) Remote station emits a gratuitous broadcast ARP:  Solaris updates its
cache
(2) Remote station emits a gratuitous unicast ARP:    Solaris updates its
cache
 Solaris
  responds *and* uses the information in the ARP Request to update its
cache
#3 is where I'm encountering confused machines:  the source IP address in
these ARP Requests is accurate, but the source MAC address is not.
And I can see how failover schemes might use any or all of these techniques
to propagate a change in their IP address <==> MAC address mappings.  Dang.
 I don't see a way to harden against this, at the host level, not without
getting into static ARP mappings, which looks like a swamp to me.
Thanx for the explanation,
--sk
 > Is there a way to disable ARP gleaning under Solaris?
Solaris will not track the ARP mappings for random on-link hosts -- the IP
address will need to already be in its cache for some reason.  My guess
would be that the Solaris box was already communicating with 10.1.2.9 and
then got the bogus ARP packet from 10.1.2.3 and dutifully updated its
cache with the bogus information.  There's really nothing Solaris can do
about this case -- it is core to many failover technologies that Solaris
updates its ARP cache in this situation.
 > I have broken end-stations which emit confused ARP responses, mixing up
their  > MAC addresses with other end-stations' IP addresses.
 >  > e.g.
 > Sender MAC address:  00:11:22:33:44:55  (correct)
 > Sender IP address:   10.1.2.9 (INCORRECT)
 > Target MAC address:  00:aa:bb:cc:dd:ee
 > Target IP address:   10.1.2.20
 >  > In this case, the sender's actual IP address is, say, 10.1.2.3,
*NOT* 10.1.2.9.  >   10.1.2.9 in fact belongs to a legitimate end-station.
 But not this one.  > Solaris gleans the IP address / MAC address mapping
from watching this traffic,  > updates its ARP cache with this incorrect
entry ... and then starts addressing  > frames to 10.1.2.9 using MAC address
00:11:22:33:44:55 ... and this of course  > doesn't work too well.
 >  > I have a number of these confused boxes, and I am gradually hunting
them down.  > In the meantime, I'm wanting to harden my Solaris boxes
against gleaning these  > addresses.  Actually, even once I've cleaned up my
confused end-stations, I'd  > like to harden Solaris against this kind of
experience ... this smells like a  > classic man-in-the-middle vulnerability
to me.  If Solaris wants a MAC address,  > let it ARP for it ... I don't
want it trying to save a little work by gleaning.
_______________________________________________
networking-discuss mailing list
Stuart Kendrick
2009-08-14 00:41:18 UTC
Permalink
OK, I have a little over a hundred servers equipped with Broadcom NICs plus
TEAMing (well, BASP in Broadcom-speak). I picked five, pointed my sniffer at
them, filtered for ARP queries which originated from their MAC addresses and for
which the sender IP address is *not* set to their actual IP address ...

Four of the five are incorrectly setting sender IP address to something other
than their own. In other words, four of the five are inflicting ARP cache
poisoning on their neighbors. [Intriguingly, all the 'poisoned' ARP Requests
are unicast ... thank goodness!]

I'm puzzled as to why my data center isn't flat on its face already. I suppose
the subset of end-stations being poisoned is small and not generally talking to
each other.

Have any tips for identifying pathological stations? Sniffing on every single
Broadcom box is going to be tedious ... I have 'arpwatch' running, but it only
records ARP responses; it doesn't "glean".

Any tips for engaging someone at Broadcom? [To identify which driver revisions
contain this bug?]

Thanx again for offering your insights.

--sk
Post by Jason King
I would strongly recommend upgrading all the broadcom drivers and
teaming software as soon as you possibly can, or at minimum disable
the teaming software.
There is a nasty (understatement) bug where certain driver versions +
their teaming software causes random ARP cache poisioning on a subnet.
I ran into this at work, and others have hit it as well. I think the
profanities are still lingering in the air around here once I figured
out what's going on :) Updating to the latest drivers and and teaming
software should fix it, of course you need to do that for all of the
boxes on a given subnet before the problem goes away.
Jason King
2009-08-14 01:43:41 UTC
Permalink
Offhand no, we just updated to the latest (in our instance they were
Dell boxes) and that seems to have resolved the issue. Unfortunately,
I can't find the specific driver versions searching now. I might
still have the links (for Dell at least) back at work I could try to
dig up tomorrow..
Post by Stuart Kendrick
OK, I have a little over a hundred servers equipped with Broadcom NICs plus
TEAMing (well, BASP in Broadcom-speak).  I picked five, pointed my sniffer
at them, filtered for ARP queries which originated from their MAC addresses
and for which the sender IP address is *not* set to their actual IP address
...
Four of the five are incorrectly setting sender IP address to something
other than their own.  In other words, four of the five are inflicting ARP
cache poisoning on their neighbors.  [Intriguingly, all the 'poisoned' ARP
Requests are unicast ... thank goodness!]
I'm puzzled as to why my data center isn't flat on its face already.  I
suppose the subset of end-stations being poisoned is small and not generally
talking to each other.
Have any tips for identifying pathological stations?  Sniffing on every
single Broadcom box is going to be tedious ...  I have 'arpwatch' running,
but it only records ARP responses; it doesn't "glean".
Any tips for engaging someone at Broadcom?  [To identify which driver
revisions contain this bug?]
Thanx again for offering your insights.
--sk
Post by Jason King
I would strongly recommend upgrading all the broadcom drivers and
teaming software as soon as you possibly can, or at minimum disable
the teaming software.
There is a nasty (understatement) bug where certain driver versions +
their teaming software causes random ARP cache poisioning on a subnet.
 I ran into this at work, and others have hit it as well.  I think the
profanities are still lingering in the air around here once I figured
out what's going on :)  Updating to the latest drivers and and teaming
software should fix it, of course you need to do that for all of the
boxes on a given subnet before the problem goes away.
Loading...