Discussion:
Internet shuts down. Bug in e1000 driver?
Orvar Korvar
2011-08-14 09:15:23 UTC
Permalink
Sometimes my internet connection just dies. I can not ping anything or do anything. This carries on for half an hour or so, then everything suddenly works again. What is going on? Is there a bug in the e1000 driver? Can I type some Solaris commands that show what is going on? The strange thing is that Windows PC / iPad works fine, I can surf the web without problems. So the problem is somewhere on the Solaris 11 Express PC. Here is some output when the Solaris 11 Express PC just hangs:



$ ping 192.168.1.1 //this is my router
ping: sendto No route to host



$ ipadm show-addr
ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
e1000g0/_a static duplicate 192.168.1.3/24
vboxnet0/_a static ok 192.168.56.1/24
lo0/v6 static ok ::1/128



$ ipadm show-ifprop
IFNAME PROPERTY PROTO PERM CURRENT PERSISTENT DEFAULT POSSIBLE
lo0 arp ipv4 rw on -- on on,off
lo0 forwarding ipv4 rw off -- off on,off
lo0 metric ipv4 rw 0 -- 0 --
lo0 mtu ipv4 rw 8232 -- 8232 68-8232
lo0 exchange_routes ipv4 rw on -- on on,off
lo0 usesrc ipv4 rw none -- none --
lo0 forwarding ipv6 rw off -- off on,off
lo0 metric ipv6 rw 0 -- 0 --
lo0 mtu ipv6 rw 8252 -- 8252 1280-8252
lo0 nud ipv6 rw on -- on on,off
lo0 exchange_routes ipv6 rw on -- on on,off
lo0 usesrc ipv6 rw none -- none --
e1000g0 arp ipv4 rw on -- on on,off
e1000g0 forwarding ipv4 rw off -- off on,off
e1000g0 metric ipv4 rw 0 -- 0 --
e1000g0 mtu ipv4 rw 1500 -- 1500 68-1500
e1000g0 exchange_routes ipv4 rw on -- on on,off
e1000g0 usesrc ipv4 rw none -- none --
e1000g0 forwarding ipv6 rw -- -- off on,off
e1000g0 metric ipv6 rw -- -- 0 --
e1000g0 mtu ipv6 rw -- -- -- --
e1000g0 nud ipv6 rw -- -- on on,off
e1000g0 exchange_routes ipv6 rw -- -- on on,off
e1000g0 usesrc ipv6 rw -- -- none --
vboxnet0 arp ipv4 rw on -- on on,off
vboxnet0 forwarding ipv4 rw off -- off on,off
vboxnet0 metric ipv4 rw 0 -- 0 --
vboxnet0 mtu ipv4 rw 1500 -- 1500 68-1500
vboxnet0 exchange_routes ipv4 rw on -- on on,off
vboxnet0 usesrc ipv4 rw none -- none --
vboxnet0 forwarding ipv6 rw -- -- off on,off
vboxnet0 metric ipv6 rw -- -- 0 --
vboxnet0 mtu ipv6 rw -- -- -- --
vboxnet0 nud ipv6 rw -- -- on on,off
vboxnet0 exchange_routes ipv6 rw -- -- on on,off
vboxnet0 usesrc ipv6 rw -- -- none --



$ ipadm show-if
IFNAME STATE CURRENT PERSISTENT
lo0 ok -m-v------46 ---
e1000g0 down bm--------4- ---
vboxnet0 ok bm--------4- ---



$ netstat -rn
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 192.168.1.1 UG 17 431912
127.0.0.1 127.0.0.1 UH 7 712 lo0
192.168.56.0 192.168.56.1 U 2 0 vboxnet0
Routing Table: IPv6
Destination/Mask Gateway Flags Ref Use If
--------------------------- --------------------------- ----- --- ------- -----
::1 ::1 UH 2 28 lo0



$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=4001000942<BROADCAST,RUNNING,PROMISC,MULTICAST,IPv4,DUPLICATE> mtu 1500 index 2
inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
vboxnet0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
inet 192.168.56.1 netmask ffffff00 broadcast 192.168.56.255
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
inet6 ::1/128



snooping the traffic is very slow. Nothing happens for 1 minute, then I see some messages. Then there is a pause for 1 minute, then I see some messages. etc.
$ pfexec snoop
Using device e1000g0 (promiscuous mode)
OLD-BROADCAST -> BROADCAST DHCP/BOOTP DHCPDISCOVER
192.168.1.1 -> (broadcast) ARP C Who is 192.168.1.4, 192.168.1.4 ?
OLD-BROADCAST -> BROADCAST DHCP/BOOTP DHCPREQUEST
192.168.1.1 -> (broadcast) ARP C Who is 192.168.1.4, 192.168.1.4 ?
192.168.1.4 -> BROADCAST DHCP/BOOTP DHCPINFORM
192.168.1.4 -> BROADCAST DHCP/BOOTP DHCPINFORM
192.168.1.4 -> BROADCAST DHCP/BOOTP DHCPINFORM
192.168.1.4 -> (broadcast) ARP C Who is 192.168.1.1, 192.168.1.1 ?
192.168.1.4 -> 192.168.1.255 UDP D=7009 S=7009 LEN=19
OLD-BROADCAST -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> OLD-BROADCAST ARP R 192.168.1.3, frasse is a4:67:6:59:25:82
192.168.1.1 -> 192.168.1.255 NBT Datagram Service Type=17 Source=READYSHARE[0]
192.168.1.1 -> 192.168.1.255 NBT Datagram Service Type=17 Source=READYSHARE[0]
^C







Ok, now it seems to work again:
$ ping 192.168.1.1
192.168.1.1 is alive


$ pfexec snoop
Using device e1000g0 (promiscuous mode)
192.168.1.5 -> (broadcast) ARP C Who is 192.168.1.1, 192.168.1.1 ?
192.168.1.1 -> 192.168.1.5 ARP R 192.168.1.1, 192.168.1.1 is 0:26:f2:94:8a:68
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1059 Syn Seq=1044951384 Len=0 Win=64240 Options=<mss 1460,nop,nop,sackOK>
192.168.1.1 -> 192.168.1.5 TCP D=1059 S=5555 Syn Ack=1044951385 Seq=4274814826 Len=0 Win=5840 Options=<mss 1460,nop,nop,sackOK>
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1059 Ack=4274814827 Seq=1044951385 Len=0 Win=64240
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1059 Push Ack=4274814827 Seq=1044951385 Len=279 Win=64240
192.168.1.1 -> 192.168.1.5 TCP D=1059 S=5555 Ack=1044951664 Seq=4274814827 Len=0 Win=6432
192.168.1.1 -> 192.168.1.5 TCP D=1059 S=5555 Push Ack=1044951664 Seq=4274814827 Len=145 Win=6432
192.168.1.1 -> 192.168.1.5 TCP D=1059 S=5555 Fin Ack=1044951664 Seq=4274814972 Len=0 Win=6432
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1059 Ack=4274814973 Seq=1044951664 Len=0 Win=64095
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1059 Rst Ack=4274814973 Seq=1044951664 Len=0 Win=0
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1060 Syn Seq=1537609271 Len=0 Win=64240 Options=<mss 1460,nop,nop,sackOK>
192.168.1.1 -> 192.168.1.5 TCP D=1060 S=5555 Syn Ack=1537609272 Seq=4271432688 Len=0 Win=5840 Options=<mss 1460,nop,nop,sackOK>
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1060 Ack=4271432689 Seq=1537609272 Len=0 Win=64240
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1060 Push Ack=4271432689 Seq=1537609272 Len=274 Win=64240
192.168.1.1 -> 192.168.1.5 TCP D=1060 S=5555 Ack=1537609546 Seq=4271432689 Len=0 Win=6432
192.168.1.1 -> 192.168.1.5 TCP D=1060 S=5555 Push Ack=1537609546 Seq=4271432689 Len=145 Win=6432
192.168.1.1 -> 192.168.1.5 TCP D=1060 S=5555 Fin Ack=1537609546 Seq=4271432834 Len=0 Win=6432
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1060 Ack=4271432835 Seq=1537609546 Len=0 Win=64095
192.168.1.5 -> 192.168.1.1 TCP D=5555 S=1060 Rst Ack=4271432835 Seq=1537609546 Len=0 Win=0
OLD-BROADCAST -> BROADCAST DHCP/BOOTP DHCPDISCOVER
192.168.1.1 -> (broadcast) ARP C Who is 192.168.1.4, 192.168.1.4 ?
OLD-BROADCAST -> BROADCAST DHCP/BOOTP DHCPREQUEST
192.168.1.1 -> (broadcast) ARP C Who is 192.168.1.4, 192.168.1.4 ?
192.168.1.4 -> BROADCAST DHCP/BOOTP DHCPINFORM
192.168.1.4 -> BROADCAST DHCP/BOOTP DHCPINFORM
192.168.1.4 -> BROADCAST DHCP/BOOTP DHCPINFORM
192.168.1.4 -> (broadcast) ARP C Who is 192.168.1.1, 192.168.1.1 ?
192.168.1.4 -> 192.168.1.255 UDP D=7009 S=7009 LEN=19
frasse -> (broadcast) ARP C Who is 192.168.1.1, 192.168.1.1 ?
192.168.1.1 -> frasse ARP R 192.168.1.1, 192.168.1.1 is 0:26:f2:94:8a:68
frasse -> 192.168.2.2 ICMP Echo request (ID: 5011 Sequence number: 0)
OLD-BROADCAST -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
OLD-BROADCAST -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
OLD-BROADCAST -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> 192.168.2.2 ICMP Echo request (ID: 5011 Sequence number: 1)
frasse -> 192.168.1.1 DNS C 2.2.168.192.in-addr.arpa. Internet PTR ?
192.168.1.1 -> frasse DNS R Error: 3(Name Error)
frasse -> 192.168.1.1 DNS C 2.2.168.192.in-addr.arpa. Internet PTR ?
192.168.1.1 -> frasse DNS R Error: 3(Name Error)
frasse -> 192.168.1.255 NBT NS Unknown Request for FRASSE[0], Success
frasse -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> 192.168.1.255 NBT NS Unknown Request for FRASSE[0], Success
frasse -> 192.168.1.1 ICMP Echo request (ID: 5014 Sequence number: 0)
192.168.1.1 -> frasse ICMP Echo reply (ID: 5014 Sequence number: 0)
frasse -> 192.168.1.255 NBT NS Unknown Request for FRASSE[20], Success
frasse -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> 192.168.1.255 NBT NS Unknown Request for FRASSE[20], Success
frasse -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> (broadcast) ARP C Who is 192.168.1.3, frasse ?
frasse -> 224.101.101.101 UDP D=7009 S=7009 LEN=325
frasse -> 192.168.1.1 DNS C 101.101.101.224.in-addr.arpa. Internet PTR ?
192.168.1.1 -> frasse DNS R Error: 3(Name Error)
frasse -> 192.168.1.1 DNS C 101.101.101.224.in-addr.arpa. Internet PTR ?
192.168.1.1 -> frasse DNS R Error: 3(Name Error)
frasse -> 192.168.1.1 DNS C yui.yahooapis.com. Internet AAAA ?
192.168.1.1 -> frasse DNS R yui.yahooapis.com. Internet CNAME geoycs-l.gy1.b.yahoodns.net.
frasse -> 192.168.1.1 DNS C yui.yahooapis.com. Internet Addr ?
192.168.1.1 -> frasse DNS R yui.yahooapis.com. Internet CNAME geoycs-l.gy1.b.yahoodns.net.
192.168.1.1 -> * ARP C Who is 192.168.1.3, frasse ?
frasse -> 192.168.1.1 ARP R 192.168.1.3, frasse is 0:1b:21:1e:2e:d0
frasse -> 192.168.1.1 ICMP Echo request (ID: 5276 Sequence number: 0)
192.168.1.1 -> frasse ICMP Echo reply (ID: 5276 Sequence number: 0)
frasse -> 224.101.101.101 UDP D=7009 S=7009 LEN=325
192.168.1.5 -> 192.168.1.255 NBT Datagram Service Type=17 Source=VB-WINXP[20]
frasse -> 224.101.101.101 UDP D=7009 S=7009 LEN=325
frasse -> 192.168.1.1 ICMP Echo request (ID: 5281 Sequence number: 0)
192.168.1.1 -> frasse ICMP Echo reply (ID: 5281 Sequence number: 0)




The next time internet dies, what can I do? Are there any commands I can use to restore connection? What is the problem? Is there a bug?
--
This message posted from opensolaris.org
James Carlson
2011-08-14 20:16:00 UTC
Permalink
Post by Orvar Korvar
e1000g0/_a static duplicate 192.168.1.3/24
That pretty much tells the entire story: you have a duplicate address on
your network. Some other device is configured with 192.168.1.3. Rather
than just destroy your network by using someone else's address, Solaris
shuts the interface down and then tries to bring it back up periodically.
Post by Orvar Korvar
e1000g0: flags=4001000942<BROADCAST,RUNNING,PROMISC,MULTICAST,IPv4,DUPLICATE> mtu 1500 index 2
inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
That shows the same problem. The interface is down (no "UP" flag
present) and is marked as a "DUPLICATE".
Post by Orvar Korvar
The next time internet dies, what can I do? Are there any commands I can use to restore connection? What is the problem? Is there a bug?
Watch for ARP messages with your address (192.168.1.3).

You can also look at your system logs (/var/adm/messages). The MAC
address of the conflicting machine should be in there.

Do you happen to have any systems on your network using Broadcom "MAC
Teaming?" If so, then those could easily be causing a problem like
this. Those devices are known to have bugs that generate bogus ARP
messages.
--
James Carlson 42.703N 71.076W <carlsonj-dlRbGz2WjHhmlEb+***@public.gmane.org>
Orvar Korvar
2011-08-15 09:07:57 UTC
Permalink
Ok, so there is a duplicate adress on my network.

I am using SunRay server on my home PC which I have configured myself. So maybe I did not configure it correctly?

The weird thing is that my network works fine normally. Once per month my network will stop working, with the symptoms I described. If I have a duplicate adress, my network should never work? Now it works fine the rest of the month.

So, I hope this configuration problem is repairable? Or should I reinstall everything?
--
This message posted from opensolaris.org
James Carlson
2011-08-15 12:26:42 UTC
Permalink
Ok, so there is a duplicate adress. I have SunRay server v5.2.1
installed, and I had to use static IP adress for SunRay server to
install. So maybe I configured my static IP wrong?
Possibly, yes. Have you looked at the addresses ... ?
The thing is, most of the time there are no problems at all. Once every
month, my network just dies for half an hour. The rest of the month
everything works fine. That is weird? If there is a problem, the problem
should be present all the time, and not just sporadically?
Yes, that's a bit weird. The normal expectation for a duplicate address
is that it's detected fairly quickly and at least N-1 of the duplicates
are shut down automatically and kept that way until the problem is resolved.

But I can't do any diagnostics on your network to determine what might
be interfering with that. It's possible that there's a bug that makes
the detection intermittent; I seem to recall problems in S10 that may
have caused behavior like that. It's also possible that there's some
condition on the network itself that plays into it.
What can I do? It seems to be overkill to reinstall everything? My
configuration problem should be reparable? Should I wait until the same
problems occur, and then do what?
Goodness, no. Just change the address on this system or the other one.

If you're using DHCP for some systems and static addresses for others,
then make sure that your DHCP server either doesn't have entries for the
static addresses at all, or that it has entries indicating that the
addresses must not be leased.
--
James Carlson 42.703N 71.076W <carlsonj-dlRbGz2WjHhmlEb+***@public.gmane.org>
sowmini varadhan
2011-08-15 12:45:42 UTC
Permalink
The thing is, most of the time there are no problems at all. Once every
month, my network just dies for half an hour. The rest of the month
everything works fine. That is weird? If there is a problem, the problem
should be present all the time, and not just sporadically?
Are you sure that the other machine that usurps the address is not something
like a laptop that connects to the network sporadically and gets the address,
either by static configuration or by other means?

--Sowmini

Loading...