Schneider Combox installation triggers virus warnings

Dusty · August 2018

I then repeated this same test (using the same Cat 5E cables) using the MSI laptop (192.168.0.197) that doesn't have a problem communicating with the combox. It's running Windows 10 Home Edition 64-bit and has a Broadwell I7 CPU as opposed to the Win 10 Pro 64-bit Alienware laptop's I7 KabyLake CPU. The MSI has a Killer E2200 Ethernet Adapter (the same adapter in the Alienware Area-51s that don't work with the Combox), and the Alienware Laptop has a Killer E2500 adapter.

This recording was only 27 seconds long, because that's only how long it took to get to the same screen that took the Alienware laptop 5 minutes to get to.

20 9.882009 192.168.0.197 192.168.0.198 TCP 66 53394 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1

21 9.883931 192.168.0.198 192.168.0.197 TCP 64 80 → 53394 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1460 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

22 9.883969 192.168.0.197 192.168.0.198 TCP 54 53394 → 80 [ACK] Seq=1 Ack=1 Win=64240 Len=0

23 9.884053 192.168.0.197 192.168.0.198 HTTP 570 POST /login.cgi HTTP/1.1 (application/x-www-form-urlencoded)

24 9.887762 192.168.0.198 192.168.0.197 TCP 64 80 → 53394 [ACK] Seq=1 Ack=517 Win=3580 Len=0 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

25 9.902411 192.168.0.198 192.168.0.197 HTTP 376 HTTP/1.1 301 Moved Permanently

26 9.902598 192.168.0.197 192.168.0.198 TCP 54 53394 → 80 [FIN, ACK] Seq=517 Ack=323 Win=63918 Len=0

27 9.904432 192.168.0.198 192.168.0.197 TCP 64 80 → 53394 [ACK] Seq=323 Ack=518 Win=3580 Len=0 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

28 9.905774 192.168.0.198 192.168.0.197 TCP 64 80 → 53394 [FIN, ACK] Seq=323 Ack=518 Win=3580 Len=0 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

29 9.905810 192.168.0.197 192.168.0.198 TCP 54 53394 → 80 [ACK] Seq=518 Ack=324 Win=63918 Len=0

The RTT to ACK for this segment was 0.000036000 seconds.

***not a single retransmission*** But still plenty of [ETHERNET FRAME CHECK SEQUENCE INCORRECT] messages coming from the combox. It just didn't seem to phase the MSI laptop.

Dusty · August 2018

Estragon said:

Do you by any chance have an old-school dumb hub?

What I'm thinking is to get a non-working client, the combox, and a working client all on the same collision domain (switched router port). The working client box would do the wireshark sniffing, and should see packets between the non-working client and the combox with no router/switch in between. Running the sniffer on a box separate from send/receive boxes mostly eliminates issues (for diagnostic purposes) around off-loading, in which things like checksums may be added where the sniffer doesn't operate on a send/receive box. Running the sniffer on a third party box helps eliminate that as it should just see traffic on the segment, not anything internal to the boxes being tested.

If so, put all 3 devices on the hub connected to one (likely switched) port on the router. The router should leave traffic between boxes on the hub on that port, and likely not mess with it.

No, I never had a dumb hub. Only unmanaged switches.

Dusty · August 2018

Estragon said:

It isn't actually as much of a stretch as you might think. From what I've observed, much of the internet only works because of a principle similar to the "swiss cheese" approach to transportation safety.

In transportation safety, every "slice" has holes (features/tweaks/bugs), but stack enough slices, the holes all get filled, and the system works. If you remove or change a slice, unexpected holes can appear.

Applying this to your processor theory; these processors have been out for some time, and neither internal or real-world testing turned up a big hole. There have to be enough of these processors used with enough comboxes, that if the hole opened with that change alone, it would be evident by now. That impies there's a tertiary (or more) slice opening the hole.

In other words, there could (for example) be a change (eg. ordering through multiple cpu cores) in threading that occurs only rarely in some code under certain circumstances, but in which some TCP implementations, in certain circumstances, rely on an obsolete/newer RFC, which the combox can't deal with.

Wow. When did the Combox first come out? 2013?. That could explain why the combox is having communication issues with the SkyLake i7-6820HK (launched Q3 2015) and Kaby Lake Processors i7-7820HK (launched Q1 2017), and the Haswell Extreme i7-5960X (Q3 2014).

But the latest firmware for the combox was Jan 2018, so I would hope that any "swiss cheese holes" by newer CPUs would have been detected by someone at Schneider by now.

The MSI laptop that works with the combox has a Broadwell CPU (I7-5950HQ) that came out 6 months after the Haswell-E. And every computer that I have with a CPU older than the Haswell-E works with the combox.

Dave Angelini · August 2018

We beta tested in 2011 and shortly after. The latest firmware will have all the issues that are addressed. I did not see anything about swiss cheese

If you can sum this up in a couple short paragraphs which include what support has said and a case number I will send it up to Engineering in Barnaby. No guarantee as everyone is out on vacation this time of year. You can e-mail me if you want

For my clients I have a guru who I pay to take care of the few problems over the years. I would much rather go to the Dentist than go deeper into this than a quick-start manual.

Much nicer outside for me!

Estragon · August 2018

If it was simply an issue with recent cpus and the combox, I suspect Dave is right, and the issue would have been addressed in a firmware update.

That it hasn't suggests there's a third (or more) factor causing the issue for you, but occurring together rarely enough that it hasn't been recognized more widely as an issue. The third factor could be as simple as a local source of electrical noise, or some really complex series of factors.

If Schneider engineering could duplicate the issue, that would certainly help.

Dusty · August 2018

Dave Angelini said:

We beta tested in 2011 and shortly after. The latest firmware will have all the issues that are addressed. I did not see anything about swiss cheese

If you can sum this up in a couple short paragraphs which include what support has said and a case number I will send it up to Engineering in Barnaby. No guarantee as everyone is out on vacation this time of year. You can e-mail me if you want

For my clients I have a guru who I pay to take care of the few problems over the years. I would much rather go to the Dentist than go deeper into this than a quick-start manual. Much nicer outside for me!

Thank you Dave. I've sent an email to Nathanael at Schneider Solar Support, and attached copies of the Wireshark files of the MSI laptop that communicates with the combox, as well as a Wireshark file of one of the Alienware laptops that won't.

The case numbers are: 51030870 and 51138987.

I would be happy to email you a copy with the WireShark file attachments. Is there a way to PM it to you? Never mind...I sent it to the email address at the bottom of your tagline. I hope that's okay.

Thank you again.

Dusty · August 2018

Estragon said:

If it was simply an issue with recent cpus and the combox, I suspect Dave is right, and the issue would have been addressed in a firmware update.

That it hasn't suggests there's a third (or more) factor causing the issue for you, but occurring together rarely enough that it hasn't been recognized more widely as an issue. The third factor could be as simple as a local source of electrical noise, or some really complex series of factors.

If Schneider engineering could duplicate the issue, that would certainly help.

I agree with you, but I've been able to duplicate it on so many different Intel platforms (Alienware and non-Alienware now), with different comboxes and now 3 different routers, that I don't believe it's a hardware issue on my end. Why it would be "plug and play" with my older computers and not with any of my newest ones is a mystery.

Estragon · August 2018

Hopefully whatever it is will turn up if/when Schneider tries to duplicate, and it isn't something in your specific environment (which might not turn up trying to duplicate elsewhere).

Dusty · August 2018

Estragon said:

Hopefully whatever it is will turn up if/when Schneider tries to duplicate, and it isn't something in your specific environment (which might not turn up trying to duplicate elsewhere).

I really hope so, because I've changed all the equipment in my environment several times (3 routers, 2 comboxes, 10 computers) with the same result.

Dusty · August 2018

Just successfully tested an Asus gaming laptop running Windows 10 Pro 64-bit with a mobile Haswell i7-4870HQ CPU (launched by Intel in Q3 2014) on my system.

Dusty · August 2018

Estragon said:

Some context to my thinking...

The issue occurs with both wireless and wired, suggesting it's likely not a hardware issue per se.

With the same adapter in different machines, one of which works, it might be worth checking to see if the driver versions are identical. I doubt this is the issue, but worth checking on the remote chance the alienware uses some sort of tweaked version.

Having tried two different routers, it's unlikely the router is mangling flow control in packets, but if they have any sort of QOS setting, it should be disabled for now.

The devices are communicating, albeit with errors, so not likely a routing issue (bad subnetting, etc.). The ARP response suggests proper MAC address to IP address.

You've run Win10pro okay on a non-alienware machine successfully, which suggests either it's not an OS level issue, or the alienware boxes are doing (or not doing) something different at that level.

The "rules" around networking generally, and TCP in particular (the layer at which the issue appears to be), are done with RFCs (Request For Comments). The thing is, because of the history of internet development, these "rules" really aren't. They become "rules" only if and to the extent they get adopted in the real world. There's RFC1149
https://tools.ietf.org/html/rfc1149
"A Standard for the Transmission of IP Datagrams On Avian Carrier". This one wasn't widely adopted, being a suggested method for using carrier pigeons

The RFCs use terms like "Must", "Should", and "May" in describing various parts of a protocol. Companies can and do implement the protocol with minor differences, which often leads to good things, but sometimes to incompatibilities.

I'll look into the combox sequence thing a bit more, but on it's face it's not surprising. The combox will have much less memory and processor resources than a full-on computer (eg. the small win=4096 in the SYN, ACK response). If ram buffers fill, for example, it can't take more until the queue gets processed. The TCP protocol has means to deal with this, but it's apparently not working properly in this case. When it goes wrong, the problem can compound exponentially. A well known denial of service attack exploits this.

So that you see the sequence message on both computers isn't surprising. The issue is likely in how the alienware is responding. Using a human example, say I say "Hi there" to you just as a loud truck goes by. You could respond with "Can you repeat that", and the conversation goes from there. If instead, your reply was "Parlez vous francais?", and I don't speak french, we have a problem. I might repeat the Hi louder, and you might shout louder and more often, to the point one of us gives up and walks away. That's sort of what I think may be happening here.

I believe the CWR in the AW(AlienWare) SYN packet suggests it doesn't see network congestion, and ECN offers a way of handling dropped packets that the combox may not support. Not necessarily the problem, but it could be if AW insists on using it.

Success!!! Here's the "Ah-Hah!" moment you've all been waiting for.

@Estragon, you hit it when you said, "I believe the CWR in the AW(AlienWare) SYN packet suggests it doesn't see network congestion, and ECN offers a way of handling dropped packets that the combox may not support. Not necessarily the problem, but it could be if AW insists on using it."

Now, I've never monkeyed around with those settings before (and hope to never again!), but I searched the Internet on how to enable/disable ECN in Windows 10, since on the MSI laptop that communicates properly with the Combox, Wireshark doesn't show ECN or CWR commands in the TCP communication lines to the combox.

I've only tried this on one machine so far--an Alienware Desktop--that wouldn't communicate with the combox:

Under Windows Powershell (Admin), I typed:

"netsh int tcp reset"

I'm not sure why any of those TCP parameters would not have been default, but after doing this, I was able to successfully log onto the combox with no sluggishness, and no disconnects. Eureka!

I'll have to power up the other machines that also wouldn't work with the combox and reset netsh int tcp, but I have high hopes that's the problem on all of them.

Thanks, EVERYONE, for offering suggestions. Really, it was the only reason I persevered through this very frustrating problem. I love this forum!!!

Estragon · August 2018

Excellent! A head scratcher for sure.

Dusty · August 2018

This is a good article that explains what I experienced on 5 of my computers when having problems communicating with the combox:

https://community.rackspace.com/products/f/public-cloud-forum/4848/disabling-ecn-explicit-congestion-notification-on-windows-servers-having-network-issues

This says that ECN has been set to "enabled" since Windows Server 2012. I'm not sure why only a handful of my computers had this switch enabled in the OS, but it would have been difficult for Schneider to detect this during their firmware testing--especially if there were only a couple people testing it, and ECN was disabled on their system's OS while testing.

On the second computer (the KabyLake Alienware laptop), I didn't reset everything to default like I did on the first one. This time, I only disabled ECN by using the command:

netsh int tcp set global ecncapability=disabled

This worked as well.

I'll send this on to Schneider and close my case numbers.

Dusty · August 2018

All 5 computers that wouldn't talk to the combox are now working. I ended up using the reset command first and then followed up with disabling ECN. It seemed to me that the combox was just a little bit "snappier" in response by manually disabling ECN.

Dusty · August 2018

stmar said:

I have nothing to add but this is one of the most interesting threads I have followed. And to think I was looking at Alienware machines. Wonder if there are any Alienware users on this forum that are running the Combox? Keep us informed and sure hope you find a solution.

@stmar Just so you know, I found the problem and it wasn't related to the Alienware brand. Just a TCP ECN setting in windows that needed to be disabled. So it's no smear on Alienware.

stmar · August 2018

Thanks, good that you found the issue. Your persistence paid off and will probably help others.

Dusty · August 2018

stmar said:

Thanks, good that you found the issue. Your persistence paid off and will probably help others.

Thanks. I learned a lot. The Schneider tech I sent the problem description and Wireshark files to was very interested in the problem, so maybe there will be a firmware upgrade in the future that supports TCP ECN. But what makes me happy the most is that it wasn't an Alienware hardware/firmware issue.

Dusty · August 2018

Dusty said:
This is a good article that explains what I experienced on 5 of my computers when having problems communicating with the combox:

https://community.rackspace.com/products/f/public-cloud-forum/4848/disabling-ecn-explicit-congestion-notification-on-windows-servers-having-network-issues

This says that ECN has been set to "enabled" since Windows Server 2012. I'm not sure why only a handful of my computers had this switch enabled in the OS, but it would have been difficult for Schneider to detect this during their firmware testing--especially if there were only a couple people testing it, and ECN was disabled on their system's OS while testing.

On the second computer (the KabyLake Alienware laptop), I didn't reset everything to default like I did on the first one. This time, I only disabled ECN by using the command:
netsh int tcp set global ecncapability=disabled
This worked as well.

I'll send this on to Schneider and close my case numbers.

I think I know why the older machines didn't have a problem with ECN. All of those machines originally had the Windows 7 OS, so when they were updated to Windows 10, it's very possible that the ECN function was never enabled. All of my newer Windows 10 Pro machines came with the OS already installed and therefore had ECN enabled by default. The MSI laptop also came with Windows 10, but the Home Edition, not Windows 10 Pro, so the default settings might be different on the home edition which could explain why that OS worked with the combox right away. But the ASUS laptop also had Windows 10 Pro factory-installed, and it worked with the combox fine. Go figure.

Although that seems plausible, I'm not sure why using the NETSH INT TCP RESET command which sets everything back to default worked, since that would have also enabled ECN. So there could have been more going on in the Windows 10 Pro TCP settings than I realize. Especially since manually disabling TCP ECN in the OS also worked.

But I'm very happy that all the systems are now working with the combox like they should. And I learned a whole lot in the process. I also got a very humbling lesson about how little I know about TCP/IP.

Schneider Combox installation triggers virus warnings

Comments

Categories