Tuesday, October 16, 2012

Exploring the Cat3560 System MTU Command

A friend on Facebook asked the the following question in response to my first post on QinQ tagging:
Hey Jay, just curious if you can explain a bit further, but if there are two 802.1q tags, taking up 4 bytes each, should the system MTU be set to 1508? Also, say you have two customers each using VLAN 12 and they both want to be trunked across the service provider network, you would have to use different "access" VLAN's for the QinQ on the the customer facing interfaces, right? Say VLAN 99 for customer 1 and VLAN 100 for customer 2?
If I can answer the second question first, yes, that is absolutely correct.

For the first question though my first inclination was to reply saying that the system mtu command is used to specify the layer 3 payload, not the layer 2 payload (since the default is 1500 bytes).  This line of thought was due to the logic of the L2 MTU being 1518 (1500 plus an 18 byte Ethernet header/trailer) and thinking that switches allow an extra 4 bytes (1522 total) when a Dot 1Q tag is added.

But then I thought about this and the L3 MTU line of thought doesn't add up...  If it's the L3 MTU why are we adjusting it for the extra Dot 1Q tag at layer 2?  And if the system mtu command is for layer 2 then why is it only 1500 (1504 for QinQ) and not 1518, 1522, or even 1526 for a double tagged frame?

Quick!  To the DocCD!! There isn't a moment to lose!

I think that this one needs to start with the command reference.


The abstract is fairly innocuous sounding.
Use the system mtu global configuration command to set the maximum packet size or maximum transmission unit (MTU) size for Gigabit Ethernet ports, for routed ports, or for Fast Ethernet (10/100) ports
And if we were to look quickly at the options we have in IOS help we see something that lines up pretty good with that description:

Sw1(config)#system mtu ?
  <1500-1998>  MTU size in bytes
  jumbo        Set Jumbo MTU value for GigabitEthernet or TenGigabitEthernet
  routing      Set the Routing MTU for the system

But what exactly does this modify?

Duh!  The MTU!

But how?

The setup today is simple.

The switch is a 3560, is a Windows host that will be running Wireshark to verify the packet arrives, and is my Raspberry Pi that I have Scapy installed on (If you're unfamilliar with Scapy I encourage you to visit the Scapy site.  I'll also drop a couple links to Stretch's Intro to Scapy and CheatSheet on the subject). The interface facing the RaspPi is set as a trunk so I can send tagged frames, and the interface facing the host is an access port in VLAN 2.

I'm going to start with the default system MTU of 1500 bytes, and I'm going to generate a 1500 byte packet (1518 bytes including the Ethernet header and trailer) and send it to Wireshark.

Here's how I'm generating this in Scapy:

>>> send(Dot1Q(vlan=2)/IP(dst="")/Raw(RandString(size=1480)))
Sent 1 packets.

Fairly straight forward.  I've added a Dot1Q tag for VLAN 2, an IP destination, and a random string of 1480 bytes.

Ethernet header/trailer = 18 bytes
Dot1Q tag = 4 bytes
IP headers = 20 bytes
Random string = 1480
1522 bytes

So, did our "full size" frame make it to Wireshark?  Yes, it did.

OK, that works.  It should... So that's a good thing.

But why is it only 1514 bytes?  Well, that's a hardware thing.  My NIC is stripping off the 4 byte Ethernet trailer (the FCS) and not passing it up to be seen by Wireshark.  I haven't yet figured a way to force my NIC to pass the trailer on, so if you know a way let me know.  I'm sure that like the VLAN tag problem it may be hardware dependent.

To compare, I am going to quickly make the RaspPi interface an access port in VLAN 2, and send an untagged frame at 1518 bytes to make sure that works as well.

>>> send(IP(dst="")/Raw(RandString(size=1480)))
Sent 1 packets.

Similar to the last time, except no VLAN tag.

I realize that there isn't anyway to tell these two frame apart...  You really have to take my word here that when I send them they are being received.  Afterall, why would I lie?

Trust me ;)

While we have the RaspPi facing port in access mode, let's see what happens if we send a frame that's larger than 1518 bytes.  Or more to the point, a packet that's larger than 1500 bytes, our supposed "system MTU".  I'm only going to increase it by one byte.

>>> send(IP(dst="")/Raw(RandString(size=1481)))
Sent 1 packets.

Well... I'm not really sure how to show you nothing.  I could do a screen shot of a blank Wireshark window, but that seems kind of silly.  Let's just say the packet didn't make it, and call it a day shall we?  Good.

This lines up with our "system MTU" in that we have now unsuccessfully sent a frame that exceeds that value, which I think is to be expected.  This also lines up with my initial assumption that we were setting a layer 3 MTU value for the switch.  But what happens if we start adding in some Dot1Q tags?  Can we still get our frames through?

At this point I'm going to set my RaspPi/Scapy facing port back to trunking mode.  I'm also going to set the native VLAN to 2.  This combination will allow me to send both tagged and untagged frames to my Wireshark without having to change in back and forth further (Please see Chris Marget's Native VLAN post for more info on this sneaky bit of work).

Baseline time.  Here's two frames, one tagged, one untagged, to set the stage.

>>> send(IP(dst="")/Raw(RandString(size=1480)))
Sent 1 packets.
>>> send(Dot1Q(vlan=2)/IP(dst="")/Raw(RandString(size=1480)))
Sent 1 packets.

And a double whammy shows up in Wireshark.

You'll have to ignore the "hop-by-hop" interpretation Wireshark gives.  It's doing its best to dissect my frames, but since I'm crafting them manually it isn't completely correct.

And for those of you keeping track, that was a 1518 byte frame, and a 1522 byte frame.  Both worked just fine with our "system MTU" of 1500 bytes.

At this point my initial assumption is holding fast...

I think the next step is to send some tagged frames that exceed our 1500 byte value and see what happens. Again, I'm only going to increase it by one byte.

>>> send(Dot1Q(vlan=2)/IP(dst="")/Raw(RandString(size=1481)))
Sent 1 packets.

And...  A whole lot of nothing!

This also is inline with my initial assumption.  I just sent a frame with 1523 bytes, and it failed.  But, I added the extra byte in the payload of layer 3, which means I sent 1501 bytes at layer 3 (including the IP header).

But am I really as smart as I think I am?

Let's instead keep our layer 3 at 1500 bytes and add another Dot1Q tag to the frame.

>>> send(Dot1Q(vlan=2)/Dot1Q(vlan=99)/IP(dst="")/Raw(RandString(size=1480)))
Sent 1 packets.

This frame, unfortunately, does not make it to the destination either.  This now confuses things, and means that I'm no where near as smart as I think I am (disclaimer: I already knew that I wasn't as smart as I think I am. This is proven on a regular basis, but I still keep thinking I'm smart).

Let's now change the system MTU, and see if we can get both our 1501 byte layer 3 packet and our double tagged 1526 byte layer 2 frame through this bloody switch.

Sw1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Sw1(config)#system mtu 1504
Changes to the system MTU will not take effect until the next reload is done
Sw1(config)#do reload

System configuration has been modified. Save? [yes/no]: y
Building configuration...
Proceed with reload? [confirm]

*Mar  1 01:14:29.114: %SYS-5-RELOAD: Reload requested by console. Reload Reason: Reload command.

Now, let's try both of those previous tests again shall we?

>>> send(Dot1Q(vlan=2)/IP(dst="")/Raw(RandString(size=1481)))
Sent 1 packets.
>>> send(Dot1Q(vlan=2)/Dot1Q(vlan=99)/IP(dst="")/Raw(RandString(size=1480)))
Sent 1 packets.

And the results please...

Well well well... This is an interesting turns of events wouldn't you say?  Both frames still made it through. You'll even notice that the second Dot1Q tag I added in the second frame made it through (Yes kids, this is how we do a VLAN hopping attack!). 

At the end of the day where the hell does all of this leave us?  Well I think we can draw the following conclusions:
  • The system mtu command doesn't directly affect either layer 2 or layer 3
  • The system mtu command affects the entire payload after the first Ethernet/Dot1Q tag combination (this blog post doesn't take into account multiple Ethernet headers existing in a frame)
So back to the original question, Yes, for the QinQinQ switch that I had set up in that earlier post it most certainly should have been set to a system MTU of 1508 for proper operation.  The two edge switches were fine at their system MTU of 1504.