Design a site like this with WordPress.com
Get started

NSX-T East-West Data Path Part I

NSX is full of amazing features. However, if you do not understand the data path then it quickly becomes difficult to troubleshoot. Within this post we’re going analyze the east-west data path between 2 VMs on separate hosts and segments. To get a clear understanding we will capture traffic at various points between both VMs. In addition, we will utilize some of the new NSX CLI commands that will make our life a lot easier. Without further ado, let’s get started.

For this walk through we’ll utilize the following topology. We will trace traffic from a VM on the Blue Segment to a VM on the Green Segment. Just to note, there will be no Service Router (SR) configured. In this example all routing will be done by the Distributed Routers (DR) on the host level. (We will cover SR packet walk in part II).

Source VM: Cent-01
Source IP: 192.168.100.10
Segment: Green-Segment
Host: ESX-04 (TEPs 192.168.141.13,  192.168.141.14)
Destination VM: Cent-02
Source IP: 192.168.200.25
Segment: Blue-Segment
Host: ESX-05 (TEPs 192.168.141.11, 192.168.141.12)

As always, the first step is to identify the port the source VM is connected to. To do this we run net-stat on host ESX-04. Here we see Cent-01 is connected to port 67108882.

[root@esx-04:~] net-stats -l | grep Cent
67108882            5       9 DvsPortset-0     00:50:56:9d:8d:ad  Cent-01.eth0

Capturing on the VM  switchport we see the ICMP request.

[root@esx-04:~] pktcap-uw --switchport 67108882 --dir 0 --stage 0 -o - | tcpdump-uw -enr -
20:04:00.141541 00:50:56:9d:8d:ad > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3761, seq 1, length 64
20:04:01.142973 00:50:56:9d:8d:ad > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3761, seq 2. length 64

Although we cannot capture on the DR interfaces within the host we can review the forwarding tables of each to understand the traffic flow. Within our configuration we have 3 routers. Green T1, Blue T1 and a Gateway T0. Running the following commands returns all 3 logical routers that reside on the host .

[root@esx-04:~] nsxcli -c get logical-routers
Mon Apr 26 2021 UTC 21:07:42.655
                                  Logical Routers Summary
 ------------------------------------------------------------------------------------------
               VDR UUID                LIF num  Route num  Max Neighbors  Current Neighbors
 7de6917b-5261-4e1a-8f37-09ea0a4a0503     3         4          50000              12
 5cd6608e-ac7d-4f67-8bad-cc9d3c9133dc     3         3          50000              17
 bec2152a-8257-4617-8d53-265547d7c191     3       65530        50000              12

As an example, here’s the forwarding table for the Green Logical Router. As can be seen we’ve 3 routes and a default route. Just to note, the 100.64.0.0/16 and 169.254.00/28 are preconfigured ranges. T1 to T0 uses 100.64.0.0/16 (Routing Link). DR to SR uses 169.254.00/28 (Routing Backplane). We have a default gateway of 100.64.32.2 so we know that the our next hop is T0 DR.

[root@esx-04:~] nsxcli -c get logical-router bec2152a-8257-4617-8d53-265547d7c191 forwarding ipv4
Mon Apr 26 2021 UTC 21:17:09.561
                                   Logical Routers Forwarding Table - IPv4
--------------------------------------------------------------------------------------------------------------
Flags Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
[H: Host], [R: Reject], [B: Blackhole], [F: Soft Flush], [E: ECMP]

                   Network                               Gateway                Type               Interface UUID
==============================================================================================================
0.0.0.0/0                                              100.64.32.2              UGE     39306823-7038-48cd-8f88-7885ebef7465
100.64.32.2/31                                           0.0.0.0                UCI     39306823-7038-48cd-8f88-7885ebef7465
169.254.0.0/28                                           0.0.0.0                UCI     25385a1a-ca25-42d3-be3c-a704172b31bb
192.168.100.0/24                          

Next we identify the VDR switchport ID.

[root@esx-04:~] net-stats -l | grep vdr
67108880            0       0 DvsPortset-0     02:50:56:56:44:52  vdr-vdrPort

On the VDR port we see the ICMP request .  In this example traffic captured at this point has already been routed onto the Blue Segment.

[root@esx-04:~] pktcap-uw --switchport 67108880 --dir 0 --stage 0 -o - | tcpdump-uw -enr -
20:22:14.767646 02:50:56:56:44:55 > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3761, seq 1094, length 64
20:22:14.767662 02:50:56:56:44:52 > 02:50:56:56:44:55, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3761, seq 1095, length 64

As per our topology we can see the destination VM resides on another host. As a result, the ICMP request packet will be encapsulated using the Geneve protocol and sent across the physical network. Before capturing traffic on the physical link we must first identity the NIC in use. To do this we obtain the VM’s world ID using the following command.

[root@esx-04:~] esxcli vm process list | grep -A 1 -m 1 Cent
Cent-01
World ID: 650442

The world ID is then used to identify the NIC.

[root@esx-04:~] esxcli network vm port list -w 650442
   Port ID: 67108882
   vSwitch: RegionA01-VDS7
   Portgroup:
   DVPort ID: 11f7d066-9dfc-4d5e-ada8-fd2d4d64508a
   MAC Address: 00:50:56:9d:8d:ad
   IP Address: 0.0.0.0
   Team Uplink: vmnic1 <---------- VM traffic is flowing over vmnic1
   Uplink Port ID: 2214592518
   Active Filters: vmware-sfw

Capturing on vmnic1 we see the ICMP request packet egressing the host. Just to note,  –stage must be set to “0” here in order to see the traffic. With –stage set to “1” the traffic will already be encapsulated.

[root@esx-04:~] pktcap-uw --uplink vmnic1 --dir 1 --stage 0 --ip 192.168.100.10 -o - | tcpdump-uw -enr -
20:50:49.529078 02:50:56:56:44:52 > 00:50:56:9d:c7:59, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3761, seq 2806, length 64
20:50:50.531200 02:50:56:56:44:52 > 00:50:56:9d:c7:59, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3761, seq 2807, length 64

The packet traverses the physical network. On ESX05 we see the ICMP packets ingressing the hosts on vmnic01. Just to note, trying to filter with pktcap-uw –ip flag won’t work here as it’ll check IP address of the outer packet. To work around this we just pipe the output into the grep. Not very very efficient but it’ll do the job.

[root@esx-05:~] pktcap-uw --uplink vmnic0 --dir 0 --stage 0 -o - | tcpdump-uw -enr - | grep -i "192.168.200.25"
20:50:50.684523 00:50:56:60:a9:7e > 00:50:56:69:06:e9, ethertype IPv4 (0x0800), length 156: 192.168.141.14.61859 > 192.168.141.12.6081: Geneve, Flags [C], vni 0x12007, proto TEB (0x6558), options [8 bytes]: 02:50:56:56:44:52 > 00:50:56:9d:c7:59, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3221, seq 792, length 64
20:50:50.986583 00:50:56:60:a9:7e > 00:50:56:69:06:e9, ethertype IPv4 (0x0800), length 156: 192.168.141.14.61859 > 192.168.141.12.6081: Geneve, Flags [C], vni 0x12007, proto TEB (0x6558), options [8 bytes]: 02:50:56:56:44:52 > 00:50:56:9d:c7:59, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3221, seq 793, length 64

Finally we hit the destination VM. Here we see the ICMP request on the switchport.

[root@esx-05:~] pktcap-uw --switchport 67108882 --dir 1 --stage 0 -o - | tcpdump-uw -enr
20:50:53.713238 02:50:56:56:44:52 > 00:50:56:9d:c7:59, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3221, seq 1709, length 64
20:50:53.718375 02:50:56:56:44:52 > 00:50:56:9d:c7:59, ethertype IPv4 (0x0800), length 98: 192.168.100.10 > 192.168.200.25: ICMP echo request, id 3221, seq 1714, length 64

Similar to the ICMP request. Before egressing ESX05 the reply traffic will be routed locally back onto the Green Segment. To demonstrate this we capture on the VDR port on ESX05. Here we see the ICMP reply traffic.

[root@esx-05:~] pktcap-uw --switchport 67108880 --dir 0 --stage 0 -o - | tcpdump-uw -enr -
20:54.337945 02:50:56:56:44:52 > 02:50:56:56:44:55, ethertype IPv4 (0x0800), length 98: 192.168.200.25 > 192.168.100.10: ICMP echo reply, id 3221, seq 2068, length 64
20:54:12.337 02:50:56:56:44:52 > 02:50:56:56:44:55, ethertype IPv4 (0x0800), length 98: 192.168.200.25 > 192.168.100.10: ICMP echo reply, id 3221, seq 2067, length 64

We can see the reply traffic has a source MAC address of 02:50:56:56:44:52. A quick check on the Green T1 logical router shows that mac address is actually from the Green Segment’s default gateway. From here the packet is sent back across the physical network to the source VM (Cent-01).

[root@esx-05:~] nsxcli -c get logical-router bec2152a-8257-4617-8d53-265547d7c191 interfaces
Fri Apr 30 2021 UTC 20:37:04.051
                         Logical Router Interfaces
---------------------------------------------------------------------------
IPv6 DAD Status Legend:  [A: DAD_Sucess], [F: DAD_Duplicate], [T: DAD_Tentative], [U: DAD_Unavailable]

LIF UUID                 : 9c4ecdf6-e172-4d39-a9f4-7fb1ea980ecc
Mode                     : [b'Routing']
Overlay VNI              : 73734
IP/Mask                  : 192.168.100.1/24
Mac                      : 02:50:56:56:44:52
Connected DVS            : RegionA01-VDS7
Control plane enable     : True
Replication Mode         : 0.0.0.1
Multicast Routing        : [b'Enabled', b'Oper Down']
State                    : [b'Enabled']
Flags                    : 0x80388
DHCP relay               : Not enable
DAD-mode                 : ['LOOSE']
RA-mode                  : ['UNKNOWN']

This concludes part I. In part II we will do another packet walk, this time we’ll include the T1 Service Router.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: