We had to investigate the operation of one of our Openstack compute nodes as it was exhibiting some unusual behaviour. We quickly determined that there was some unexpected packet loss and we had reason to believe that this could have been due to the packet processing in the node. Investigating this problem necessitated some deeper exploration of how packets are processed in the node, particularly relating to the mix of ovs bridges, linux bridges and iptables. It turns out that this is rather complex and clear information describing how all this fits together in detail is not readily available. Here, we note what we learnt from this exploration.
There is some excellent information from rackspace here (thanks for the pointer Bruno!), some good content from redhat here although slightly higher level and nice troubleshooting documentation from Nick Jones here.
Our suspicion was that iptables was somehow resulting in packet loss, as some packets were received and some packets mysteriously disappeared. Consequently, we wanted to understand how iptables works in the context of a compute node in openstack.
The specific mechanisms by which iptables is triggered in an openstack compute node are non-trivial. In the olden days, iptables was triggered when a packet arrives on any Operating System interface and typically applied a quite basic ruleset to the packet. In a modern compute node, the number of interfaces known to the system can be large (few hundred is common, mostly relating to the taps and bridges dedicated to VMs) with packets being passed between these interfaces via both OVS and linux bridges. Also, packets being passed through the system can be modified – the most obvious modification is an encapsulation/decapsulation process – meaning that the applicable iptables rules can differ as the packet passes through the node. Further, having a look at iptables on the host (iptables --list
) shows a lot of chain based rules arising from neutron which mostly address security groups functionalities.
To understand a little more about what was going on, we used the basic iptables tracing capability by inserting a trace instruction into the two paths (OUTPUT
and PREROUTING
) that can be followed from the raw table (iptables -A OUTPUT -p icmp -s x.y.z.0/24 -j TRACE
). We observed that packets destined for the host go through the full set of iptables rules, many of which pertain specifically to neutron. We then modified the trace functions to try to capture VM traffic and created traffic destined for VMs. Of course, on the compute node, the internal IP addresses assigned to the VMs should be used (rather than the public facing IP addresses). In this way, we were able to see how iptables performed packet processing for VM traffic: we noted that the packets arrived at iptables on the qbre
interface after they had passed through the OVS GRE encapsulation phase.
<4>Nov 10 11:23:11 node-5 kernel: [11821685.049025] TRACE: raw:PREROUTING:policy:3 IN=qbre14aa959-d6 OUT= PHYSIN=qvbe14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049042] TRACE: mangle:PREROUTING:policy:1 IN=qbre14aa959-d6 OUT= PHYSIN=qvbe14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049057] TRACE: mangle:FORWARD:policy:1 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049070] TRACE: filter:FORWARD:rule:1 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049084] TRACE: filter:neutron-filter-top:rule:1 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049109] TRACE: filter:neutron-openvswi-local:return:1 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049122] TRACE: filter:neutron-filter-top:return:2 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049135] TRACE: filter:FORWARD:rule:2 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E)<4>Nov 10 11:23:11 node-5 kernel: [11821685.049151] TRACE: filter:neutron-openvswi-FORWARD:rule:55 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049185] TRACE: filter:neutron-openvswi-sg-chain:rule:55 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049206] TRACE: filter:neutron-openvswi-ie14aa959-d:rule:2 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049239] TRACE: filter:neutron-openvswi-sg-chain:return:77 IN=qbre14aa959-d6 OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 MAC=fa:16:3e:cf:81:aa:fa:16:3e:23:c4:5f:08:00 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E) <4>Nov 10 11:23:11 node-5 kernel: [11821685.049251] TRACE: mangle:POSTROUTING:policy:2 IN= OUT=qbre14aa959-d6 PHYSIN=qvbe14aa959-d6 PHYSOUT=tape14aa959-d6 SRC=192.168.113.4 DST=192.168.113.2 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=10770 DF PROTO=TCP SPT=5672 DPT=41943 SEQ=1808903081 ACK=2600615076 WINDOW=1450 RES=0x00 ACK URGP=0 OPT (0101080AEA673046EA67B69E)
In the above output from kern.log
, we have some insight into the way the packet traverses the system; we can see that a packet destined for the internal IP address 192.168.113.2
first appears on the qvbe14aa959-d6
interface. Then it goes through the routing function in the node where the tap interface of the appropriate VM is identified as the destination for the node. Following this, it is put through the iptables (in the FORWARD
stage of the process), where we can see that the neutron-openvswi-sg-chain:rule:55
rule is triggered which directs control to the table specific to this VM, neutron-openvswi-ie14aa959-d
. As this packet is not blocked by the VM’s security groups, it is ultimately passed on to the POSTROUTING
stage and into the VM’s tap interface.
Using the above, we were able to gain a better understanding of how iptables operates in a compute node. We learnt enough to debug our basic problem (which ultimately ended up being a mix of firewall and routing issues elsewhere) and it gave us pause for thought regarding the overall system design. More effort would be required to explain fully the relationship between these working parts, but we could not dedicate more effort to this at that time – I’m sure we’ll revisit it in more detail in future.
Generally, this system design is complex and for sure it is a less than ideal compromise of partial solutions that were available at the time. It did strike us as unusual that there is not greater separation between host and guest traffic, exemplified by the fact that traffic destined for the host goes through the neutron security rules. Further, it seems a more natural system design to put the responsibility for enforcing Security Group rules closer to the VM, eg could this be a domain responsibility rather than something that sits within the networking functions of the host OS. Of course, imagining alternative solutions is easy; figuring out if they make sense in practice and actually turning them into reality is an entirely different story!