Open vSwitch – Stateful Functions

Open vSwitch is a software switch, commonly seen in OpenStack Networking that connects physical to virtual interfaces. It uses virtual bridges and flow rules to forward packets and consists of a number of switches, including provider, integration, and tunnel bridge. Each switch has a different role in the network – tunnel bridge creates the overlay and the integration switch is the main connectivity bridge. The term bridge and switch are used interchangeably with Neutron networking.

Open vSwitch has user actions issued in userspace and a set of flows programmed in the Linux kernel with match criteria and actions. The kernel module is where all the packet processing occurs, similar to an ASIC on a standard physical/hardware switch. The vSwitch daemon is the userspace element, running in userspace, controlling how the kernel gets programmed. It also uses a database server, called Open vSwitch Database Server (OVSDB), a network configuration protocol.


Note on Open vSwitch Performance

Initially, Open vSwitch had good performance with steady state traffic. The kernel was multi-threaded so flows that were established had good performance. However, there were certain traffic patterns that would give Open vSwitch a headache and degrade performance. For example, peer-to-peer applications initiating a large number of quickly generated connections would hit it badly. The reason for this is because the kernel contained recently cached flows and when a packet came it that was wasn’t an exact cache match it would result in a cache miss and get sent to userspace. Continuous user – kernel space interaction kills performance. Unlike the kernel, userspace is single threaded and does not have the performance to process large amounts of packets or set up connections quickly.

They needed to improve the performance of connection setup. They added Megaflow which is wildcard entries in the kernel, made userspace multithreaded and introduced various enhancements to the classifier. They have spent a lot of time putting mega flows in the kernel and they don’t want to undo all of that good work. This is a foundation design principle to support the new implementation of stateful service and connection tracking. Anything they add to Open vSwitch must not affect performance. Network Heresy has a great article on OVS Performance.


Stateless vs Stateful functionality

It’s a good stateless flow forwarding device and supports finer grained flow fields but there is a gap when it comes to supporting stateful services. They are currently working to expand its feature set to include stateful connection tracking, stateful firewall, and deep packet inspection services. The current matching enables you to match on IP and port numbers. Nothing higher up in the application stack such as application ID is used. Stateless services offer better protection than stateless services as it delves deeper into the packet.


What is a stateless function?

Stateless means once a packet arrives the device can only affect what’s currently in that packet. It simply looks at the headers and bases the policy on those headers that it has just inspected. Evaluation is carried out on packet contents statically and is unaware of any data patterns.

Typically, stateless inspects the following elements within a packet – source / destination IP, source / destination port, protocol type. No additional inspection into Layer 3 or Layer 4, such as TCP control flags, sequence numbers, and ACK fields are carried out. For example, if the requirement involves matching on a TCP window parameter, stateless tracking won’t be able to track if packets are within a certain window. In terms of Network Address Translation (NAT), it is possible to perform stateless translation from one IP address to another, adjusting the MAC address for external forwarding but it won’t handle anything complicated.


What is a stateful function?

Today’s security requires more advanced filtering than just Layer 3 and Layer 4 headers. Stateful function watches everything end-to-end and knows exactly what stage the TCP connection is in. This enables more detailed information other than source/destination IP or port numbers.

Connection tracking is a fundamental part of stateful firewalling and supporting enhanced NAT functionality. We need to take into account when traffic is based on sessions and filter according to other parameters such as the state of a connection. Stateful inspection goes a couple of levels deeper and tracks every connection, examining both the packet headers and the application layer information in the payload. Stateful devices can determine if a connection has been negotiated, reset, establish and closedIt provides complete protection against many different types of high-level attacks by allowing administrators to get specific with their filtering, for example, not allowing the peer-to-peer (P2P) application to be transferred over HTTP tunnels.


Traditionally, Open vSwitch has two stateless approaches to firewalling:

  • Match on TCP flags

The ability to match on TCP flags and enforce policy on the SYN packets, permitting ALL ACK and RST. This approach gains in performance due to cached entries existing in the kernel. Keeping as much as possible in the kernel limits cache misses and user space interaction. What it gains in performance, lacks in security. It is not very secure as you are allowing ANY packet that has an ACK or RST bit set. It allows non-established flows through with ACT or RST set. An attacker could easily probe with a standard TCP port scanning tool, sending an ACK in, examining received responses.


  • Use the “learn” action

By default, Open vSwitch ovs-vswitchd process acts like a standard bridge and learns MAC addresses. It will continue to connect to the controller in the background and when it succeeds, it will stop acting like a traditional MAC-learning switch. The user space element maintains MAC tables and generates flows with matches and actions. This allows inserting new OpenFlow rules into user space. When a packet arrives it get pushed to user space and the user space function would use the “learn” action to create the reverse of the 5 tuple, inserting a new flow into the OpenFlow table. The process comes at a performance cost and it’s not as quick as having an existing connection. It forces every new flow to user space.

These methods are sufficient for some network requirements but they don’t carry out any deep actions on TCP to make sure you don’t have, for example, overlapping segments. They cannot inspect related flows to support complex protocols like FTP and SIP. Protocols like FTP and SIP have different flows for data and control. The control channel negotiates with the remote end on how the configuration of the data flow. For example, with FTP, the client initiates a TCP port 21 control connection. The remote FTP server then opens up a data socket on port 20.


Conntrack integration with OVS

To enable stateful services, the team at Open vSwitch proposes to use the conntrack module in Linux. This proposal is an alternative solution to using Linux Bridge with IPtables.

Conntrack stores state of all the connections and informs the Netfilter framework of connection state. Transit packets are connection tracked in the PRE_ROUTING chain and anything locally generated in performed in the OUTPUT chain. Packets may have 4 different user land states NEW, ESTABLISHED, RELATED and INVALID. Outside of the userland state, we have packet states in the kernel, for example, TCP SYN_SENT lets us know we have only seen a TCP SYN in one direction. If the conntrack sees one SYN packet, it considers the packet new. Once it sees a return TCP SYN/ACK it considers the connection to be established and data can be transmitted. Once a return packet is received the packet state changes to ESTABLISHED in the PRE_ROUTING chain of the nat table. 

The Open vSwitch can call into the kernel connection tracker. This will allow stateful tracking of flows and also the support of Application Layer Gateway (ALG) to punch holes for related “data” channels needed for protocols like FTP and SIP.


Netfilter Framework

A fundamental part of the connection tracking is the Netfilter framework. The Netfilter framework provides a variety of functionalities – packet selection, packet filtering, connection tracking and NAT. The Netfilter framework enables callbacks in the packet traversing the network stack. These callbacks are known as netfilter hooks enabling an operation on the packet. The essence of Netfilter is the ability to activate hooks.

They are called upon in distinct points along packet traversal in the kernel. The five points in the network stack where you can implement hooks include NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD, NF_INET_POST_ROUTING, NF_INET_LOCAL_OUT. Once a packet comes in and passes initial tests ( checksum etc), they are passed to the Netfilter framework NF_IP_PRE_ROUTING hook. Once the packet passes this code, a routing decision is made. If locally destined, the Netfilter framework is called for the NF_IP_LOCAL_IN or externally forwarded via the NF_IP_FORWARD hook. The packet finally goes to the NF_IP_POST_ROUTING before placed on the wire for transmission.


Netfilter Conntrack integration

Packets arrive to the Open vSwitch flow table and get sent to Netfilter connection tracking. This is the original Linux connection tracker and they haven’t made any changes to this. The connection tracking table enforces the flow and TCP window sizes and then makes available to the Open vSwitch table the state of the flow – NEW, ESTABLISHED etc. Now, it gets sent back to the Open vSwitch flow tables with the connection bits set.

Connection tracking allows tracking to set 5 tuples and store some information within the datapath. They expose generic concepts about those connections or if it part of a related flow, like FTP or SIP. This functionality enables the steering of micro flows based on a policy whether the packet is part of an NEW or ESTABLISHED flow state. Rather than simply applying a policy based on IP or port number. 




About Matt Conran

Matt Conran has created 165 entries.

One Comment

  • jeff

    Could you please tell me the working of ovs dpdk+ conntrack,
    Since in user space ovs, kernel module is not involved, How it will work in ovs usersapce.

Leave a Reply