Proper isolation of a Linux bridge

Vincent Bernat April 12, 2017

TL;DR

When configuring a Linux bridge, use the following commands to enforce isolation:

# bridge vlan del dev br0 vid 1 self
# ip link set dev br0 type bridge vlan_filtering 1

A network bridge (also commonly called a “switch”) brings several Ethernet segments together. It is a common element in most infrastructures. Linux provides its own implementation.

A typical use of a Linux bridge is shown below. The hypervisor is running three virtual hosts. Each virtual host is attached to the br0 bridge (represented by the horizontal segment). The hypervisor has two physical network interfaces:

eth0 is attached to a public network providing various services for the virtual hosts (DHCP, DNS, NTP, routers to Internet, …). It is also part of the br0 bridge.
eth1 is attached to an infrastructure network providing various services to the hypervisor (DNS, NTP, configuration management, routers to Internet, …). It is not part of the br0 bridge.

Typical use of Linux bridging with virtual machines — Typical use of Linux bridging on a hypervisor with virtual hosts

The main expectation of such a setup is that while the virtual hosts should be able to use resources from the public network, they should not be able to access resources from the infrastructure network (including resources hosted on the hypervisor itself, like a SSH server). In other words, we expect total isolation between the green domain and the purple one.

That’s not the case. From any virtual host:

# ip route add 192.168.14.3/32 dev eth0
# ping -c 3 192.168.14.3
PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data.
64 bytes from 192.168.14.3: icmp_seq=1 ttl=59 time=0.644 ms
64 bytes from 192.168.14.3: icmp_seq=2 ttl=59 time=0.829 ms
64 bytes from 192.168.14.3: icmp_seq=3 ttl=59 time=0.894 ms

--- 192.168.14.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2033ms
rtt min/avg/max/mdev = 0.644/0.789/0.894/0.105 ms

Why?
Workarounds
- Protocol-independent workarounds
- Protocol-dependent workarounds
  - ARP
  - IPv4
  - IPv6
About the example
About MACVLAN interfaces

Why?#

There are two main factors of this behavior:

A bridge can accept IP traffic. This is a useful feature if you want Linux to act as a bridge and provide some IP services to bridge users (a DHCP relay or a default gateway). This is usually done by configuring the IP address on the bridge device: ip addr add 192.0.2.2/25 dev br0.
An interface doesn’t need an IP address to process incoming IP traffic. Additionally, by default, Linux accepts to answer ARP requests independently from the incoming interface.

Bridge processing#

After turning an incoming Ethernet frame into a socket buffer, the network driver transfers the buffer to the netif_receive_skb() function. The following actions are executed:

evaluate the XDP receive hooks,
copy the frame to any registered global or per-device taps (e.g. tcpdump),
evaluate the ingress policy (configured with tc),
hand over the frame to the device-specific receive handler, if any,
hand over the frame to a global or device-specific protocol handler (e.g. IPv4, ARP, IPv6).

For a bridged interface, the kernel has configured a device-specific receive handler, br_handle_frame(). This function won’t allow any additional processing in the context of the incoming interface, except for STP and LLDP frames, or if “brouting” is enabled.¹ Therefore, the protocol handlers are never executed in this case.

After a few additional checks, Linux will decide if the frame has to be locally delivered:

the entry for the target MAC in the FDB is marked for local delivery; or
the target MAC is a broadcast or a multicast address.

In this case, the frame is passed to the br_pass_frame_up() function. A VLAN-related check is optionally performed. The socket buffer is attached to the bridge interface (br0) instead of the physical interface (eth0). It is evaluated by Netfilter and sent back to netif_receive_skb(). It will go through the four steps a second time.

IPv4 processing#

When a device doesn’t have a protocol-independent receive handler, a protocol-specific handler will be used:

# cat /proc/net/ptype
Type Device      Function
0800          ip_rcv
0011          llc_rcv [llc]
0004          llc_rcv [llc]
0806          arp_rcv
86dd          ipv6_rcv

Therefore, if the Ethernet type of the incoming frame is 0x800, the socket buffer is handled by ip_rcv(). Among other things, the three following steps will happen:

If the frame destination address is not the MAC address of the incoming interface, not a multicast one, and not a broadcast one, the frame is dropped (“not for us”).
Netfilter gets a chance to evaluate the packet (in a PREROUTING chain).
The routing subsystem will decide the destination of the packet in ip_route_input_slow(): is it a local packet, should it be forwarded, should it be dropped, should it be encapsulated? Notably, the reverse-path filtering is done during this evaluation in fib_validate_source().

Reverse-path filtering (also known as uRPF, or unicast reverse-path forwarding, RFC 3704) enables Linux to reject traffic on interfaces which it should never have originated: the source address is looked up in the routing tables and if the outgoing interface is different from the current incoming one, the packet is rejected.

ARP processing#

When the Ethernet type of the incoming frame is 0x806, the socket buffer is handled by arp_rcv().

As with IPv4, if the frame is not for us, it is dropped.
If the incoming device has the NOARP flag, the frame is dropped.
Netfilter gets a chance to evaluate the packet (configuration is done with arptables).
For an ARP request, the values of arp_ignore and arp_filter may trigger a drop of the packet.

IPv6 processing#

When the Ethernet type of the incoming frame is 0x86dd, the socket buffer is handled by ipv6_rcv().

As with IPv4, if the frame is not for us, it is dropped.
If IPv6 is disabled on the interface, the packet is dropped.
Netfilter gets a chance to evaluate the packet (in a PREROUTING chain).
The routing subsystem will decide the destination of the packet. However, unlike IPv4, there is no reverse-path filtering.²

Workarounds#

There are various methods to fix the situation.

We can completely ignore the bridged interfaces: as long as they are attached to the bridge, they cannot process any upper layer protocol (IPv4, IPv6, ARP). Therefore, we can focus on filtering incoming traffic from br0.

It should be noted that for IPv4, IPv6, and ARP protocols, the MAC address check can be circumvented by using the broadcast MAC address.

Protocol-independent workarounds#

The four following fixes will indistinctly drop IPv4, ARP, and IPv6 packets.

Using VLAN-aware bridge#

Linux 3.9 introduced the ability to use VLAN filtering on bridge ports. This can be used to prevent any local traffic:³

# ip link set dev br0 type bridge vlan_filtering 1
# bridge vlan del dev br0 vid 1 self
# bridge vlan show
port    vlan ids
eth0     1 PVID Egress Untagged
eth2     1 PVID Egress Untagged
eth3     1 PVID Egress Untagged
eth4     1 PVID Egress Untagged
br0     None

This is the most efficient method since the frame is dropped directly in br_pass_frame_up().

Using XDP receive hooks#

It’s also possible to drop the bridged frame early after it has been re-delivered to netif_receive_skb() by br_pass_frame_up(). The XDP hooks attached to an interface are evaluated before taps, ingress policy, and handlers. A very simple xdp_drop_all.c program would look like this:

#include <linux/bpf.h>

int main()
{
    return XDP_DROP;
}

It can be compiled and installed with:

$ clang -O2 -target bpf -c xdp_drop_all.c -o xdp_drop_all.o
$ sudo ip link set dev br0 xdp obj xdp_drop_all.o sec .text

This requires Linux 4.14 and this is the second most efficient method but a bit inconvenient as it needs a compiler.⁴

Using ingress policy#

The ingress policy of an interface is evaluated just a bit later but before any handler. Therefore, the following commands will ensure no local delivery happens:

# tc qdisc add dev br0 handle ffff: ingress
# tc filter add dev br0 parent ffff: matchall action drop

This is the third most efficient method but is more convenient than the previous one for a very tiny additional overhead.

Update (2020-01)

Since Linux 4.2, it is possible to use Netfilter to setup the ingress policy with the same efficiency. This can be done through nftables:

# nft -f - <<EOF
> table netdev firewall {
>   chain ingress {
>     type filter hook ingress device br0 priority 0; policy drop;
>     log
>   }
> }
> EOF

Using `ebtables`#

Just before re-delivering the frame to netif_receive_skb(), Netfilter gets a chance to issue a decision. It’s easy to configure it to drop the frame:

# ebtables -A INPUT --logical-in br0 -j DROP

However, to the best of my knowledge, this part of Netfilter is known to be inefficient.

Using namespaces#

Isolation can also be obtained by moving all the bridged interfaces into a dedicated network namespace and configure the bridge inside this namespace:

# ip netns add bridge0
# ip link set netns bridge0 eth0
# ip link set netns bridge0 eth2
# ip link set netns bridge0 eth3
# ip link set netns bridge0 eth4
# ip link del dev br0
# ip netns exec bridge0 brctl addbr br0
# for i in 0 2 3 4; do
>    ip netns exec bridge0 brctl addif br0 eth$i
>    ip netns exec bridge0 ip link set up dev eth$i
> done
# ip netns exec bridge0 ip link set up dev br0

The frame will still wander a bit inside the IP stack, wasting some CPU cycles and increasing the possible attack surface. But ultimately, it will be dropped.

Update (2023-08)

A more lightweight alternative is to move the bridge into a VRF with ip link set br0 master vrf0. The result is similar to the solution with namespaces, but the configuration is simpler.

Protocol-dependent workarounds#

Unless you require multiple layers of security, if one of the previous workarounds is already applied, there is no need to apply one of the protocol-dependent fix below. It’s still interesting to know them because it is not uncommon to already have them in place.

ARP#

The easiest way to disable ARP processing on a bridge is to set the NOARP flag on the device. The ARP packet will be dropped as the very first step of the ARP handler.

# ip link set arp off dev br0
# ip l l dev br0
8: br0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 50:54:33:00:00:04 brd ff:ff:ff:ff:ff:ff

arptables can also drop the packet quite early:

# arptables -A INPUT -i br0 -j DROP

Another way is to set arp_ignore to 2 for the given interface. The kernel will only answer to ARP requests whose target IP address is configured on the incoming interface. Since the bridge interface doesn’t have an IP address, no ARP requests will be answered.

# sysctl -qw net.ipv4.conf.br0.arp_ignore=2

Disabling ARP processing is not a sufficient workaround for IPv4. A user can still insert the appropriate entry in its neighbor cache:

# ip neigh replace 192.168.14.3 lladdr 50:54:33:00:00:04 dev eth0
# ping -c 1 192.168.14.3
PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data.
64 bytes from 192.168.14.3: icmp_seq=1 ttl=49 time=1.30 ms

--- 192.168.14.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.309/1.309/1.309/0.000 ms

Because the check on the target MAC address is quite loose, they don’t even need to guess the MAC address:

# ip neigh replace 192.168.14.3 lladdr ff:ff:ff:ff:ff:ff dev eth0
# ping -c 1 192.168.14.3
PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data.
64 bytes from 192.168.14.3: icmp_seq=1 ttl=49 time=1.12 ms

--- 192.168.14.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.129/1.129/1.129/0.000 ms

IPv4#

The earliest place to drop an IPv4 packet is with Netfilter:⁵

# iptables -t raw -I PREROUTING -i br0 -j DROP

If Netfilter is disabled, another possibility is to enable strict reverse-path filtering for the interface. In this case, since there is no IP address configured on the interface, the packet will be dropped during the route lookup:

# sysctl -qw net.ipv4.conf.br0.rp_filter=1

Another option is the use of a dedicated routing rule. Compared to the reverse-path filtering option, the packet will be dropped a bit earlier, still during the route lookup.

# ip rule add iif br0 blackhole

IPv6#

Linux provides a way to completely disable IPv6 on a given interface. The packet will be dropped as the very first step of the IPv6 handler:

# sysctl -qw net.ipv6.conf.br0.disable_ipv6=1

As with IPv4, it’s possible to use Netfilter or a dedicated routing rule.

About the example#

In the above example, the virtual host gets ICMP replies because they are routed through the infrastructure network to Internet (e.g. the hypervisor has a default gateway which also acts as a NAT router to Internet). This may not be the case.

If you want to check if you are “vulnerable” despite not getting an ICMP reply, look at the guest neighbor table to check if you got an ARP reply from the host:

# ip route add 192.168.14.3/32 dev eth0
# ip neigh show dev eth0
192.168.14.3 lladdr 50:54:33:00:00:04 REACHABLE

If you didn’t get a reply, you could still have issues with IP processing. Add a static neighbor entry before checking the next step:

# ip neigh replace 192.168.14.3 lladdr ff:ff:ff:ff:ff:ff dev eth0

To check if IP processing is enabled, check the bridge host’s network statistics:

# netstat -s | grep "ICMP messages"
    15 ICMP messages received
    15 ICMP messages sent
    0 ICMP messages failed

If the counters are increasing, it is processing incoming IP packets.

One-way communication still allows a lot of bad things, like DoS attacks. Additionally, if the hypervisor happens to also act as a router, the reach is extended to the whole infrastructure network, potentially exposing weak devices (e.g. PDU) exposing an SNMP agent. If one-way communication is all that’s needed, the attacker can also spoof its source IP address, bypassing IP-based authentication.

About MACVLAN interfaces#

When source-address learning is not needed, bridges can be replaced by MACVLAN or MACVTAP interfaces. Have a look at the following article for more details: “Bridge vs Macvlan.”

In this case, the packet processing is different and asymmetric. When a frame exits the MACVLAN interface, it is directly transmitted to the appropriate interface (either another MACVLAN interface or the lower device). In this direction, communication with the host is not possible.

However, when a frame enters the lower device, if the destination MAC is not one of the sub-interfaces, the frame is processed by the host using the appropriate protocol handler (ARP, IPv4, IPv6). Therefore, isolation can only be obtained by using the protocol-dependent methods.

A frame can be forcibly routed (L3) instead of bridged (L2) by “brouting” the packet. This action can be triggered using ebtables. ↩︎
For IPv6, reverse-path filtering needs to be implemented with Netfilter, using the rpfilter match. ↩︎
With a kernel older than Linux 4.3, you’ll have to use the following command instead of ip link:
```
# echo 1 > /sys/class/net/br0/bridge/vlan_filtering
```
↩︎

The above program is so simple it could be written without a compiler:

$ ip -d l l dev br0 | tail -1
    prog/xdp id 1 tag 57cd311f2e27366b
$ bpftool prog dump xlated id 1
   0: (b7) r0 = 1
   1: (95) exit
$ readelf -x .text xdp_drop_all_kern.o
b7000000 01000000 95000000 00000000

However, ip link requires an ELF object. ↩︎

If the br_netfilter module is loaded, net.bridge.bridge-nf-call-ipatbles sysctl has to be set to 0. Otherwise, you also need to use the physdev match to not drop IPv4 packets going through the bridge. ↩︎

Why?#

Bridge processing#

IPv4 processing#

ARP processing#

IPv6 processing#

Workarounds#

Protocol-independent workarounds#

Using VLAN-aware bridge#

Using XDP receive hooks#

Using ingress policy#

Using ebtables#

Using namespaces#

Protocol-dependent workarounds#

ARP#

IPv4#

IPv6#

About the example#

About MACVLAN interfaces#

Using `ebtables`#