Proper isolation of a Linux bridge
Vincent Bernat
TL;DR
When configuring a Linux bridge, use the following commands to enforce isolation:
# bridge vlan del dev br0 vid 1 self # ip link set dev br0 type bridge vlan_filtering 1
A network bridge (also commonly called a “switch”) brings several Ethernet segments together. It is a common element in most infrastructures. Linux provides its own implementation.
A typical use of a Linux bridge is shown below. The hypervisor is
running three virtual hosts. Each virtual host is attached to the
br0
bridge (represented by the horizontal segment). The hypervisor
has two physical network interfaces:
eth0
is attached to a public network providing various services for the virtual hosts (DHCP, DNS, NTP, routers to Internet, …). It is also part of thebr0
bridge.eth1
is attached to an infrastructure network providing various services to the hypervisor (DNS, NTP, configuration management, routers to Internet, …). It is not part of thebr0
bridge.
The main expectation of such a setup is that while the virtual hosts should be able to use resources from the public network, they should not be able to access resources from the infrastructure network (including resources hosted on the hypervisor itself, like a SSH server). In other words, we expect total isolation between the green domain and the purple one.
That’s not the case. From any virtual host:
# ip route add 192.168.14.3/32 dev eth0 # ping -c 3 192.168.14.3 PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data. 64 bytes from 192.168.14.3: icmp_seq=1 ttl=59 time=0.644 ms 64 bytes from 192.168.14.3: icmp_seq=2 ttl=59 time=0.829 ms 64 bytes from 192.168.14.3: icmp_seq=3 ttl=59 time=0.894 ms --- 192.168.14.3 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2033ms rtt min/avg/max/mdev = 0.644/0.789/0.894/0.105 ms
Why?#
There are two main factors of this behavior:
- A bridge can accept IP traffic. This is a useful feature if
you want Linux to act as a bridge and provide some IP services to
bridge users (a DHCP relay or a default gateway). This is usually
done by configuring the IP address on the bridge device:
ip addr add 192.0.2.2/25 dev br0
. - An interface doesn’t need an IP address to process incoming IP traffic. Additionally, by default, Linux accepts to answer ARP requests independently from the incoming interface.
Bridge processing#
After turning an incoming Ethernet frame into a socket buffer, the
network driver transfers the buffer to the netif_receive_skb()
function. The following actions are executed:
- evaluate the XDP receive hooks,
- copy the frame to any registered global or per-device taps
(e.g.
tcpdump
), - evaluate the ingress policy (configured with
tc
), - hand over the frame to the device-specific receive handler, if any,
- hand over the frame to a global or device-specific protocol handler (e.g. IPv4, ARP, IPv6).
For a bridged interface, the kernel has configured a device-specific
receive handler, br_handle_frame()
. This function won’t allow any
additional processing in the context of the incoming interface, except
for STP and LLDP frames, or if “brouting” is
enabled.1 Therefore, the protocol handlers are never
executed in this case.
After a few additional checks, Linux will decide if the frame has to be locally delivered:
- the entry for the target MAC in the FDB is marked for local delivery; or
- the target MAC is a broadcast or a multicast address.
In this case, the frame is passed to the br_pass_frame_up()
function. A VLAN-related check is optionally performed. The socket
buffer is attached to the bridge interface (br0
) instead of the
physical interface (eth0
). It is evaluated by Netfilter and sent
back to netif_receive_skb()
. It will go through the four steps a
second time.
IPv4 processing#
When a device doesn’t have a protocol-independent receive handler, a protocol-specific handler will be used:
# cat /proc/net/ptype Type Device Function 0800 ip_rcv 0011 llc_rcv [llc] 0004 llc_rcv [llc] 0806 arp_rcv 86dd ipv6_rcv
Therefore, if the Ethernet type of the incoming frame is 0x800
, the
socket buffer is handled by ip_rcv()
. Among other things, the three
following steps will happen:
- If the frame destination address is not the MAC address of the incoming interface, not a multicast one, and not a broadcast one, the frame is dropped (“not for us”).
- Netfilter gets a chance to evaluate the packet (in a
PREROUTING
chain). - The routing subsystem will decide the destination of the packet
in
ip_route_input_slow()
: is it a local packet, should it be forwarded, should it be dropped, should it be encapsulated? Notably, the reverse-path filtering is done during this evaluation infib_validate_source()
.
Reverse-path filtering (also known as uRPF, or unicast reverse-path forwarding, RFC 3704) enables Linux to reject traffic on interfaces which it should never have originated: the source address is looked up in the routing tables and if the outgoing interface is different from the current incoming one, the packet is rejected.
ARP processing#
When the Ethernet type of the incoming frame is 0x806
, the socket
buffer is handled by arp_rcv()
.
- As with IPv4, if the frame is not for us, it is dropped.
- If the incoming device has the
NOARP
flag, the frame is dropped. - Netfilter gets a chance to evaluate the packet (configuration is
done with
arptables
). - For an ARP request, the values of
arp_ignore
andarp_filter
may trigger a drop of the packet.
IPv6 processing#
When the Ethernet type of the incoming frame is 0x86dd
, the socket
buffer is handled by ipv6_rcv()
.
- As with IPv4, if the frame is not for us, it is dropped.
- If IPv6 is disabled on the interface, the packet is dropped.
- Netfilter gets a chance to evaluate the packet (in a
PREROUTING
chain). - The routing subsystem will decide the destination of the packet. However, unlike IPv4, there is no reverse-path filtering.2
Workarounds#
There are various methods to fix the situation.
We can completely ignore the bridged interfaces: as long as they are
attached to the bridge, they cannot process any upper layer protocol
(IPv4, IPv6, ARP). Therefore, we can focus on filtering incoming
traffic from br0
.
It should be noted that for IPv4, IPv6, and ARP protocols, the MAC address check can be circumvented by using the broadcast MAC address.
Protocol-independent workarounds#
The four following fixes will indistinctly drop IPv4, ARP, and IPv6 packets.
Using VLAN-aware bridge#
Linux 3.9 introduced the ability to use VLAN filtering on bridge ports. This can be used to prevent any local traffic:3
# ip link set dev br0 type bridge vlan_filtering 1 # bridge vlan del dev br0 vid 1 self # bridge vlan show port vlan ids eth0 1 PVID Egress Untagged eth2 1 PVID Egress Untagged eth3 1 PVID Egress Untagged eth4 1 PVID Egress Untagged br0 None
This is the most efficient method since the frame is dropped directly
in br_pass_frame_up()
.
Using XDP receive hooks#
It’s also possible to drop the bridged frame early after it has been
re-delivered to netif_receive_skb()
by br_pass_frame_up()
. The XDP
hooks attached to an interface are evaluated before taps, ingress
policy, and handlers. A very simple xdp_drop_all.c
program would
look like this:
#include <linux/bpf.h> int main() { return XDP_DROP; }
It can be compiled and installed with:
$ clang -O2 -target bpf -c xdp_drop_all.c -o xdp_drop_all.o $ sudo ip link set dev br0 xdp obj xdp_drop_all.o sec .text
This requires Linux 4.14 and this is the second most efficient method but a bit inconvenient as it needs a compiler.4
Using ingress policy#
The ingress policy of an interface is evaluated just a bit later but before any handler. Therefore, the following commands will ensure no local delivery happens:
# tc qdisc add dev br0 handle ffff: ingress # tc filter add dev br0 parent ffff: matchall action drop
This is the third most efficient method but is more convenient than the previous one for a very tiny additional overhead.
Update (2020-01)
Since Linux 4.2, it is possible to use Netfilter to setup the ingress policy with the same efficiency. This can be done through nftables:
# nft -f - <<EOF > table netdev firewall { > chain ingress { > type filter hook ingress device br0 priority 0; policy drop; > log > } > } > EOF
Using ebtables
#
Just before re-delivering the frame to netif_receive_skb()
,
Netfilter gets a chance to issue a decision. It’s easy to configure it
to drop the frame:
# ebtables -A INPUT --logical-in br0 -j DROP
However, to the best of my knowledge, this part of Netfilter is known to be inefficient.
Using namespaces#
Isolation can also be obtained by moving all the bridged interfaces into a dedicated network namespace and configure the bridge inside this namespace:
# ip netns add bridge0 # ip link set netns bridge0 eth0 # ip link set netns bridge0 eth2 # ip link set netns bridge0 eth3 # ip link set netns bridge0 eth4 # ip link del dev br0 # ip netns exec bridge0 brctl addbr br0 # for i in 0 2 3 4; do > ip netns exec bridge0 brctl addif br0 eth$i > ip netns exec bridge0 ip link set up dev eth$i > done # ip netns exec bridge0 ip link set up dev br0
The frame will still wander a bit inside the IP stack, wasting some CPU cycles and increasing the possible attack surface. But ultimately, it will be dropped.
Update (2023-08)
A more lightweight alternative is to move the bridge into
a VRF with ip link set br0 master vrf0
. The result is similar to the
solution with namespaces, but the configuration is simpler.
Protocol-dependent workarounds#
Unless you require multiple layers of security, if one of the previous workarounds is already applied, there is no need to apply one of the protocol-dependent fix below. It’s still interesting to know them because it is not uncommon to already have them in place.
ARP#
The easiest way to disable ARP processing on a bridge is to set the
NOARP
flag on the device. The ARP packet will be dropped as the very
first step of the ARP handler.
# ip link set arp off dev br0 # ip l l dev br0 8: br0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 50:54:33:00:00:04 brd ff:ff:ff:ff:ff:ff
arptables
can also drop the packet quite early:
# arptables -A INPUT -i br0 -j DROP
Another way is to set arp_ignore
to 2 for the given interface. The
kernel will only answer to ARP requests whose target IP address is
configured on the incoming interface. Since the bridge interface
doesn’t have an IP address, no ARP requests will be answered.
# sysctl -qw net.ipv4.conf.br0.arp_ignore=2
Disabling ARP processing is not a sufficient workaround for IPv4. A user can still insert the appropriate entry in its neighbor cache:
# ip neigh replace 192.168.14.3 lladdr 50:54:33:00:00:04 dev eth0 # ping -c 1 192.168.14.3 PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data. 64 bytes from 192.168.14.3: icmp_seq=1 ttl=49 time=1.30 ms --- 192.168.14.3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.309/1.309/1.309/0.000 ms
Because the check on the target MAC address is quite loose, they don’t even need to guess the MAC address:
# ip neigh replace 192.168.14.3 lladdr ff:ff:ff:ff:ff:ff dev eth0 # ping -c 1 192.168.14.3 PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data. 64 bytes from 192.168.14.3: icmp_seq=1 ttl=49 time=1.12 ms --- 192.168.14.3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.129/1.129/1.129/0.000 ms
IPv4#
The earliest place to drop an IPv4 packet is with Netfilter:5
# iptables -t raw -I PREROUTING -i br0 -j DROP
If Netfilter is disabled, another possibility is to enable strict reverse-path filtering for the interface. In this case, since there is no IP address configured on the interface, the packet will be dropped during the route lookup:
# sysctl -qw net.ipv4.conf.br0.rp_filter=1
Another option is the use of a dedicated routing rule. Compared to the reverse-path filtering option, the packet will be dropped a bit earlier, still during the route lookup.
# ip rule add iif br0 blackhole
IPv6#
Linux provides a way to completely disable IPv6 on a given interface. The packet will be dropped as the very first step of the IPv6 handler:
# sysctl -qw net.ipv6.conf.br0.disable_ipv6=1
As with IPv4, it’s possible to use Netfilter or a dedicated routing rule.
About the example#
In the above example, the virtual host gets ICMP replies because they are routed through the infrastructure network to Internet (e.g. the hypervisor has a default gateway which also acts as a NAT router to Internet). This may not be the case.
If you want to check if you are “vulnerable” despite not getting an ICMP reply, look at the guest neighbor table to check if you got an ARP reply from the host:
# ip route add 192.168.14.3/32 dev eth0 # ip neigh show dev eth0 192.168.14.3 lladdr 50:54:33:00:00:04 REACHABLE
If you didn’t get a reply, you could still have issues with IP processing. Add a static neighbor entry before checking the next step:
# ip neigh replace 192.168.14.3 lladdr ff:ff:ff:ff:ff:ff dev eth0
To check if IP processing is enabled, check the bridge host’s network statistics:
# netstat -s | grep "ICMP messages" 15 ICMP messages received 15 ICMP messages sent 0 ICMP messages failed
If the counters are increasing, it is processing incoming IP packets.
One-way communication still allows a lot of bad things, like DoS attacks. Additionally, if the hypervisor happens to also act as a router, the reach is extended to the whole infrastructure network, potentially exposing weak devices (e.g. PDU) exposing an SNMP agent. If one-way communication is all that’s needed, the attacker can also spoof its source IP address, bypassing IP-based authentication.
About MACVLAN interfaces#
When source-address learning is not needed, bridges can be replaced by MACVLAN or MACVTAP interfaces. Have a look at the following article for more details: “Bridge vs Macvlan.”
In this case, the packet processing is different and asymmetric. When a frame exits the MACVLAN interface, it is directly transmitted to the appropriate interface (either another MACVLAN interface or the lower device). In this direction, communication with the host is not possible.
However, when a frame enters the lower device, if the destination MAC is not one of the sub-interfaces, the frame is processed by the host using the appropriate protocol handler (ARP, IPv4, IPv6). Therefore, isolation can only be obtained by using the protocol-dependent methods.
-
A frame can be forcibly routed (L3) instead of bridged (L2) by “brouting” the packet. This action can be triggered using
ebtables
. ↩︎ -
For IPv6, reverse-path filtering needs to be implemented with Netfilter, using the
rpfilter
match. ↩︎ -
With a kernel older than Linux 4.3, you’ll have to use the following command instead of
ip link
:↩︎# echo 1 > /sys/class/net/br0/bridge/vlan_filtering
-
The above program is so simple it could be written without a compiler:
$ ip -d l l dev br0 | tail -1 prog/xdp id 1 tag 57cd311f2e27366b $ bpftool prog dump xlated id 1 0: (b7) r0 = 1 1: (95) exit $ readelf -x .text xdp_drop_all_kern.o b7000000 01000000 95000000 00000000
However,
ip link
requires an ELF object. ↩︎ -
If the
br_netfilter
module is loaded,net.bridge.bridge-nf-call-ipatbles
sysctl has to be set to 0. Otherwise, you also need to use thephysdev
match to not drop IPv4 packets going through the bridge. ↩︎