Securing BGP on the host with origin validation
Vincent Bernat
An increasingly popular design for a datacenter network is BGP on the host: each host ships with a BGP daemon to advertise the IPs it handles and receives the routes to its fellow servers. Compared to a L2-based design, it is very scalable, resilient, cross-vendor and safe to operate.1 Take a look at “L3 routing to the hypervisor with BGP” for a usage example.
While routing on the host eliminates the security problems related to
Ethernet networks, a server may announce any IP prefix. In the above
picture, two of them are announcing 2001:db8:cc::/64
. This could be
a legit use of anycast or a prefix hijack. BGP offers several
solutions to improve this aspect and one of them is to reuse the
features around the RPKI.
Short introduction to the RPKI#
On the Internet, BGP is mostly relying on trust. This contributes to various incidents due to operator errors, like the one that affected Cloudflare a few months ago, or to malicious attackers, like the hijack of Amazon DNS to steal cryptocurrency wallets. RFC 7454 explains the best practices to avoid such issues.
IP addresses are allocated by five Regional Internet Registries (RIR). Each of them maintains a database of the assigned Internet resources, notably the IP addresses and the associated AS numbers. These databases may not be entirely reliable but are widely used to build ACLs to ensure peers only announce the prefixes they are expected to. Here is an example of ACLs generated by bgpq3 when peering directly with Apple:2
$ bgpq3 -l v6-IMPORT-APPLE -6 -R 48 -m 48 -A -J -E AS-APPLE policy-options { policy-statement v6-IMPORT-APPLE { replace: from { route-filter 2403:300::/32 upto /48; route-filter 2620:0:1b00::/47 prefix-length-range /48-/48; route-filter 2620:0:1b02::/48 exact; route-filter 2620:0:1b04::/47 prefix-length-range /48-/48; route-filter 2620:149::/32 upto /48; route-filter 2a01:b740::/32 upto /48; route-filter 2a01:b747::/32 upto /48; } } }
The RPKI (RFC 6480) adds public-key cryptography on top of it to sign the authorization for an AS to be the origin of an IP prefix. Such record is a Route Origination Authorization (ROA). You can browse the databases of these ROAs through the RIPE’s RPKI Validator instance:
BGP daemons do not have to download the databases or to check digital signatures to validate the received prefixes. Instead, they offload these tasks to a local RPKI validator implementing the “RPKI-to-Router Protocol” (RTR, RFC 6810).
For more details, have a look at “RPKI and BGP: our path to securing Internet Routing.”
Using origin validation in the datacenter#
While it is possible to create our own RPKI for use inside the datacenter, we can take a shortcut and use a validator implementing RTR, like GoRTR, and accepting another source of truth. Let’s work on the following topology:
You assume we have a place to maintain a mapping between the private AS numbers used by each host and the allowed prefixes:3
ASN | Allowed prefixes |
---|---|
AS 65005 | 2001:db8:aa::/64 |
AS 65006 | 2001:db8:bb::/64 ,2001:db8:11::/64 |
AS 65007 | 2001:db8:cc::/64 |
AS 65008 | 2001:db8:dd::/64 |
AS 65009 | 2001:db8:ee::/64 ,2001:db8:11::/64 |
AS 65010 | 2001:db8:ff::/64 |
From this table, we build a JSON file for GoRTR, assuming each host
can announce the provided prefixes or longer ones (like
2001:db8:aa::42:d9ff:fefc:287a/128
for AS 65005):
{ "roas": [ { "prefix": "2001:db8:aa::/64", "maxLength": 128, "asn": "AS65005" }, { "…": "…" }, { "prefix": "2001:db8:ff::/64", "maxLength": 128, "asn": "AS65010" }, { "prefix": "2001:db8:11::/64", "maxLength": 128, "asn": "AS65006" }, { "prefix": "2001:db8:11::/64", "maxLength": 128, "asn": "AS65009" } ] }
This file is deployed to all validators and served by a web server. GoRTR is configured to fetch it and update it every 10 minutes:
$ gortr -refresh=600 \ > -verify=false -checktime=false \ > -cache=http://127.0.0.1/rpki.json INFO[0000] New update (7 uniques, 8 total prefixes). 0 bytes. Updating sha256 hash -> 68a1d3b52db8d654bd8263788319f08e3f5384ae54064a7034e9dbaee236ce96 INFO[0000] Updated added, new serial 1
The refresh time could be lowered but GoRTR can be notified of an
update using the SIGHUP
signal. Clients are immediately notified of
the change.
The next step is to configure the leaf routers to validate the received prefixes using the farm of validators. Most vendors support RTR:
Platform | Over TCP? | Over SSH? |
---|---|---|
Juniper Junos | ✔️ | ❌ |
Cisco IOS XR | ✔️ | ✔️ |
Cisco IOS XE | ✔️ | ❌ |
Cisco IOS | ✔️ | ❌ |
Arista EOS | ✔️ | ❌ |
BIRD | ✔️ | ✔️ |
FRR | ✔️ | ✔️ |
GoBGP | ✔️ | ❌ |
Configuring Junos#
Junos only supports plain-text TCP. First, let’s configure the connections to the validation servers:
routing-options { validation { group RPKI { session validator1 { hold-time 60; # session is considered down after 1 minute record-lifetime 3600; # cache is kept for 1 hour refresh-time 30; # cache is refreshed every 30 seconds port 8282; } session validator2 { /* OMITTED */ } session validator3 { /* OMITTED */ } } } }
By default, at most two sessions are randomly established at the same time. This provides a good way to load-balance them among the validators while maintaining good availability. The second step is to define the policy for route validation:
policy-options { policy-statement ACCEPT-VALID { term valid { from { protocol bgp; validation-database valid; } then { validation-state valid; accept; } } term invalid { from { protocol bgp; validation-database invalid; } then { validation-state invalid; reject; } } } policy-statement REJECT-ALL { then reject; } }
The policy statement ACCEPT-VALID
turns the validation state of a
prefix from unknown
to valid
if the ROA database says it is valid.
It also accepts the route. If the prefix is invalid, the prefix is
marked as such and rejected. We have also prepared a REJECT-ALL
statement to reject everything else, notably unknown prefixes.
A ROA only certifies the origin of a prefix. A malicious actor can
therefore prepend the expected AS number to the AS path to circumvent
the validation. For example, AS 65007 could annonce
2001:db8:dd::/64
, a prefix allocated to AS 65006, by advertising it
with the AS path 65007 65006
. To avoid that, we define an additional
policy statement to reject AS paths with more than one ASN:4
policy-options { as-path EXACTLY-ONE-ASN "^.$"; policy-statement ONLY-DIRECTLY-CONNECTED { term exactly-one-asn { from { protocol bgp; as-path EXACTLY-ONE-ASN; } then next policy; } then reject; } }
The last step is to configure the BGP sessions:
protocols { bgp { group HOSTS { local-as 65100; type external; # export [ … ]; import [ ONLY-DIRECTLY-CONNECTED ACCEPT-VALID REJECT-ALL ]; enforce-first-as; neighbor 2001:db8:42::a10 { peer-as 65005; } neighbor 2001:db8:42::a12 { peer-as 65006; } neighbor 2001:db8:42::a14 { peer-as 65007; } } } }
The import policy rejects any AS path longer than one AS, accepts any
validated prefix and rejects everything else. The enforce-first-as
directive is also pretty important: it ensures the first (and, here,
only) AS in the AS path matches the peer AS. Without it, a malicious
neighbor could inject a prefix using an AS different than its own,
defeating our purpose.5
Let’s check the state of the RTR sessions and the database:
> show validation session Session State Flaps Uptime #IPv4/IPv6 records 2001:db8:4242::10 Up 0 00:16:09 0/9 2001:db8:4242::11 Up 0 00:16:07 0/9 2001:db8:4242::12 Connect 0 0/0 > show validation database RV database for instance master Prefix Origin-AS Session State Mismatch 2001:db8:11::/64-128 65006 2001:db8:4242::10 valid 2001:db8:11::/64-128 65006 2001:db8:4242::11 valid 2001:db8:11::/64-128 65009 2001:db8:4242::10 valid 2001:db8:11::/64-128 65009 2001:db8:4242::11 valid 2001:db8:aa::/64-128 65005 2001:db8:4242::10 valid 2001:db8:aa::/64-128 65005 2001:db8:4242::11 valid 2001:db8:bb::/64-128 65006 2001:db8:4242::10 valid 2001:db8:bb::/64-128 65006 2001:db8:4242::11 valid 2001:db8:cc::/64-128 65007 2001:db8:4242::10 valid 2001:db8:cc::/64-128 65007 2001:db8:4242::11 valid 2001:db8:dd::/64-128 65008 2001:db8:4242::10 valid 2001:db8:dd::/64-128 65008 2001:db8:4242::11 valid 2001:db8:ee::/64-128 65009 2001:db8:4242::10 valid 2001:db8:ee::/64-128 65009 2001:db8:4242::11 valid 2001:db8:ff::/64-128 65010 2001:db8:4242::10 valid 2001:db8:ff::/64-128 65010 2001:db8:4242::11 valid IPv4 records: 0 IPv6 records: 18
Here is an example of accepted route:
> show route protocol bgp table inet6 extensive all inet6.0: 11 destinations, 11 routes (8 active, 0 holddown, 3 hidden) 2001:db8:bb::42/128 (1 entry, 0 announced) *BGP Preference: 170/-101 Next hop type: Router, Next hop index: 0 Address: 0xd050470 Next-hop reference count: 4 Source: 2001:db8:42::a12 Next hop: 2001:db8:42::a12 via em1.0, selected Session Id: 0x0 State: <Active NotInstall Ext> Local AS: 65006 Peer AS: 65000 Age: 12:11 Validation State: valid Task: BGP_65000.2001:db8:42::a12+179 AS path: 65006 I Accepted Localpref: 100 Router ID: 1.1.1.1
A rejected route would be similar with the reason “rejected by import
policy” shown in the details and the validation state would be
invalid
.
Configuring BIRD#
BIRD supports both plain-text TCP and SSH. Let’s configure it to use
SSH. We need to generate keypairs for both the leaf router and the
validators (they can all share the same keypair). We also have to
create a known_hosts
file for BIRD:
(validatorX)$ ssh-keygen -qN "" -t rsa -f /etc/gortr/ssh_key (validatorX)$ echo -n "validatorX:8283 " ; \ > cat /etc/bird/ssh_key_rtr.pub validatorX:8283 ssh-rsa AAAAB3[…]Rk5TW0= (leaf1)$ ssh-keygen -qN "" -t rsa -f /etc/bird/ssh_key (leaf1)$ echo 'validator1:8283 ssh-rsa AAAAB3[…]Rk5TW0=' >> /etc/bird/known_hosts (leaf1)$ echo 'validator2:8283 ssh-rsa AAAAB3[…]Rk5TW0=' >> /etc/bird/known_hosts (leaf1)$ cat /etc/bird/ssh_key.pub ssh-rsa AAAAB3[…]byQ7s= (validatorX)$ echo 'ssh-rsa AAAAB3[…]byQ7s=' >> /etc/gortr/authorized_keys
GoRTR needs additional flags to allow connections over SSH:
$ gortr -refresh=600 -verify=false -checktime=false \ > -cache=http://127.0.0.1/rpki.json \ > -ssh.bind=:8283 \ > -ssh.key=/etc/gortr/ssh_key \ > -ssh.method.key=true \ > -ssh.auth.user=rpki \ > -ssh.auth.key.file=/etc/gortr/authorized_keys INFO[0000] Enabling ssh with the following authentications: password=false, key=true INFO[0000] New update (7 uniques, 8 total prefixes). 0 bytes. Updating sha256 hash -> 68a1d3b52db8d654bd8263788319f08e3f5384ae54064a7034e9dbaee236ce96 INFO[0000] Updated added, new serial 1
Then, we can configure BIRD to use these RTR servers:
roa6 table ROA6; template rpki VALIDATOR { roa6 { table ROA6; }; transport ssh { user "rpki"; remote public key "/etc/bird/known_hosts"; bird private key "/etc/bird/ssh_key"; }; refresh keep 30; retry keep 30; expire keep 3600; } protocol rpki VALIDATOR1 from VALIDATOR { remote validator1 port 8283; } protocol rpki VALIDATOR2 from VALIDATOR { remote validator2 port 8283; }
Unlike Junos, BIRD doesn’t have a feature to only use a subset of validators. Therefore, we only configure two of them. As a safety measure, if both connections become unavailable, BIRD will keep the ROAs for one hour.
We can query the state of the RTR sessions and the database:
> show protocols all VALIDATOR1 Name Proto Table State Since Info VALIDATOR1 RPKI --- up 17:28:56.321 Established Cache server: rpki@validator1:8283 Status: Established Transport: SSHv2 Protocol version: 1 Session ID: 0 Serial number: 1 Last update: before 25.212 s Refresh timer : 4.787/30 Retry timer : --- Expire timer : 3574.787/3600 No roa4 channel Channel roa6 State: UP Table: ROA6 Preference: 100 Input filter: ACCEPT Output filter: REJECT Routes: 9 imported, 0 exported, 9 preferred Route change stats: received rejected filtered ignored accepted Import updates: 9 0 0 0 9 Import withdraws: 0 0 --- 0 0 Export updates: 0 0 0 --- 0 Export withdraws: 0 --- --- --- 0 > show route table ROA6 Table ROA6: 2001:db8:11::/64-128 AS65006 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:11::/64-128 AS65009 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:aa::/64-128 AS65005 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:bb::/64-128 AS65006 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:cc::/64-128 AS65007 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:dd::/64-128 AS65008 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:ee::/64-128 AS65009 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100) 2001:db8:ff::/64-128 AS65010 [VALIDATOR1 17:28:56.333] * (100) [VALIDATOR2 17:28:56.414] (100)
Like for the Junos case, a malicious actor could try to workaround the validation by building an AS path where the last AS number is the legitimate one. BIRD is flexible enough to allow us to use any AS to check the IP prefix. Instead of checking the origin AS, we ask it to check the peer AS with this function, without looking at the AS path:
function validated(int peeras) { if (roa_check(ROA6, net, peeras) != ROA_VALID) then { print "Ignore invalid ROA ", net, " for ASN ", peeras; reject; } accept; }
The BGP instance is then configured to use the above function as the import policy:
protocol bgp PEER1 { local as 65100; neighbor 2001:db8:42::a10 as 65005; connect delay time 30; ipv6 { import keep filtered; import where validated(65005); # export …; }; }
You can view the rejected routes with show route filtered
, but
BIRD does not store information about the validation state in the
routes. You can also watch the logs:
2019-07-31 17:29:08.491 <INFO> Ignore invalid ROA 2001:db8:bb::40:/126 for ASN 65005
Currently, BIRD does not reevaluate the filters when the ROAs
are updated. There is work in progress to fix this. If this
feature is important to you, have a look at FRR instead: it also
supports the RTR protocol and triggers a soft reconfiguration of the
BGP sessions when ROAs are updated.
Update (2021-03)
From version 2.0.8, BIRD reevaluates the
filters when the ROAs are updated. You need to replace import keep
filtered
with import table yes
in the BGP instance configuration.
You can also drop the connect delay time
directive in the proposed
configuration. Its purpose was to ensure the ROAs are loaded before
the BGP connection is established.
-
Notably, the data flow and the control plane are separated. A node can remove itself by notifying its peers without losing a single packet. ↩︎
-
People often use AS sets, like
AS-APPLE
in this example, as they are convenient if you have multiple AS numbers or customers. However, there is currently nothing preventing a rogue actor to add arbitrary AS numbers to their AS set. ↩︎ -
We are using 16-bit AS numbers for readability. Because we need to assign a different AS number for each host in the datacenter, in an actual deployment, we would use 32-bit AS numbers. ↩︎
-
This restriction also prevents the peer from prepending its own ASN to deprioritize a path. A modern alternative is to use the graceful shutdown community. ↩︎
-
Cisco routers and FRR enforce the first AS by default. It is a tunable value to allow the use of route servers: they distribute prefixes on behalf of other routers. ↩︎