ClickHouse SF Bay Area Meetup: Akvorado
Here are the slides I presented for a ClickHouse SF Bay Area Meetup in July 2022, hosted by Altinity. They are about Akvorado, a network flow collector and visualizer, and notably on how it relies on ClickHouse, a column-oriented database.
The meetup was recorded and available on YouTube. Here is the part relevant to my presentation, with subtitles:1
I got a few questions about how to get information from the higher layers, like HTTP. As my use case for Akvorado was at the network edge, my answers were mostly negative. However, as sFlow is extensible, when collecting flows from Linux servers instead, you could embed additional data and they could be exported as well.
I also got a question about doing aggregation in a single table.
ClickHouse can automatically aggregate data using TTL. My answer for
not doing that is partial. There is another reason: the retention
periods of the various tables may overlap. For example, the main table
keeps data for 15 days, but even in these 15 days, if I do a query on
a 12-hour window, it is faster to use the flows_1m0s aggregated
table, unless I request something about ports and IP addresses.