Here are the slides I presented for a ClickHouse SF Bay Area Meetup in July 2022, hosted by Altinity. They are about Akvorado, a network flow collector and visualizer, and notably on how it relies on ClickHouse, a column-oriented database.
I got a few questions about how to get information from the higher layers, like HTTP. As my use case for Akvorado was at the network edge, my answers were mostly negative. However, as sFlow is extensible, when collecting flows from Linux servers instead, you could embed additional data and they could be exported as well.
I also got a question about doing aggregation in a single table.
ClickHouse can aggregate automatically data using TTL. My answer for
not doing that is partial. There is another reason: the retention
periods of the various tables may overlap. For example, the main table
keeps data for 15 days, but even in these 15 days, if I do a query on
a 12-hour window, it is faster to use the
table, unless I request something about ports and IP addresses.
To generate the subtitles, I have used Amazon Transcribe, the speech-to-text solution from Amazon AWS. Unfortunately, there is no
en-FRlanguage available, which would have been useful for my
terribleaccent. While the subtitles were 100% accurate when the host, Robert Hodge from Altinity, was speaking, the success rate on my talk was quite lower. I had to rewrite almost all sentences. However, using speech-to-text is still useful to get the timings, as it is also something requiring a lot of work to do manually. ↩︎