DIY Linux Router Part 8: Netflow / IPFIX

DIY Linux Router Part 8: Netflow / IPFIX

The following is the eighth part of a multipart series describing how I build (software not hardware) my own Linux router from scratch, based on Debian 11.

By now, we already have a fully functional router, so the next parts are optional. In this part we configure NetFlow, which allows us to see all network connections and traffic flows going through our router. This configuration requires Elasticsearch but setting this up is not part of this blog post.

NetFlow, or in our case IPFIX which is based on the latest NetFlow version 9, is a protocol which can send traffic flows and network connections to an external software for visualizing and analyzing. This includes source and destination IP addresses and ports, bytes transmitted, and other layer 3 and 4 related information.

To do this we will use an iptables module to create the logs, Filebeat to accept the logs and Elasticsearch to visualize them.

Even though we use IPFIX, I will keep calling it NetFlow since this seems to be the more common name.

Iptables module

Iptables does not come with NetFlow capabilities out of the box, but this can be added via a kernel module which is includes in the Debian package repository.

The module is called ipt-netflow and the documentation can be found on their GitHub: https://github.com/aabc/ipt-netflow

To install and load the module run this as root:

apt install iptables-netflow-dkms
modprobe ipt_NETFLOW

Then we can configure the module. Create the file /etc/modprobe.d/ipt_NETFLOW.conf and add the following line:

options ipt_NETFLOW destination=127.0.0.1:2055 protocol=10 natevents=1 inactive_timeout=15 active_timeout=15
/etc/modprobe.d/ipt_NETFLOW.conf
  • The destination will we the software that collects the logs, for us this is a local Filebeat which we will configure after this.
  • The protocol specifies the NetFlow version. 10 means IPFIX since it is the based on NetFlow version 9.
  • natevents will allow the logging of NAT events like a replaced source address.
  • inactive_timeout is the time in seconds the modules waits before sending finished flows (downloads, uploads, ...).
  • active_timeout is the time in seconds the modules waits before sending still running flows. The default is 30 minutes. If you have a download which took one minute, the default will only create one log event at the end which includes the full amount of data transmitted. Setting it to 15 seconds will create four log events each containing the same network information except it will split the transmitted data over smaller chunks. This way we can get a live view of running downloads. But this will also increase the number of logs generated.

Now we create some iptables rules to specify which flows we want to log. I want to log all flows, so we use the following rules:

    [...]
    
    iptables -I INPUT -j NETFLOW
    iptables -I OUTPUT -j NETFLOW
    iptables -I FORWARD -j NETFLOW
    
    # Allow localhost
    [...]
/usr/local/sbin/firewall.sh

We put these rules into our base firewall script from https://www.sherbers.de/diy-linux-router-part-4-firewall-and-port-forwards/. Put the rules above the "# Allow localhost". These should be the first rules in our iptables chains, so no network events are rejected or accepted before reaching them.

Filebeat

I assume you already have a working Filebeat agent on your device, since setting up Filebeat and Elasticsearch is out of scope for this blog post. You can then enable the Filebeat NetFlow module by adding this to the configuration file:

- module: netflow
  log:
    enabled: true
    var:
      netflow_host: 127.0.0.1
      netflow_port: 2055
/etc/filebeat/filebeat.yml

That's it for Filebeat. We now get NetFlow Events into our filebeat-* index. This is an example for what a single event can look like:

Elasticsearch / Filebeat add some extra fields that are not included in the original NetFlow event, like Geo Coordinates and AS Names for IP addresses.

Storage requirements

On my router this generates around 20 GB of logs per month. If you want to reduce this increase the active_timeout setting.

Elasticsearch

We now have all NetFlow event logs in our Elasticsearch and can start creating some visualizations. I created a dashboard and created my visualizations, using the Lens tool, directly there. Now follows a few examples of the visualizations I created and the settings needed. With this you should be able to create the other visualizations as well.

All visualizations have one filter in common: event.module: netflow to filter out all non NetFlow logs inside our filebeat-* index.

Traffic inbound per AS

Filter: network.direction: inbound
Type: Table

Traffic inbound

Filter: network.direction: inbound
Type: Bar vertical stacked

Traffic inbound per IP

This one is my favorite. I can see all my clients and the inbound (download) traffic they generate over time.

Filter: network.direction: inbound
Type: Line

Full dashboard

The DNS queries and Blocked ad domains visualizations were created from the log events of my unbound DNS server that I also send to my Elasticsearch.