October 15, 2022

HAQQ validator monitoring, notification and protection system

This article will tell you how to set up the cosmos node monitoring, the alert system, as well as how to improve your security.
The guide is written on the example of the HAQQ project.


This guide will consist of two parts:

  1. PanicBot
    1. Validator Server
    2. Monitoring Server
  2. Grafana + Prometheus
    1. Validator Server
    2. Monitoring Server

PANIC Bot

PANIC is an open source monitoring and alerting solution for Cosmos-SDK, Substrate and Chainlink based nodes by Simply VC. The tool was built with user-friendliness in mind, and comes with numerous features such as phone calls for critical alerts, a UI Dashboard, a Web-UI installation process and Telegram/Slack commands for increased control over your alerts.

If you want to dive into the design and feature set of PANIC click here.

Official guide – https://github.com/SimplyVC/panic.

Validator Server

Open RPC endpoint

Change node config (in our case - HAQQ). For other projects – use specific paths (usually – $HOME/.project/config/config.toml)

nano ~/.haqqd/config/config.toml

In block RPC Server Configuration Options for the RPC server to listen on, change laddr value

laddr = "tcp://0.0.0.0:26657"

Install node-exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.*-amd64.tar.gz
sudo mv node_exporter-*.*-amd64/node_exporter /usr/local/bin/
rm node_exporter-* -rf
sudo useradd -rs /bin/false node_exporter

Then we need to create a systemd service for our app:

sudo tee <<EOF >/dev/null /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Monitoring Server

Check if Python3 is available on your server. Python is preinstalled in most distributions.

python3 --version

Install dependencies:

sudo apt-get install python3-pip redis-server -y
sudo pip3 install pipenv
sudo systemctl enable redis-server.service

Create user and directories for PANIC, clone repository:

adduser panic_alerter
mkdir /opt/panic_alerter
chown -R panic_alerter:panic_alerter /opt/panic_alerter
su panic_alerter
cd /opt/panic_alerter
git clone https://github.com/SimplyVC/panic_cosmos.git
cd panic_cosmos
git checkout master

Run PANIC installer

pipenv update
pipenv run python run_setup.py

During the interactive setup, you will be prompted to enter a set of data.
First one - unique identifier:

Next step - setting up alert. Let's take the setting of alerts in Telegram as an example.
When prompted to configure them, press Y.

Create a bot and get the necessary data can be here, in this guide will not be a detailed explanation. In the screenshot you can see an example of a test alert

Setting up monitoring

Also during installation, the script will prompt us to enter the data of the node we want to monitor.

Create Service file

printf "[Unit]
Description=P.A.N.I.C.
After=network.target
StartLimitIntervalSec=0

[Service]
Type=simple
Restart=always
User=panic_alerter
TimeoutStopSec=90s
WorkingDirectory=/opt/panic_alerter/panic_cosmos
ExecStart=/usr/local/bin/pipenv run python /opt/panic_alerter/panic_cosmos/run_alerter.py

[Install]
WantedBy=multi-user.target" > /etc/systemd/system/panic.service

Run created service

sudo systemctl daemon-reload
sudo systemctl enable panic
sudo systemctl start panic
sudo journalctl -u panic -f

Example

For demonstration purposes, I stopped one of the nodes. The result is in the screenshots below.


Grafana + Prometheus

For monitoring we will use Grafana + Prometheus. We will need a separate server to configure monitoring and, accordingly, a server with a validator node.

A number of changes need to be made both on the server side with the validator and on the server side that serves as the monitoring center of our system.

First, let's set them up.

Validator Server

Enable Tendermint Metrics

sed -i 's/prometheus = false/prometheus = true/g' /root/.haqqd/config/config.toml

After restarting your node, you should be able to access the tendermint metrics (default port is 26660): http://localhost:26660

Make sure the following ports are open:

  • 9100 (node-exporter)
  • 9300 (cosmos-exporter)
  • 26660 (validator prometheus)

Install cosmos-exporter

wget https://github.com/solarlabsteam/cosmos-exporter/releases/download/v0.3.0/cosmos-exporter_0.3.0_Linux_x86_64.tar.gz
tar xvfz cosmos-exporter*
sudo cp ./cosmos-exporter /usr/bin
rm cosmos-exporter* -rf
sudo useradd -rs /bin/false cosmos_exporter

Set up network variables:

BOND_DENOM=aISLM
BENCH_PREFIX=haqq
RPC_PORT=26657
GRPC_PORT=9090

Then we need to create a systemd service for our app:

sudo tee <<EOF >/dev/null /etc/systemd/system/cosmos-exporter.service
[Unit]
Description=Cosmos Exporter
After=network-online.target

[Service]
User=cosmos_exporter
Group=cosmos_exporter
TimeoutStartSec=0
CPUWeight=95
IOWeight=95
ExecStart=cosmos-exporter --denom ${BOND_DENOM} --denom-coefficient 1000000 --bech-prefix ${BENCH_PREFIX} --tendermint-rpc http://localhost:${RPC_PORT} --node localhost:${GRPC_PORT}
Restart=always
RestartSec=2
LimitNOFILE=800000
KillSignal=SIGTERM

[Install]
WantedBy=multi-user.target
EOF

Then we'll add this service to the autostart and run it:

sudo systemctl enable cosmos-exporter
sudo systemctl start cosmos-exporter
sudo systemctl status cosmos-exporter # validate it's running

Install node-exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.*-amd64.tar.gz
sudo mv node_exporter-*.*-amd64/node_exporter /usr/local/bin/
rm node_exporter-* -rf
sudo useradd -rs /bin/false node_exporter

Then we need to create a systemd service for our app:

sudo tee <<EOF >/dev/null /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable cosmos-exporter
sudo systemctl start cosmos-exporter
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Monitoring Server

Prometheus

Download the latest version of Prometheus with the WGET command:

mkdir -p /tmp/prometheus
cd /tmp/prometheus
curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest | grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -qi -
tar xvf prometheus*.tar.gz -C . --strip-components=1

Afterwards, we move the Prometheus and Promtool files from the Prometheus folder to /usr/local/bin

cd /tmp/prometheus
mv prometheus promtool /usr/local/bin/
sudo mkdir /etc/prometheus
mv prometheus.yml /etc/prometheus/prometheus.yml
mv consoles/ console_libraries/ /etc/prometheus/

Change default /etc/prometheus/prometheus.yml to

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: prometheus
    metrics_path: /metrics
    static_configs:
      - targets: [localhost:9090]
  - job_name: cosmos
    metrics_path: /metrics
    static_configs: []
  - job_name: node
    metrics_path: /metrics
    static_configs: []
  - job_name: validators
    metrics_path: /metrics/validators
    static_configs: []
  - job_name: validator
    metrics_path: /metrics/validator
    relabel_configs:
      - source_labels: [address]
        target_label: __param_address
    static_configs: []
  - job_name: wallet
    metrics_path: /metrics/wallet
    relabel_configs:
    - source_labels: [address]
      target_label: __param_address
    static_configs: []

Create a Prometheus user, required directories, and make Prometheus the user as the owner of those directories

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /var/lib/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool

Let's set up Prometheus. For the sake of simplicity, we create the files for the systemd service:

sudo tee /etc/systemd/system/prometheus.service<<EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP \$MAINPID
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.external-url=
SyslogIdentifier=prometheus
Restart=always
[Install]
WantedBy=multi-user.target
EOF

Configure prometheus to listen to validator metrics

sudo apt install jq -y
sudo apt install python3-pip -y
sudo pip install yq

Create script add_validator.sh with content:

yq -i -y '.scrape_configs[] |= (.job_name as $name | .static_configs += if $name == "prometheus" then [] else ([{targets:["'$1'" + (if $name == "node" then ":9100" elif $name == "cosmos" then ":'$2'" else ":9300" end)], labels:(if $name == "validator" then {address: "'$3'"} elif $name == "wallet" then {address: "'$4'"} elif $name == "node" then {instance: "'$5'"} else {} end)}]) end)' /etc/prometheus/prometheus.yml

Give rights for script execution:

chmod +x add_validator.sh

Run created script with our validator data:

./add_validator.sh VALIDATOR_IP PROM_PORT VALOPER_ADDRESS WALLET_ADDRESS PROJECT_NAME

Example with actual data:

./add_validator.sh 1.2.3.4 26660 haqqvaloper1ul80t0j9kg5qjum9at7tpuu7w9eu3rd0x7hwxy haqq1ul80t0j9kg5qjum9at7tpuu7w9eu3rd02umqz9 cosmos

Reload the systemd service to register the prometheus service and start the prometheus service.

sudo systemctl daemon-reload
sudo systemctl start prometheus

Grafana

Grafana is an open source interactive data-visualization platform, developed by Grafana Labs, which allows users to see their data via charts and graphs that are unified into one dashboard (or multiple dashboards!) for easier interpretation and understanding.

Grafana

The official installation guide can be found here.

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
sudo wget -q -O /usr/share/keyrings/grafana.key https://packages.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://packages.grafana.com/enterprise/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana-enterprise

Now you need to start the grafana server with systemd.

sudo systemctl daemon-reload
sudo systemctl start grafana-server

Configure the Grafana server to start at boot.

sudo systemctl enable grafana-server.service

Now let's configure the Grafana interface. Open in browser http://<YOUR-HOST-IP>:3000

Grafana login

Your username is admin and your password is admin. If you don’t see the screen below, then your firewall is probably not open on port 3000 or you made a mistake somewhere in the previous steps.

Go back to the Grafana page and then click on Configuration and then Data Sources

Grafana data sources

Now click on Add data source, then select the Prometheus data source.

Prometheus data source

Now enter the IP address with port 9090. If you decided to run Prometheus and Grafana on the same server, that’s fine. Go ahead with http://localhost:9090

Prometheus data source

Scroll to the bottom and click the Save & test button. Now hover on Dashboards and then click Import

Import dashboard by ID 15991. Now you can see a dashboard with a lot of useful information.

At the top of the filters you can specify the network (if you centrally set up monitoring for multiple validators/chains)