This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Backup

Desgin docs for Backup

Some of these date back to older versions but efforts are made to keep the most important - sometimes :)

LambdaStack backup design document

Affected version: 0.4.x

Goals

Provide backup functionality for LambdaStack - cluster created using lambdastack tool.

Backup will cover following areas:

  1. Kubernetes cluster backup

    1.1 etcd database

    1.2 kubeadm config

    1.3 certificates

    1.4 persistent volumes

    1.5 applications deployed on the cluster

  2. Kafka backup

    2.1 Kafka topic data

    2.2 Kafka index

    2.3 Zookeeper settings and data

  3. Elastic stack backup

    3.1 Elasticsearch data

    3.2 Kibana settings

  4. Monitoring backup

    4.1 Prometheus data

    4.2 Prometheus settings (properties, targets)

    4.3 Alertmanager settings

    4.4 Grafana settings (datasources, dashboards)

  5. PostgreSQL backup

    5.1 All databases from DB

  6. RabbitMQ settings and user data

  7. HAProxy settings backup

Use cases

User/background service/job is able to backup whole cluster or backup selected parts and store files in desired location. There are few options possible to use for storing backup:

  • S3
  • Azure file storage
  • local file
  • NFS

Application/tool will create metadata file that will be definition of the backup - information that can be useful for restore tool. This metadata file will be stored within backup file.

Backup is packed to zip/gz/tar.gz file that has timestamp in the name. If name collision occurred name+'_1' will be used.

Example use

lsbackup -b /path/to/build/dir -t /target/location/for/backup

Where -b is path to build folder that contains Ansible inventory and -t contains target path to store backup.

Backup Component View

LambdaStack backup component

User/background service/job executes lsbackup (code name) application. Application takes parameters:

  • -b: build directory of existing cluster. Most important is ansible inventory existing in this directory - so it can be assumed that this should be folder of Ansible inventory file.
  • -t: target location of zip/tar.gz file that will contain backup files and metadata file.

Tool when executed looks for the inventory file in -b location and executes backup playbooks. All playbooks are optional, in MVP version it can try to backup all components (it they exists in the inventory). After that, some components can be skipped (by providing additional flag, or parameter to cli).

Tool also produces metadata file that describes backup with time, backed up components and their versions.

1. Kubernetes cluster backup

There are few ways of doing backups of existing Kuberntes cluster. Going to take into further research two approaches.

First: Backup etcd database and kubeadm config of single master node. Instruction can be found here. Simple solution for that will backup etcd which contains all workload definitions and settings.

Second: Use 3rd party software to create a backup like Heptio Velero - Apache 2.0 license, Velero GitHub

2. Kafka backup

Possible options for backing up Kafka broker data and indexes:

  1. Mirror using Kafka Mirror Maker. It requires second Kafka cluster running independently that will replicate all data (including current offset and consumer groups). It is used mostly for multi-cloud replication.

  2. Kafka-connect – use Kafka connect to get all topic and offset data from Kafka an save to it filesystem (NFS, local, S3, ...) called Sink connector.

    2.1 Confluent Kafka connector – that use Confluent Kafka Community License Agreement
    2.2 Use another Open Source connector like kafka-connect-s3 (BSD) or kafka-backup (Apache 2.0)

  3. File system copy: take Kafka broker and ZooKeeper data stored in files and copy it to backup location. It requires Kafka Broker to be stopped. Solution described in Digital Ocean post.

3. Elastic stack backup

Use built-in features of Elasticsearch to create backup like:

PUT /_snapshot/my_unverified_backup?verify=false
{
  "type": "fs",
  "settings": {
    "location": "my_unverified_backup_location"
  }
}

More information can be found here.

OpenDistro uses similar way of doing backups - it should be compatible. OpenDistro backups link.

4. Monitoring backup

Prometheus from version 2.1 is able to create data snapshot by doing HTTP request:

curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot

Snapshot will be created in <data-dir>/snapshots/SNAPSHOT-NAME-RETURNED-IN-RESPONSE

More info

Files like targets and Prometheus/AlertManager settings should be also copied to backup location.

5. PostgreSQL backup

Relational DB backup mechanisms are the most mature ones. Simplest solution is to use standard PostgreSQL backup funtions. Valid option is also to use pg_dump.

6. RabbitMQ settings and user data

RabbitMQ has standard way of creating backup.

7. HAProxy settings backup

Copy HAProxy configuration files to backup location.

1 - Operational

Desgin docs for Backup Operational

LambdaStack backup design document with details

Affected version: 0.7.x

Goals

This document is extension of high level design doc: LambdaStack backup design document and describes more detailed, operational point-of-view of this case. Document does not include Kubernetes and Kafka stack

Components

lsbackup application

Example use:

lambdastack backup -b build_dir -t target_path

Where -b is path to build folder that contains Ansible inventory and -t contains target path to store backup.

backup runs tasks from ansible backup role

build_dir contains cluster's ansible inventory

target_path location to store backup, see Storage section below.

Consider to add disclaimer for user to check whether backup location has enough space to store whole backup.

Storage

Location created on master node to keep backup files. This location might be used to mount external storage, like:

  • Amazon S3
  • Azure blob
  • NFS
  • Any external disk mounted by administrator

In cloud configuration blob or S3 storage might be mounted directly on every machine in cluster and can be configured by LambdaStack. For on-prem installation it's up to administrator to attach external disk to backup location on master node. This location should be shared with other machines in cluster as NFS.

Backup scripts structure:

Role backup

Main role for backup contains ansible tasks to run backups on cluster components.

Tasks:

  1. Elasticsearch & Kibana

    1.1. Create local location where snapshot will be stored: /tmp/snapshots 1.2. Update elasticsearch.yml file with backup location

     ```bash
     path.repo: ["/tmp/backup/elastic"]
     ```
    

    1.3. Reload configuration 1.4. Register repository:

    curl -X PUT "https://host_ip:9200/_snapshot/my_backup?pretty" \n
    -H 'Content-Type: application/json' -d '
    {
        "type": "fs",
        "settings": {
        "location": "/tmp/backup/elastic"
        }
    }
    '
    

    1.5. Take snapshot:

    curl -X GET "https://host_ip:9200/_snapshot/my_repository/1" \n 
    -H 'Content-Type: application/json'
    

    This command will create snapshot in location sent in step 1.2

    1.5. Backup restoration:

    curl -X POST "https://host_ip:9200/_snapshot/my_repository/2/_restore" -H 'Content-Type: application/json'
    

    Consider options described in opendistro documentation

    1.6. Backup configuration files:

    /etc/elasticsearch/elasticsearch.yml
    /etc/kibana/kibana.yml
    
  2. Monitoring

    2.1.1 Prometheus data

    Prometheus delivers solution to create data snapshot. Admin access is required to connect to application api with admin privileges. By default admin access is disabled, and needs to be enabled before snapshot creation. To enable admin access --web.enable-admin-api needs to be set up while starting service:

    service configuration:
    /etc/systemd/system/prometheus.service
    
    systemctl daemon-reload
    systemctl restart prometheus
    

    Snapshot creation:

    curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot
    

    By default snapshot is saved in data directory, which is configured in Prometheus service configuration file as flag:

    --storage.tsdb.path=/var/lib/prometheus
    

    Which means that snapshot directory is creted under:

    /var/lib/prometheus/snapshots/yyyymmddThhmmssZ-*
    

    After snapshot admin access throuh API should be reverted.

    Snapshot restoration process is just pointing --storage.tsdb.path parameter to snaphot location and restart Prometheus.

    2.1.2. Prometheus configuration

    Prometheus configurations are located in:

    /etc/prometheus
    

    2.2. Grafana backup and restore

    Copy files from grafana home folder do desired location and set up correct permissions:

    location: /var/lib/grafana
    content:
    - dashboards
    - grafana.db
    - plugins
    - png (contains renederes png images - not necessary to back up)
    

    2.3 Alert manager

    Configuration files are located in:

    /etc/prometheus
    

    File alertmanager.yml should be copied in step 2.1.2 if exists

  3. PostgreSQL

    3.1. Basically PostgreSQL delivers two main tools for backup creation: pg_dump and pg_dumpall

    pg_dump create dump of selected database:

    pg_dump dbname > dbname.bak
    

    pg_dumpall - create dump of all databases of a cluster into one script. This dumps also global objects that are common to all databases like: users, groups, tablespaces and properties such as access permissions (pg_dump does not save these objects)

    pg_dumpall > pg_backup.bak
    

    3.2. Database resotre: psql or pg_restore:

    psql < pg_backup.bak
    pgrestore -d dbname db_name.bak
    

    3.3. Copy configuration files:

    /etc/postgresql/10/main/* - configuration files
    .pgpass - authentication credentials
    
    
  4. RabbitMQ

    4.1. RabbitMQ definicions might be exported using API (rabbitmq_management plugins need to be enabled):

    rabbitmq-plugins enable rabbitmq_management
    curl -v -X GET http://localhost:15672/api/definitions -u guest:guest -H "content-type:application/json" -o json
    

    Import backed up definitions:

    curl -v -X POST http://localhost:15672/api/definitions -u guest:guest -H "content-type:application/json" --data backup.json
    

    or add backup location to configuration file and restart rabbitmq:

    management.load_definitions = /path/to/backup.json
    

    4.2 Backing up RabbitMQ messages To back up messages RabbitMQ must be stopped. Copy content of rabbitmq mnesia directory:

    RABBITMQ_MNESIA_BASE
    
    ubuntu:
    /var/lib/rabbitmq/mnesia
    

    Restoration: place these files to similar location

    4.3 Backing up configuration:

    Copy /etc/rabbitmq/rabbitmq.conf file

  5. HAProxy

Copy /etc/haproxy/ to backup location

Copy certificates stored in /etc/ssl/haproxy/ location.

2 - Cloud

Desgin docs for Cloud Backup

LambdaStack cloud backup design document

Affected version: 0.5.x

Goals

Provide backup functionality for LambdaStack - cluster created using lambdastack tool.

Use cases

Creating snapshots of disks for all elements in environment created on cloud.

Example use

lsbackup --disks-snapshot -f path_to_data_yaml

Where -f is path to data yaml file with configuration of environment. --disks-snapshot informs about option that will create whole disks snapshot.

Backup Component View

User/background service/job executes lsbackup (code name) application. Application takes parameters:

  • -f: path to data yaml file with configuration of environment.
  • --disks-snapshot: option to create whole disk snapshot

Tool when executed takes resource group from file provided with -f flag and create snapshots of all elements in resource group.

Tool also produces metadata file that describes backup with time and the name of disks for which snapshot has been created.