This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

How-To

LambdaStack how-tos

1: Backup
2: Cluster
3: Configuration
4: Databases
5: Helm
6: Istio
7: Konnectivity
8: Kubernetes
9: Logging
10: Maintenance
11: Modules
12: Monitoring
13: OS Patching
14: Persistent Storage
15: Prerequisites
16: Repository
17: Retention
18: Security Groups
19: Security
20: Upgrade

1 - Backup

LambdaStack how-tos - Backup

LambdaStack backup and restore

Introduction

LambdaStack provides solution to create full or partial backup and restore for some components, like:

Load Balancer
Logging
Monitoring
Postgresql
RabbitMQ
Kubernetes (only backup)

Backup is created directly on the machine where component is running, and it is moved to the repository host via rsync. On the repository host backup files are stored in location /lsbackup/mounted mounted on a local filesystem. See How to store backup chapter.

1. How to perform backup

Backup configuration

Copy default configuration for backup from defaults/configuration/backup.yml into newly created backup.yml config file, and enable backup for chosen components by setting up enabled parameter to true.

This config may also be attached to cluster-config.yml or whatever you named your cluster yaml file.

kind: configuration/backup
title: Backup Config
name: default
specification:
  components:
    load_balancer:
      enabled: true
    logging:
      enabled: false
    monitoring:
      enabled: true
    postgresql:
      enabled: true
    rabbitmq:
      enabled: false
# Kubernes recovery is not supported at this point.
# You may create backup by enabling this below, but recovery should be done manually according to Kubernetes documentation.
    kubernetes:
      enabled: false

Run lambdastack backup command:

lambdastack backup -f backup.yml -b build_folder

If backup config is attached to cluster-config.yml, use this file instead of backup.yml.

2. How to store backup

Backup location is defined in backup role as backup_destination_host and backup_destination_dir. Default backup location is defined on repository host in location /lsbackup/mounted/. Use mounted location as mount point and mount storage you want to use. This might be:

Azure Blob Storage
Amazon S3
GCP Blob Storage
NAS
Any other attached storage

Ensure that mounted location has enough space, is reliable and is well protected against disaster.

NOTE

If you don't attach any storage into the mount point location, be aware that backups will be stored on the local machine. This is not recommended.

3. How to perform recovery

Recovery configuration

Copy existing default configuration from defaults/configuration/recovery.yml into newly created recovery.yml config file, and set enabled parameter for component to recovery. It's possible to choose snapshot name by passing date and time part of snapshot name. If snapshot name is not provided, the latest one will be restored.

This config may also be attached to cluster-config.yml

kind: configuration/recovery
title: Recovery Config
name: default
specification:
  components:
    load_balancer:
      enabled: true
      snapshot_name: latest           #restore latest backup
    logging:
      enabled: true
      snapshot_name: 20200604-150829  #restore selected backup
    monitoring:
      enabled: false
      snapshot_name: latest
    postgresql:
      enabled: false
      snapshot_name: latest
    rabbitmq:
      enabled: false
      snapshot_name: latest

Run lambdastack recovery command:

lambdastack recovery -f recovery.yml -b build_folder

If recovery config is attached to cluster-config.yml, use this file instead of recovery.yml.

4. How backup and recovery work

Load Balancer

Load balancer backup includes:

Configuration files: /etc/haproxy/
SSL certificates: /etc/ssl/haproxy/

Recovery includes all backed up files

Logging

Logging backup includes:

Elasticsearch database snapshot
Elasticsearch configuration /etc/elasticsearch/
Kibana configuration /etc/kibana/

Only single-node Elasticsearch backup is supported. Solution for multi-node Elasticsearch cluster will be added in future release.

Monitoring

Monitoring backup includes:

Prometheus data snapshot
Prometheus configuration /etc/prometheus/
Grafana data snapshot

Recovery includes all backed up configurations and snapshots.

Postgresql

Postgresql backup includes:

Database data and metadata dump using pg_dumpall
Configuration files: *.conf

When multiple node configuration is used, and failover action has changed database cluster status (one node down, switchover) it's still possible to create backup. But before database restore, cluster needs to be recovered by running lambdastack apply and next lambdastack recovery to restore database data. By default, we don't support recovery database configuration from backup since this needs to be done using lambdastack apply or manually by copying backed up files accordingly to cluster state. The reason of this is that is very risky to restore configuration files among different database cluster configurations.

RabbitMQ

RabbitMQ backup includes:

Messages definitions
Configuration files: /etc/rabbitmq/

Backup does not include RabbitMQ messages.

Recovery includes all backed up files and configurations.

Kubernetes

LambdaStack backup provides:

Etcd snapshot
Public Key Infrastructure /etc/kubernetes/pki
Kubeadm configuration files

Following features are not supported yet (use related documentation to do that manually):

Kubernetes cluster recovery
Backup and restore of data stored on persistent volumes described in persistent storage documentation

2 - Cluster

LambdaStack how-tos - Cluster

How to enable/disable LambdaStack repository VM

Enable for Ubuntu (default):

Enable "repository" component:
```
repository:
  count: 1
```

Enable for RHEL on Azure:

Enable "repository" component:

repository:
  count: 1
  machine: repository-machine-rhel

Add repository VM definition to main config file:

kind: infrastructure/virtual-machine
name: repository-machine-rhel
provider: azure
based_on: repository-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-LVM
    version: "7.9.2021051701"

Enable for RHEL on AWS:

Enable "repository" component:

repository:
  count: 1
  machine: repository-machine-rhel

Add repository VM definition to main config file:

kind: infrastructure/virtual-machine
title: Virtual Machine Infra
name: repository-machine-rhel
provider: aws
based_on: repository-machine
specification:
  os_full_name: RHEL-7.9_HVM-20210208-x86_64-0-Hourly2-GP2

Enable for CentOS on Azure:

Enable "repository" component:

repository:
  count: 1
  machine: repository-machine-centos

Add repository VM definition to main config file:

kind: infrastructure/virtual-machine
name: repository-machine-centos
provider: azure
based_on: repository-machine
specification:
  storage_image_reference:
    publisher: OpenLogic
    offer: CentOS
    sku: "7_9"
    version: "7.9.2021071900"

Enable for CentOS on AWS:

Enable "repository" component:

repository:
  count: 1
  machine: repository-machine-centos

Add repository VM definition to main config file:

kind: infrastructure/virtual-machine
title: Virtual Machine Infra
name: repository-machine-centos
provider: aws
based_on: repository-machine
specification:
  os_full_name: "CentOS 7.9.2009 x86_64"

Disable:

Disable "repository" component:
```
repository:
  count: 0
```
Prepend "kubernetes_master" mapping (or any other mapping if you don't deploy Kubernetes) with:
```
kubernetes_master:
  - repository
  - image-registry
```

How to create an LambdaStack cluster on existing infrastructure

Please read first prerequisites related to hostname requirements.

LambdaStack has the ability to set up a cluster on infrastructure provided by you. These can be either bare metal machines or VMs and should meet the following requirements:

Note. Hardware requirements are not listed since this depends on use-case, component configuration etc.

The cluster machines/VMs are connected by a network (or virtual network of some sorts) and can communicate with each other. At least one of them (with repository role) has Internet access in order to download dependencies. If there is no Internet access, you can use air gap feature (offline mode).
The cluster machines/VMs are running one of the following Linux distributions:
- RedHat 7.6+ and < 8
- CentOS 7.6+ and < 8
- Ubuntu 18.04
The cluster machines/VMs are accessible through SSH with a set of SSH keys you provide and configure on each machine yourself (key-based authentication).
The user used for SSH connection (admin_user) has passwordless root privileges through sudo.
A provisioning machine that:
- Has access to the SSH keys
- Is on the same network as your cluster machines
- Has LambdaStack running. Note. To run LambdaStack check the Prerequisites

To set up the cluster do the following steps from the provisioning machine:

First generate a minimal data yaml file:
```
lambdastack init -p any -n newcluster
```
The any provider will tell LambdaStack to create a minimal data config which does not contain any cloud provider related information. If you want full control you can add the --full flag which will give you a configuration with all parts of a cluster that can be configured.
Open the configuration file and set up the admin_user data:
```
admin_user:
  key_path: id_rsa
  name: user_name
  path: # Dynamically built
```
Here you should specify the path to the SSH keys and the admin user name which will be used by Ansible to provision the cluster machines.
Define the components you want to install and link them to the machines you want to install them on:

Under the components tag you will find a bunch of definitions like this one:
```
kubernetes_master:
  count: 1
  machines:
  - default-k8s-master
```
The count specifies how many machines you want to provision with this component. The machines tag is the array of machine names you want to install this component on. Note that the count and the number of machines defined must match. If you don't want to use a component you can set the count to 0 and remove the machines tag. Finally, a machine can be used by multiple component since multiple components can be installed on one machine of desired.

You will also find a bunch of infrastructure/machine definitions like below:
```
kind: infrastructure/machine
name: default-k8s-master
provider: any
specification:
  hostname: master
  ip: 192.168.100.101
```
Each machine name used when setting up the component layout earlier must have such a configuration where the name tag matches with the defined one in the components. The hostname and ip fields must be filled to match the actual cluster machines you provide. Ansible will use this to match the machine to a component which in turn will determine which roles to install on the machine.
Finally, start the deployment with:
```
lambdastack apply -f newcluster.yml --no-infra
```
This will create the inventory for Ansible based on the component/machine definitions made inside the newcluster.yml and let Ansible deploy it. Note that the --no-infra is important since it tells LambdaStack to skip the Terraform part.

How to create an LambdaStack cluster on existing air-gapped infrastructure

Please read first prerequisites related to hostname requirements.

LambdaStack has the ability to set up a cluster on air-gapped infrastructure provided by you. These can be either bare metal machines or VMs and should meet the following requirements:

Note. Hardware requirements are not listed since this depends on use-case, component configuration etc.

The air-gapped cluster machines/VMs are connected by a network or virtual network of some sorts and can communicate with each other.
The air-gapped cluster machines/VMs are running one of the following Linux distributions:
- RedHat 7.6+ and < 8
- CentOS 7.6+ and < 8
- Ubuntu 18.04
The cluster machines/VMs are accessible through SSH with a set of SSH keys you provide and configure on each machine yourself (key-based authentication).
The user used for SSH connection (admin_user) has passwordless root privileges through sudo.
A requirements machine that:
- Runs the same distribution as the air-gapped cluster machines/VMs (RedHat 7, CentOS 7, Ubuntu 18.04)
- Has access to the internet. If you don't have access to a similar machine/VM with internet access, you can also try to download the requirements with a Docker container. More information here.
A provisioning machine that:
- Has access to the SSH keys
- Is on the same network as your cluster machines
- Has LambdaStack running. Note. To run LambdaStack check the Prerequisites

To set up the cluster do the following steps:

First we need to get the tooling to prepare the requirements. On the provisioning machine run:
```
lambdastack prepare --os OS
```
Where OS should be centos-7, redhat-7, ubuntu-18.04. This will create a directory called prepare_scripts with the needed files inside.
The scripts in the prepare_scripts will be used to download all requirements. To do that copy the prepare_scripts folder over to the requirements machine and run the following command:
```
download-requirements.sh /requirementsoutput/
```
This will start downloading all requirements and put them in the /requirementsoutput/ folder. Once run successfully the /requirementsoutput/ needs to be copied to the provisioning machine to be used later on.
Then generate a minimal data yaml file on the provisioning machine:
```
lambdastack init -p any -n newcluster
```
The any provider will tell LambdaStack to create a minimal data config which does not contain any cloud provider related information. If you want full control you can add the --full flag which will give you a configuration with all parts of a cluster that can be configured.
Open the configuration file and set up the admin_user data:
```
admin_user:
  key_path: id_rsa
  name: user_name
  path: # Dynamically built
```
Here you should specify the path to the SSH keys and the admin user name which will be used by Ansible to provision the cluster machines.
Define the components you want to install and link them to the machines you want to install them on:

Under the components tag you will find a bunch of definitions like this one:
```
kubernetes_master:
  count: 1
  machines:
  - default-k8s-master
```
The count specifies how many machines you want to provision with this component. The machines tag is the array of machine names you want to install this component on. Note that the count and the number of machines defined must match. If you don't want to use a component you can set the count to 0 and remove the machines tag. Finally, a machine can be used by multiple component since multiple components can be installed on one machine of desired.

You will also find a bunch of infrastructure/machine definitions like below:
```
kind: infrastructure/machine
name: default-k8s-master
provider: any
specification:
  hostname: master
  ip: 192.168.100.101
```
Each machine name used when setting up the component layout earlier must have such a configuration where the name tag matches with the defined one in the components. The hostname and ip fields must be filled to match the actual cluster machines you provide. Ansible will use this to match the machine to a component which in turn will determine which roles to install on the machine.
Finally, start the deployment with:
```
lambdastack apply -f newcluster.yml --no-infra --offline-requirements /requirementsoutput/
```
This will create the inventory for Ansible based on the component/machine definitions made inside the newcluster.yml and let Ansible deploy it. Note that the --no-infra is important since it tells LambdaStack to skip the Terraform part. The --offline-requirements tells LambdaStack it is an air-gapped installation and to use the /requirementsoutput/ requirements folder prepared in steps 1 and 2 as source for all requirements.

How to create an LambdaStack cluster using custom system repository and Docker image registry

LambdaStack has the ability to use external repository and image registry during lambdastack apply execution.

Custom urls need to be specified inside the configuration/shared-config document, for example:

kind: configuration/shared-config
title: Shared configuration that will be visible to all roles
name: default
specification:
  custom_image_registry_address: "10.50.2.1:5000"
  custom_repository_url: "http://10.50.2.1:8080/lsrepo"
  use_ha_control_plane: true

The repository and image registry implementation must be compatible with already existing Ansible code:

the repository data (including apt or yum repository) is served from HTTP server and structured exactly as in the offline package
the image registry data is loaded into and served from standard Docker registry implementation

Note. If both custom repository/registry and offline installation are configured then the custom repository/registry is preferred.

Note. You can switch between custom repository/registry and offline/online installation methods. Keep in mind this will cause "imageRegistry" change in Kubernetes which in turn may cause short downtime.

By default, LambdaStack creates "repository" virtual machine for cloud environments. When custom repository and registry are used there is no need for additional empty VM. The following config snippet can illustrate how to mitigate this problem:

kind: lambdastack-cluster
title: LambdaStack Cluster Config
provider: <provider>
name: default
specification:
  ...
  components:
    repository:
      count: 0
    kubernetes_master:
      count: 1
    kubernetes_node:
      count: 2
---
kind: configuration/feature-mapping
title: "Feature mapping to roles"
provider: <provider>
name: default
specification:
  roles_mapping:
    kubernetes_master:
      - repository
      - image-registry
      - kubernetes-master
      - helm
      - applications
      - node-exporter
      - filebeat
      - firewall
      - vault
---
kind: configuration/shared-config
title: Shared configuration that will be visible to all roles
provider: <provider>
name: default
specification:
  custom_image_registry_address: "<ip-address>:5000"
  custom_repository_url: "http://<ip-address>:8080/lsrepo"

Disable "repository" component:
```
repository:
  count: 0
```
Prepend "kubernetes_master" mapping (or any other mapping if you don't deploy Kubernetes) with:
```
kubernetes_master:
  - repository
  - image-registry
```

Specify custom repository/registry in configuration/shared-config:

specification:
  custom_image_registry_address: "<ip-address>:5000"
  custom_repository_url: "http://<ip-address>:8080/lsrepo"

How to create an LambdaStack cluster on a cloud provider

Please read first prerequisites related to hostname requirements.

LambdaStack has the ability to set up a cluster on one of the following cloud providers:

AWS
Azure
GCP - WIP

Under the hood it uses Terraform to create the virtual infrastructure before it applies our Ansible playbooks to provision the VMs.

You need the following prerequisites:

Access to one of the supported cloud providers, aws, azure or gcp.
Adequate resources to deploy a cluster on the cloud provider.
A set of SSH keys you provide.
A provisioning machine that:
- Has access to the SSH keys
- Has LambdaStack running.
  
  Note. To run LambdaStack check the Prerequisites

To set up the cluster do the following steps from the provisioning machine:

First generate a minimal data yaml file:
```
lambdastack init -p aws/azure -n newcluster
```
The provider flag should be either aws or azure and will tell LambdaStack to create a data config which contains the specifics for that cloud provider. If you want full control you can add the --full flag which will give you a config with all parts of a cluster that can be configured.
Open the configuration file and set up the admin_user data:
```
admin_user:
  key_path: id_rsa
  name: user_name
  path: # Dynamically built
```
Here you should specify the path to the SSH keys and the admin user name which will be used by Ansible to provision the cluster machines.

For AWS the admin name is already specified and is dependent on the Linux distro image you are using for the VM's:
- Username for Ubuntu Server: ubuntu
- Username for Redhat: ec2-user
On Azure the name you specify will be configured as the admin name on the VM's.

On GCP-WIP the name you specify will be configured as the admin name on the VM's.
Set up the cloud specific data:

To let Terraform access the cloud providers you need to set up some additional cloud configuration.

AWS:
```
cloud:
  region: us-east-1
  credentials:
    key: aws_key
    secret: aws_secret
  use_public_ips: false
  default_os_image: default
```
The region lets you chose the most optimal place to deploy your cluster. The key and secret are needed by Terraform and can be generated in the AWS console. More information about that here

Azure:
```
cloud:
  region: East US
  subscription_name: Subscribtion_name
  use_service_principal: false
  use_public_ips: false
  default_os_image: default
```
The region lets you chose the most optimal place to deploy your cluster. The subscription_name is the Azure subscription under which you want to deploy the cluster.

Terraform will ask you to sign in to your Microsoft Azure subscription when it prepares to build/modify/destroy the infrastructure on azure. In case you need to share cluster management with other people you can set the use_service_principal tag to true. This will create a service principle and uses it to manage the resources.

If you already have a service principle and don't want to create a new one you can do the following. Make sure the use_service_principal tag is set to true. Then before you run lambdastack apply -f yourcluster.yml create the following folder structure from the path you are running LambdaStack:
```
/build/clustername/terraform
```
Where the clustername is the name you specified under specification.name in your cluster yaml. Then in terraform folder add the file named sp.yml and fill it up with the service principal information like so:
```
appId: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
displayName: "app-name"
name: "http://app-name"
password: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
tenant: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
subscriptionId: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
```
LambdaStack will read this file and automatically use it for authentication for resource creation and management.

GCP-WIP:

NOTE: GCP-WIP values may or may not be correct until official GCP release
```
cloud:
  region: us-east-1
  credentials:
    key: gcp_key
    secret: gcp_secret
  use_public_ips: false
  default_os_image: default
```
The region lets you chose the most optimal place to deploy your cluster. The key and secret are needed by Terraform and can be generated in the GCP console.

For both aws, azure, and gcp the following cloud attributes overlap:
- use_public_ips: When true, the VMs will also have a direct interface to the internet. While this is easy for setting up a cluster for testing, it should not be used in production. A VPN setup should be used which we will document in a different section (TODO).
- default_os_image: Lets you more easily select LambdaStack team validated and tested OS images. When one is selected, it will be applied to every infrastructure/virtual-machine document in the cluster regardless of user defined ones. The following values are accepted: - default: Applies user defined infrastructure/virtual-machine documents when generating a new configuration. - ubuntu-18.04-x86_64: Applies the latest validated and tested Ubuntu 18.04 image to all infrastructure/virtual-machine documents on x86_64 on Azure and AWS. - redhat-7-x86_64: Applies the latest validated and tested RedHat 7.x image to all infrastructure/virtual-machine documents on x86_64 on Azure and AWS. - centos-7-x86_64: Applies the latest validated and tested CentOS 7.x image to all infrastructure/virtual-machine documents on x86_64 on Azure and AWS. - centos-7-arm64: Applies the latest validated and tested CentOS 7.x image to all infrastructure/virtual-machine documents on arm64 on AWS. Azure currently doesn't support arm64. The images which will be used for these values will be updated and tested on regular basis.
Define the components you want to install:

Under the components tag you will find a bunch of definitions like this one:
```
kubernetes_master:
  count: 1
```
The count specifies how much VM's you want to provision with this component. If you don't want to use a component you can set the count to 0.

Note that for each cloud provider LambdaStack already has a default VM configuration for each component. If you need more control over the VM's, generate a config with the --full flag. Then each component will have an additional machine tag:
```
kubernetes_master:
  count: 1
  machine: kubernetes-master-machine
  ...
```
This links to a infrastructure/virtual-machine document which can be found inside the same configuration file. It gives you full control over the VM config (size, storage, provision image, security etc.). More details on this will be documented in a different section (TODO).
Finally, start the deployment with:
```
lambdastack apply -f newcluster.yml
```

Note for RHEL Azure images

LambdaStack currently supports RHEL 7 LVM partitioned images attached to standard RHEL repositories. For more details, refer to Azure documentation.

LambdaStack uses cloud-init custom data in order to merge small logical volumes (homelv, optlv, tmplv and varlv) into the rootlv and extends it (with underlying filesystem) by the current free space in its volume group. The usrlv LV, which has 10G, is not merged since it would require a reboot. The merging is required to deploy a cluster, however, it can be disabled for troubleshooting since it performs some administrative tasks (such as remounting filesystems or restarting services).

NOTE: RHEL 7 LVM images require at least 64 GB for OS disk.

Example config:

kind: infrastructure/virtual-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: "7-LVM"
    version: "7.9.2021051701"
  storage_os_disk:
    disk_size_gb: 64

Note for CentOS Azure images

LambdaStack supports CentOS 7 images with RAW partitioning (recommended) and LVM as well.

Example config:

kind: infrastructure/virtual-machine
specification:
  storage_image_reference:
    publisher: OpenLogic
    offer: CentOS
    sku: "7_9"
    version: "7.9.2021071900"

How to disable merging LVM logical volumes

In order to not merge logical volumes (for troubleshooting), use the following doc:

kind: infrastructure/cloud-init-custom-data
title: cloud-init user-data
provider: azure
name: default
specification:
  enabled: false

How to delete an LambdaStack cluster on a cloud provider

LambdaStack has a delete command to remove a cluster from a cloud provider (AWS, Azure). With LambdaStack run the following:

lambdastack delete -b /path/to/cluster/build/folder

From the defined cluster build folder it will take the information needed to remove the resources from the cloud provider.

Single machine cluster

Please read first prerequisites related to hostname requirements.

NOTE

Single machine cannot be scaled up or deployed alongside other types of cluster.

Sometimes it might be desirable to run an LambdaStack cluster on a single machine. For this purpose LambdaStack ships with a single_cluster component configuration. This cluster comes with the following main components:

kubernetes-master: Untainted so pods can be deployed on it
rabbitmq: Rabbitmq for messaging instead of Kafka
applications: For deploying the Keycloak authentication service
postgresql: To provide a database for Keycloak

Note that components like logging and monitoring are missing since they do not provide much benefit in a single machine scenario. Also, RabbitMQ is included over Kafka since that is much less resource intensive.

To get started with a single machine cluster you can use the following template as a base. Note that some configurations are omitted:

kind: lambdastack-cluster
title: LambdaStack Cluster Config
name: default
built_path: # Dynamically built
specification:
  prefix: dev
  name: single
  admin_user:
    name: operations
    key_path: id_rsa
    path: # Dynamically built
  cloud:
    ... # add other cloud configuration as needed
  components:
    kubernetes_master:
      count: 0
    kubernetes_node:
      count: 0
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 0
    load_balancer:
      count: 0
    rabbitmq:
      count: 0
    ignite:
      count: 0
    opendistro_for_elasticsearch:
      count: 0
    single_machine:
      count: 1
---
kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
specification:
  applications:
  - name: auth-service
    enabled: yes # set to yest to enable authentication service
    ... # add other authentication service configuration as needed

To create a single machine cluster using the "any" provider (with extra load_balancer config included) use the following template below:

kind: lambdastack-cluster
title: "LambdaStack Cluster Config"
provider: any
name: single
build_path: # Dynamically built
specification:
  name: single
  admin_user:
    name: ubuntu
    key_path: id_rsa
    path: # Dynamically built
  components:
    kubernetes_master:
      count: 0
    kubernetes_node:
      count: 0
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 0
    load_balancer:
      count: 1
      configuration: default
      machines: [single-machine]
    rabbitmq:
      count: 0
    single_machine:
      count: 1
      configuration: default
      machines: [single-machine]
---
kind: configuration/haproxy
title: "HAProxy"
provider: any
name: default
specification:
  logs_max_days: 60
  self_signed_certificate_name: self-signed-fullchain.pem
  self_signed_private_key_name: self-signed-privkey.pem
  self_signed_concatenated_cert_name: self-signed-test.tld.pem
  haproxy_log_path: "/var/log/haproxy.log"

  stats:
    enable: true
    bind_address: 127.0.0.1:9000
    uri: "/haproxy?stats"
    user: operations
    password: your-haproxy-stats-pwd
  frontend:
    - name: https_front
      port: 443
      https: yes
      backend:
      - http_back1
  backend: # example backend config below
    - name: http_back1
      server_groups:
      - kubernetes_node
      # servers: # Definition for server to that hosts the application.
      # - name: "node1"
      #   address: "lambdastack-vm1.domain.com"
      port: 30104
---
kind: infrastructure/machine
provider: any
name: single-machine
specification:
  hostname: x1a1
  ip: 10.20.2.10

How to create custom cluster components

LambdaStack gives you the ability to define custom components. This allows you to define a custom set of roles for a component you want to use in your cluster. It can be useful when you for example want to maximize usage of the available machines you have at your disposal.

The first thing you will need to do is define it in the configuration/feature-mapping configuration. To get this configuration you can run lambdastack init ... --full command. In the available_roles roles section you can see all the available roles that LambdaStack provides. The roles_mapping is where all the LambdaStack components are defined and were you need to add your custom components.

Below are parts of an example configuration/feature-mapping were we define a new single_machine_new component. We want to use Kafka instead of RabbitMQ and don`t need applications and postgres since we don't want a Keycloak deployment:

kind: configuration/feature-mapping
title: Feature mapping to roles
name: default
specification:
  available_roles: # All entries here represent the available roles within LambdaStack
  - name: repository
    enabled: yes
  - name: firewall
    enabled: yes
  - name: image-registry
  ...
  roles_mapping: # All entries here represent the default components provided with LambdaStack
  ...
    single_machine:
    - repository
    - image-registry
    - kubernetes-master
    - applications
    - rabbitmq
    - postgresql
    - firewall
    # Below is the new single_machine_new definition
    single_machine_new:
    - repository
    - image-registry
    - kubernetes-master
    - kafka
    - firewall
  ...

Once defined the new single_machine_new can be used inside the lambdastack-cluster configuration:

kind: lambdastack-cluster
title: LambdaStack Cluster Config
name: default
build_path: # Dynamically built
specification:
  prefix: new
  name: single
  admin_user:
    name: operations
    key_path: id_rsa
    path: # Dynamically built
  cloud:
    ... # add other cloud configuration as needed
  components:
    ... # other components as needed
    single_machine_new:
      count: x

Note: After defining a new component you might also need to define additional configurations for virtual machines and security rules depending on what you are trying to achieve.

How to scale or cluster components

Not all components are supported for this action. There is a bunch of issues referenced below in this document.

LambdaStack has the ability to automatically scale and cluster certain components on cloud providers (AWS, Azure). To upscale or downscale a component the count number must be increased or decreased:

components:
  kubernetes_node:
    count: ...
    ...

Then when applying the changed configuration using LambdaStack, additional VM's will be spawned and configured or removed. The following table shows what kind of operation component supports:

Component	Scale up	Scale down	HA	Clustered	Known issues
Repository	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	---
Monitoring	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	---
Logging	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	---
Kubernetes master	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	#1579
Kubernetes node	:heavy_check_mark:	:x:	:heavy_check_mark:	:heavy_check_mark:	#1580
Ignite	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---
Kafka	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---
Load Balancer	:heavy_check_mark:	:heavy_check_mark:	:x:	:x:	---
Opendistro for elasticsearch	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---
Postgresql	:x:	:x:	:heavy_check_mark:	:heavy_check_mark:	#1577
RabbitMQ	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	#1578, #1309
RabbitMQ K8s	:heavy_check_mark:	:heavy_check_mark:	:x:	:heavy_check_mark:	#1486
Keycloak K8s	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---
Pgpool K8s	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---
Pgbouncer K8s	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---
Ignite K8s	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	---

Additional notes:

Repository:
In standard LambdaStack deployment only one repository machine is required.
:arrow_up: Scaling up the repository component will create a new standalone VM.
:arrow_down: Scaling down will remove it in LIFO order (Last In, First Out).
However, even if you create more than one VM, by default all other components will use the first one.
Kubernetes master:
:arrow_up: When increased this will set up additional control plane nodes, but in the case of non-ha k8s cluster, the existing control plane node must be promoted first.
:arrow_down: At the moment there is no ability to downscale.
Kubernetes node:
:arrow_up: When increased this will set up an additional node and join into the Kubernetes cluster.
:arrow_down: There is no ability to downscale.
Load balancer:
:arrow_up: Scaling up the load_balancer component will create a new standalone VM.
:arrow_down: Scaling down will remove it in LIFO order (Last In, First Out).
Logging:
:arrow_up: Scaling up will create new VM with both Kibana and ODFE components inside.
ODFE will join the cluster but Kibana will be a standalone instance.
:arrow_down: When scaling down VM will be deleted.
Monitoring:
:arrow_up: Scaling up the monitoring component will create a new standalone VM.
:arrow_down: Scaling down will remove it in LIFO order (Last In, First Out).
Postgresql:
:arrow_up: At the moment does not support scaling up. Check known issues.
:arrow_down: At the moment does not support scaling down. Check known issues.
RabbitMQ:
If the instance count is changed, then additional RabbitMQ nodes will be added or removed.
:arrow_up: Will create new VM and adds it to the RabbitMQ cluster.
:arrow_down: At the moment scaling down will just remove VM. All data not processed on this VM will be purged. Check known issues.
Note that clustering requires a change in the configuration/rabbitmq document:
```
kind: configuration/rabbitmq
...
specification:
  cluster:
    is_clustered: true
...
```
RabbitMQ K8s: Scaling is controlled via replicas in StatefulSet. RabbitMQ on K8s uses plugin rabbitmq_peer_discovery_k8s to works in cluster.

Additional known issues:

#1574 - Disks are not removed after downscale of any LambdaStack component on Azure.

Multi master cluster

LambdaStack can deploy HA Kubernetes clusters (since v0.6). To achieve that, it is required that:

the master count must be higher than 1 (proper values should be 1, 3, 5, 7):
```
kubernetes_master:
  count: 3
```

the HA mode must be enabled in configuration/shared-config:

kind: configuration/shared-config
...
specification:
  use_ha_control_plane: true
  promote_to_ha: false

the regular lambdastack apply cycle must be executed

LambdaStack can promote / convert older single-master clusters to HA mode (since v0.6). To achieve that, it is required that:

the existing cluster is legacy single-master cluster
the existing cluster has been upgraded to Kubernetes 1.17 or above first

the HA mode and HA promotion must be enabled in configuration/shared-config:

kind: configuration/shared-config
...
specification:
  use_ha_control_plane: true
  promote_to_ha: true

the regular lambdastack apply cycle must be executed

since it is one-time operation, after successful promotion, the HA promotion must be disabled in the config:

kind: configuration/shared-config
...
specification:
  use_ha_control_plane: true
  promote_to_ha: false

Note: It is not supported yet to reverse HA promotion.

LambdaStack can scale-up existing HA clusters (including ones that were promoted). To achieve that, it is required that:

the existing cluster must be already running in HA mode
the master count must be higher than previous value (proper values should be 3, 5, 7):
```
kubernetes_master:
  count: 5
```

the HA mode must be enabled in configuration/shared-config:

kind: configuration/shared-config
...
specification:
  use_ha_control_plane: true
  promote_to_ha: false

the regular lambdastack apply cycle must be executed

Note: It is not supported yet to scale-down clusters (master count cannot be decreased).

Build artifacts

LambdaStack engine produce build artifacts during each deployment. Those artifacts contain:

Generated terraform files.
Generated terraform state files.
Generated cluster manifest file.
Generated ansible files.
Azure login credentials for service principal if deploying to Azure.

Artifacts contain sensitive data, so it is important to keep it in safe place like private GIT repository or storage with limited access. Generated build is also important in case of scaling or updating cluster - you will it in build folder in order to edit your cluster.

LambdaStack creates (or use if you don't specified it to create) service principal account which can manage all resources in subscription, please store build artifacts securely.

Kafka replication and partition setting

When planning Kafka installation you have to think about number of partitions and replicas since it is strongly related to throughput of Kafka and its reliability. By default, Kafka's replicas number is set to 1 - you should change it in core/src/ansible/roles/kafka/defaults in order to have partitions replicated to many virtual machines.

  ...
  replicas: 1 # Default to at least 1 (1 broker)
  partitions: 8 # 100 x brokers x replicas for reasonable size cluster. Small clusters can be less
  ...

You can read more here about planning number of partitions.

NOTE: LambdaStack does not use Confluent. The above reference is simply for documentation.

RabbitMQ installation and setting

To install RabbitMQ in single mode just add rabbitmq role to your data.yaml for your server and in general roles section. All configuration on RabbitMQ, e.g., user other than guest creation should be performed manually.

How to use Azure availability sets

In your cluster yaml config declare as many as required objects of kind infrastructure/availability-set like in the example below, change the name field as you wish.

---
kind: infrastructure/availability-set
name: kube-node  # Short and simple name is preferred
specification:
# The "name" attribute is generated automatically according to LambdaStack's naming conventions
  platform_fault_domain_count: 2
  platform_update_domain_count: 5
  managed: true
provider: azure

Then set it also in the corresponding components section of the kind: lambdastack-cluster doc.

  components:
    kafka:
      count: 0
    kubernetes_master:
      count: 1
    kubernetes_node:
# This line tells we generate the availability-set terraform template
      availability_set: kube-node  # Short and simple name is preferred
      count: 2

The example below shows a complete configuration. Note that it's recommended to have a dedicated availability set for each clustered component.

# Test availability set config
---
kind: lambdastack-cluster
name: default
provider: azure
build_path: # Dynamically built
specification:
  name: test-cluster
  prefix: test
  admin_user:
    key_path: id_rsa
    name: di-dev
    path: # Dynamically built
  cloud:
    region: Australia East
    subscription_name: <your subscription name>
    use_public_ips: true
    use_service_principal: true
  components:
    kafka:
      count: 0
    kubernetes_master:
      count: 1
    kubernetes_node:
# This line tells we generate the availability-set terraform template
      availability_set: kube-node  # Short and simple name is preferred
      count: 2
    load_balancer:
      count: 1
    logging:
      count: 0
    monitoring:
      count: 0
    postgresql:
# This line tells we generate the availability-set terraform template
      availability_set: postgresql  # Short and simple name is preferred
      count: 2
    rabbitmq:
      count: 0
title: LambdaStack Cluster Config
---
kind: infrastructure/availability-set
name: kube-node  # Short and simple name is preferred
specification:
# The "name" attribute (omitted here) is generated automatically according to LambdaStack's naming conventions
  platform_fault_domain_count: 2
  platform_update_domain_count: 5
  managed: true
provider: azure
---
kind: infrastructure/availability-set
name: postgresql  # Short and simple name is preferred
specification:
# The "name" attribute (omitted here) is generated automatically according to LambdaStack's naming conventions
  platform_fault_domain_count: 2
  platform_update_domain_count: 5
  managed: true
provider: azure

Downloading offline requirements with a Docker container

This paragraph describes how to use a Docker container to download the requirements for air-gapped/offline installations. At this time we don't officially support this, and we still recommend using a full distribution which is the same as the air-gapped cluster machines/VMs.

A few points:

This only describes how to set up the Docker containers for downloading. The rest of the steps are similar as in the paragraph here.
Main reason why you might want to give this a try is to download arm64 architecture requirements on a x86_64 machine. More information on the current state of arm64 support can be found here.

Ubuntu 18.04

For Ubuntu, you can use the following command to launch a container:

docker run -v /shared_folder:/home <--platform linux/amd64 or --platform linux/arm64> --rm -it ubuntu:18.04

As the ubuntu:18.04 image is multi-arch you can include --platform linux/amd64 or --platform linux/arm64 to run the container as the specified architecture. The /shared_folder should be a folder on your local machine containing the required scripts.

When you are inside the container run the following commands to prepare for the running of the download-requirements.sh script:

apt-get update # update the package manager
apt-get install sudo # install sudo so we can make the download-requirements.sh executable and run it as root
sudo chmod +x /home/download-requirements.sh # make the requirements script executable

After this you should be able to run the download-requirements.sh from the home folder.

RedHat 7.x

For RedHat you can use the following command to launch a container:

docker run -v /shared_folder:/home <--platform linux/amd64 or --platform linux/arm64> --rm -it registry.access.redhat.com/ubi7/ubi:7.9

As the registry.access.redhat.com/ubi7/ubi:7.9 image is multi-arch you can include --platform linux/amd64 or --platform linux/arm64 to run the container as the specified architecture. The /shared_folder should be a folder on your local machine containing the requirement scripts.

For running the download-requirements.sh script you will need a RedHat developer subscription to register the running container and make sure you can access to official Redhat repos for the packages needed. More information on getting this free subscription here.

When you are inside the container run the following commands to prepare for the running of the download-requirements.sh script:

subscription-manager register # will ask for you credentials of your RedHat developer subscription and setup the container
subscription-manager attach --auto # will enable the RedHat official repositories
chmod +x /home/download-requirements.sh # make the requirements script executable

After this you should be able to run the download-requirements.sh from the home folder.

CentOS 7.x

For CentOS, you can use the following command to launch a container:

arm64:

docker run -v /shared_folder:/home --platform linux/arm64 --rm -it arm64v8/centos:7.9.2009

x86_64:

docker run -v /shared_folder:/home --platform linux/amd64 --rm -it amd64/centos:7.9.2009

The /shared_folder should be a folder on your local machine containing the requirement scripts.

When you are inside the container run the following commands to prepare for the running of the download-requirements.sh script:

chmod +x /home/download-requirements.sh # make the requirements script executable

After this you should be able to run the download-requirements.sh from the home folder.

3 - Configuration

LambdaStack how-tos - Configuration

Configuration file

Named lists

LambdaStack uses a concept called named lists in the configuration YAML. Every item in a named list has the name key to identify it and make it unique for merge operation:

...
  list:
  - name: item1
    property1: value1
    property2: value2
  - name: item2
    property1: value3
    property2: value4
...

By default, a named list in your configuration file will completely overwrite the defaults that LambdaStack provides. This behaviour is on purpose so when you, for example, define a list of users for Kafka inside your configuration it completely overwrites the users defined in the Kafka defaults.

In some cases, however, you don't want to overwrite a named list. A good example would be the application configurations.

You don't want to re-define every item just to make sure LambdaStack has all default items needed by the Ansible automation. That is where the _merge metadata tag comes in. It will let you define whether you want to overwrite or merge a named list by setting it to true or false.

For example you want to enable the auth-service application. Instead of defining the whole configuration/applications configuration you can do the following:

kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
provider: azure
specification:
  applications:
  - _merge: true
  - name: auth-service
    enabled: true

The _merge item with true will tell lambdastack to merge the application list and only change the enabled: true setting inside the auth-service and take the rests of the configuration/applications configuration from the defaults.

4 - Databases

LambdaStack how-tos - Databases

How to configure PostgreSQL

To configure PostgreSQL, login to server using ssh and switch to postgres user with command:

sudo -u postgres -i

Then configure database server using psql according to your needs and PostgreSQL documentation.

PostgreSQL passwords encryption

LambdaStack sets up MD5 password encryption. Although PostgreSQL since version 10 is able to use SCRAM-SHA-256 password encryption, LambdaStack does not support this encryption method since recommended production configuration uses more than one database host with HA configuration (repmgr) cooperating with PgBouncer and Pgpool. Pgpool is not able to parse SCRAM-SHA-256 hashes list while this encryption is enabled. Due to limited Pgpool authentication options, it is not possible to refresh the pool_passwd file automatically. For this reason, MD5 password encryption is set up and this is not configurable in LambdaStack.

How to set up PostgreSQL connection pooling

PostgreSQL connection pooling in LambdaStack is served by PgBouncer application. It is available as Kubernetes ClusterIP or standalone package. The Kubernetes based installation works together with PgPool so it supports PostgreSQL HA setup. The standalone installation (described below) is deprecated and will be removed in the next release.

NOTE

PgBouncer extension is not supported on ARM.

PgBouncer is installed only on PostgreSQL primary node. This needs to be enabled in configuration yaml file:

kind: configuration/postgresql
specification:
  extensions:
    ...
    pgbouncer:
      enabled: yes
    ...

PgBouncer listens on standard port 6432. Basic configuration is just template, with very limited access to database. This is because of security reasons. Configuration needs to be tailored according component documentation and stick to security rules and best practices.

How to set up PostgreSQL HA replication with repmgr cluster

NOTE 1

Replication (repmgr) extension is not supported on ARM.

NOTE 2

Changing number of PostgreSQL nodes is not supported by LambdaStack after first apply. Before cluster deployment think over what kind of configuration you need, and how many PostgreSQL nodes will be needed.

This component can be used as a part of PostgreSQL clustering configured by LambdaStack. In order to configure PostgreSQL HA replication, add to your configuration file a block similar to the one below to core section:

---
kind: configuration/postgresql
name: default
title: PostgreSQL
specification:
  config_file:
    parameter_groups:
      ...
      # This block is optional, you can use it to override default values
    - name: REPLICATION
      subgroups:
      - name: Sending Server(s)
        parameters:
        - name: max_wal_senders
          value: 10
          comment: maximum number of simultaneously running WAL sender processes
          when: replication
        - name: wal_keep_size
          value: 500
          comment: the size of WAL files held for standby servers (MB)
          when: replication
      - name: Standby Servers
        parameters:
        - name: hot_standby
          value: 'on'
          comment: must be 'on' for repmgr needs, ignored on primary but recommended
            in case primary becomes standby
          when: replication
  extensions:
    ...
    replication:
      enabled: true
      replication_user_name: ls_repmgr
      replication_user_password: PASSWORD_TO_CHANGE
      privileged_user_name: ls_repmgr_admin
      privileged_user_password: PASSWORD_TO_CHANGE
      repmgr_database: ls_repmgr
      shared_preload_libraries:
      - repmgr
    ...

If enabled is set to true for replication extension, LambdaStack will automatically create a cluster of primary and secondary server with replication user with name and password specified in configuration file. This is only possible for configurations containing two PostgreSQL servers.

Privileged user is used to perform full backup of primary instance and replicate this at the beginning to secondary node. After that for replication only replication user with limited permissions is used for WAL replication.

How to stop PostgreSQL service in HA cluster

In order to maintenance work sometimes PostgreSQL service needs to be stopped. Before this action repmgr service needs to be paused, see manual page before. When repmgr service is paused steps from PostgreSQL manual page may be applied or stop it as a regular systemd service.

How to register database standby in repmgr cluster

If one of database nodes has been recovered to desired state, you may want to re-attach it to database cluster. Execute these steps on node which will be attached as standby:

Clone data from current primary node:

repmgr standby clone -h CURRENT_PRIMARY_ADDRESS -U ls_repmgr_admin -d ls_repmgr --force

repmgr standby register

You may use option --force if the node was registered in cluster before. For more options, see repmgr manual: https://repmgr.org/docs/5.2/repmgr-standby-register.html

How to switchover database nodes

For some reason you may want to switchover database nodes (promote standby to primary and demote existing primary to standby).

Configure passwordless SSH communication for postgres user between database nodes.
Test and run initial login between nodes to authenticate host (if host authentication is enabled).

Execute commands listed below on actual standby node

Confirm that standby you want to promote is registered in repmgr cluster:

repmgr cluster show

Run switchover:

repmgr standby switchover

Run command from step 3 and check status. For more details or troubleshooting, see repmgr manual: https://repmgr.org/docs/5.2/repmgr-standby-switchover.html

How to set up PgBouncer, PgPool and PostgreSQL parameters

This section describes how to set up connection pooling and load balancing for highly available PostgreSQL cluster. The default configuration provided by LambdaStack is meant for midrange class systems but can be customized to scale up or to improve performance.

To adjust the configuration to your needs, you can refer to the following documentation:

Component	Documentation URL
PgBouncer	https://www.pgbouncer.org/config.html
PgPool: Performance Considerations	https://www.pgpool.net/docs/41/en/html/performance.html
PgPool: Server Configuration	https://www.pgpool.net/docs/41/en/html/runtime-config.html
PostgreSQL: connections	https://www.postgresql.org/docs/10/runtime-config-connection.html
PostgreSQL: resources management	https://www.postgresql.org/docs/10/runtime-config-resource.html

Installing PgBouncer and PgPool

NOTE

PgBouncer and PgPool Docker images are not supported for ARM. If these applications are enabled in configuration, installation will fail.

PgBouncer and PgPool are provided as K8s deployments. By default, they are not installed. To deploy them you need to add configuration/applications document to your configuration yaml file, similar to the example below (enabled flags must be set as true):

---
kind: configuration/applications
version: 1.2.0
title: "Kubernetes Applications Config"
provider: aws
name: default
specification:
  applications:
  ...

## --- pgpool ---

  - name: pgpool
    enabled: true
    ...
    namespace: postgres-pool
    service:
      name: pgpool
      port: 5432
    replicas: 3
    ...
    resources: # Adjust to your configuration, see https://www.pgpool.net/docs/42/en/html/resource-requiremente.html
      limits:
        # cpu: 900m # Set according to your env
        memory: 310Mi
      requests:
        cpu: 250m # Adjust to your env, increase if possible
        memory: 310Mi
    pgpool:
      # https://github.com/bitnami/bitnami-docker-pgpool#configuration + https://github.com/bitnami/bitnami-docker-pgpool#environment-variables
      env:
        PGPOOL_BACKEND_NODES: autoconfigured # you can use custom value like '0:pg-node-1:5432,1:pg-node-2:5432'
        # Postgres users
        PGPOOL_POSTGRES_USERNAME: ls_pgpool_postgres_admin # with SUPERUSER role to use connection slots reserved for superusers for K8s liveness probes, also for user synchronization
        PGPOOL_SR_CHECK_USER: ls_pgpool_sr_check # with pg_monitor role, for streaming replication checks and health checks
        # ---
        PGPOOL_ADMIN_USERNAME: ls_pgpool_admin # Pgpool administrator (local pcp user)
        PGPOOL_ENABLE_LOAD_BALANCING: false # set to 'false' if there is no replication
        PGPOOL_MAX_POOL: 4
        PGPOOL_CHILD_LIFE_TIME: 300
        PGPOOL_POSTGRES_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_postgres_password
        PGPOOL_SR_CHECK_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_sr_check_password
        PGPOOL_ADMIN_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_admin_password
      secrets:
        pgpool_postgres_password: PASSWORD_TO_CHANGE
        pgpool_sr_check_password: PASSWORD_TO_CHANGE
        pgpool_admin_password: PASSWORD_TO_CHANGE
      # https://www.pgpool.net/docs/42/en/html/runtime-config.html
      pgpool_conf_content_to_append: |
        #------------------------------------------------------------------------------
        # CUSTOM SETTINGS (appended by LambdaStack to override defaults)
        #------------------------------------------------------------------------------
        # num_init_children = 32
        connection_life_time = 600
        reserved_connections = 1        
      # https://www.pgpool.net/docs/41/en/html/auth-pool-hba-conf.html
      pool_hba_conf: autoconfigured

## --- pgbouncer ---

  - name: pgbouncer
    enabled: true
    ...
    namespace: postgres-pool
    service:
      name: pgbouncer
      port: 5432
    replicas: 2
    resources:
      requests:
        cpu: 250m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 128Mi
    pgbouncer:
      env:
        DB_HOST: pgpool.postgres-pool.svc.cluster.local
        DB_LISTEN_PORT: 5432
        MAX_CLIENT_CONN: 150
        DEFAULT_POOL_SIZE: 25
        RESERVE_POOL_SIZE: 25
        POOL_MODE: session
        CLIENT_IDLE_TIMEOUT: 0

Default setup - main parameters

This chapter describes the default setup and main parameters responsible for the performance limitations. The limitations can be divided into 3 layers: resource usage, connection limits and query caching. All the configuration parameters can be modified in the configuration yaml file.

Resource usage

Each of the components has hardware requirements that depend on its configuration, in particular on the number of allowed connections.

PgBouncer

replicas: 2
resources:
  requests:
    cpu: 250m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 128Mi

PgPool

replicas: 3
resources: # Adjust to your configuration, see https://www.pgpool.net/docs/41/en/html/resource-requiremente.html
  limits:
    # cpu: 900m # Set according to your env
    memory: 310Mi
  requests:
    cpu: 250m # Adjust to your env, increase if possible
    memory: 310Mi

By default, each PgPool pod requires 176 MB of memory. This value has been determined based on PgPool docs, however after stress testing we need to add several extra megabytes to avoid failed to fork a child issue. You may need to adjust resources after changing num_init_children or max_pool (PGPOOL_MAX_POOL) settings. Such changes should be synchronized with PostgreSQL and PgBouncer configuration.

PostgreSQL

Memory related parameters have PostgreSQL default values. If your setup requires performance improvements, you may consider changing values of the following parameters:

shared_buffers
work_mem
maintenance _work_mem
effective_cache_size
temp_buffers

The default settings can be overridden by LambdaStack using configuration/postgresql doc in the configuration yaml file.

Connection limits

PgBouncer

There are connection limitations defined in PgBouncer configuration. Each of these parameters is defined per PgBouncer instance (pod). For example, having 2 pods (with MAX_CLIENT_CONN = 150) allows for up to 300 client connections.

    pgbouncer:
      env:
        ...
        MAX_CLIENT_CONN: 150
        DEFAULT_POOL_SIZE: 25
        RESERVE_POOL_SIZE: 25
        POOL_MODE: session
        CLIENT_IDLE_TIMEOUT: 0

By default, POOL_MODE is set to session to be transparent for Pgbouncer client. This section should be adjusted depending on your desired configuration. Rotating connection modes are well described in Official Pgbouncer documentation.
If your client application doesn't manage sessions you can use CLIENT_IDLE_TIMEOUT to force session timeout.

PgPool

By default, PgPool service is configured to handle up to 93 active concurrent connections to PostgreSQL (3 pods x 31). This is because of the following settings:

num_init_children = 32
reserved_connections = 1

Each pod can handle up to 32 concurrent connections but one is reserved. This means that the 32nd connection from a client will be refused. Keep in mind that canceling a query creates another connection to PostgreSQL, thus, a query cannot be canceled if all the connections are in use. Furthermore, for each pod, one connection slot must be available for K8s health checks. Hence, the real number of available concurrent connections is 30 per pod.

If you need more active concurrent connections, you can increase the number of pods (replicas), but the total number of allowed concurrent connections should not exceed the value defined by PostgreSQL parameters: (max_connections - superuser_reserved_connections).

In order to change PgPool settings (defined in pgpool.conf), you can edit pgpool_conf_content_to_append section:

pgpool_conf_content_to_append: |
  #------------------------------------------------------------------------------
  # CUSTOM SETTINGS (appended by LambdaStack to override defaults)
  #------------------------------------------------------------------------------
  connection_life_time = 900
  reserved_connections = 1

The content of pgpool.conf file is stored in K8s pgpool-config-files ConfigMap.

For detailed information about connection tuning, see "Performance Considerations" chapter in PgPool documentation.

PostgreSQL

PostgreSQL uses max_connections parameter to limit the number of client connections to database server. The default is typically 100 connections. Generally, PostgreSQL on sufficient amount of hardware can support a few hundred connections.

Query caching

Query caching is not available in PgBouncer.

PgPool

Query caching is disabled by default in PgPool configuration.

PostgreSQL

PostgreSQL is installed with default settings.

How to set up PostgreSQL audit logging

Audit logging of database activities is available through the PostgreSQL Audit Extension: PgAudit. It provides session and/or object audit logging via the standard PostgreSQL log.

PgAudit may generate a large volume of logging, which has an impact on performance and log storage. For this reason, PgAudit is not enabled by default.

To install and configure PgAudit, add to your configuration yaml file a doc similar to the following:

kind: configuration/postgresql
title: PostgreSQL
name: default
provider: aws
version: 1.0.0
specification:
  extensions:
    pgaudit:
      enabled: yes
      config_file_parameters:
        ## postgresql standard
        log_connections: 'off'
        log_disconnections: 'off'
        log_statement: 'none'
        log_line_prefix: "'%m [%p] %q%u@%d,host=%h '"
        ## pgaudit specific, see https://github.com/pgaudit/pgaudit/blob/REL_10_STABLE/README.md#settings
        pgaudit.log: "'write, function, role, ddl' # 'misc_set' is not supported for PG 10"
        pgaudit.log_catalog: 'off # to reduce overhead of logging'
        # the following first 2 parameters are set to values that make it easier to access audit log per table
        # change their values to the opposite if you need to reduce overhead of logging
        pgaudit.log_relation: 'on # separate log entry for each relation'
        pgaudit.log_statement_once: 'off'
        pgaudit.log_parameter: 'on'

If enabled property for PgAudit extension is set to yes, LambdaStack will install PgAudit package and add PgAudit extension to be loaded in shared_preload_libraries . Settings defined in config_file_parameters section are populated to LambdaStack managed PostgreSQL configuration file. Using this section, you can also set any additional parameter if needed (e.g. pgaudit.role) but keep in mind that these settings are global.

To configure PgAudit according to your needs, see PgAudit documentation.

Once LambdaStack installation is complete, there is one manual action at database level (per each database). Connect to your database using a client (like psql) and load PgAudit extension into current database by running command:

CREATE EXTENSION pgaudit;

To remove the extension from database, run:

DROP EXTENSION IF EXISTS pgaudit;

How to work with PostgreSQL connection pooling

PostgreSQL connection pooling is described in design documentaion page. Properly configured application (kubernetes service) to use fully HA configuration should be set up to connect to pgbouncer service (kubernetes) instead directly to database host. This configuration provides all the benefits of user PostgreSQL in clusteres HA mode (including database failover). Both pgbouncer and pgpool stores database users and passwords in configuration files and needs to be restarted (pods) in case of PostgreSQL authentication changes like: create, alter username or password. Pods during restart process are refreshing stored database credentials automatically.

How to configure PostgreSQL replication

Note

PostgreSQL native replication is now deprecated and removed. Use PostgreSQL HA replication with repmgr instead.

How to start working with OpenDistro for Elasticsearch

OpenDistro for Elasticsearch is an Apache 2.0-licensed distribution of Elasticsearch enhanced with enterprise security, alerting, SQL. In order to start working with OpenDistro change machines count to value greater than 0 in your cluster configuration:

kind: lambdastack-cluster
...
specification:
  ...
  components:
    kubernetes_master:
      count: 1
      machine: aws-kb-masterofpuppets
    kubernetes_node:
      count: 0
    ...
    logging:
      count: 1
    opendistro_for_elasticsearch:
      count: 2

Installation with more than one node will always be clustered - Option to configure the non-clustered installation of more than one node for Open Distro is not supported.

kind: configuration/opendistro-for-elasticsearch
title: OpenDistro for Elasticsearch Config
name: default
specification:
  cluster_name: LambdaStackElastic

By default, Kibana is deployed only for logging component. If you want to deploy Kibana for opendistro_for_elasticsearch you have to modify feature mapping. Use below configuration in your manifest.

kind: configuration/feature-mapping
title: "Feature mapping to roles"
name: default
specification:
  roles_mapping:
    opendistro_for_elasticsearch:
      - opendistro-for-elasticsearch
      - node-exporter
      - filebeat
      - firewall
      - kibana

Filebeat running on opendistro_for_elasticsearch hosts will always point to centralized logging hosts (./LOGGING.md).

How to start working with Apache Ignite Stateful setup

Apache Ignite can be installed in LambdaStack if count property for ignite feature is greater than 0. Example:

kind: lambdastack-cluster
specification:
  components:
    load_balancer:
      count: 1
    ignite:
      count: 2
    rabbitmq:
      count: 0
    ...

Configuration like in this example will create Virtual Machines with Apache Ignite cluster installed. There is possible to modify configuration for Apache Ignite and plugins used.

kind: configuration/ignite
title: "Apache Ignite stateful installation"
name: default
specification:
  version: 2.7.6
  file_name: apache-ignite-2.7.6-bin.zip
  enabled_plugins:
  - ignite-rest-http
  config: |
    <?xml version="1.0" encoding="UTF-8"?>

    <!--
      Licensed to the Apache Software Foundation (ASF) under one or more
      contributor license agreements.  See the NOTICE file distributed with
      this work for additional information regarding copyright ownership.
      The ASF licenses this file to You under the Apache License, Version 2.0
      (the "License"); you may not use this file except in compliance with
      the License.  You may obtain a copy of the License at
          http://www.apache.org/licenses/LICENSE-2.0
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.
    -->

    <beans xmlns="http://www.springframework.org/schema/beans"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="
          http://www.springframework.org/schema/beans
          http://www.springframework.org/schema/beans/spring-beans.xsd">

        <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
          <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
              <!-- Set the page size to 4 KB -->
              <property name="pageSize" value="#{4 * 1024}"/>
              <!--
              Sets a path to the root directory where data and indexes are
              to be persisted. It's assumed the directory is on a separated SSD.
              -->
              <property name="storagePath" value="/var/lib/ignite/persistence"/>

              <!--
                  Sets a path to the directory where WAL is stored.
                  It's assumed the directory is on a separated HDD.
              -->
              <property name="walPath" value="/wal"/>

              <!--
                  Sets a path to the directory where WAL archive is stored.
                  The directory is on the same HDD as the WAL.
              -->
              <property name="walArchivePath" value="/wal/archive"/>
            </bean>
          </property>

          <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
              <property name="ipFinder">
                <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                  <property name="addresses">
                  IP_LIST_PLACEHOLDER
                  </property>
                </bean>
              </property>
            </bean>
          </property>
        </bean>
    </beans>

Property enabled_plugins contains list with plugin names that will be enabled. Property config contains xml configuration for Apache Ignite. Important placeholder variable is IP_LIST_PLACEHOLDER which will be replaced by automation with list of Apache Ignite nodes for self discovery.

How to start working with Apache Ignite Stateless setup

Stateless setup of Apache Ignite is done using Kubernetes deployments. This setup uses standard applications LambdaStack's feature (similar to auth-service, rabbitmq). To enable stateless Ignite deployment use following document:

kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
specification:
  applications:
  - name: ignite-stateless
    image_path: "lambdastack/ignite:2.9.1" # it will be part of the image path: {{local_repository}}/{{image_path}}
    namespace: ignite
    service:
      rest_nodeport: 32300
      sql_nodeport: 32301
      thinclients_nodeport: 32302
    replicas: 1
    enabled_plugins:
    - ignite-kubernetes # required to work on K8s
    - ignite-rest-http

Adjust this config to your requirements with number of replicas and plugins that should be enabled.

5 - Helm

LambdaStack how-tos - Helm

Helm "system" chart repository

LambdaStack provides Helm repository for internal usage inside our Ansible codebase. Currently only the "system" repository is available, but it's not designed to be used by regular users. In fact, regular users must not reuse it for any purpose.

LambdaStack developers can find it inside this location roles/helm_charts/files/system. To add a chart to the repository it's enough just to put unarchived chart directory tree inside the location (in a separate directory) and re-run epcli apply.

When the repository Ansible role is run it copies all unarchived charts to the repository host, creates Helm repository (index.yaml) and serves all these files from Apache HTTP server.

Installing Helm charts from the "system" repository

LambdaStack developers can reuse the "system" repository from any place inside the Ansible codebase. Moreover, it's a responsibility of a particular role to call the helm upgrade --install command.

There is a helpler task file that can be reused for that purpose roles/helm/tasks/install-system-release.yml. It's only responsible for installing already existing "system" Helm charts from the "system" repository.

This helper task expects such parameters/facts:

- set_fact:
    helm_chart_name: <string>
    helm_chart_values: <map>
    helm_release_name: <string>

helm_chart_values is a standard yaml map, values defined there replace default config of the chart (values.yaml).

Our standard practice is to place those values inside the specification document of the role that deploys the Helm release in Kubernetes.

Example config:

kind: configuration/<mykind-used-by-myrole>
name: default
specification:
  helm_chart_name: mychart
  helm_release_name: myrelease
  helm_chart_values:
    service:
      port: 8080
    nameOverride: mychart_custom_name

Example usage:

- name: Mychart
  include_role:
    name: helm
    tasks_from: install-system-release.yml
  vars:
    helm_chart_name: "{{ specification.helm_chart_name }}"
    helm_release_name: "{{ specification.helm_release_name }}"
    helm_chart_values: "{{ specification.helm_chart_values }}"

By default all installed "system" Helm releases are deployed inside the ls-charts namespace in Kubernetes.

Uninstalling "system" Helm releases

To uninstall Helm release roles/helm/tasks/delete-system-release.yml can be used. For example:

- include_role:
    name: helm
    tasks_from: delete-system-release.yml
  vars:
    helm_release_name: myrelease

6 - Istio

LambdaStack how-tos - Istio

Istio

Open source platform which allows you to run service mesh for distributed microservice architecture. It allows to connect, manage and run secure connections between microservices and brings lots of features such as load balancing, monitoring and service-to-service authentication without any changes in service code. Read more about Istio here.

Installing Istio

Istio in LambdaStack is provided as K8s application. By default, it is not installed. To deploy it you need to add "configuration/applications" document to your configuration yaml file, similar to the example below (enabled flag must be set as true):

Istio is installed using Istio Operator. Operator is a software extension to the Kubernetes API which has a deep knowledge how Istio deployments should look like and how to react if any problem appears. It is also very easy to make upgrades and automate tasks that would normally be executed by user/admin.

---
kind: configuration/applications
version: 0.8.0
title: "Kubernetes Applications Config"
provider: aws
name: default
specification:
  applications:
  ...

## --- istio ---

  - name: istio
    enabled: true
    use_local_image_registry: true
    namespaces:
      operator: istio-operator # namespace where operator will be deployed
      watched: # list of namespaces which operator will watch
        - istio-system
      istio: istio-system # namespace where Istio control plane will be deployed
    istio_spec:
      profile: default # Check all possibilites https://istio.io/latest/docs/setup/additional-setup/config-profiles/
      name: istiocontrolplane

Using this configuration file, controller will detect Istio Operator resource in first of watched namespaces and will install Istio components corresponding to the specified profile (default). Using the default profile, Istio control plane and Istio ingress gateway will be deployed in istio-system namespace.

How to set up service mesh for an application

The default Istio installation uses automcatic sidecar injection. You need to label the namespace where application will be hosted:

kubectl label namespace default istio-injection=enabled

Once the proper namespaces are labeled and Istio is deployed, you can deploy your applications or restart existing ones.

You may need to make an application accessible from outside of your Kubernetes cluster. An Istio Gateway which was deployed using default profile is used for this purpose. Define the ingress gateway deploying gateway and virtual service specification. The gateway specification describes the L4-L6 properties of a load balancer and the virtual service specification describes the L7 properties of a load balancer.

Example of the gateway and virtual service specification (You have to adapt the entire specification to the application):

Gateway:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: httpbin-gateway
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "httpbin.example.com"

Virtual Service:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - "httpbin.example.com"
  gateways:
  - httpbin-gateway
  http:
  - match:
    - uri:
        prefix: /status
    - uri:
        prefix: /delay
    route:
    - destination:
        port:
          number: 8000
        host: httpbin

:warning: Pay attention to the network policies in your cluster if a CNI plugin is used that supports them (such as Calico or Canal). In this case, you should set up secure network policies for inter-microservice communication and communication between Envoy proxy and Istio control plane in your application's namespace. You can also just apply the following NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 namespace: <your_application_namespace>
 name: allow-istio-communication
spec:
  podSelector: {}
  egress:
  - {}
  ingress:
  - {}
  policyTypes:
  - Egress
  - Ingress

7 - Konnectivity

LambdaStack how-tos - Konnectivity

Konnectivity

Replaces using SSH Tunneling

This is currently a WIP (Work In Progress). Ansible playbook roles are being built and tested along with testing.

Server

Agent

RBAC

8 - Kubernetes

LambdaStack how-tos - Kubernetes

Kubernetes

Issues

See Troubleshooting

Kubectl

You can see from the Troubleshooting link above that the default secruity setup for kubectl is to have sudo rights to run and then to specify the kubeconfig=/etc/kubernetes/admin.conf as an additional parameter to kubectl. Also, by default, this only works on the Control Plane nodes. To have it work on Worker nodes or any node in the cluster do the following. Make sure it complies with your Security strategy:

# Control Plane node - Option 2 from link above...

mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Once kubectl is working as desired from a non-root user, you can simply:

Copy the ./kube/config file from the Control Plane node
Create the ./kube directory in the non-root user's home directory and then paste the config file copied in #1
Do this for any node you want to access kubectl on for a given cluster

Supported CNI plugins

LambdaStack supports following CNI plugins:

Flannel is a default setting in LambdaStack configuration.

NOTE

Calico is not supported on Azure. To have an ability to use network policies, choose Canal.

Use the following configuration to set up an appropriate CNI plugin:

 kind: configuration/kubernetes-master
 name: default
 specification:
   advanced:
     networking:
       plugin: flannel

Kubernetes applications - overview

Currently, LambdaStack provides the following predefined applications which may be deployed with lambdastack:

ignite
rabbitmq
auth-service (Keycloak)
pgpool
pgbouncer
istio

All of them have default configuration. The common parameters are: name, enabled, namespace, image_path and use_local_image_registry. If you set use_local_image_registry to false in configuration manifest, you have to provide a valid docker image path in image_path. Kubernetes will try to pull image from image_path value externally.
To see what version of the application image is in local image registry please refer to components list.

Note: The above link points to develop branch. Please choose the right branch that suits to LambdaStackphany version you are using.

How to expose service through HA Proxy load balancer

Create NodePort service type for your application in Kubernetes.
Make sure your service has statically assigned nodePort (a number between 30000-32767), for example 31234. More info here.

Add configuration document for load_balancer/HAProxy to your main config file.

kind: configuration/haproxy
title: "HAProxy"
name: haproxy
specification:
  frontend:
    - name: https_front
      port: 443
      https: yes
      backend:
        - http_back1
  backend:
    - name: http_back1
      server_groups:
        - kubernetes_node
      port: 31234
provider: <your-provider-here-replace-it>

Run lambdastack apply.

How to do Kubernetes RBAC

Kubernetes that comes with LambdaStack has an admin account created, you should consider creating more roles and accounts - especially when having many deployments running on different namespaces.

To know more about RBAC in Kubernetes use this link

How to run an example app

Here we will get a simple app to run using Docker through Kubernetes. We assume you are using Windows 10, have an LambdaStack cluster on Azure ready and have an Azure Container Registry ready (might not be created in early version LambdaStack clusters. If you don't have one you can skip to point no 11 and test the cluster using some public app from the original Docker Registry). Steps with asterisk can be skipped.

Install Chocolatey
Use Chocolatey to install:
- Docker-for-windows (choco install docker-for-windows, requires Hyper-V)
- Azure-cli (choco install azure-cli)
Make sure Docker for Windows is running (run as admin, might require a restart)
Run docker build -t sample-app:v1 . in examples/dotnet/lambdastack-web-app.
*For test purposes, run your image locally with docker run -d -p 8080:80 --name myapp sample-app:v1 and head to localhost:8080 to check if it's working.
*Stop your local docker container with: docker stop myapp and run docker rm myapp to delete the container.
*Now that you have a working docker image we can proceed to the deployment of the app on the LambdaStack Kubernetes cluster.
Run docker login myregistry.azurecr.io -u myUsername -p myPassword to login into your Azure Container Registry. Credentials are in the Access keys tab in your registry.
Tag your image with: docker tag sample-app:v1 myregistry.azurecr.io/samples/sample-app:v1
Push your image to the repo: docker push myregistry.azurecr.io/samples/sample-app:v1
SSH into your LambdaStack clusters master node.
*Run kubectl cluster-info and kubectl config view to check if everything is okay.
Run kubectl create secret docker-registry myregistry --docker-server myregistry.azurecr.io --docker-username myusername --docker-password mypassword to create k8s secret with your registry data.

Create sample-app.yaml file with contents:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
spec:
  selector:
    matchLabels:
      app: sample-app
  replicas: 2
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
      - name: sample-app
        image: myregistry.azurecr.io/samples/sample-app:v1
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 64Mi
          limits:
            memory: 128Mi
      imagePullSecrets:
      - name: myregistry

Run kubectl apply -f sample-app.yaml, and after a minute run kubectl get pods to see if it works.
Run kubectl expose deployment sample-app --type=NodePort --name=sample-app-nodeport, then run kubectl get svc sample-app-nodeport and note the second port.
Run kubectl get pods -o wide and check on which node is the app running.
Access the app through [AZURE_NODE_VM_IP]:[PORT] from the two previous points - firewall changes might be needed.

How to set resource requests and limits for Containers

When Kubernetes schedules a Pod, it’s important that the Containers have enough resources to actually run. If you schedule a large application on a node with limited resources, it is possible for the node to run out of memory or CPU resources and for things to stop working! It’s also possible for applications to take up more resources than they should.

When you specify a Pod, it is strongly recommended to specify how much CPU and memory (RAM) each Container needs. Requests are what the Container is guaranteed to get. If a Container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits make sure a Container never goes above a certain value. For more details about the difference between requests and limits, see Resource QoS.

For more information, see the links below:

How to run CronJobs

NOTE: Examples have been moved to their own repo but they are not visible at the moment.

Follow the previous point using examples/dotnet/LambdaStack.SampleApps/LambdaStack.SampleApps.CronApp

Create cronjob.yaml file with contents:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: sample-cron-job
spec:
  schedule: "*/1 * * * *"   # Run once a minute
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: sample-cron-job
            image: myregistry.azurecr.io/samples/sample-cron-app:v1
          restartPolicy: OnFailure
          imagePullSecrets:
          - name: myregistrysecret

Run kubectl apply -f cronjob.yaml, and after a minute run kubectl get pods to see if it works.
Run kubectl get cronjob sample-cron-job to get status of our cron job.
Run kubectl get jobs --watch to see job scheduled by the “sample-cron-job” cron job.

How to test the monitoring features

Prerequisites: LambdaStack cluster on Azure with at least a single VM with prometheus and grafana roles enabled.

Copy ansible inventory from build/lambdastack/*/inventory/ to examples/monitoring/
Run ansible-playbook -i NAME_OF_THE_INVENTORY_FILE grafana.yml in examples/monitoring
In the inventory file find the IP adress of the node of the machine that has grafana installed and head over to https://NODE_IP:3000 - you might have to head over to Portal Azure and allow traffic to that port in the firewall, also ignore the possible certificate error in your browser.
Head to Dashboards/Manage on the side panel and select Kubernetes Deployment metrics - here you can see a sample kubernetes monitoring dashboard.
Head to http://NODE_IP:9090 to see Prometheus UI - there in the dropdown you have all of the metrics you can monitor with Prometheus/Grafana.

How to run chaos on LambdaStack Kubernetes cluster and monitor it with Grafana

SSH into the Kubernetes master.
Copy over chaos-sample.yaml file from the example folder and run it with kubectl apply -f chaos-sample.yaml - it takes code from github.com/linki/chaoskube so normal security concerns apply.
Run kubectl create clusterrolebinding chaos --clusterrole=cluster-admin --user=system:serviceaccount:default:default to start the chaos - random pods will be terminated with 5s ferquency, configurable inside the yaml file.
Head over to Grafana at https://NODE_IP:3000, open a new dashboard, add a panel, set Prometheus as a data source and put kubelet_running_pod_count in the query field - now you can see how Kubernetes is replacing killed pods and balancing them between the nodes.
Run kubectl get svc nginx-service and note the second port. You can access the nginx page via [ANY_CLUSTER_VM_IP]:[PORT] - it is accessible even though random pods carrying it are constantly killed at random, unless you have more vms in your cluster than deployed nginx instances and choose IP of one not carrying it.

How to test the central logging features

Prerequisites: LambdaStack cluster on Azure with at least a single VM with elasticsearch, kibana and filebeat roles enabled.

Connect to kubectl using kubectl proxy or directly from Kubernetes master server
Apply from LambdaStack repository extras/kubernetes/pod-counter pod-counter.yaml with command: kubectl apply -f yourpath_to_pod_counter/pod-counter.yaml

Paths are system dependend so please be aware of applying correct separator for your operatins system.
In the inventory file find the IP adress of the node of the machine that has kibana installed and head over to http://NODE_IP:5601 - you might have to head over to Portal Azure and allow traffic to that port in the firewall.
You can right now search for data from logs in Discover section in Kibana after creating filebeat-* index pattern. To create index pattern click Discover, then in Step 1: Define index pattern as filebeat-*. Then click Next step. In Step 2: Configure settings click Create index pattern. Right now you can go to Discover section and look at output from your logs.
You can verify if CounterPod is sending messages correctly and filebeat is gathering them correctly querying for CounterPod in search field in Discover section.
For more informations refer to documentation: https://www.elastic.co/guide/en/kibana/current/index.html

How to tunnel Kubernetes Dashboard from remote kubectl to your PC

SSH into server, and forward port 8001 to your machine ssh -i ls_keys/id_rsa operations@40.67.255.155 -L 8001:localhost:8001 NOTE: substitute IP with your cluster master's IP.
On remote host: get admin token bearer: kubectl describe secret $(kubectl get secrets --namespace=kube-system | grep admin-user | awk '{print $1}') --namespace=kube-system | grep -E '^token' | awk '{print $2}' | head -1 NOTE: save this token for next points.
On remote host, open proxy to the dashboard kubectl proxy
Now on your local machine navigate to http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
When prompted to put in credentials, use admin token from the previous point.

How to run Keycloak on Kubernetes

Enable Kubernetes master & node, repository and postgresql components in initial configuration manifest (yaml) by encreasing count value.

kind: lambdastack-cluster
title: LambdaStack Cluster Config
provider: azure
name: default
build_path: '' # Dynamically built
specification:
 components:
    repository:
      count: 1
    kubernetes_master:
      count: 1
    kubernetes_node:
      count: 2
    postgresql:
      count: 2

Enable applications in feature-mapping in initial configuration manifest.

---
kind: configuration/feature-mapping
title: Feature mapping to roles
name: default
specification:
  available_roles:
  - _merge: true
  - name: applications
    enabled: true

Enable required applications by setting enabled: true and adjust other parameters in configuration/applications kind.

The default applications configuration available here

Note: To get working with Pgbouncer, Keycloak requires Pgbouncer configuration parametr POOL_MODE set to session, see Installing Pgbouncer and Pgpool section. The reason is that Keycloak uses SET SQL statements. For details see SQL feature map for pooling modes.

---
kind: configuration/applications
title: Kubernetes Applications Config
name: default
specification:
  applications:
  - _merge: true
  - name: auth-service
    enabled: true
    image_path: lambdastack/keycloak:14.0.0
    use_local_image_registry: true
    service:
      name: as-testauthdb
      port: 30104
      replicas: 2
      namespace: namespace-for-auth
      admin_user: auth-service-username
      admin_password: PASSWORD_TO_CHANGE
    database:
      name: auth-database-name
      user: auth-db-user
      password: PASSWORD_TO_CHANGE

To set specific database host IP address for Keyclock you have to provide additional parameter address:

    database:
      address: 10.0.0.2

Note: If database address is not specified, lambdastack assumes that database instance doesn't exist and will create it.

By default, if database address is not specified and if Postgres is HA mode, Keycloak uses PGBouncer ClusterIP service name as database address.
If Postgres is in standalone mode, and database address is not specified, then it uses first Postgres host address from inventory.

Run lambdastack apply on your configuration manifest.
Log into GUI

Note: Accessing the Keycloak GUI depends on your configuration.

By default, LambdaStack provides the following K8s Services for Keycloak: Headless and NodePort. The simplest way for reaching GUI is to use ssh tunnel with forwarding NodePort.
Example:
ssh -L 30104:localhost:30104 user@target_host -i ssh_key

If you need your GUI accesible outside, you would have to change your firewall rules.

GUI should be reachable at: https://localhost:30104/auth

9 - Logging

LambdaStack how-tos - Logging

Centralized logging setup

For centralized logging LambdaStack uses OpenDistro for Elasticsearch. In order to enable centralized logging, be sure that count property for logging feature is greater than 0 in your configuration manifest.

kind: lambdastack-cluster
...
specification:
  ...
  components:
    kubernetes_master:
      count: 1
    kubernetes_node:
      count: 0
    ...
    logging:
      count: 1
    ...

Default feature mapping for logging

...
logging:
  - logging
  - kibana
  - node-exporter
  - filebeat
  - firewall
...

Optional feature (role) available for logging: logstash more details here: link

The logging role replaced elasticsearch role. This change was done to enable Elasticsearch usage also for data storage - not only for logs as it was till 0.5.0.

Default configuration of logging and opendistro_for_elasticsearch roles is identical ( ./DATABASES.md#how-to-start-working-with-opendistro-for-elasticsearch). To modify configuration of centralized logging adjust and use the following defaults in your manifest:

kind: configuration/logging
title: Logging Config
name: default
specification:
  cluster_name: LambdaStackElastic
  clustered: True
  paths:
    data: /var/lib/elasticsearch
    repo: /var/lib/elasticsearch-snapshots
    logs: /var/log/elasticsearch

How to manage Opendistro for Elasticsearch data

Elasticsearch stores data using JSON documents, and an Index is a collection of documents. As in every database, it's crucial to correctly maintain data in this one. It's almost impossible to deliver database configuration which will fit to every type of project and data stored in. LambdaStack deploys preconfigured Opendistro Elasticsearch, but this configuration may not meet user requirements. Before going to production, configuration should be tailored to the project needs. All configuration tips and tricks are available in official documentation.

The main and most important decisions to take before you deploy cluster are:

How many Nodes are needed
How big machines and/or storage data disks need to be used

These parameters are defined in yaml file, and it's important to create a big enough cluster.

specification:
  components:
    logging:
      count: 1    #  Choose number of nodes
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
name: logging-machine
specification:
  size: Standard_DS2_v2    #  Choose machine size

If it's required to have Elasticsearch which works in cluster formation configuration, except setting up more than one machine in yaml config file please acquaint dedicated support article and adjust Elasticsearch configuration file.

At this moment Opendistro for Elasticsearch does not support plugin similar to ILM, log rotation is possible only by configuration created in Index State Management.

ISM - Index State Management - is a plugin that provides users and administrative panel to monitor the indices and apply policies at different index stages. ISM lets users automate periodic, administrative operations by triggering them based on index age, size, or number of documents. Using the ISM plugin, can define policies that automatically handle index rollovers or deletions. ISM is installed with Opendistro by default - user does not have to enable this. Official documentation is available in Opendistro for Elasticsearch website.

To reduce the consumption of disk resources, every index you created should use well-designed policy.

Among others these two index actions might save machine from filling up disk space:

Index Rollover - rolls an alias to a new index. Set up correctly max index size / age or minimum number of documents to keep index size in requirements framework.

Index Deletion - deletes indexes managed by policy

Combining these actions, adapting them to data amount and specification users are able to create policy which will maintain data in cluster for example: to secure node from fulfilling disk space.

There is example of policy below. Be aware that this is only example, and it needs to be adjusted to environment needs.

{
    "policy": {
        "policy_id": "ls_policy",
        "description": "Safe setup for logs management",
        "last_updated_time": 1615201615948,
        "schema_version": 1,
        "error_notification": null,
        "default_state": "keep",
        "states": [
            {
                "name": "keep",
                "actions": [],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "14d"
                        }
                    },
                    {
                        "state_name": "rollover_by_size",
                        "conditions": {
                            "min_size": "1gb"
                        }
                    },
                    {
                        "state_name": "rollover_by_time",
                        "conditions": {
                            "min_index_age": "1d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "delete": {}
                    }
                ],
                "transitions": []
            },
            {
                "name": "rollover_by_size",
                "actions": [
                    {
                        "rollover": {}
                    }
                ],
                "transitions": []
            },
            {
                "name": "rollover_by_time",
                "actions": [
                    {
                        "rollover": {}
                    }
                ],
                "transitions": []
            }
        ]
    }
}

Example above shows configuration with rollover daily or when index achieve 1 GB size. Indexes older than 14 days will be deleted. States and conditionals could be combined. Please see policies documentation for more details.

Apply Policy

To apply policy use similar API request as presented below:

PUT _template/template_01

{
  "index_patterns": ["filebeat*"],
  "settings": {
    "opendistro.index_state_management.rollover_alias": "filebeat"
    "opendistro.index_state_management.policy_id": "ls_policy"
  }
}

After applying this policy, every new index created under this one will apply to it. There is also possibility to apply policy to already existing policies by assigning them to policy in Index Management Kibana panel.

How to export Kibana reports to CSV format

Since v1.0 LambdaStack provides the possibility to export reports from Kibana to CSV, PNG or PDF using the Open Distro for Elasticsearch Kibana reports feature.

Check more details about the plugin and how to export reports in the documentation

Note: Currently in Open Distro for Elasticsearch Kibana the following plugins are installed and enabled by default: security, alerting, anomaly detection, index management, query workbench, notebooks, reports, alerting, gantt chart plugins.

You can easily check enabled default plugins for Kibana using the following command on the logging machine: ./bin/kibana-plugin list in Kibana directory.

How to export Elasticsearch data to CSV format

Since v0.8 LambdaStack provides the possibility to export data from Elasticsearch to CSV using Logstash (logstash-oss) along with logstash-input-elasticsearch and logstash-output-csv plugins.

To install Logstash in your cluster add logstash to feature mapping for logging, opendistro_for_elasticsearch or * elasticsearch* group.

NOTE

To check plugin versions following command can be used

/usr/share/logstash/bin/logstash-plugin list --verbose

LambdaStack provides a basic configuration file (logstash-export.conf.template) as template for your data export. This file has to be modified according to your Elasticsearch configuration and data you want to export.

NOTE

Exporting data is not automated. It has to be invoked manually. Logstash daemon is disabled by default after installation.

Run Logstash to export data:
/usr/share/logstash/bin/logstash -f /etc/logstash/logstash-export.conf

More details about configuration of input and output plugins.

NOTE

At the moment input plugin doesn't officially support skipping certificate validation for secure connection to Elasticsearch. For non-production environment you can easily disable it by adding new line:

ssl_options[:verify] = false right after other ssl_options definitions in file:

/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-elasticsearch-*/lib/logstash/inputs/elasticsearch.rb

How to add multiline support for Filebeat logs

In order to properly handle multilines in files harvested by Filebeat you have to provide multiline definition in the configuration manifest. Using the following code you will be able to specify which lines are part of a single event.

By default, postgresql block is provided, you can use it as example:

postgresql_input:
  multiline:
    pattern: >-
            '^\d{4}-\d{2}-\d{2} '
    negate: true
    match: after

Supported inputs: common_input,postgresql_input,container_input More details about multiline options you can find in the official documentation

How to deploy Filebeat as Daemonset in K8s

There is a possibility to deploy Filebeat as daemonset in K8s. To do that, set k8s_as_cloud_service option to true:

kind: lambdastack-cluster
specification:
  cloud:
    k8s_as_cloud_service: true

How to use default Kibana dashboards

It is possible to configure setup.dashboards.enabled and setup.dashboards.index Filebeat settings using specification.kibana.dashboards key in configuration/filebeat doc. When specification.kibana.dashboards.enabled is set to auto, the corresponding setting in Filebeat configuration file will be set to true only if Kibana is configured to be present on the host. Other possible values are true and false.

Default configuration:

specification:
  kibana:
    dashboards:
      enabled: auto
      index: filebeat-*

Note: Setting specification.kibana.dashboards.enabled to true not providing Kibana will result in a Filebeat crash.

10 - Maintenance

LambdaStack how-tos - Maintenance

Maintenance

Verification of service state

This part of the documentations covers the topic how to check if each component is working properly.

- Docker

To verify that Docker services are up and running you can first check the status of the Docker service with the following command:

systemctl status docker

Additionally you can check also if the command:

docker info

doesn't return any error. You can also find there useful information about your Docker configuration.

- Kubernetes

First to check if everything is working fine we need to check verify status of Kubernetes kubelet service with the command:

systemctl status kubelet

We can also check state of Kubernetes nodes using the command:

root@primary01:~# kubectl get nodes --kubeconfig=/etc/kubernetes/admin.conf
NAME                                         STATUS   ROLES    AGE   VERSION
primary01                                    Ready    master   24h   v1.17.7
node01                                       Ready    <none>   23h   v1.17.7
node02                                       Ready    <none>   23h   v1.17.7

We can get additional information about Kubernetes components:

root@primary01:~# kubectl cluster-info --kubeconfig=/etc/kubernetes/admin.conf
Kubernetes master is running at https://primary01:6443
CoreDNS is running at https://primary01:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

We can also check status of pods in all namespaces using the command:

kubectl get pods -A --kubeconfig=/etc/kubernetes/admin.conf

We can get additional information about components statuses:

root@primary01:~# kubectl get cs --kubeconfig=/etc/kubernetes/admin.conf
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-0               Healthy   {"health":"true"}

For more detailed information please refer to official documentation

- Keycloak

To check the if a Keycloak service deployed on Kubernetes is running with the command:

kubectl get pods --kubeconfig=/etc/kubernetes/admin.conf --namespace=keycloak_service_namespace --field-selector=status.phase=Running | grep keycloak_service_name

- HAProxy

To check status of HAProxy we can use the command:

systemctl status haproxy

Additionally we can check if the application is listening on ports defined in the file haproxy.cfg running netstat command.

- Prometheus

To check status of Prometheus we can use the command:

systemctl status prometheus

We can also check if Prometheus service is listening at the port 9090:

netstat -antup | grep 9090

- Grafana

To check status of Grafana we can use the command:

systemctl status grafana-server

We can also check if Grafana service is listening at the port 3000:

netstat -antup | grep 3000

- Prometheus Node Exporter

To check status of Node Exporter we can use the command:

status prometheus-node-exporter

- Elasticsearch

To check status of Elasticsearch we can use the command:

systemct status elasticsearch

We can check if service is listening on 9200 (API communication port):

netstat -antup | grep 9200

We can also check if service is listening on 9300 (nodes coummunication port):

netstat -antup | grep 9300

We can also check status of Elasticsearch cluster:

<IP>:9200/_cluster/health

We can do this using curl or any other equivalent tool.

- Kibana

To check status of Kibana we can use the command:

systemctl status kibana

We can also check if Kibana service is listening at the port 5601:

netstat -antup | grep 5601

- Filebeat

To check status of Filebeat we can use the command:

systemctl status filebeat

- PostgreSQL

To check status of PostgreSQL we can use commands:

on Ubuntu:

systemctl status postgresql

on Red Hat:

systemctl status postgresql-10

where postgresql-10 is only an example, because the number differs from version to version. Please refer to your version number in case of using this command.

We can also check if PostgreSQL service is listening at the port 5432:

netstat -antup | grep 5432

We can also use the pg_isready command, to get information if the PostgreSQL server is running and accepting connections with command:

on Ubuntu:

[user@postgres01 ~]$ pg_isready
/var/run/postgresql:5432 - accepting connections

on Red Hat:

[user@postgres01 ~]$ /usr/pgsql-10/bin/pg_isready
/var/run/postgresql:5432 - accepting connections

where the path /usr/pgsql-10/bin/pg_isready is only an example, because the number differs from version to version. Please refer to your version number in case of using this command.

11 - Modules

LambdaStack how-tos - Modules

Modules

Introduction

In version 0.8 of LambdaStack we introduced modules. Modularization of LambdaStack environment will result in:

smaller code bases for separate areas,
simpler and faster test process,
interchangeability of elements providing similar functionality (eg.: different Kubernetes providers),
faster and more focused release cycle.

Those and multiple other factors (eg.: readability, reliability) influence this direction of changes.

User point of view

From a user point of view, there will be no significant changes in the nearest future as it will be still possible to install LambdaStack "classic way" so with a single lambdastack configuration using a whole codebase as a monolith.

For those who want to play with new features, or will need newly introduced possibilities, there will be a short transition period which we consider as a kind of "preview stage". In this period there will be a need to run each module separately by hand in the following order:

moduleA init
moduleA plan
moduleA apply
moduleB init
moduleB plan
moduleB apply
...

Init, plan and apply phases explanation you'll find in next sections of this document. Main point is that dependent modules have to be executed one after another during this what we called "preview stage". Later, with next releases there will be separate mechanism introduced to orchestrate modules dependencies and their consecutive execution.

New scenarios

In 0.8 we offer the possibility to use AKS or EKS as Kubernetes providers. That is introduced with modules mechanism, so we launched the first four modules:

Azure Basic Infrastructure (AzBI) module
Azure AKS (AzKS) module
AWS Basic Infrastructure (AwsBI) module
AWS EKS (AwsKS) module

Those 4 modules together with the classic LambdaStack used with any provider allow replacing of on-prem Kubernetes cluster with managed Kubernetes services.

As it might be already visible there are 2 paths provided:

Azure related, using AzBI and AzKS modules,
AWS related, using AwsBI and AwsKS modules.

Those "... Basic Infrastructure" modules are responsible to provide basic cloud resources (eg.: resource groups, virtual networks, subnets, virtual machines, network security rules, routing, ect.) which will be used by next modules. So in this case, those are "... KS modules" meant to provide managed Kubernetes services. They use resources provided by basic infrastructure modules (eg.: subnets or resource groups) and instantiate managed Kubernetes services provided by cloud providers. The last element in both those cloud provider related paths is classic LambdaStack installed on top of resources provided by those modules using any provider.

Hands-on

In each module, we provided a guide on how to use the module. Please refer:

Azure Basic Infrastructure (AzBI) module
Azure AKS (AzKS) module
AWS Basic Infrastructure (AwsBI) module
AWS EKS (AwsKS) module

After deployment of EKS or AKS, you can perform LambdaStack installation on top of it.

Install LambdaStack on top of AzKS or AwsKS

NOTE - Default OS users:

Azure:
    redhat: ec2-user
    ubuntu: operations
AWS:
    redhat: ec2-user
    ubuntu: ubuntu

Create LambdaStack cluster config file in /tmp/shared/ls.yml Example:

kind: lambdastack-cluster
title: LambdaStack Cluster Config
name: your-cluster-name # <----- make unified with other places and build directory name
build_path: # Dynamically built
provider: any # <----- use "any" provider
specification:
  name: your-cluster-name # <----- make unified with other places and build directory name
  admin_user:
    name: operations # <----- make sure os-user is correct
    key_path: /tmp/shared/vms_rsa # <----- use generated key file
    path: # Dynamically built
  cloud:
    k8s_as_cloud_service: true # <----- make sure that flag is set, as it indicates usage of a managed Kubernetes service
  components:
    repository:
      count: 1
      machines:
        - default-lambdastack-modules-test-all-0 # <----- make sure that it is correct VM name
    kubernetes_master:
      count: 0
    kubernetes_node:
      count: 0
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 1
      machines:
        - default-lambdastack-modules-test-all-1 # <----- make sure that it is correct VM name
    load_balancer:
      count: 0
    rabbitmq:
      count: 0
---
kind: configuration/feature-mapping
title: Feature mapping to roles
name: your-cluster-name # <----- make unified with other places and build directory name
provider: any
specification:
  roles_mapping:
    repository:
      - repository
      - image-registry
      - firewall
      - filebeat
      - node-exporter
      - applications
---
kind: infrastructure/machine
name: default-lambdastack-modules-test-all-0
provider: any
specification:
  hostname: lambdastack-modules-test-all-0
  ip: 12.34.56.78 # <----- put here public IP attached to machine
---
kind: infrastructure/machine
name: default-lambdastack-modules-test-all-1
provider: any
specification:
  hostname: lambdastack-modules-test-all-1
  ip: 12.34.56.78 # <----- put here public IP attached to machine
---
kind: configuration/repository
title: "LambdaStack requirements repository"
name: default
specification:
  description: "Local repository of binaries required to install LambdaStack"
  download_done_flag_expire_minutes: 120
  apache_lsrepo_path: "/var/www/html/lsrepo"
  teardown:
    disable_http_server: true
    remove:
      files: false
      helm_charts: false
      images: false
      packages: false
provider: any
---
kind: configuration/postgresql
title: PostgreSQL
name: default
specification:
  config_file:
    parameter_groups:
      - name: CONNECTIONS AND AUTHENTICATION
        subgroups:
          - name: Connection Settings
            parameters:
              - name: listen_addresses
                value: "'*'"
                comment: listen on all addresses
          - name: Security and Authentication
            parameters:
              - name: ssl
                value: 'off'
                comment: to have the default value also on Ubuntu
      - name: RESOURCE USAGE (except WAL)
        subgroups:
          - name: Kernel Resource Usage
            parameters:
              - name: shared_preload_libraries
                value: AUTOCONFIGURED
                comment: set by automation
      - name: ERROR REPORTING AND LOGGING
        subgroups:
          - name: Where to Log
            parameters:
              - name: log_directory
                value: "'/var/log/postgresql'"
                comment: to have standard location for Filebeat and logrotate
              - name: log_filename
                value: "'postgresql.log'"
                comment: to use logrotate with common configuration
      - name: WRITE AHEAD LOG
        subgroups:
          - name: Settings
            parameters:
              - name: wal_level
                value: replica
                when: replication
          - name: Archiving
            parameters:
              - name: archive_mode
                value: 'on'
                when: replication
              - name: archive_command
                value: "'test ! -f /dbbackup/{{ inventory_hostname }}/backup/%f &&\
                    \ gzip -c < %p > /dbbackup/{{ inventory_hostname }}/backup/%f'"
                when: replication
      - name: REPLICATION
        subgroups:
          - name: Sending Server(s)
            parameters:
              - name: max_wal_senders
                value: 10
                comment: maximum number of simultaneously running WAL sender processes
                when: replication
              - name: wal_keep_segments
                value: 34
                comment: number of WAL files held for standby servers
                when: replication
  extensions:
    pgaudit:
      enabled: false
      shared_preload_libraries:
        - pgaudit
      config_file_parameters:
        log_connections: 'off'
        log_disconnections: 'off'
        log_statement: 'none'
        log_line_prefix: "'%m [%p] %q%u@%d,host=%h '"
        pgaudit.log: "'write, function, role, ddl' # 'misc_set' is not supported for\
            \ PG 10"
        pgaudit.log_catalog: 'off # to reduce overhead of logging'
        pgaudit.log_relation: 'on # separate log entry for each relation'
        pgaudit.log_statement_once: 'off'
        pgaudit.log_parameter: 'on'
    pgbouncer:
      enabled: false
    replication:
      enabled: false
      replication_user_name: ls_repmgr
      replication_user_password: PASSWORD_TO_CHANGE
      privileged_user_name: ls_repmgr_admin
      privileged_user_password: PASSWORD_TO_CHANGE
      repmgr_database: ls_repmgr
      shared_preload_libraries:
        - repmgr
  logrotate:
    config: |-
      /var/log/postgresql/postgresql*.log {
          maxsize 10M
          daily
          rotate 6
          copytruncate
      # delaycompress is for Filebeat
          delaycompress
          compress
          notifempty
          missingok
          su root root
          nomail
      # to have multiple unique filenames per day when dateext option is set
          dateformat -%Y%m%dH%H
      }      
provider: any
---
kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
specification:
  applications:
    - name: ignite-stateless
      enabled: false
      image_path: "lambdastack/ignite:2.9.1"
      use_local_image_registry: false
      namespace: ignite
      service:
        rest_nodeport: 32300
        sql_nodeport: 32301
        thinclients_nodeport: 32302
      replicas: 1
      enabled_plugins:
        - ignite-kubernetes
        - ignite-rest-http
    - name: rabbitmq
      enabled: false
      image_path: rabbitmq:3.8.3
      use_local_image_registry: false
      service:
        name: rabbitmq-cluster
        port: 30672
        management_port: 31672
        replicas: 2
        namespace: queue
      rabbitmq:
        plugins:
          - rabbitmq_management
          - rabbitmq_management_agent
        policies:
          - name: ha-policy2
            pattern: ".*"
            definitions:
              ha-mode: all
        custom_configurations:
          - name: vm_memory_high_watermark.relative
            value: 0.5
        cluster:
    - name: auth-service
      enabled: false
      image_path: jboss/keycloak:9.0.0
      use_local_image_registry: false
      service:
        name: as-testauthdb
        port: 30104
        replicas: 2
        namespace: namespace-for-auth
        admin_user: auth-service-username
        admin_password: PASSWORD_TO_CHANGE
      database:
        name: auth-database-name
        user: auth-db-user
        password: PASSWORD_TO_CHANGE
    - name: pgpool
      enabled: true
      image:
        path: bitnami/pgpool:4.1.1-debian-10-r29
        debug: false
      use_local_image_registry: false
      namespace: postgres-pool
      service:
        name: pgpool
        port: 5432
      replicas: 3
      pod_spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 100
                podAffinityTerm:
                  labelSelector:
                    matchExpressions:
                      - key: app
                        operator: In
                        values:
                          - pgpool
                  topologyKey: kubernetes.io/hostname
        nodeSelector: {}
        tolerations: {}
      resources:
        limits:
          memory: 176Mi
        requests:
          cpu: 250m
          memory: 176Mi
      pgpool:
        env:
          PGPOOL_BACKEND_NODES: autoconfigured
          PGPOOL_POSTGRES_USERNAME: ls_pgpool_postgres_admin
          PGPOOL_SR_CHECK_USER: ls_pgpool_sr_check
          PGPOOL_ADMIN_USERNAME: ls_pgpool_admin
          PGPOOL_ENABLE_LOAD_BALANCING: true
          PGPOOL_MAX_POOL: 4
          PGPOOL_POSTGRES_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_postgres_password
          PGPOOL_SR_CHECK_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_sr_check_password
          PGPOOL_ADMIN_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_admin_password
        secrets:
          pgpool_postgres_password: PASSWORD_TO_CHANGE
          pgpool_sr_check_password: PASSWORD_TO_CHANGE
          pgpool_admin_password: PASSWORD_TO_CHANGE
        pgpool_conf_content_to_append: |
          #------------------------------------------------------------------------------
          # CUSTOM SETTINGS (appended by LambdaStack to override defaults)
          #------------------------------------------------------------------------------
          # num_init_children = 32
          connection_life_time = 900
          reserved_connections = 1          
        pool_hba_conf: autoconfigured
    - name: pgbouncer
      enabled: true
      image_path: brainsam/pgbouncer:1.12
      init_image_path: bitnami/pgpool:4.1.1-debian-10-r29
      use_local_image_registry: false
      namespace: postgres-pool
      service:
        name: pgbouncer
        port: 5432
      replicas: 2
      resources:
        requests:
          cpu: 250m
          memory: 128Mi
        limits:
          cpu: 500m
          memory: 128Mi
      pgbouncer:
        env:
          DB_HOST: pgpool.postgres-pool.svc.cluster.local
          DB_LISTEN_PORT: 5432
          LISTEN_ADDR: "*"
          LISTEN_PORT: 5432
          AUTH_FILE: "/etc/pgbouncer/auth/users.txt"
          AUTH_TYPE: md5
          MAX_CLIENT_CONN: 150
          DEFAULT_POOL_SIZE: 25
          RESERVE_POOL_SIZE: 25
          POOL_MODE: transaction
provider: any

Run lambdastack tool to install LambdaStack:
```
lambdastack --auto-approve apply --file='/tmp/shared/ls.yml' --vault-password='secret'
```
This will install PostgreSQL on one of the machines and configure PgBouncer, Pgpool and additional services to manage database connections.

Please make sure you disable applications that you don't need. Also, you can enable standard LambdaStack services like Kafka or RabbitMQ, by increasing the number of virtual machines in the basic infrastructure config and assigning them to LambdaStack components you want to use.

If you would like to deploy custom resources into managed Kubernetes, then the standard kubeconfig yaml document can be found inside the shared state file (you should be able to use vendor tools as well to get it).

We highly recommend using the Ingress resource in Kubernetes to allow access to web applications inside the cluster. Since it's managed Kubernetes and fully supported by the cloud platform, the classic HAProxy load-balancer solution seems to be deprecated here.

12 - Monitoring

LambdaStack how-tos - Monitoring

Prometheus:

How to enable provided Prometheus rules
How to enable Alertmanager
How to configure scalable Prometheus setup

Grafana:

How to setup default admin password and user in Grafana
Import and create Grafana dashboards

Kibana:

How to configure Kibana
How to configure default user password in Kibana

Azure:

How to configure Azure additional monitoring and alerting

AWS:

How to configure AWS additional monitoring and alerting

Prometheus

Prometheus is an open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach. For more information about the features, components and architecture of Prometheus please refer to the official documentation.

How to enable provided Prometheus rules

Prometheus role provides the following files with rules:

common.rules (contain basic alerts like cpu load, disk space, memomory usage etc..)
container.rules (contain container alerts like container killed, volume usage, volume IO usage etc..)
kafka.rules (contain kafka alerts like consumer lags, )
node.rules (contain node alerts like node status, oom, cpu load, etc..)
postgresql.rules (contain postgresql alerts like postgresql status, exporter error, dead locks, etc..)
prometheus.rules (contain additional alerts for monitoring Prometheus itself + Alertmanager)

However, only common rules are enabled by default. To enable a specific rule you have to meet two conditions:

Your infrastructure has to have a specific component enabled (count > 0)
You have to set the value to "true" in Prometheus configuration in a manifest:

kind: configuration/prometheus
...
specification:
  alert_rules:
    common: true
    container: false
    kafka: false
    node: false
    postgresql: false
    prometheus: false

For more information about how to setup Prometheus alerting rules, refer to the official website.

How to enable Alertmanager

LambdaStack provides Alertmanager configuration via configuration manifest. To see default configuration please refer to default Prometheus configuration file.
To enable Alertmanager you have to modify configuration manifest:

Enable Alermanager
Enable desired alerting rules
Provide at least one receiver

Example:

...
specification:
...
  alertmanager:
    enable: true
    alert_rules:
      common: true
      container: false
      kafka: false
      node: false
      postgresql: false
      prometheus: false
...
    config:
      route:
        receiver: 'email'
      receivers:
        - name: 'email'
          email_configs:
            - to: "test@domain.com"

For more details about Alertmanager configuration please refer to the official documentation

How to configure scalable Prometheus setup

If you want to create scalable Prometheus setup you can use federation. Federation lets you scrape metrics from different Prometheus instances on one Prometheus instance.

In order to create a federation of Prometheus add to your configuration (for example to prometheus.yaml file) of previously created Prometheus instance (on which you want to scrape data from other Prometheus instances) to scrape_configs section:

scrape_configs:
  - job_name: federate
    metrics_path: /federate
    params:
      'match[]':
        - '{job=~".+"}'
    honor_labels: true
    static_configs:
    - targets:
      - your-prometheus-endpoint1:9090
      - your-prometheus-endpoint2:9090
      - your-prometheus-endpoint3:9090
      ...
      - your-prometheus-endpointn:9090

To check if Prometheus from which you want to scrape data is accessible, you can use a command like below (on Prometheus instance where you want to scrape data):

curl -G --data-urlencode 'match[]={job=~".+"}' your-prometheus-endpoint:9090/federate

If everything is configured properly and Prometheus instance from which you want to gather data is up and running, this should return the metrics from that instance.

Grafana

Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. For more information about Grafana please refer to the official website.

How to setup default admin password and user in Grafana

Prior to setup Grafana, please setup in your configuration yaml new password and/or name for your admin user. If not, default "admin" user will be used with the default password "PASSWORD_TO_CHANGE".

kind: configuration/grafana
specification:
  ...
  # Variables correspond to ones in grafana.ini configuration file
  # Security
  grafana_security:
    admin_user: admin
    admin_password: "YOUR_PASSWORD"
  ...

More information about Grafana security you can find at https://grafana.com/docs/grafana/latest/installation/configuration/#security address.

Import and create Grafana dashboards

LambdaStack uses Grafana for monitoring data visualization. LambdaStack installation creates Prometheus datasource in Grafana, so the only additional step you have to do is to create your dashboard.

There are also many ready to take Grafana dashboards created by community - remember to check license before importing any of those dashboards.

Creating dashboards

You can create your own dashboards Grafana getting started page will help you with it. Knowledge of Prometheus will be really helpful when creating diagrams since it use PromQL to fetch data.

Importing dashboards via Grafana GUI

To import existing dashboard:

If you have found dashboard that suits your needs you can import it directly to Grafana going to menu item Dashboards/Manage in your Grafana web page.
Click +Import button.
Enter dashboard id or load json file with dashboard definition
Select datasource for dashboard - you should select Prometheus.
Click Import

Importing dashboards via configuration manifest

In order to pull a dashboard from official Grafana website during lambdastack execution, you have to provide dashboard_id, revision_id and datasource in your configuration manifest.

Example:

kind: configuration/grafana
specification:
  ...
  grafana_online_dashboards:
    - dashboard_id: '4271'
      revision_id: '3'
      datasource: 'Prometheus'

Enabling predefined Grafana dashboards

Since v1.1.0 LambdaStack provides predefined Grafana dashboards. These dashboards are available in online and offline deployment modes. To enable particular Grafana dashboard, refer to default Grafana configuration file, copy kind: configuration/grafana section to your configuration manifest and uncomment desired dashboards.

Example:

kind: configuration/grafana
specification:
  ...
  grafana_external_dashboards:
  # Kubernetes cluster monitoring (via Prometheus)
    - dashboard_id: '315'
      datasource: 'Prometheus'
  # Node Exporter Server Metrics
    - dashboard_id: '405'
      datasource: 'Prometheus'

Note: The above link points to develop branch. Please choose the right branch that suits to LambdaStack version you are using.

Components used for monitoring

There are many monitoring components deployed with LambdaStack that you can visualize data from. The knowledge which components are used is important when you look for appropriate dashboard on Grafana website or creating your own query to Prometheus.

List of monitoring components - so called exporters:

cAdvisor
HAProxy Exporter
JMX Exporter
Kafka Exporter
Node Exporter
Zookeeper Exporter

When dashboard creation or import succeeds you will see it on your dashboard list.

Note: For some dashboards, there is no data to visualize until there is traffic activity for the monitored component.

Kibana

Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. For more informations about Kibana please refer to the official website.

How to configure Kibana - Open Distro

In order to start viewing and analyzing logs with Kibana, you first need to add an index pattern for Filebeat according to the following steps:

Goto the Management tab
Select Index Patterns
On the first step define as index pattern: filebeat-* Click next.
Configure the time filter field if desired by selecting @timestamp. This field represents the time that events occurred or were processed. You can choose not to have a time field, but you will not be able to narrow down your data by a time range.

This filter pattern can now be used to query the Elasticsearch indices.

By default Kibana adjusts the UTC time in @timestamp to the browser's local timezone. This can be changed in Management > Advanced Settings > Timezone for date formatting.

How to configure default user passwords for Kibana - Open Distro, Open Distro for Elasticsearch and Filebeat

To configure admin password for Kibana - Open Distro and Open Distro for Elasticsearch you need to follow the procedure below. There are separate procedures for logging and opendistro-for-elasticsearch roles since most of the times for opendistro-for-elasticsearch, kibanaserver and logstash users are not required to be present.

Logging component

- Logging role

By default LambdaStack removes users that are listed in demo_users_to_remove section of configuration/logging doc. By default, kibanaserver user (needed by default LambdaStack installation of Kibana) and logstash (needed by default LambdaStack installation of Filebeat) are not removed. If you want to perform configuration by LambdaStack, set kibanaserver_user_active to true for kibanaserver user or logstash_user_active for logstash user. For logging role, those settings are already set to true by default. We strongly advice to set different password for each user.

To change admin user's password, change value for admin_password key. For kibanaserver and logstash, change values for kibanaserver_password and logstash_password keys respectively. Changes from logging role will be propagated to Kibana and Filebeat configuration.

kind: configuration/logging
title: Logging Config
name: default
specification:
  ...
  admin_password: YOUR_PASSWORD
  kibanaserver_password: YOUR_PASSWORD
  kibanaserver_user_active: true
  logstash_password: YOUR_PASSWORD
  logstash_user_active: true
  demo_users_to_remove:
  - kibanaro
  - readall
  - snapshotrestore

- Kibana role

To set password of kibanaserver user, which is used by Kibana for communication with Open Distro Elasticsearch backend follow the procedure described in Logging role.

- Filebeat role

To set password of logstash user, which is used by Filebeat for communication with Open Distro Elasticsearch backend follow the procedure described in Logging role.

Open Distro for Elasticsearch component

By default LambdaStack removes all demo users except admin user. Those users are listed in demo_users_to_remove section of configuration/opendistro-for-elasticsearch doc. If you want to keep kibanaserver user (needed by default LambdaStack installation of Kibana), you need to remove it from demo_users_to_remove list and set kibanaserver_user_active to true in order to change the default password. We strongly advice to set different password for each user.

To change admin user's password, change value for admin_password key. For kibanaserver and logstash, change values for kibanaserver_password and logstash_password keys respectively.

kind: configuration/opendistro-for-elasticsearch
title: Open Distro for Elasticsearch Config
name: default
specification:
  ...
  admin_password: YOUR_PASSWORD
  kibanaserver_password: YOUR_PASSWORD
  kibanaserver_user_active: false
  logstash_password: YOUR_PASSWORD
  logstash_user_active: false
  demo_users_to_remove:
  - kibanaro
  - readall
  - snapshotrestore
  - logstash
  - kibanaserver

Upgrade of Elasticsearch, Kibana and Filebeat

During upgrade LambdaStack takes kibanaserver (for Kibana) and logstash (for Filebeat) user passwords and re-applies them to upgraded configuration of Filebeat and Kibana. LambdaStack upgrade of Open Distro, Kibana or Filebeat will fail if kibanaserver or logstash usernames were changed in configuration of Kibana, Filebeat or Open Distro for Elasticsearch.

Azure

How to configure Azure additional monitoring and alerting

Setting up addtional monitoring on Azure for redundancy is good practice and might catch issues the LambdaStack monitoring might miss like:

Azure issues and resource downtime
Issues with the VM which runs the LambdaStack monitoring and Alerting (Prometheus)

More information about Azure monitoring and alerting you can find under links provided below:

https://docs.microsoft.com/en-us/azure/azure-monitor/overview

https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-overview-alerts

AWS

How to configure AWS additional monitoring and alerting

TODO

13 - OS Patching

LambdaStack how-tos - OS Patching

Patching OS with running LambdaStack components

This guide describes steps you have to perform to patch RHEL and Ubuntu operating systems in a way to not to interrupt working LambdaStack components.

Disclaimer

We provide a recommended way to patch your RHEL and Ubuntu operating systems. Before proceeding with patching the production environment we strongly recommend patching your test cluster first. This document will help you decide how you should patch your OS. This is not a step-by-step guide.

Requirements

The fresh, actual backup containing your all important data
Verify if repositories are in the desired state. Details here

Patching OS with running LambdaStack components
- Disclaimer
- Requirements
Table of contents

AWS

Suggested OS images

For LambdaStack >= v1.2 we recommend the following image (AMI):

RHEL: RHEL-7.9_HVM-20210208-x86_64-0-Hourly2-GP2 (kernel 3.10.0-1160.15.2.el7.x86_64),
Ubuntu: ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20210907 (kernel 5.4.0-1056-aws).

Note: For different supported OS versions this guide may be useful as well.

Patching methods

AWS provides Patch Manager that automates the process of patching managed instances. Benefits:

Automate patching
Define approval rules
Create patch baselines
Monitor compliance

This feature is available via:

console: Systems Manager > Instances & Nodes > Patch Manager
AWS CLI

For more information, refer to AWS Systems Manager User Guide.

Azure

Suggested OS images

For LambdaStack >= v1.2 we recommend the following image (urn):

RHEL: RedHat:RHEL:7-LVM:7.9.2021051701 (kernel 3.10.0-1160.el7.x86_64),
Ubuntu: Canonical:UbuntuServer:18.04-LTS:18.04.202109130 (kernel 5.4.0-1058-azure).

Note: For different supported OS versions this guide may be useful as well.

Patching methods

Azure has Update Management solution in Azure Automation. It gives you visibility into update compliance across Azure and other clouds, and on-premises. The feature allows you to create scheduled deployments that orchestrate the installation of updates within a defined maintenance window. To manage updates that way please refer to official documentation.

Patching with OS specific package manager

The following commands can be executed in both clustered and non-clustered environments. In case of patching non-clustered environment, you have to schedule a maintenance window due to the required reboot after kernel patching.

Note: Some of the particular patches may also require a system reboot.

If your environment is clustered then hosts should be patched one by one. Before proceeding with the next host be sure that the patched host is up and all its components are running. For information how to check state of specific LambdaStack components, see here.

Repositories

LambdaStack uses the repository role to provide all required packages. The role disables all existing repositories and provides a new one. After successful LambdaStack deployment, official repositories should be re-enabled and lambdastack-provided repository should be disabled.

RHEL

Verify if lsrepo is disabled: yum repolist lsrepo

Verify if repositories you want to use for upgrade are enabled: yum repolist all

List installed security patches: yum updateinfo list security installed

List available patches without installing them: yum updateinfo list security available

Grab more details about available patches: yum updateinfo info security available or specific patch: yum updateinfo info security <patch_name>

Install system security patches: sudo yum update-minimal --sec-severity=critical,important --bugfix

Install all patches and updates, not only flagged as critical and important: sudo yum update

You can also specify the exact bugfix you want to install or even which CVE vulnerability to patch, for example: sudo yum update --cve CVE-2008-0947

Available options:

  --advisory=ADVS, --advisories=ADVS
                        Include packages needed to fix the given advisory, in updates
  --bzs=BZS             Include packages needed to fix the given BZ, in updates
  --cves=CVES           Include packages needed to fix the given CVE, in updates
  --sec-severity=SEVS, --secseverity=SEVS
                        Include security relevant packages matching the severity, in updates

Additional information Red Hat provides notifications about security flaws that affect its products in the form of security advisories. For more information, see here.

Ubuntu

For automated security patches Ubuntu uses unattended-upgrade facility. By default it runs every day. To verify it on your system, execute: dpkg --list unattended-upgrades cat /etc/apt/apt.conf.d/20auto-upgrades | grep Unattended-Upgrade

For information how to change Unattended-Upgrade configuration, see here.

The following steps will allow you to perform an upgrade manually.

Update your local repository cache: sudo apt update

Verify if lsrepo is disabled: apt-cache policy | grep lsrepo

Verify if repositories you want to use for upgrade are enabled: apt-cache policy

List available upgrades without installing them: apt-get upgrade -s

List available security patches: sudo unattended-upgrade -d --dry-run

Install system security patches: sudo unattended-upgrade -d

Install all patches and updates with dependencies: sudo apt-get dist-upgrade

Verify if your system requires a reboot after an upgrade (check if file exists): test -e /var/run/reboot-required && echo reboot required || echo reboot not required

Additional information Canonical provides notifications about security flaws that affect its products in the form of security notices. For more information, see here.

Patching with external tools

Solutions are available to perform kernel patching without system reboot.

Red Hat kpatch only for RHEL,
Canonical Livepatch Service only for Ubuntu,
KernelCare - third-party software. Available also in AWS Marketplace in SaaS model.

If you have a valid subscription for any of the above tools, we highly recommend using it to patch your systems.

14 - Persistent Storage

LambdaStack how-tos - Persistent Storage

Kubernetes persistent storage

LambdaStack supports Azure Files and Amazon EFS storage types to use as Kubernetes persistent volumes.

Azure

Infrastructure

LambdaStack creates a storage account with "Standard" tier and locally-redundant storage ("LRS" redundancy option). This storage account contains a file share with the name "k8s".

With the following configuration it is possible to specify storage account name and "k8s" file share quota in GiB.

---
kind: infrastructure/storage-share
name: default
provider: azure
specification:
  quota: 50

Kubernetes

There are a few related K8s objects created such as PersistentVolume, PersistentVolumeClaim and "azure-secret" Secret when specification.storage.enable is set to true. It is possible to control pv/pvc names and storage capacity/request in GiB with the configuration below.

NOTE

It makes no sense to specify greater capacity than Azure file share allows using. In general these values should be the same.

---
kind: configuration/kubernetes-master
name: default
provider: azure
specification:
  storage:
    name: lambdastack-cluster-volume
    enable: true
    capacity: 50

Additional configuration

It is possible to use Azure file shares created by your own. Check documentation for details. Created file shares may be used in different ways. There are appropriate configuration examples below.

NOTE

Before applying configuration, storage access secret should be created

Direct approach

As LambdaStack always creates a file share when provider: azure is used, in this case similar configuration can be used even with specification.storage.enable set to false.

apiVersion: v1
kind: Pod
metadata:
  name: azure1
spec:
  containers:
    - image: busybox
      name: azure
      command: [ "/bin/sh", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      volumeMounts:
        - name: azure
          mountPath: /mnt/azure
  volumes:
    - name: azure
      azureFile:
        secretName: azure-secret
        shareName: k8s
        readOnly: false

Using persistent volumes

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: lambdastack-cluster-volume
spec:
  storageClassName: azurefile
  capacity:
    storage: 50Gi
    accessModes:
      - "ReadWriteMany"
  azureFile:
    secretName: azure-secret
    shareName: k8s
    readOnly: false
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
    name: lambdastack-cluster-volume-claim
spec:
  storageClassName: azurefile
  volumeName: lambdastack-cluster-volume
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: azure2
spec:
  containers:
    - image: busybox
      name: azure
      command: [ "/bin/sh", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      volumeMounts:
        - name: azure
          mountPath: /mnt/azure
  volumes:
    - name: azure
      persistentVolumeClaim:
        claimName: lambdastack-cluster-volume-claim

AWS

Infrastructure

Amazon EFS can be configured using following configuration.

---
kind: infrastructure/efs-storage
provider: aws
name: default
specification:
  encrypted: true
  performance_mode: generalPurpose
  throughput_mode: bursting
  #provisioned_throughput_in_mibps:  # The throughput, measured in MiB/s, that you want to provision for the file system. Only applicable when throughput_mode set to provisioned

Kubernetes

Configuration for AWS supports additional parameter specification.storage.path that allows specifying the path on EFS to be accessed by pods. When specification.storage.enable is set to true, PersistentVolume and PersistentVolumeClaim are created

---
kind: configuration/kubernetes-master
name: default
provider: aws
specification:
  storage:
    name: lambdastack-cluster-volume
    path: /
    enable: true
    capacity: 50

Additional configuration

If provider: aws is specified, EFS storage is always created and can be used with persistent volumes created by the user. It is possible to create a separate EFS and use it. For more information check Kubernetes NFS storage documentation. There is another way to use EFS by Amazon EFS CSI driver but this approach is not supported by LambdaStack's AWS provider.

Persistent volume creation example

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: lambdastack-cluster-volume
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 100Gi
  mountOptions:
    - hard
    - nfsvers=4.1
    - rsize=1048576
    - wsize=1048576
    - timeo=600
    - retrans=2
  nfs:
    path: /
    server: fs-xxxxxxxx.efs.eu-west-1.amazonaws.com
  storageClassName: defaultfs
  volumeMode: Filesystem

15 - Prerequisites

LambdaStack how-tos - Prerequisites

Run LambdaStack from Docker image

There are 2 ways to get the image, build it locally yourself or pull it from the LambdaStack docker registry.

Option 1 - Build LambdaStack image locally

Shows the option of pushing the locally generated image to Docker Hub as well.

Install the following dependencies:
- Docker
Open a terminal in the root directory of the LambdaStack source code and run (it should contain the /cli subdirectory. This also where the Dockerfile is located). There are two options below, the first option builds and applies a specific tag/version to the image and the second option builds and applies a specific tag/version plus applies a 'latest' tag in the event the user only wanted the latest version:

TAG=$(cat version)
docker build --file Dockerfile --tag lambdastack/lambdastack:${TAG} .

TAG=$(cat version)
docker build --file Dockerfile --tag lambdastack/lambdastack:${TAG} --tag lambdastack/lambdastack:latest .

To push the image(s) to the default Docker Hub:
1. Make sure to create an account at Docker. If you want to have more than one repo then create an Organization and add yourself as a member. If organization, then select or create repo name. For example, we use LambdaStack as the organization and lambdastack as a repo (lambdastack/lambdastack). We actually have a number of repos but you get the point.
2. Push the image(s) to Docker Hub as follows: (Note - 'latest' tag is optional and Docker will see it's the same and simply create latest reference link)

TAG=$(cat version)
docker push lambdastack/lambdastack:${TAG}
docker push lambdastack/lambdastack:latest

Option 2a - Pull LambdaStack image from the registry

NOTE: This the default way. The latest version of LambdaStack will already be in the Docker Hub ready to be pulled down locally. If you built the image locally then it will already be in your local image so there is no need to pull it down - you can skip to doing a Docker Run like below.

TAG is the specific version tag given to the image. If you don't know the specific version then use the second option and it will grab the latest version.

docker pull lambdastack/lambdastack:TAG

docker pull lambdastack/lambdastack:latest

Check here for the available tags.

Option 2b - Running the LambdaStack image

To run the image:

docker run -it -v LOCAL_DIR:/shared --rm lambdastack/lambdastack:TAG

Where:

LOCAL_DIR should be replaced with the local path to the directory for LambdaStack input (SSH keys, data yaml files) and output (logs, build states),
TAG should be replaced with an existing tag.

Example: docker run -it -v $PWD:/shared --rm lambdastack/lambdastack:latest

The lambdastack docker image will mount to /shared. $PWD means present working directory so, change directory to where you want it to mount. It will expect any customized configs, SSH keys or data yaml files to be in that directory. The example above is for Linux based systems (including macs). See Windows method below.

Check here for the available tags.

Let LambdaStack run (it will take a while depending on the options you selected)!

Notes below are only here if you run into issues with a corporate proxy or something like that or if you want to do development and add cool new features to LambdaStack :).

LambdaStack development

For setting up the LambdaStack development environment please refer to this dedicated document here.

Important notes

Hostname requirements

LambdaStack supports only DNS-1123 subdomain that must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.

Note for Windows users

Watch out for the line endings conversion. By default, Git for Windows sets core.autocrlf=true. Mounting such files with Docker results in ^M end-of-line character in the config files. Use: Checkout as-is, commit Unix-style (core.autocrlf=input) or Checkout as-is, commit as-is (core.autocrlf=false). Be sure to use a text editor that can work with Unix line endings (e.g. Notepad++).
Remember to allow Docker Desktop to mount drives in Settings -> Shared Drives

Escape your paths properly:

Powershell example:

docker run -it -v C:\Users\USERNAME\git\LambdaStack:/LambdaStack --rm lambdastack-dev:

Git-Bash example:

winpty docker run -it -v C:\\Users\\USERNAME\\git\\LambdaStack:/LambdaStack --rm lambdastack-dev

Mounting NTFS disk folders in a linux based image causes permission issues with SSH keys. When running either the development or deploy image:

Copy the certs on the image:

mkdir -p ~/.ssh/lambdastack-operations/
cp /lambdastack/core/ssh/id_rsa* ~/.ssh/lambdastack-operations/

Set the proper permission on the certs:

chmod 400 ~/.ssh/lambdastack-operations/id_rsa*

Note about proxies

To run LambdaStack behind a proxy, environment variables need to be set.

When running a development container (upper and lowercase are needed because of an issue with the Ansible dependency):

export http_proxy="http://PROXY_SERVER:PORT"
export https_proxy="https://PROXY_SERVER:PORT"
export HTTP_PROXY="http://PROXY_SERVER:PORT"
export HTTPS_PROXY="https://PROXY_SERVER:PORT"

Or when running from a Docker image (upper and lowercase are needed because of an issue with the Ansible dependency):

docker run -it -v POSSIBLE_MOUNTS... -e HTTP_PROXY=http://PROXY_SERVER:PORT -e HTTPS_PROXY=http://PROXY_SERVER:PORT http_proxy=http://PROXY_SERVER:PORT -e https_proxy=http://PROXY_SERVER:PORT --rm IMAGE_NAME

Note about custom CA certificates

In some cases it might be that a company uses custom CA certificates or CA bundles for providing secure connections. To use these with LambdaStack you can do the following:

Devcontainer

Note that for the comments below the filenames of the certificate(s)/bundle do not matter, only the extensions. The certificate(s)/bundle need to be placed here before building the devcontainer.

If you have one CA certificate you can add it here with the crt extension.
If you have multiple certificates in a chain/bundle you need to add them here individually with the crt extension and also add the single bundle with the pem extension containing the same certificates. This is needed because not all tools inside the container accept the single bundle.

LambdaStack release container

If you are running LambdaStack from one of the prebuilt release containers you can do the following to install the certificate(s):

cp ./path/to/*.crt /usr/local/share/ca-certificates/
chmod 644 /usr/local/share/ca-certificates/*.crt
update-ca-certificates

If you plan to deploy on AWS you also need to add a separate configuration for Boto3 which can either be done by a config file or setting the AWS_CA_BUNDLE environment variable. More information about for Boto3 can be found here.

16 - Repository

LambdaStack how-tos - Repository

Repository

Introduction

When installing a cluster, LambdaStack sets up its own internal repository for serving:

This ONLY applies to airgapped environments (no Internet access environments - high secure areas)

OS packages
Files
Docker images

This document will provide information about the repository lifecyle and how to deal with possible issues that might popup during that.

Repository steps and lifecycle

Below the lifecycle of the LambdaStack repository:

Download requirements (This can be automatic for online cluster or manual for an airgapped cluster. )
Set up LambdaStack repository (create lsrepo and start HTTP server)
For all cluster machines:
- Back up and disable system package repositories
- Enable the LambdaStack repository
Install LambdaStack components
For all cluster machines:
- Disable the LambdaStack repository
- Restore original package repositories from the backup
Stop LambdaStack repository (optionally removing data)

Troubleshooting

Downloading requirements progression and logging

Note: This will only cover online clusters

Downloading requirements is one of the most sensitive steps in deploying a new cluster because lots of resources are being downloaded from various sources.

When you see the following output from lambdastack, requirements are being downloaded:

INFO cli.engine.ansible.AnsibleCommand - TASK [repository : Run download-requirements script, this can take a long time
INFO cli.engine.ansible.AnsibleCommand - You can check progress on repository host with: journalctl -f -t download-requirements.sh] ***

As noted this process can take a long time depending on the connection and as downloading requirements is being done by a shell script, the Ansible process cannot return any realtime information.

To view the progress during the downloading (realtime output from the logs), one can SSH into the repository machine and run:

journalctl -f -t download-requirements.sh

If for some reason the download-requirements fails you can also always check the log afterwards on the repository machine here:

/tmp/ls-download-requirements/log

Re-downloading requirements

If for some reason the download requirement step fails and you want to restart, it might be a good idea to delete the following directory first:

/var/www/html/lsrepo

This directory holds all the files being downloaded and removing it makes sure that there are no corrupted or temporary files which might interfere with the restarted download process.

If you want to re-download the requirements but the process finished successfully before, you might need to remove the following file:

/tmp/ls-download-requirements/download-requirements-done.flag

When this file is present and it isn't older than defined amount of time (2 hours by default), it enforces skipping re-downloading requirements.

Restoring system repositories

If during the component installation an issue will arise (e.g. network issue), it might be the case that the cluster machines are left in a state where step 5 of the repository lifecycle is not run. This might leave the machines with a broken repository setup making re-running lambdastack apply impossible as noted in issue #2324.

To restore the original repository setup on a machine, you can execute the following scripts:

# Re-enable system repositories
/tmp/ls-repository-setup-scripts/enable-system-repos.sh
# Disable lsrepo
/tmp/ls-repository-setup-scripts/disable-lsrepo-client.sh

17 - Retention

LambdaStack how-tos - Retention

An LambdaStack cluster has a number of components which log, collect and retain data. To make sure that these do not exceed the usable storage of the machines they running on, the following configurations are available.

Elasticsearch

TODO

Grafana

TODO

Kafka

There are two types of retention policies that can be configured at the broker or topic levels: based on time or size. LambdaStack defines the same default value for broker size retention policy as Kafka, -1, which means that no size limit is applied.

To define new log retention values following configuration can be used:

kind: configuration/kafka
title: "Kafka"
name: default
specification:
    kafka_var:
        partitions: 8
        log_retention_hours: 168
        log_retention_bytes: -1

Configuration parameters

specification.kafka_var.partitions

Sets num.partitions parameter

specification.kafka_var.log_retention_hours

Sets log.retention.hours parameter

specification.kafka_var.log_retention_bytes

Sets log.retention.bytes parameter

NOTE

Since this limit is enforced at the partition level, multiply it by the number of partitions to compute the topic retention in bytes.

Kibana

TODO

Kubernetes

TODO

Prometheus

TODO

Zookeeper

TODO

18 - Security Groups

LambdaStack how-tos - Security Groups

Security Groups

This document describes the Security Groups layout which is used to deploy LambdaStack in AWS or Azure. You will find the default configuration here, as well as examples of adding own rules or changing existing ones.

Introduction

By default LambdaStack platform is creating security groups required to handle communication by all components (like postgress/rabbitmq etc). As per defaults, LambdaStack creates a subnet per component and each subnet has its own of security group, with rules that allow communication between them. This enables the smooth communication between all components. Please check our security document too. Be aware, that whenever you want to add a new rule, you need to copy all default rules from mentioned above url. That this document is splited into two parts: AWS and Azure. The reason why we do that, is that there are diffrent values in AWS and AZure, when setting the security rules.

Setting own security groups

Sometimes, there is a need to set additional security rules for application which we're deploying in LambdaStack kubernetes cluster. Than, we need to stick into following rules:

Whenever we want to add new rule - for example open port "X", we should COPY all current roles into our deployment .yaml file, and at the end, add the rule which we want.
Each component has his own rule-set, so we need to be very carefull where we're putting them.
After coping, we can also modify existing default security groups.
After adding new rules, and infra part is done (terraform), we can go into terraform build directory and check if fiiles contain our port definition.

Security groups diagram

Check bellow security diagram, which show how security groups are related to other components. This is example of AWS architecutre, but in Azure should be almost the same.

Azure Security groups

List of all security groups and related services in Azure are described here.

Rules description:

- name:                       "Name of the rule"
  description:                "Short rule description"
  priority:                   "Priority (NUM), which describes which rules should be taken into conediration as first "
  direction:                  "Inbound || Outbound" - which direction are you allowing rule"
  access:                     "Allow|Deny - whenever we want to grant access or block"
  protocol:                   "TCP || UDP" - which protocol should be used for connections"
  source_port_range:          "Source port ranges"
  destination_port_range:     "Destination port/s range"
  source_address_prefix:      "Source network address"
  destination_address_prefix: "Destination network address"

Lets look into example on which, we are setting new rule name "nrpe-agent-port", with priority 250, which is allowing accesses from local network "10.1.4.0/24" into all hosts in our network on port 5666.

The rule:

     - name: nrpe-agent-port
       description: Allow access all hosts on port 5666 where nagios agent is running.
       priority: 250
       direction: Inbound
       access: Allow
       protocol: Tcp
       source_port_range: "*"
       destination_port_range: "5666"
       source_address_prefix: "10.1.4.0/24"
       destination_address_prefix: "0.0.0.0/0"

Azure Security groups full yaml file

To deploy previously mentioned rule, we need to setup a complete YAML configuraiton file. Bellow example shows how this file should looks like. In this configuration we set simple setup of LambdaStack with 2nodes and 1 master vm in Azure.

kind: lambdastack-cluster
name: default
provider: azure
title: LambdaStack Cluster Config
build_path: # Dynamically built
specification:
  name: azure
  prefix: azure
  admin_user:
    name: operations
    key_path:  id_rsa
    path: # Dynamically built
  cloud:
    region: East US
    subscription_name: PUT_SUBSCRIPTION_NAME_HERE
    use_public_ips: true
    use_service_principal: true
    network:
      use_network_security_groups: true
  components:
    kafka:
      count: 0
    kubernetes_master:
      count: 1
      machine: kubernetes-master-machine
      configuration: default
    kubernetes_node:
      count: 2
    load_balancer:
      count: 0
    logging:
      count: 0
    monitoring:
      count: 0
    postgresql:
      count: 0
    rabbitmq:
      count: 0
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
provider: azure
name: kubernetes-master-machine
specification:
  size: Standard_DS3_v2
  security:
    rules:
      - name: ssh
        description: Allow SSH
        priority: 100
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_port_range: "*"
        destination_port_range: "22"
        source_address_prefix: "0.0.0.0/0"
        destination_address_prefix: "0.0.0.0/0"
      - name: out
        description: Allow out
        priority: 101
        direction: Outbound
        access: Allow
        protocol: "*"
        source_port_range: "*"
        destination_port_range: "0"
        source_address_prefix: "0.0.0.0/0"
        destination_address_prefix: "0.0.0.0/0"
      - name: node_exporter
        description: Allow node_exporter traffic
        priority: 200
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_port_range: "*"
        destination_port_range: "9100"
        source_address_prefix: "10.1.0.0/20"
        destination_address_prefix: "0.0.0.0/0"
      - name: subnet-traffic
        description: Allow subnet traffic
        priority: 201
        direction: Inbound
        access: Allow
        protocol: "*"
        source_port_range: "*"
        destination_from_port: 0
        destination_to_port: 65536
        destination_port_range: "0"
        source_address_prefix: "10.1.1.0/24"
        destination_address_prefix: "0.0.0.0/0"
      - name: monitoring-traffic
        description: Allow monitoring subnet traffic
        priority: 203
        direction: Inbound
        access: Allow
        protocol: "*"
        source_port_range: "*"
        destination_from_port: 0
        destination_to_port: 65536
        destination_port_range: "0"
        source_address_prefix: "10.1.4.0/24"
        destination_address_prefix: "0.0.0.0/0"
      - name: node-subnet-traffic
        description: Allow node subnet traffic
        priority: 204
        direction: Inbound
        access: Allow
        protocol: "*"
        source_port_range: "*"
        destination_from_port: 0
        destination_to_port: 65536
        destination_port_range: "0"
        source_address_prefix: "10.1.2.0/24"
        destination_address_prefix: "0.0.0.0/0"
      - name: package_repository
        description: Allow package repository traffic
        priority: 205
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_port_range: "*"
        destination_port_range: "80"
        source_address_prefix: "10.1.0.0/20"
        destination_address_prefix: "0.0.0.0/0"
      - name: image_repository
        description: Allow image repository traffic
        priority: 206
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_port_range: "*"
        destination_port_range: "5000"
        source_address_prefix: "10.1.0.0/20"
        destination_address_prefix: "0.0.0.0/0"
      # Add NRPE AGENT RULE
      - name: nrpe-agent-port
        description: Allow access all hosts on port 5666 where nagios agent is running.
        priority: 250
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_port_range: "*"
        destination_port_range: "5666"
        source_address_prefix: "10.1.4.0/24"
        estination_address_prefix: "0.0.0.0/0"

AWS Security groups

List of all security groups and related services in AWS are described here.

Rules description:

- name:                       "Name of the rule"
  description:                "Short rule description"
  direction:                  "Inbound || Egress" - which direction are you allowing rule"
  protocol:                   "TCP || UDP" - which protocol should be used for connections"
  destination_port_range:     "Destination port/s range"
  source_address_prefix:      "Source network address"
  destination_address_prefix: "Destination network address"

Lets look into example on which, we are setting new rule name "nrpe-agent-port", which is allowing accesses from local network "10.1.4.0/24" into all hosts in our network on port 5666.

The rule:

     - name: nrpe-agent-port
       description: Allow access all hosts on port 5666 where nagios agent is running.
       direction: Inbound
       protocol: Tcp
       destination_port_range: "5666"
       source_address_prefix: "10.1.4.0/24"
       destination_address_prefix: "0.0.0.0/0"

AWS Setting groups full yaml file

Please check bellow example, how to setup basic LambdaStack cluster in AWS with 1 master, 2 nodes, mandatory repository machine, and open accesses to all hosts on port 5666 from monitoring network.

kind: lambdastack-cluster
name: default
provider: aws
build_path: # Dynamically built
specification:
  admin_user:
    name: ubuntu
    key_path: id_rsa
    path: # Dynamically built
  cloud:
    region: eu-central-1
    credentials:
      key: YOUR_AWS_KEY
      secret: YOUR_AWS_SECRET
    use_public_ips: true
  components:
    repository:
      count: 1
      machine: repository-machine
      configuration: default
      subnets:
      - availability_zone: eu-central-1a
        address_pool: 10.1.11.0/24
    kubernetes_master:
      count: 1
      machine: kubernetes-master-machine
      configuration: default
      subnets:
      - availability_zone: eu-central-1a
        address_pool: 10.1.1.0/24
      - availability_zone: eu-central-1b
        address_pool: 10.1.2.0/24
    kubernetes_node:
      count: 2
      machine: kubernetes-node-machine
      configuration: default
      subnets:
      - availability_zone: eu-central-1a
        address_pool: 10.1.1.0/24
      - availability_zone: eu-central-1b
        address_pool: 10.1.2.0/24
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 0
    load_balancer:
      count: 0
    rabbitmq:
      count: 0
    ignite:
      count: 0
    opendistro_for_elasticsearch:
      count: 0
    single_machine:
      count: 0
  name: testing
  prefix: 'aws-machine'
title: LambdaStack Cluster Config
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
provider: aws
name: kubernetes-master-machine
specification:
  size: t3.medium
  authorized_to_efs: true
  mount_efs: true
  security:
    rules:
     - name: ssh
       description: Allow ssh traffic
       direction: Inbound
       protocol: Tcp
       destination_port_range: "22"
       source_address_prefix: "0.0.0.0/0"
       destination_address_prefix: "0.0.0.0/0"
     - name: node_exporter
       description: Allow node_exporter traffic
       direction: Inbound
       protocol: Tcp
       destination_port_range: "9100"
       source_address_prefix: "10.1.0.0/20"
       destination_address_prefix: "0.0.0.0/0"
     - name: subnet-traffic
       description: Allow master subnet traffic
       direction: Inbound
       protocol: ALL
       destination_port_range: "0"
       source_address_prefix: "10.1.1.0/24"
       destination_address_prefix: "0.0.0.0/0"
     - name: monitoring-traffic
       description: Allow monitoring subnet traffic
       direction: Inbound
       protocol: ALL
       destination_port_range: "0"
       source_address_prefix: "10.1.4.0/24"
       destination_address_prefix: "0.0.0.0/0"
     - name: node-subnet-traffic
       description: Allow node subnet traffic
       direction: Inbound
       protocol: ALL
       destination_port_range: "0"
       source_address_prefix: "10.1.2.0/24"
       destination_address_prefix: "0.0.0.0/0"
     - name: out
       description: Allow out
       direction: Egress
       protocol: "all"
       destination_port_range: "0"
       source_address_prefix: "0.0.0.0/0"
       destination_address_prefix: "0.0.0.0/0"
     # New Rule
     - name: nrpe-agent-port
       description: Allow access all hosts on port 5666 where nagios agent is running.
       direction: Inbound
       protocol: Tcp
       destination_port_range: "5666"
       source_address_prefix: "10.1.4.0/24"
       destination_address_prefix: "0.0.0.0/0"

19 - Security

LambdaStack how-tos - Security

How to enable/disable LambdaStack service user

To enable/disable LambdaStack service user you can use tool from our repository. You can find this in directory tools/service_user_disable_enable under name service-user-disable-enable.yml.

To use this you need to have Ansible installed on machine from which you want to execute this.

To disable user you need to run command:

ansible-playbook -i inventory --extra-vars "operation=disable name=your_service_user_name" service-user-disable-enable.yml

To enable user you need to run command:

ansible-playbook -i inventory --extra-vars "operation=enable name=your_service_user_name" service-user-disable-enable.yml

How to add/remove additional users to/from OS

To add/remove users you need to provide additional section to kind: lambdastack-cluster configuration.

You need to add specification.users in the format similar to example that you can find below:

kind: lambdastack-cluster
name: pg-aws-deb
provider: aws
build_path: '' # Dynamically built
specification:

  ...

  users:
    - name: user01 # name of the user
      sudo: true # does user have sudo priviledge, not defined will set to false
      state: present # user will be added if not exist
      public_key: "ssh-rsa ..." # public key to add to .ssh/authorized_keys
    - name: user02
      state: absent # user will deleted if exists
      public_key: "ssh-rsa ..."
    - name: user03
      state: present
      public_key: "ssh-rsa ..."

  ...

How to use TLS/SSL certificate with HA Proxy

TODO

How to use TLS/SSL with Kafka

Right now LambdaStack supports only self-signed certificates generated and signed by CA self-sign certificate. If you want to provide your own certificates you need to configure Kafka manually according to Kafka documentation.

To use LambdaStack automation and self-signed certificates you need to provide your own configuration for kafka role and enable TLS/SSL as this is disabled by default.

To enable TLS/SSL communication in Kafka you can provide your own configuration of Kafka by adding it to your LambdaStack configuration file and set the enabled flag to true in the security/ssl section.

If in the ssl section you will also set the parameter client_auth parameter as required, you have to also provide configuration of authorization and authentication as this setting enforces checking identity. This option is by default set as required. Values requested and none don't require such setup.

When TLS/SSL is turned on then all communication to Kafka is encrypted and no other option is enabled. If you need different configuration, you need to configure Kafka manually.

When CA certificate and key is created on server it is also downloaded to host from which LambdaStack was executed. By default LambdaStack downloads this package to build output folder to ansible/kafka_certs directory. You can also change this path in LambdaStack configuration.

Sample configuration for Kafka with enabled TLS/SSL:

kind: configuration/kafka
title: "Kafka"
name: default
specification:

    ...

    security:
      ssl:
        enabled: True
        port: 9093 # port on which Kafka will listen for encrypted communication
        server:
          local_cert_download_path: kafka-certs # path where generated key and certificate will be downloaded
          keystore_location: /var/private/ssl/kafka.server.keystore.jks # location of keystore on servers
          truststore_location: /var/private/ssl/kafka.server.truststore.jks # location of truststore on servers
          cert_validity: 365 # period of time when certificates are valid
          passwords: # in this section you can define passwords to keystore, truststore and key
            keystore: PasswordToChange
            truststore: PasswordToChange
            key: PasswordToChange

        endpoint_identification_algorithm: HTTPS # this parameter enforces validating of hostname in certificate
        client_auth: required # authentication mode for Kafka - options are: none (no authentication), requested (optional authentication), required (enforce authentication, you need to setup also authentication and authorization then)
      inter_broker_protocol: SSL # must be set to SSL if TLS/SSL is enabled

    ...

How to use TLS/SSL certificates for Kafka authentication

To configure Kafka authentication with TLS/SSL, first you need to configure ssl section. Then you need to add authentication section with enabled flag set to true and set authentication_method as certificates. Setting authentication_method as sasl is not described right now in this document.

kind: configuration/kafka
title: "Kafka"
name: default
build_path: '' # Dynamically built
specification:

    ...

    security:

      ...

      authentication:
        enabled: True
        authentication_method: certificates

    ...

How to use TLS/SSL certificates for Kafka authorization

To configure Kafka authorization with TLS/SSL, first you need to configure ssl and authentication sections. If authentication is disabled, then authorization will be disabled as well.

To enable authorization, you need to provide authorization section and set enabled flag to True.

For authorization you can also provide different than default authorizer_class_name. By default kafka.security.auth.SimpleAclAuthorizer is used.

If allow_everyone_if_no_acl_found parameter is set to False, Kafka will prevent accessing resources everyone except super users and users having permissions granted to access topic.

You can also provide super users that will be added to Kafka configuration. To do this you need to provide list of users, like in the example below, and generate certificate on your own only with CN that matches position that can be found in list (do not set OU, DC or any other of parameters). Then the certificate needs to be signed by CA root certificate for Kafka. CA root certificate will be downloaded automatically by LambdaStack to location set in ssl.server.local_cert_download_path or can be found on first Kafka host in ssl.server.keystore_location directory. To access the certificate key, you need root priviledges.

kind: configuration/kafka
title: "Kafka"
name: default
build_path: '' # Dynamically built
specification:

    ...

    security:

      ...

      authorization:
        enabled: True
        authorizer_class_name: kafka.security.auth.SimpleAclAuthorizer
        allow_everyone_if_no_acl_found: False
        super_users:
          - tester01
          - tester02
    ...

How to enable Azure disk encryption

Automatic encryption of storage on Azure is not yet supported by LambdaStack. Guides to encrypt manually can be found:

Here for VM storage.
Here for storage shares,

How to use TLS/SSL certificate with RabbitMQ

To configure RabbitMQ TLS support in LambdaStack you need to set custom_configurations in the configuration file and manually create certificate with common CA according to documentation on your RabbitMQ machines:

https://www.rabbitmq.com/ssl.html#manual-certificate-generation

or:

https://www.rabbitmq.com/ssl.html#automated-certificate-generation

If stop_service parameter in configuration/rabbitmq is set to true, then RabbitMQ will be installed and stopped to allow manual actions that are required to copy or generate TLS certificates.

NOTE

To complete installation it's required to execute lambdastack apply the second time with stop_service set to false

There is custom_configurations setting in LambdaStack that extends RabbitMQ configuration with the custom one. Also, it can be used to perform TLS configuration of RabbitMQ. To customize RabbitMQ configuration you need to pass a list of parameters in format:

-name: rabbitmq.configuration.parameter value: rabbitmq.configuration.value

These settings are mapping to RabbitMQ TLS parameters configuration from documentation that you can find below the link: https://www.rabbitmq.com/ssl.html

Below you can find example of TLS/SSL configuration.


kind: configuration/rabbitmq
title: "RabbitMQ"
name: default
build_path: '' # Dynamically built
specification:

  ...

  custom_configurations: 
    - name: listeners.tcp # option that disables non-TLS/SSL support
      value: none
    - name: listeners.ssl.default # port on which TLS/SSL RabbitMQ will be listening for connections
      value: 5671
    - name: ssl_options.cacertfile # file with certificate of CA which should sign all certificates
      value: /var/private/ssl/ca/ca_certificate.pem
    - name: ssl_options.certfile # file with certificate of the server that should be signed by CA
      value: /var/private/ssl/server/server_certificate.pem
    - name: ssl_options.keyfile # file with key to the certificate of the server
      value: /var/private/ssl/server/private_key.pem
    - name: ssl_options.password # password to key protecting server certificate
      value: PasswordToChange
    - name: ssl_options.verify # setting of peer verification
      value: verify_peer
    - name: ssl_options.fail_if_no_peer_cert # parameter that configure behaviour if peer cannot present a certificate
      value: "false"

  ...

Please be careful about boolean values as they need to be double quoted and written in lowercase form. Otherwise RabbitMQ startup will fail.

How to enable AWS disk encryption

EC2 Root volumes

Encryption at rest for EC2 root volumes is turned on by default. To change this one can modify the encrypted flag for the root disk inside a infrastructure/virtual-machine document:

...
disks:
  root:
    volume_type: gp2
    volume_size: 30
    delete_on_termination: true
    encrypted: true
...

Additional EC2 volumes

Encryption at rest for additional EC2 volumes is turned on by default. To change this one can modify the encrypted flag for each additional_disks inside a infrastructure/virtual-machine document:

...
disks:
  root:
  ...
  additional_disks:
    - device_name: "/dev/sdb"
      volume_type: gp2
      volume_size: 60
      delete_on_termination: true
      encrypted: true
...

EFS storage

Encryption at rest for EFS storage is turned on by default. To change this one can modify the encrypted flag inside the infrastructure/efs-storage document:

kind: infrastructure/efs-storage
title: "Elastic File System Config"
provider: aws
name: default
build_path: '' # Dynamically built
specification:
  encrypted: true
...

Additional information can be found here.

How to use Kubernetes Secrets

Prerequisites: LambdaStack Kubernetes cluster

SSH into the Kubernetes master.
Run echo -n 'admin' > ./username.txt, echo -n 'VeryStrongPassword!!1' > ./password.txt and kubectl create secret generic mysecret --from-file=./username.txt --from-file=./password.txt
Copy over secrets-sample.yaml file from the example folder and run it with kubectl apply -f secrets-sample.yaml
Run kubectl get pods, copy the name of one of the ubuntu pods and run kubectl exec -it POD_NAME -- /bin/bash with it.
In the pods bash run printenv | grep SECRET - Kubernetes secret created in point 2 was attached to pods during creation (take a look at secrets-sample.yaml) and are availiable inside of them as an environmental variables.

How to authenticate to Azure AD app

Register you application. Go to Azure portal to Azure Active Directory => App registrations tab.
Click button New application registration fill the data and confirm.
Deploy app from examples/dotnet/LambdaStack.SampleApps/LambdaStack.SampleApps.AuthService.

This is a test service for verification Azure AD authentication of registered app. (How to deploy app)
Create secret key for your app settings => keys. Remember to copy value of key after creation.
Try to authenticate (e.g. using postman) calling service api <service-url>/api/auth/ with following Body application/json type parameters :
```
{
  "TenantId": "<tenant-id>",
  "ClientId": "<client-id>",
  "Resource": "https://graph.windows.net/",
  "ClientSecret": "<client-secret>"
}
```
- TenantId - Directory ID, which you find in Azure active Directory => Properties tab.
- ClientId - Application ID, which you find in details of previously registered app Azure Active Directory => App registrations => your app
- Resource - https://graph.windows.net is the service root of Azure AD Graph API. The Azure Active Directory (AD) Graph API provides programmatic access to Azure AD through OData REST API endpoints. You can construct your own Graph API URL. (How to construct a Graph API URL)
- ClientSecret - Created secret key from 4. point.
The service should return Access Token.

How to run lambdastack with password

LambdaStack encrypts Kubernetes artifacts (access tokens) stored in LambdaStack build directory. In order to achieve it, user is asked for password which will be used for encryption and decryption of artifacts. Remember to enter the same password for the same cluster - if password will not be the same, lambdastack will not be able to decrypt secrets.

Standard way of executing lambdastack has not been changed:

lambdastack apply -f demo.yaml

But you will be asked to enter a password:

Provide password to encrypt vault:

When running lambdastack from CI pipeline you can use new parameter for lambdastack:

lambdastack apply -f build/demo/demo.yaml --vault-password MYPWD

How to make kubectl work for non-root user on master node

For security reasons, the access to the admin credentials is limited to the root user. To make a non-root user the cluster administrator, run these commands (as the non-root user):

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

See more options in Troubleshooting

How to turn on Hashicorp Vault functionality

In LambdaStack beside storing secrets in Kubernetes secrets there is also a possibility of using secrets stored in Vault from Hashicorp. This can provide much more sophisticated solution for using secrets and also higher level of security than standard Kubernetes secrets implementation. Also LambdaStack provides transparent method to access Hashicorp Vault secrets with applications running on Kubernetes. You can read in the more about it in How to turn on Hashicorp Vault integration with k8s section. In the future we want also to provide additional features that right now can be configured manually according to Hashicorp Vault documentation.

At the moment only installation on Kubernetes Control Plane is supported, but we are also planning separate installation with no other components. Also at this moment we are not providing clustered option for Vault deployment, but this will be part of the future releases. For multi-master (HA) Kubernetes, Vault is installed only on the first master defined in Ansible inventory.

Below you can find sample configuration for Vault with description of all options.

kind: configuration/vault
title: Vault Config
name: default
specification:
  vault_enabled: true # enable Vault install
  vault_system_user: vault # user name under which Vault service will be running
  vault_system_group: vault # group name under which Vault service will be running
  enable_vault_audit_logs: false # turn on audit logs that can be found at /opt/vault/logs/vault_audit.log
  enable_vault_ui: false # enable Vault UI, shouldn't be used at production
  vault_script_autounseal: true # enable automatic unseal vault at the start of the service, shouldn't be used at production
  vault_script_autoconfiguration: true # enable automatic configuration of Hashicorp Vault. It sets the UNSEAL_VAULT variable in script.config
  ...
  app_secret_path: devwebapp # application specific path where application secrets will be mounted
  revoke_root_token: false # not implemented yet (more about in section Root token revocation)
  secret_mount_path: secret # start of the path that where secrets will be mounted
  vault_token_cleanup: true # should configuration script clean token
  vault_install_dir: /opt/vault # directory where vault will be installed
  vault_log_level: info # logging level that will be set for Vault service
  override_existing_vault_users: false # should user from vault_users ovverride existing user and generate new password
  vault_users: # users that will be created with vault
    - name: admin # name of the user that will be created in Vault
      policy: admin # name of the policy that will be assigned to user (descrption bellow)
    - name: provisioner
      policy: provisioner
  vault_helm_chart_values: # helm chart values overwriting the default package (to be able to use internal registry for offline purposes)
    injector:
      externalVaultAddr: https://your-external-address:8200 # external vault address (only if you want to setup address to provide full name to use with signed certificate) [IMPORTANT: switch https->http if tls_disable parameter is set to true]
      image:
        repository: "{{ image_registry_address }}/hashicorp/vault-k8s" # docker image used by vault injector in kubernetes
      agentImage:
        repository: "{{ image_registry_address }}/vault" # docker image used by vault injector in kubernetes
    server:
      image:
        repository: "{{ image_registry_address }}/vault" # docker image used by vault injector in kubernetes
  # TLS part
  tls_disable: false # enable TLS support, should be used always in production
  certificate_name: fullchain.pem # certificate file name
  private_key_name: privkey.pem # private key file name for certificate
  vault_tls_valid_days: 365 # certificate valid time in days
  selfsigned_certificate: # selfsigned certificate information
    country: US # selfexplanatory
    state: state # selfexplanatory
    city: city # selfexplanatory
    company: company # selfexplanatory
    common_name: "*" # selfexplanatory

More information about configuration of Vault in LambdaStack and some guidance how to start working with Vault with LambdaStack you can find below.

To get more familiarity with Vault usage you can reffer to official getting started guide.

Creation of user using LambdaStack in Vault

To create user by LambdaStack please provide list of users with name of policy that should be assigned to user. You can use predefined policy delivered by LambdaStack, default Vault policies or your own policy. Remember that if you have written your own policy it must exist before user creation.

Password for user will be generated automatically and can be found in directory /opt/vault in files matching tokens-*.csv pattern. If user password will be generated or changed you will see corresponding line in csv file with username, policy and password. If password won't be updated you will see ALREADY_EXISTS in password place.

Predefined Vault policies

Vault policies are used to define Role-Based Access Control that can be assigned to clients, applications and other components that are using Vault. You can find more information about policies here.

LambdaStack besides two already included in vault policies (root and default) provides two additional predefined policies:

admin - policy granting administration privileges, have sudo permission on Vault system endpoints
provisioner - policy granting permissions to create user secrets, adding secrets, enable authentication methods, but without access to Vault system endpoints

Manual unsealing of the Vault

By design Hashicorp Vault starts in sealed mode. It means that Vault data is encrypted and operator needs to provide unsealing key to be able to access data.

Vault can be unsealed manually using command:

vault operator unseal

and passing three unseal keys from /opt/vault/init.txt file. Number of keys will be defined from the level of LambdaStack configuration in the future releases. Right now we are using default Hashicorp Vault settings.

For development purposes you can also use vault_script_autounseal option in LambdaStack configuration.

More information about unseal you can find in documentation for CLI and about concepts here.

Configuration with manual unsealing

If you are using option with manual unseal or want to perform manual configuration you can run script later on manually from the command line:

/opt/vault/bin/configure-vault.sh
        -c /opt/vault/script.config
        -a ip_address_of_vault
        -p http | https
        -v helm_chart_values_be_override

Values for script configuration in script.config are automatically generated by LambdaStack and can be later on used to perform configuration.

Log into Vault with token

To log into Vault with token you just need to pass token. You can do this using command:

vault login

Only root token has no expiration date, so be aware that all other tokens can expire. To avoid such situations you need to renew the token. You can assign policy to token to define access.

More information about login with tokens you can find here and about tokens here.

Log into Vault with user and password

Other option to log into Vault is to use user/password pair. This method doesn't have disadvantage of login each time with different token after expire. To login with user/password pair you need to have userpass method and login with command:

vault login -method=userpass username=your-username

More information about login with tokens you can find here and about userpass authentication here.

Token Helpers

Vault provide option to use token helper. By default Vault is creating a file .vault-token in home directory of user running command vault login, which let to user perform automatically commands without providing a token. This token will be removed by default after LambdaStack configuration, but this can be changed using vault_token_cleanup flag.

More information about token helper you can find here.

Creating your own policy

In order to create your own policy using CLI please refer to CLI documentation and documentation.

Creating your own user

In order to create your own user with user and password login please refer to documentation. If you have configured any user using LambdaStack authentication userpass will be enabled, if not needs to be enabled manually.

Root token revocation

In production is a good practice to revoke root token. This option is not implemented yet, by LambdaStack, but will be implemented in the future releases.

Be aware that after revoking root token you won't be able to use configuration script without generating new token and replace old token with the new one in /opt/vault/init.txt (field Initial Root Token). For new root token generation please refer to documentation accessible here.

TLS support

By default tls_disable is set to false which means that certificates are used by vault. There are 2 ways of certificate configuration:

selfsigned

Vault selfsigned certificates are generated automatically during vault setup if no custom certificates are present in dedicated location.

certificate provided by user

In dedicated location user can add certificate (and private key). File names are important and have to be the same as provided in configuration and .pem file extensions are required.

Dedicated location of custom certificates: core/src/lambdastack/data/common/ansible/playbooks/roles/vault/files/tls-certs

Certificate files names configuration:

kind: configuration/vault
title: Vault Config
name: default
specification:
...
  certificate_name: fullchain.pem # certificate file name
  private_key_name: privkey.pem # private key file name for certificate
...

Production hardening for Vault

In LambdaStack we have performed a lot of things to improve Vault security, e.g.:

End-to-End TLS
Disable Swap (when running on Kubernetes machine)
Don't Run as Root
Turn Off Core
Enable Auditing
Restrict Storage Access
Tweak ulimits

However if you want to provide more security please refer to this guide.

Troubleshooting

To perform troubleshooting of vault and find the root cause of the problem please enable audit logs and set vault_log_level to debug. Please be aware that audit logs can contain sensitive data.

How to turn on Hashicorp Vault integration with k8s

In LambdaStack there is also an option to configure automatically integration with Kubernetes. This is achieved with applying additional settings to Vault configuration. Sample config with description you can find below.

kind: configuration/vault
title: Vault Config
name: default
specification:
  vault_enabled: true
  ...
  vault_script_autounseal: true
  vault_script_autoconfiguration: true
  ...
  kubernetes_integration: true # enable setup kubernetes integration on vault side
  kubernetes_configuration: true # enable setup kubernetes integration on vault side
  enable_vault_kubernetes_authentication: true # enable kubernetes authentication on vault side
  kubernetes_namespace: default # namespace where your application will be deployed
  ...

Vault and Kubernetes integration in LambdaStack relies on vault-k8s tool. Thit tool enables sidecar injection of secret into pod with usage of Kubernetes Mutating Admission Webhook. This is transparent for your application and you do not need to perform any binding to Hashicorp libaries to use secret stored in Vault.

You can also configure Vault manually on your own enabling by LambdaStack only options that are necessary for you.

More about Kubernetes sidecar integration you can find at the link.

Vault Kubernetes authentication

To work with sidecar integration with Vault you need to enable Kubernetes authentication. Without that sidecar won't be able to access secret stored in Vault.

If you don't want to use sidecar integration, but you want to access automatically Vault secrets you can use Kubernetes authentication. To find more information about capabilities of Kubernetes authentication please refer to documentation.

Create your secret in Vault

In LambdaStack you can use integration of key value secrets to inject them into container. To do this you need to create them using vault CLI.

You can do this running command similar to sample below:

vault kv put secret/yourpath/to/secret username='some_user' password='some_password'

LambdaStack as backend for Vault secrets is using kv secrets engine. More information about kv secrets engine you can find here.

Kubernetes namespace

In LambdaStack we are creating additional Kubernetes objects to inject secrets automatically using sidecar. Those objects to have access to your application pods needs to be deployed in the same namespace.

Annotations

Below you can find sample of deployment configuration excerpt with annotations. For this moment vault.hashicorp.com/role cannot be changed, but this will change in future release.

  template:
    metadata:
      labels:
        app: yourapp
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "devweb-app"
        vault.hashicorp.com/agent-inject-secret-credentials.txt: "secret/data/yourpath/to/secret"
        vault.hashicorp.com/tls-skip-verify: "true"

vault.hashicorp.com/tls-skip-verify If true, configures the Vault Agent to skip verification of Vault's TLS certificate. It's mandatory for selfsigned certificates and not recommended to set this value to true in a production environment.

More information about annotations you can find here.

20 - Upgrade

LambdaStack how-tos - Upgrade

Upgrade

Introduction

From lscli 0.4.2 and up the CLI has the ability to perform upgrades on certain components on a cluster. The components it currently can upgrade and will add are:

NOTE

There is an assertion to check whether K8s version is supported before running upgrade.

Kubernetes (master and nodes). Supported versions: v1.18.6 (LambdaStack 0.7.1+), v1.20.12 (LambdaStack 1.3.0+)
common: Upgrades all common configurations to match them to current LambdaStack version
repository: Adds the repository role needed for component installation in current LambdaStack version
image_registry: Adds the image_registry role needed for offline installation in current LambdaStack version

The component upgrade takes the existing Ansible build output and based on that performs the upgrade of the currently supported components. If you need to re-apply your entire LambdaStack cluster a manual adjustment of the input yaml is needed to the latest specification which then should be applied with lambdastack apply.... Please see Run apply after upgrade chapter for more details.

Note about upgrade from pre-0.8 LambdaStack:

If you need to upgrade a cluster deployed with lambdastack in version earlier than 0.8, you should make sure that you've got enough disk space on master (which is used as repository) host. If you didn't extend OS disk on master during deployment process, you probably have only 32 GB disk which is not enough to properly upgrade cluster (we recommend at least 64 GB). Before you run upgrade, please extend OS disk on master machine according to cloud provider documentation: AWS , Azure.
If you use logging-machine(s) already in your cluster, it's necessary to scale up those machines before running upgrade to ensure you've got enough resources to run ELK stack in newer version. We recommend to use at least DS2_v2 Azure size (2 CPUs, 7 GB RAM) machine, or its equivalent on AWS and on-prem installations. It's very related to amount of data you'll store inside. Please see logging documentation for more details.

Online upgrade

Online prerequisites

Your airgapped existing cluster should meet the following requirements:

The cluster machines/vm`s are connected by a network or virtual network of some sorts and can communicate which each other and have access to the internet:
The cluster machines/vm`s are upgraded to the following versions:
- RedHat 7.6
- CentOS 7.6
- Ubuntu 18.04
The cluster machines/vm`s should be accessible through SSH with a set of SSH keys you provided and configured on each machine yourself.
A provisioning machine that:
- Has access to the SSH keys
- Has access to the build output from when the cluster was first created.
- Is on the same network as your cluster machines
- Has LambdaStack 0.4.2 or up running. Note. To run LambdaStack check the Prerequisites

Start the online upgrade

Start the upgrade with:

lambdastack upgrade -b /buildoutput/

This will backup and upgrade the Ansible inventory in the provided build folder /buildoutput/ which will be used to perform the upgrade of the components.

Offline upgrade

Offline prerequisites

Your airgapped existing cluster should meet the following requirements:

The airgapped cluster machines/vm`s are connected by a network or virtual network of some sorts and can communicate with each other:
The airgapped cluster machines/vm`s are upgraded to the following versions:
- RedHat 7.6
- CentOS 7.6
- Ubuntu 18.04
The airgapped cluster machines/vm`s should be accessible through SSH with a set of SSH keys you provided and configured on each machine yourself.
A requirements machine that:
- Runs the same distribution as the airgapped cluster machines/vm`s (RedHat 7, CentOS 7, Ubuntu 18.04)
- Has access to the internet.
A provisioning machine that:
- Has access to the SSH keys
- Has access to the build output from when the cluster was first created.
- Is on the same network as your cluster machines
- Has LambdaStack 0.4.2 or up running.

NOTE

Before running lambdastack, check the Prerequisites

Start the offline upgrade

To upgrade the cluster components run the following steps:

First we need to get the tooling to prepare the requirements for the upgrade. On the provisioning machine run:
```
lambdastack prepare --os OS
```
Where OS should be centos-7, redhat-7, ubuntu-18.04. This will create a directory called prepare_scripts with the needed files inside.
The scripts in the prepare_scripts will be used to download all requirements. To do that, copy the prepare_scripts folder over to the requirements machine and run the following command:
```
download-requirements.sh /requirementsoutput/
```
This will start downloading all requirements and put them in the /requirementsoutput/ folder. Once run successfully the /requirementsoutput/ needs to be copied to the provisioning machine to be used later on.
Finally, start the upgrade with:
```
lambdastack upgrade -b /buildoutput/ --offline-requirements /requirementsoutput/
```
This will backup and upgrade the Ansible inventory in the provided build folder /buildoutput/ which will be used to perform the upgrade of the components. The --offline-requirements flag tells LambdaStack where to find the folder with requirements (/requirementsoutput/) prepared in steps 1 and 2 which is needed for the offline upgrade.

Additional parameters

The lambdastack upgrade command has additional flags:

--wait-for-pods. When this flag is added, the Kubernetes upgrade will wait until all pods are in the ready state before proceeding. This can be useful when a zero downtime upgrade is required. Note: that this can also cause the upgrade to hang indefinitely.
--upgrade-components. Specify comma separated component names, so the upgrade procedure will only process specific ones. List cannot be empty, otherwise execution will fail. By default, upgrade will process all components if this parameter is not provided

Example:
```
lambdastack upgrade -b /buildoutput/ --upgrade-components "kafka,filebeat"
```

Run apply after upgrade

Currently, LambdaStack does not fully support apply after upgrade. There is a possibility to re-apply configuration from newer version of LambdaStack but this needs some manual work from Administrator. Re-apply on already upgraded cluster needs to be called with --no-infra option to skip Terraform part of configuration. If apply after upgrade is run with --no-infra, the used system images from the older LambdaStack version are preserved to prevent the destruction of the VMs. If you plan modify any infrastructure unit (e.g., add Kubernetes Node) you need to create machine by yourself and attach it into configuration yaml. While running lambdastack apply... on already upgraded cluster you should use yaml config files generated in newer version of LambdaStack and apply changes you had in older one. If the cluster is upgraded to version 0.8 or newer you need also add additional feature mapping for repository role as shown on example below:

---
kind: lambdastack-cluster
name: clustername
provider: azure
build_path: # Dynamically built
specification:
  admin_user:
    key_path: id_rsa
    name: operations
    path: # Dynamically built
  components:
    repository:
      count: 0  # Set repository to 0 since it's introduced in v0.8
    kafka:
      count: 1
    kubernetes_master:
      count: 1
    kubernetes_node:
      count: 2
    load_balancer:
      count: 1
    logging:
      count: 1
    monitoring:
      count: 1
    postgresql:
      count: 1
    rabbitmq:
      count: 0
    ignite:
      count: 0
    opendistro_for_elasticsearch:
      count: 0
  name: clustername
  prefix: 'prefix'
title: LambdaStack Cluster Config
---
kind: configuration/feature-mapping
title: Feature mapping to roles
provider: azure
name: default
specification:
  roles_mapping:
    kubernetes_master:
      - kubernetes-master
      - helm
      - applications
      - node-exporter
      - filebeat
      - firewall
      - vault
      - repository      # add repository here
      - image-registry  # add image-registry here
...

Kubernetes applications

To upgrade applications on Kubernetes to the desired version after lambdastack upgrade you have to:

generate new configuration manifest using lambdastack init
in case of generating minimal configuration manifest (without --full argument), copy and paste the default configuration into it
run lambdastack apply

NOTE

The above link points to develop branch. Please choose the right branch that suits to LambdaStack version you are using.

How to upgrade Kafka

Kafka upgrade

Kafka will be automatically updated to the latest version supported by LambdaStack. You can check the latest supported version here. Kafka brokers are updated one by one - but the update procedure does not guarantee "zero downtime" because it depends on the number of available brokers, topic, and partitioning configuration.

ZooKeeper upgrade

Redundant ZooKeeper configuration is also recommended, since service restart is required during upgrade - it can cause ZooKeeper unavailability. Having at least two ZooKeeper services in ZooKeepers ensemble you can upgrade one and then start with the rest one by one.

More detailed information about ZooKeeper you can find in ZooKeeper documentation.

Open Distro for Elasticsearch upgrade

NOTE

Before upgrade procedure make sure you have a data backup!

In LambdaStack v1.0.0 we provided upgrade elasticsearch-oss package to v7.10.2 and opendistro-* plugins package to v1.13.*. Upgrade will be performed automatically when the upgrade procedure detects your logging , opendistro_for_elasticsearch or kibana hosts.

Upgrade of Elasticsearch uses API calls (GET, PUT, POST) which requires an admin TLS certificate. By default, LambdaStack generates self-signed certificates for this purpose but if you use your own, you have to provide the admin certificate's location. To do that, edit the following settings changing cert_path and key_path.

logging:
  upgrade_config:
    custom_admin_certificate:
      cert_path: /etc/elasticsearch/custom-admin.pem
      key_path:  /etc/elasticsearch/custom-admin-key.pem

opendistro_for_elasticsearch:
  upgrade_config:
    custom_admin_certificate:
      cert_path: /etc/elasticsearch/custom-admin.pem
      key_path:  /etc/elasticsearch/custom-admin-key.pem

They are accessible via the defaults of upgrade role (/usr/local/lambdastack/data/common/ansible/playbooks/roles/upgrade/defaults/main.yml).

Node exporter upgrade

NOTE

Before upgrade procedure, make sure you have a data backup, and you are familiar with breaking changes.

Starting from LambdaStack v0.8.0 it's possible to upgrade node exporter to v1.0.1. Upgrade will be performed automatically when the upgrade procedure detects node exporter hosts.

RabbitMQ upgrade

NOTE

Before upgrade procedure, make sure you have a data backup. Check that the node or cluster is in a good state: no alarms are in effect, no ongoing queue synchronisation operations and the system is otherwise under a reasonable load. For more information visit RabbitMQ site.

With the latest LambdaStack version it's possible to upgrade RabbitMQ to v3.8.9. It requires Erlang system packages upgrade that is done automatically to v23.1.4. Upgrade is performed in offline mode after stopping all RabbitMQ nodes. Rolling upgrade is not supported by LambdaStack, and it is advised not to use this approach when Erlang needs to be upgraded.

Kubernetes upgrade

Prerequisites

Before K8s version upgrade make sure that deprecated API versions are not used:

v1.18

Upgrade

NOTE

If the K8s cluster that is going to be upgraded has the Istio control plane application deployed, issues can occur. The default profiles we currently support for installing Istio only deploy a single replica for the control services with a PodDisruptionBudgets value of 0. This will result in the following error while draining pods during an upgrade:

Cannot evict pod as it would violate the pods disruption budget.

As we currently don't support any kind of advanced configuration of the Istio control plane components outside the default profiles, we need to scale up all components manually before the upgrade. This can be done with the following command:

kubectl scale deploy -n istio-system --replicas=2 --all

After the upgrade, the deployments can be scaled down to the original capacity:

kubectl scale deploy -n istio-system --replicas=1 --all

Note: The istio-system namespace value is the default value and should be set to whatever is being used in the Istio application configuration.

PostgreSQL upgrade

NOTE

Before upgrade procedure, make sure you have a data backup.

Versions

LambdaStack upgrades PostgreSQL 10 to 13 with the following extensions (for versions, see COMPONENTS.md):

PgAudit
PgBouncer
PgPool
repmgr

Prerequisites

The prerequisites below are checked by the preflight script before upgrading PostgreSQL. Never the less it's good to check these manually before doing any upgrade:

Diskspace: When LambdaStack upgrades PostgreSQL 10 to 13 it will make a copy of the data directory on each node to ensure easy recovery in the case of a failed data migration. It is up to the user to make sure there is enough space available. The used rule is:

total storage used on the data volume + total size of the data directory < 95% of total size of the data volume

We use 95% of used storage after data directory copy as some space is needed during the upgrade.
Cluster health: Before starting the upgrade the state of the PostgreSQL cluster needs to be healthy. This means that executing:
```
repmgr cluster show
```
Should not fail and return 0 as exit code.

Upgrade

Upgrade procedure is based on PostgreSQL documentation and requires downtime as there is a need to stop old service(s) and start new one(s).

There is a possibility to provide a custom configuration for upgrade with lambdastack upgrade -f, and there are a few limitations related to specifying parameters for upgrade:

If there were non-default values provided for installation (lambdastack apply), they have to be used again not to be overwritten by defaults.
wal_keep_segments parameter for replication is replaced by wal_keep_size with the default value of 500 MB. Previous parameter is not supported.
archive_command parameter for replication is set to /bin/true by default. It was planned to disable archiving, but changes to archive_mode require a full PostgreSQL server restart, while archive_command changes can be applied via a normal configuration reload. See documentation.
There is no possibility to disable an extension after installation, so specification.extensions.*.enabled: false value will be ignored during upgrade if it was set to true during installation.

Manual actions

LambdaStack runs pg_upgrade (on primary node only) from a dedicated location (pg_upgrade_working_dir). For Ubuntu, this is /var/lib/postgresql/upgrade/$PG_VERSION and for RHEL/CentOS /var/lib/pgsql/upgrade/$PG_VERSION. LambdaStack saves there output from pg_upgrade as logs which should be checked after the upgrade.

Post-upgrade processing

As the "Post-upgrade processing" step in PostgreSQL documentation states if any post-upgrade processing is required, pg_upgrade will issue warnings as it completes. It will also generate SQL script files that must be run by the administrator. There is no clear description in which cases they are created, so please check logs in pg_upgrade_working_dir after the upgrade to see if additional steps are required.

Statistics

Because optimizer statistics are not transferred by pg_upgrade, you may need to run a command to regenerate that information after the upgrade. For this purpose, consider running analyze_new_cluster.sh script (created in pg_upgrade_working_dir) as postgres user.

Delete old cluster

For safety LambdaStack does not remove old PostgreSQL data. This is a user responsibility to identify if data is ready to be removed and take care about that. Once you are satisfied with the upgrade, you can delete the old cluster's data directories by running delete_old_cluster.sh script (created in pg_upgrade_working_dir on primary node) on all nodes. The script is not created if you have user-defined tablespaces inside the old data directory. You can also delete the old installation directories (e.g., bin, share). You may delete pg_upgrade_working_dir on primary node once the upgrade is completely over.