This the multi-page printable view of this section. Click here to print.
How-To
- 1: Backup
- 2: Cluster
- 3: Configuration
- 4: Databases
- 5: Helm
- 6: Istio
- 7: Konnectivity
- 8: Kubernetes
- 9: Logging
- 10: Maintenance
- 11: Modules
- 12: Monitoring
- 13: OS Patching
- 14: Persistent Storage
- 15: Prerequisites
- 16: Repository
- 17: Retention
- 18: Security Groups
- 19: Security
- 20: Upgrade
1 - Backup
LambdaStack backup and restore
Introduction
LambdaStack provides solution to create full or partial backup and restore for some components, like:
Backup is created directly on the machine where component is running, and it is moved to the repository
host via
rsync. On the repository
host backup files are stored in location /lsbackup/mounted
mounted on a local
filesystem. See How to store backup chapter.
1. How to perform backup
Backup configuration
Copy default configuration for backup from defaults/configuration/backup.yml
into newly created backup.yml config file, and enable backup for chosen components by setting up enabled
parameter to true
.
This config may also be attached to cluster-config.yml or whatever you named your cluster yaml file.
kind: configuration/backup
title: Backup Config
name: default
specification:
components:
load_balancer:
enabled: true
logging:
enabled: false
monitoring:
enabled: true
postgresql:
enabled: true
rabbitmq:
enabled: false
# Kubernes recovery is not supported at this point.
# You may create backup by enabling this below, but recovery should be done manually according to Kubernetes documentation.
kubernetes:
enabled: false
Run lambdastack backup
command:
lambdastack backup -f backup.yml -b build_folder
If backup config is attached to cluster-config.yml, use this file instead of backup.yml
.
2. How to store backup
Backup location is defined in backup
role as backup_destination_host
and backup_destination_dir
. Default
backup location is defined on repository
host in location /lsbackup/mounted/
. Use mounted
location as mount
point and mount storage you want to use. This might be:
- Azure Blob Storage
- Amazon S3
- GCP Blob Storage
- NAS
- Any other attached storage
Ensure that mounted location has enough space, is reliable and is well protected against disaster.
NOTE
If you don't attach any storage into the mount point location, be aware that backups will be stored on the local machine. This is not recommended.
3. How to perform recovery
Recovery configuration
Copy existing default configuration from defaults/configuration/recovery.yml
into newly created recovery.yml config file, and set enabled
parameter for component to recovery. It's possible to choose snapshot name by passing date and time part of snapshot name. If snapshot name is not provided, the latest one will be restored.
This config may also be attached to cluster-config.yml
kind: configuration/recovery
title: Recovery Config
name: default
specification:
components:
load_balancer:
enabled: true
snapshot_name: latest #restore latest backup
logging:
enabled: true
snapshot_name: 20200604-150829 #restore selected backup
monitoring:
enabled: false
snapshot_name: latest
postgresql:
enabled: false
snapshot_name: latest
rabbitmq:
enabled: false
snapshot_name: latest
Run lambdastack recovery
command:
lambdastack recovery -f recovery.yml -b build_folder
If recovery config is attached to cluster-config.yml, use this file instead of recovery.yml
.
4. How backup and recovery work
Load Balancer
Load balancer backup includes:
- Configuration files:
/etc/haproxy/
- SSL certificates:
/etc/ssl/haproxy/
Recovery includes all backed up files
Logging
Logging backup includes:
- Elasticsearch database snapshot
- Elasticsearch configuration
/etc/elasticsearch/
- Kibana configuration
/etc/kibana/
Only single-node Elasticsearch backup is supported. Solution for multi-node Elasticsearch cluster will be added in future release.
Monitoring
Monitoring backup includes:
- Prometheus data snapshot
- Prometheus configuration
/etc/prometheus/
- Grafana data snapshot
Recovery includes all backed up configurations and snapshots.
Postgresql
Postgresql backup includes:
- Database data and metadata dump using
pg_dumpall
- Configuration files:
*.conf
When multiple node configuration is used, and failover action has changed database cluster status (one node down,
switchover) it's still possible to create backup. But before database restore, cluster needs to be recovered by
running lambdastack apply
and next lambdastack recovery
to restore database data. By default, we don't support recovery
database configuration from backup since this needs to be done using lambdastack apply
or manually by copying backed up
files accordingly to cluster state. The reason of this is that is very risky to restore configuration files among
different database cluster configurations.
RabbitMQ
RabbitMQ backup includes:
- Messages definitions
- Configuration files:
/etc/rabbitmq/
Backup does not include RabbitMQ messages.
Recovery includes all backed up files and configurations.
Kubernetes
LambdaStack backup provides:
- Etcd snapshot
- Public Key Infrastructure
/etc/kubernetes/pki
- Kubeadm configuration files
Following features are not supported yet (use related documentation to do that manually):
- Kubernetes cluster recovery
- Backup and restore of data stored on persistent volumes described in persistent storage documentation
2 - Cluster
How to enable/disable LambdaStack repository VM
Enable for Ubuntu (default):
-
Enable "repository" component:
repository: count: 1
Enable for RHEL on Azure:
-
Enable "repository" component:
repository: count: 1 machine: repository-machine-rhel
-
Add repository VM definition to main config file:
kind: infrastructure/virtual-machine name: repository-machine-rhel provider: azure based_on: repository-machine specification: storage_image_reference: publisher: RedHat offer: RHEL sku: 7-LVM version: "7.9.2021051701"
Enable for RHEL on AWS:
-
Enable "repository" component:
repository: count: 1 machine: repository-machine-rhel
-
Add repository VM definition to main config file:
kind: infrastructure/virtual-machine title: Virtual Machine Infra name: repository-machine-rhel provider: aws based_on: repository-machine specification: os_full_name: RHEL-7.9_HVM-20210208-x86_64-0-Hourly2-GP2
Enable for CentOS on Azure:
-
Enable "repository" component:
repository: count: 1 machine: repository-machine-centos
-
Add repository VM definition to main config file:
kind: infrastructure/virtual-machine name: repository-machine-centos provider: azure based_on: repository-machine specification: storage_image_reference: publisher: OpenLogic offer: CentOS sku: "7_9" version: "7.9.2021071900"
Enable for CentOS on AWS:
-
Enable "repository" component:
repository: count: 1 machine: repository-machine-centos
-
Add repository VM definition to main config file:
kind: infrastructure/virtual-machine title: Virtual Machine Infra name: repository-machine-centos provider: aws based_on: repository-machine specification: os_full_name: "CentOS 7.9.2009 x86_64"
Disable:
-
Disable "repository" component:
repository: count: 0
-
Prepend "kubernetes_master" mapping (or any other mapping if you don't deploy Kubernetes) with:
kubernetes_master: - repository - image-registry
How to create an LambdaStack cluster on existing infrastructure
Please read first prerequisites related to hostname requirements.
LambdaStack has the ability to set up a cluster on infrastructure provided by you. These can be either bare metal machines or VMs and should meet the following requirements:
Note. Hardware requirements are not listed since this depends on use-case, component configuration etc.
- The cluster machines/VMs are connected by a network (or virtual network of some sorts) and can communicate with each other.
At least one of them (with
repository
role) has Internet access in order to download dependencies. If there is no Internet access, you can use air gap feature (offline mode). - The cluster machines/VMs are running one of the following Linux distributions:
- RedHat 7.6+ and < 8
- CentOS 7.6+ and < 8
- Ubuntu 18.04
- The cluster machines/VMs are accessible through SSH with a set of SSH keys you provide and configure on each machine yourself (key-based authentication).
- The user used for SSH connection (
admin_user
) has passwordless root privileges throughsudo
. - A provisioning machine that:
- Has access to the SSH keys
- Is on the same network as your cluster machines
- Has LambdaStack running. Note. To run LambdaStack check the Prerequisites
To set up the cluster do the following steps from the provisioning machine:
-
First generate a minimal data yaml file:
lambdastack init -p any -n newcluster
The
any
provider will tell LambdaStack to create a minimal data config which does not contain any cloud provider related information. If you want full control you can add the--full
flag which will give you a configuration with all parts of a cluster that can be configured. -
Open the configuration file and set up the
admin_user
data:admin_user: key_path: id_rsa name: user_name path: # Dynamically built
Here you should specify the path to the SSH keys and the admin user name which will be used by Ansible to provision the cluster machines.
-
Define the components you want to install and link them to the machines you want to install them on:
Under the
components
tag you will find a bunch of definitions like this one:kubernetes_master: count: 1 machines: - default-k8s-master
The
count
specifies how many machines you want to provision with this component. Themachines
tag is the array of machine names you want to install this component on. Note that thecount
and the number ofmachines
defined must match. If you don't want to use a component you can set thecount
to 0 and remove themachines
tag. Finally, a machine can be used by multiple component since multiple components can be installed on one machine of desired.You will also find a bunch of
infrastructure/machine
definitions like below:kind: infrastructure/machine name: default-k8s-master provider: any specification: hostname: master ip: 192.168.100.101
Each machine name used when setting up the component layout earlier must have such a configuration where the
name
tag matches with the defined one in the components. Thehostname
andip
fields must be filled to match the actual cluster machines you provide. Ansible will use this to match the machine to a component which in turn will determine which roles to install on the machine. -
Finally, start the deployment with:
lambdastack apply -f newcluster.yml --no-infra
This will create the inventory for Ansible based on the component/machine definitions made inside the
newcluster.yml
and let Ansible deploy it. Note that the--no-infra
is important since it tells LambdaStack to skip the Terraform part.
How to create an LambdaStack cluster on existing air-gapped infrastructure
Please read first prerequisites related to hostname requirements.
LambdaStack has the ability to set up a cluster on air-gapped infrastructure provided by you. These can be either bare metal machines or VMs and should meet the following requirements:
Note. Hardware requirements are not listed since this depends on use-case, component configuration etc.
- The air-gapped cluster machines/VMs are connected by a network or virtual network of some sorts and can communicate with each other.
- The air-gapped cluster machines/VMs are running one of the following Linux distributions:
- RedHat 7.6+ and < 8
- CentOS 7.6+ and < 8
- Ubuntu 18.04
- The cluster machines/VMs are accessible through SSH with a set of SSH keys you provide and configure on each machine yourself (key-based authentication).
- The user used for SSH connection (
admin_user
) has passwordless root privileges throughsudo
. - A requirements machine that:
- Runs the same distribution as the air-gapped cluster machines/VMs (RedHat 7, CentOS 7, Ubuntu 18.04)
- Has access to the internet. If you don't have access to a similar machine/VM with internet access, you can also try to download the requirements with a Docker container. More information here.
- A provisioning machine that:
- Has access to the SSH keys
- Is on the same network as your cluster machines
- Has LambdaStack running. Note. To run LambdaStack check the Prerequisites
To set up the cluster do the following steps:
-
First we need to get the tooling to prepare the requirements. On the provisioning machine run:
lambdastack prepare --os OS
Where OS should be
centos-7
,redhat-7
,ubuntu-18.04
. This will create a directory calledprepare_scripts
with the needed files inside. -
The scripts in the
prepare_scripts
will be used to download all requirements. To do that copy theprepare_scripts
folder over to the requirements machine and run the following command:download-requirements.sh /requirementsoutput/
This will start downloading all requirements and put them in the
/requirementsoutput/
folder. Once run successfully the/requirementsoutput/
needs to be copied to the provisioning machine to be used later on. -
Then generate a minimal data yaml file on the provisioning machine:
lambdastack init -p any -n newcluster
The
any
provider will tell LambdaStack to create a minimal data config which does not contain any cloud provider related information. If you want full control you can add the--full
flag which will give you a configuration with all parts of a cluster that can be configured. -
Open the configuration file and set up the
admin_user
data:admin_user: key_path: id_rsa name: user_name path: # Dynamically built
Here you should specify the path to the SSH keys and the admin user name which will be used by Ansible to provision the cluster machines.
-
Define the components you want to install and link them to the machines you want to install them on:
Under the
components
tag you will find a bunch of definitions like this one:kubernetes_master: count: 1 machines: - default-k8s-master
The
count
specifies how many machines you want to provision with this component. Themachines
tag is the array of machine names you want to install this component on. Note that thecount
and the number ofmachines
defined must match. If you don't want to use a component you can set thecount
to 0 and remove themachines
tag. Finally, a machine can be used by multiple component since multiple components can be installed on one machine of desired.You will also find a bunch of
infrastructure/machine
definitions like below:kind: infrastructure/machine name: default-k8s-master provider: any specification: hostname: master ip: 192.168.100.101
Each machine name used when setting up the component layout earlier must have such a configuration where the
name
tag matches with the defined one in the components. Thehostname
andip
fields must be filled to match the actual cluster machines you provide. Ansible will use this to match the machine to a component which in turn will determine which roles to install on the machine. -
Finally, start the deployment with:
lambdastack apply -f newcluster.yml --no-infra --offline-requirements /requirementsoutput/
This will create the inventory for Ansible based on the component/machine definitions made inside the
newcluster.yml
and let Ansible deploy it. Note that the--no-infra
is important since it tells LambdaStack to skip the Terraform part. The--offline-requirements
tells LambdaStack it is an air-gapped installation and to use the/requirementsoutput/
requirements folder prepared in steps 1 and 2 as source for all requirements.
How to create an LambdaStack cluster using custom system repository and Docker image registry
LambdaStack has the ability to use external repository and image registry during lambdastack apply
execution.
Custom urls need to be specified inside the configuration/shared-config
document, for example:
kind: configuration/shared-config
title: Shared configuration that will be visible to all roles
name: default
specification:
custom_image_registry_address: "10.50.2.1:5000"
custom_repository_url: "http://10.50.2.1:8080/lsrepo"
use_ha_control_plane: true
The repository and image registry implementation must be compatible with already existing Ansible code:
- the repository data (including apt or yum repository) is served from HTTP server and structured exactly as in the offline package
- the image registry data is loaded into and served from standard Docker registry implementation
Note. If both custom repository/registry and offline installation are configured then the custom repository/registry is preferred.
Note. You can switch between custom repository/registry and offline/online installation methods. Keep in mind this will cause "imageRegistry" change in Kubernetes which in turn may cause short downtime.
By default, LambdaStack creates "repository" virtual machine for cloud environments. When custom repository and registry are used there is no need for additional empty VM. The following config snippet can illustrate how to mitigate this problem:
kind: lambdastack-cluster
title: LambdaStack Cluster Config
provider: <provider>
name: default
specification:
...
components:
repository:
count: 0
kubernetes_master:
count: 1
kubernetes_node:
count: 2
---
kind: configuration/feature-mapping
title: "Feature mapping to roles"
provider: <provider>
name: default
specification:
roles_mapping:
kubernetes_master:
- repository
- image-registry
- kubernetes-master
- helm
- applications
- node-exporter
- filebeat
- firewall
- vault
---
kind: configuration/shared-config
title: Shared configuration that will be visible to all roles
provider: <provider>
name: default
specification:
custom_image_registry_address: "<ip-address>:5000"
custom_repository_url: "http://<ip-address>:8080/lsrepo"
-
Disable "repository" component:
repository: count: 0
-
Prepend "kubernetes_master" mapping (or any other mapping if you don't deploy Kubernetes) with:
kubernetes_master: - repository - image-registry
-
Specify custom repository/registry in
configuration/shared-config
:specification: custom_image_registry_address: "<ip-address>:5000" custom_repository_url: "http://<ip-address>:8080/lsrepo"
How to create an LambdaStack cluster on a cloud provider
Please read first prerequisites related to hostname requirements.
LambdaStack has the ability to set up a cluster on one of the following cloud providers:
- AWS
- Azure
- GCP - WIP
Under the hood it uses Terraform to create the virtual infrastructure before it applies our Ansible playbooks to provision the VMs.
You need the following prerequisites:
- Access to one of the supported cloud providers,
aws
,azure
orgcp
. - Adequate resources to deploy a cluster on the cloud provider.
- A set of SSH keys you provide.
- A provisioning machine that:
- Has access to the SSH keys
- Has LambdaStack running.
Note. To run LambdaStack check the Prerequisites
To set up the cluster do the following steps from the provisioning machine:
-
First generate a minimal data yaml file:
lambdastack init -p aws/azure -n newcluster
The
provider
flag should be eitheraws
orazure
and will tell LambdaStack to create a data config which contains the specifics for that cloud provider. If you want full control you can add the--full
flag which will give you a config with all parts of a cluster that can be configured. -
Open the configuration file and set up the
admin_user
data:admin_user: key_path: id_rsa name: user_name path: # Dynamically built
Here you should specify the path to the SSH keys and the admin user name which will be used by Ansible to provision the cluster machines.
For
AWS
the admin name is already specified and is dependent on the Linux distro image you are using for the VM's:- Username for Ubuntu Server:
ubuntu
- Username for Redhat:
ec2-user
On
Azure
the name you specify will be configured as the admin name on the VM's.On
GCP-WIP
the name you specify will be configured as the admin name on the VM's. - Username for Ubuntu Server:
-
Set up the cloud specific data:
To let Terraform access the cloud providers you need to set up some additional cloud configuration.
AWS:
cloud: region: us-east-1 credentials: key: aws_key secret: aws_secret use_public_ips: false default_os_image: default
The region lets you chose the most optimal place to deploy your cluster. The
key
andsecret
are needed by Terraform and can be generated in the AWS console. More information about that hereAzure:
cloud: region: East US subscription_name: Subscribtion_name use_service_principal: false use_public_ips: false default_os_image: default
The region lets you chose the most optimal place to deploy your cluster. The
subscription_name
is the Azure subscription under which you want to deploy the cluster.Terraform will ask you to sign in to your Microsoft Azure subscription when it prepares to build/modify/destroy the infrastructure on
azure
. In case you need to share cluster management with other people you can set theuse_service_principal
tag to true. This will create a service principle and uses it to manage the resources.If you already have a service principle and don't want to create a new one you can do the following. Make sure the
use_service_principal
tag is set to true. Then before you runlambdastack apply -f yourcluster.yml
create the following folder structure from the path you are running LambdaStack:/build/clustername/terraform
Where the
clustername
is the name you specified underspecification.name
in your cluster yaml. Then interraform
folder add the file namedsp.yml
and fill it up with the service principal information like so:appId: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" displayName: "app-name" name: "http://app-name" password: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" tenant: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx" subscriptionId: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
LambdaStack will read this file and automatically use it for authentication for resource creation and management.
GCP-WIP:
NOTE: GCP-WIP values may or may not be correct until official GCP release
cloud: region: us-east-1 credentials: key: gcp_key secret: gcp_secret use_public_ips: false default_os_image: default
The region lets you chose the most optimal place to deploy your cluster. The
key
andsecret
are needed by Terraform and can be generated in the GCP console.For both
aws
,azure
, andgcp
the following cloud attributes overlap:use_public_ips
: Whentrue
, the VMs will also have a direct interface to the internet. While this is easy for setting up a cluster for testing, it should not be used in production. A VPN setup should be used which we will document in a different section (TODO).default_os_image
: Lets you more easily select LambdaStack team validated and tested OS images. When one is selected, it will be applied to everyinfrastructure/virtual-machine
document in the cluster regardless of user defined ones. The following values are accepted: -default
: Applies user definedinfrastructure/virtual-machine
documents when generating a new configuration. -ubuntu-18.04-x86_64
: Applies the latest validated and tested Ubuntu 18.04 image to allinfrastructure/virtual-machine
documents onx86_64
on Azure and AWS. -redhat-7-x86_64
: Applies the latest validated and tested RedHat 7.x image to allinfrastructure/virtual-machine
documents onx86_64
on Azure and AWS. -centos-7-x86_64
: Applies the latest validated and tested CentOS 7.x image to allinfrastructure/virtual-machine
documents onx86_64
on Azure and AWS. -centos-7-arm64
: Applies the latest validated and tested CentOS 7.x image to allinfrastructure/virtual-machine
documents onarm64
on AWS. Azure currently doesn't supportarm64
. The images which will be used for these values will be updated and tested on regular basis.
-
Define the components you want to install:
Under the
components
tag you will find a bunch of definitions like this one:kubernetes_master: count: 1
The
count
specifies how much VM's you want to provision with this component. If you don't want to use a component you can set thecount
to 0.Note that for each cloud provider LambdaStack already has a default VM configuration for each component. If you need more control over the VM's, generate a config with the
--full
flag. Then each component will have an additional machine tag:kubernetes_master: count: 1 machine: kubernetes-master-machine ...
This links to a
infrastructure/virtual-machine
document which can be found inside the same configuration file. It gives you full control over the VM config (size, storage, provision image, security etc.). More details on this will be documented in a different section (TODO). -
Finally, start the deployment with:
lambdastack apply -f newcluster.yml
Note for RHEL Azure images
LambdaStack currently supports RHEL 7 LVM partitioned images attached to standard RHEL repositories. For more details, refer to Azure documentation.
LambdaStack uses cloud-init custom data in order to merge small logical volumes (homelv
, optlv
, tmplv
and varlv
)
into the rootlv
and extends it (with underlying filesystem) by the current free space in its volume group.
The usrlv
LV, which has 10G, is not merged since it would require a reboot. The merging is required to deploy a cluster,
however, it can be disabled for troubleshooting since it performs some administrative tasks (such as remounting filesystems or restarting services).
NOTE: RHEL 7 LVM images require at least 64 GB for OS disk.
Example config:
kind: infrastructure/virtual-machine
specification:
storage_image_reference:
publisher: RedHat
offer: RHEL
sku: "7-LVM"
version: "7.9.2021051701"
storage_os_disk:
disk_size_gb: 64
Note for CentOS Azure images
LambdaStack supports CentOS 7 images with RAW partitioning (recommended) and LVM as well.
Example config:
kind: infrastructure/virtual-machine
specification:
storage_image_reference:
publisher: OpenLogic
offer: CentOS
sku: "7_9"
version: "7.9.2021071900"
How to disable merging LVM logical volumes
In order to not merge logical volumes (for troubleshooting), use the following doc:
kind: infrastructure/cloud-init-custom-data
title: cloud-init user-data
provider: azure
name: default
specification:
enabled: false
How to delete an LambdaStack cluster on a cloud provider
LambdaStack has a delete command to remove a cluster from a cloud provider (AWS, Azure). With LambdaStack run the following:
lambdastack delete -b /path/to/cluster/build/folder
From the defined cluster build folder it will take the information needed to remove the resources from the cloud provider.
Single machine cluster
Please read first prerequisites related to hostname requirements.
NOTE
Single machine cannot be scaled up or deployed alongside other types of cluster.
Sometimes it might be desirable to run an LambdaStack cluster on a single machine. For this purpose LambdaStack ships with a single_cluster
component configuration. This cluster comes with the following main components:
- kubernetes-master: Untainted so pods can be deployed on it
- rabbitmq: Rabbitmq for messaging instead of Kafka
- applications: For deploying the Keycloak authentication service
- postgresql: To provide a database for Keycloak
Note that components like logging and monitoring are missing since they do not provide much benefit in a single machine scenario. Also, RabbitMQ is included over Kafka since that is much less resource intensive.
To get started with a single machine cluster you can use the following template as a base. Note that some configurations are omitted:
kind: lambdastack-cluster
title: LambdaStack Cluster Config
name: default
built_path: # Dynamically built
specification:
prefix: dev
name: single
admin_user:
name: operations
key_path: id_rsa
path: # Dynamically built
cloud:
... # add other cloud configuration as needed
components:
kubernetes_master:
count: 0
kubernetes_node:
count: 0
logging:
count: 0
monitoring:
count: 0
kafka:
count: 0
postgresql:
count: 0
load_balancer:
count: 0
rabbitmq:
count: 0
ignite:
count: 0
opendistro_for_elasticsearch:
count: 0
single_machine:
count: 1
---
kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
specification:
applications:
- name: auth-service
enabled: yes # set to yest to enable authentication service
... # add other authentication service configuration as needed
To create a single machine cluster using the "any" provider (with extra load_balancer config included) use the following template below:
kind: lambdastack-cluster
title: "LambdaStack Cluster Config"
provider: any
name: single
build_path: # Dynamically built
specification:
name: single
admin_user:
name: ubuntu
key_path: id_rsa
path: # Dynamically built
components:
kubernetes_master:
count: 0
kubernetes_node:
count: 0
logging:
count: 0
monitoring:
count: 0
kafka:
count: 0
postgresql:
count: 0
load_balancer:
count: 1
configuration: default
machines: [single-machine]
rabbitmq:
count: 0
single_machine:
count: 1
configuration: default
machines: [single-machine]
---
kind: configuration/haproxy
title: "HAProxy"
provider: any
name: default
specification:
logs_max_days: 60
self_signed_certificate_name: self-signed-fullchain.pem
self_signed_private_key_name: self-signed-privkey.pem
self_signed_concatenated_cert_name: self-signed-test.tld.pem
haproxy_log_path: "/var/log/haproxy.log"
stats:
enable: true
bind_address: 127.0.0.1:9000
uri: "/haproxy?stats"
user: operations
password: your-haproxy-stats-pwd
frontend:
- name: https_front
port: 443
https: yes
backend:
- http_back1
backend: # example backend config below
- name: http_back1
server_groups:
- kubernetes_node
# servers: # Definition for server to that hosts the application.
# - name: "node1"
# address: "lambdastack-vm1.domain.com"
port: 30104
---
kind: infrastructure/machine
provider: any
name: single-machine
specification:
hostname: x1a1
ip: 10.20.2.10
How to create custom cluster components
LambdaStack gives you the ability to define custom components. This allows you to define a custom set of roles for a component you want to use in your cluster. It can be useful when you for example want to maximize usage of the available machines you have at your disposal.
The first thing you will need to do is define it in the configuration/feature-mapping
configuration. To get this configuration you can run lambdastack init ... --full
command. In the available_roles
roles section you can see all the available roles that LambdaStack provides. The roles_mapping
is where all the LambdaStack components are defined and were you need to add your custom components.
Below are parts of an example configuration/feature-mapping
were we define a new single_machine_new
component. We want to use Kafka instead of RabbitMQ and don`t need applications and postgres since we don't want a Keycloak deployment:
kind: configuration/feature-mapping
title: Feature mapping to roles
name: default
specification:
available_roles: # All entries here represent the available roles within LambdaStack
- name: repository
enabled: yes
- name: firewall
enabled: yes
- name: image-registry
...
roles_mapping: # All entries here represent the default components provided with LambdaStack
...
single_machine:
- repository
- image-registry
- kubernetes-master
- applications
- rabbitmq
- postgresql
- firewall
# Below is the new single_machine_new definition
single_machine_new:
- repository
- image-registry
- kubernetes-master
- kafka
- firewall
...
Once defined the new single_machine_new
can be used inside the lambdastack-cluster
configuration:
kind: lambdastack-cluster
title: LambdaStack Cluster Config
name: default
build_path: # Dynamically built
specification:
prefix: new
name: single
admin_user:
name: operations
key_path: id_rsa
path: # Dynamically built
cloud:
... # add other cloud configuration as needed
components:
... # other components as needed
single_machine_new:
count: x
Note: After defining a new component you might also need to define additional configurations for virtual machines and security rules depending on what you are trying to achieve.
How to scale or cluster components
Not all components are supported for this action. There is a bunch of issues referenced below in this document.
LambdaStack has the ability to automatically scale and cluster certain components on cloud providers (AWS, Azure). To upscale or downscale a component the count
number must be increased or decreased:
components:
kubernetes_node:
count: ...
...
Then when applying the changed configuration using LambdaStack, additional VM's will be spawned and configured or removed. The following table shows what kind of operation component supports:
Component | Scale up | Scale down | HA | Clustered | Known issues |
---|---|---|---|---|---|
Repository | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | --- |
Monitoring | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | --- |
Logging | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | --- |
Kubernetes master | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | #1579 |
Kubernetes node | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | #1580 |
Ignite | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Kafka | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Load Balancer | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | --- |
Opendistro for elasticsearch | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Postgresql | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | #1577 |
RabbitMQ | :heavy_check_mark: | :heavy_check_mark: | :x: | :heavy_check_mark: | #1578, #1309 |
RabbitMQ K8s | :heavy_check_mark: | :heavy_check_mark: | :x: | :heavy_check_mark: | #1486 |
Keycloak K8s | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Pgpool K8s | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Pgbouncer K8s | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Ignite K8s | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | --- |
Additional notes:
-
Repository:
In standard LambdaStack deployment only one repository machine is required.
:arrow_up: Scaling up the repository component will create a new standalone VM.
:arrow_down: Scaling down will remove it in LIFO order (Last In, First Out).
However, even if you create more than one VM, by default all other components will use the first one. -
Kubernetes master:
:arrow_up: When increased this will set up additional control plane nodes, but in the case of non-ha k8s cluster, the existing control plane node must be promoted first.
:arrow_down: At the moment there is no ability to downscale. -
Kubernetes node:
:arrow_up: When increased this will set up an additional node and join into the Kubernetes cluster.
:arrow_down: There is no ability to downscale. -
Load balancer:
:arrow_up: Scaling up the load_balancer component will create a new standalone VM.
:arrow_down: Scaling down will remove it in LIFO order (Last In, First Out). -
Logging:
:arrow_up: Scaling up will create new VM with both Kibana and ODFE components inside.
ODFE will join the cluster but Kibana will be a standalone instance.
:arrow_down: When scaling down VM will be deleted. -
Monitoring:
:arrow_up: Scaling up the monitoring component will create a new standalone VM.
:arrow_down: Scaling down will remove it in LIFO order (Last In, First Out). -
Postgresql:
:arrow_up: At the moment does not support scaling up. Check known issues.
:arrow_down: At the moment does not support scaling down. Check known issues. -
RabbitMQ:
If the instance count is changed, then additional RabbitMQ nodes will be added or removed.
:arrow_up: Will create new VM and adds it to the RabbitMQ cluster.
:arrow_down: At the moment scaling down will just remove VM. All data not processed on this VM will be purged. Check known issues.
Note that clustering requires a change in theconfiguration/rabbitmq
document:kind: configuration/rabbitmq ... specification: cluster: is_clustered: true ...
-
RabbitMQ K8s: Scaling is controlled via replicas in StatefulSet. RabbitMQ on K8s uses plugin rabbitmq_peer_discovery_k8s to works in cluster.
Additional known issues:
- #1574 - Disks are not removed after downscale of any LambdaStack component on Azure.
Multi master cluster
LambdaStack can deploy HA Kubernetes clusters (since v0.6). To achieve that, it is required that:
-
the master count must be higher than 1 (proper values should be 1, 3, 5, 7):
kubernetes_master: count: 3
-
the HA mode must be enabled in
configuration/shared-config
:kind: configuration/shared-config ... specification: use_ha_control_plane: true promote_to_ha: false
-
the regular lambdastack apply cycle must be executed
LambdaStack can promote / convert older single-master clusters to HA mode (since v0.6). To achieve that, it is required that:
-
the existing cluster is legacy single-master cluster
-
the existing cluster has been upgraded to Kubernetes 1.17 or above first
-
the HA mode and HA promotion must be enabled in
configuration/shared-config
:kind: configuration/shared-config ... specification: use_ha_control_plane: true promote_to_ha: true
-
the regular lambdastack apply cycle must be executed
-
since it is one-time operation, after successful promotion, the HA promotion must be disabled in the config:
kind: configuration/shared-config ... specification: use_ha_control_plane: true promote_to_ha: false
Note: It is not supported yet to reverse HA promotion.
LambdaStack can scale-up existing HA clusters (including ones that were promoted). To achieve that, it is required that:
-
the existing cluster must be already running in HA mode
-
the master count must be higher than previous value (proper values should be 3, 5, 7):
kubernetes_master: count: 5
-
the HA mode must be enabled in
configuration/shared-config
:kind: configuration/shared-config ... specification: use_ha_control_plane: true promote_to_ha: false
-
the regular lambdastack apply cycle must be executed
Note: It is not supported yet to scale-down clusters (master count cannot be decreased).
Build artifacts
LambdaStack engine produce build artifacts during each deployment. Those artifacts contain:
- Generated terraform files.
- Generated terraform state files.
- Generated cluster manifest file.
- Generated ansible files.
- Azure login credentials for
service principal
if deploying to Azure.
Artifacts contain sensitive data, so it is important to keep it in safe place like private GIT repository
or storage with limited access
. Generated build is also important in case of scaling or updating cluster - you will it in build folder in order to edit your cluster.
LambdaStack creates (or use if you don't specified it to create) service principal account which can manage all resources in subscription, please store build artifacts securely.
Kafka replication and partition setting
When planning Kafka installation you have to think about number of partitions and replicas since it is strongly related to throughput of Kafka and its reliability. By default, Kafka's replicas
number is set to 1 - you should change it in core/src/ansible/roles/kafka/defaults
in order to have partitions replicated to many virtual machines.
...
replicas: 1 # Default to at least 1 (1 broker)
partitions: 8 # 100 x brokers x replicas for reasonable size cluster. Small clusters can be less
...
You can read more here about planning number of partitions.
NOTE: LambdaStack does not use Confluent. The above reference is simply for documentation.
RabbitMQ installation and setting
To install RabbitMQ in single mode just add rabbitmq role to your data.yaml for your server and in general roles section. All configuration on RabbitMQ, e.g., user other than guest creation should be performed manually.
How to use Azure availability sets
In your cluster yaml config declare as many as required objects of kind infrastructure/availability-set
like
in the example below, change the name
field as you wish.
---
kind: infrastructure/availability-set
name: kube-node # Short and simple name is preferred
specification:
# The "name" attribute is generated automatically according to LambdaStack's naming conventions
platform_fault_domain_count: 2
platform_update_domain_count: 5
managed: true
provider: azure
Then set it also in the corresponding components
section of the kind: lambdastack-cluster
doc.
components:
kafka:
count: 0
kubernetes_master:
count: 1
kubernetes_node:
# This line tells we generate the availability-set terraform template
availability_set: kube-node # Short and simple name is preferred
count: 2
The example below shows a complete configuration. Note that it's recommended to have a dedicated availability set for each clustered component.
# Test availability set config
---
kind: lambdastack-cluster
name: default
provider: azure
build_path: # Dynamically built
specification:
name: test-cluster
prefix: test
admin_user:
key_path: id_rsa
name: di-dev
path: # Dynamically built
cloud:
region: Australia East
subscription_name: <your subscription name>
use_public_ips: true
use_service_principal: true
components:
kafka:
count: 0
kubernetes_master:
count: 1
kubernetes_node:
# This line tells we generate the availability-set terraform template
availability_set: kube-node # Short and simple name is preferred
count: 2
load_balancer:
count: 1
logging:
count: 0
monitoring:
count: 0
postgresql:
# This line tells we generate the availability-set terraform template
availability_set: postgresql # Short and simple name is preferred
count: 2
rabbitmq:
count: 0
title: LambdaStack Cluster Config
---
kind: infrastructure/availability-set
name: kube-node # Short and simple name is preferred
specification:
# The "name" attribute (omitted here) is generated automatically according to LambdaStack's naming conventions
platform_fault_domain_count: 2
platform_update_domain_count: 5
managed: true
provider: azure
---
kind: infrastructure/availability-set
name: postgresql # Short and simple name is preferred
specification:
# The "name" attribute (omitted here) is generated automatically according to LambdaStack's naming conventions
platform_fault_domain_count: 2
platform_update_domain_count: 5
managed: true
provider: azure
Downloading offline requirements with a Docker container
This paragraph describes how to use a Docker container to download the requirements for air-gapped/offline installations. At this time we don't officially support this, and we still recommend using a full distribution which is the same as the air-gapped cluster machines/VMs.
A few points:
- This only describes how to set up the Docker containers for downloading. The rest of the steps are similar as in the paragraph here.
- Main reason why you might want to give this a try is to download
arm64
architecture requirements on ax86_64
machine. More information on the current state ofarm64
support can be found here.
Ubuntu 18.04
For Ubuntu, you can use the following command to launch a container:
docker run -v /shared_folder:/home <--platform linux/amd64 or --platform linux/arm64> --rm -it ubuntu:18.04
As the ubuntu:18.04
image is multi-arch you can include --platform linux/amd64
or --platform linux/arm64
to run the container as the specified architecture. The /shared_folder
should be a folder on your local machine containing the required scripts.
When you are inside the container run the following commands to prepare for the running of the download-requirements.sh
script:
apt-get update # update the package manager
apt-get install sudo # install sudo so we can make the download-requirements.sh executable and run it as root
sudo chmod +x /home/download-requirements.sh # make the requirements script executable
After this you should be able to run the download-requirements.sh
from the home
folder.
RedHat 7.x
For RedHat you can use the following command to launch a container:
docker run -v /shared_folder:/home <--platform linux/amd64 or --platform linux/arm64> --rm -it registry.access.redhat.com/ubi7/ubi:7.9
As the registry.access.redhat.com/ubi7/ubi:7.9
image is multi-arch you can include --platform linux/amd64
or --platform linux/arm64
to run the container as the specified architecture. The /shared_folder
should be a folder on your local machine containing the requirement scripts.
For running the download-requirements.sh
script you will need a RedHat developer subscription to register the running container and make sure you can access to official Redhat repos for the packages needed. More information on getting this free subscription here.
When you are inside the container run the following commands to prepare for the running of the download-requirements.sh
script:
subscription-manager register # will ask for you credentials of your RedHat developer subscription and setup the container
subscription-manager attach --auto # will enable the RedHat official repositories
chmod +x /home/download-requirements.sh # make the requirements script executable
After this you should be able to run the download-requirements.sh
from the home
folder.
CentOS 7.x
For CentOS, you can use the following command to launch a container:
arm64:
docker run -v /shared_folder:/home --platform linux/arm64 --rm -it arm64v8/centos:7.9.2009
x86_64:
docker run -v /shared_folder:/home --platform linux/amd64 --rm -it amd64/centos:7.9.2009
The /shared_folder
should be a folder on your local machine containing the requirement scripts.
When you are inside the container run the following commands to prepare for the running of the download-requirements.sh
script:
chmod +x /home/download-requirements.sh # make the requirements script executable
After this you should be able to run the download-requirements.sh
from the home
folder.
3 - Configuration
Configuration file
Named lists
LambdaStack uses a concept called named lists in the configuration YAML. Every item in a named list has the name
key to identify it and make it unique for merge operation:
...
list:
- name: item1
property1: value1
property2: value2
- name: item2
property1: value3
property2: value4
...
By default, a named list in your configuration file will completely overwrite the defaults that LambdaStack provides. This behaviour is on purpose so when you, for example, define a list of users for Kafka inside your configuration it completely overwrites the users defined in the Kafka defaults.
In some cases, however, you don't want to overwrite a named list. A good example would be the application configurations.
You don't want to re-define every item just to make sure LambdaStack has all default items needed by the Ansible automation. That is where the _merge
metadata tag comes in. It will let you define whether you want to overwrite
or merge
a named list by setting it to true
or false
.
For example you want to enable the auth-service
application. Instead of defining the whole configuration/applications
configuration you can do the following:
kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
provider: azure
specification:
applications:
- _merge: true
- name: auth-service
enabled: true
The _merge
item with true
will tell lambdastack to merge the application list and only change the enabled: true
setting inside the auth-service
and take the rests of the configuration/applications configuration from the defaults.
4 - Databases
How to configure PostgreSQL
To configure PostgreSQL, login to server using ssh and switch to postgres
user with command:
sudo -u postgres -i
Then configure database server using psql according to your needs and PostgreSQL documentation.
PostgreSQL passwords encryption
LambdaStack sets up MD5 password encryption. Although PostgreSQL since version 10 is able to use SCRAM-SHA-256 password encryption, LambdaStack does not support this encryption method since recommended production configuration uses more than one database host with HA configuration (repmgr) cooperating with PgBouncer and Pgpool. Pgpool is not able to parse SCRAM-SHA-256 hashes list while this encryption is enabled. Due to limited Pgpool authentication options, it is not possible to refresh the pool_passwd file automatically. For this reason, MD5 password encryption is set up and this is not configurable in LambdaStack.
How to set up PostgreSQL connection pooling
PostgreSQL connection pooling in LambdaStack is served by PgBouncer application. It is available as Kubernetes ClusterIP
or standalone package.
The Kubernetes based installation works together with PgPool so it supports PostgreSQL HA setup.
The standalone installation (described below) is deprecated and will be removed in the next release.
NOTE
PgBouncer extension is not supported on ARM.
PgBouncer is installed only on PostgreSQL primary node. This needs to be enabled in configuration yaml file:
kind: configuration/postgresql
specification:
extensions:
...
pgbouncer:
enabled: yes
...
PgBouncer listens on standard port 6432. Basic configuration is just template, with very limited access to database. This is because of security reasons. Configuration needs to be tailored according component documentation and stick to security rules and best practices.
How to set up PostgreSQL HA replication with repmgr cluster
NOTE 1
Replication (repmgr) extension is not supported on ARM.
NOTE 2
Changing number of PostgreSQL nodes is not supported by LambdaStack after first apply. Before cluster deployment think over what kind of configuration you need, and how many PostgreSQL nodes will be needed.
This component can be used as a part of PostgreSQL clustering configured by LambdaStack. In order to configure PostgreSQL HA replication, add to your configuration file a block similar to the one below to core section:
---
kind: configuration/postgresql
name: default
title: PostgreSQL
specification:
config_file:
parameter_groups:
...
# This block is optional, you can use it to override default values
- name: REPLICATION
subgroups:
- name: Sending Server(s)
parameters:
- name: max_wal_senders
value: 10
comment: maximum number of simultaneously running WAL sender processes
when: replication
- name: wal_keep_size
value: 500
comment: the size of WAL files held for standby servers (MB)
when: replication
- name: Standby Servers
parameters:
- name: hot_standby
value: 'on'
comment: must be 'on' for repmgr needs, ignored on primary but recommended
in case primary becomes standby
when: replication
extensions:
...
replication:
enabled: true
replication_user_name: ls_repmgr
replication_user_password: PASSWORD_TO_CHANGE
privileged_user_name: ls_repmgr_admin
privileged_user_password: PASSWORD_TO_CHANGE
repmgr_database: ls_repmgr
shared_preload_libraries:
- repmgr
...
If enabled
is set to true
for replication
extension, LambdaStack will automatically create a cluster of primary and
secondary server with replication user with name and password specified in configuration file. This is only possible for
configurations containing two PostgreSQL servers.
Privileged user is used to perform full backup of primary instance and replicate this at the beginning to secondary node. After that for replication only replication user with limited permissions is used for WAL replication.
How to stop PostgreSQL service in HA cluster
In order to maintenance work sometimes PostgreSQL service needs to be stopped. Before this action repmgr service needs to be paused, see manual page before. When repmgr service is paused steps from PostgreSQL manual page may be applied or stop it as a regular systemd service.
How to register database standby in repmgr cluster
If one of database nodes has been recovered to desired state, you may want to re-attach it to database cluster. Execute these steps on node which will be attached as standby:
- Clone data from current primary node:
repmgr standby clone -h CURRENT_PRIMARY_ADDRESS -U ls_repmgr_admin -d ls_repmgr --force
- Register node as standby
repmgr standby register
You may use option --force if the node was registered in cluster before. For more options, see repmgr manual: https://repmgr.org/docs/5.2/repmgr-standby-register.html
How to switchover database nodes
For some reason you may want to switchover database nodes (promote standby to primary and demote existing primary to standby).
-
Configure passwordless SSH communication for postgres user between database nodes.
-
Test and run initial login between nodes to authenticate host (if host authentication is enabled).
Execute commands listed below on actual standby node
- Confirm that standby you want to promote is registered in repmgr cluster:
repmgr cluster show
- Run switchover:
repmgr standby switchover
- Run command from step 3 and check status. For more details or troubleshooting, see repmgr manual: https://repmgr.org/docs/5.2/repmgr-standby-switchover.html
How to set up PgBouncer, PgPool and PostgreSQL parameters
This section describes how to set up connection pooling and load balancing for highly available PostgreSQL cluster. The default configuration provided by LambdaStack is meant for midrange class systems but can be customized to scale up or to improve performance.
To adjust the configuration to your needs, you can refer to the following documentation:
Component | Documentation URL |
---|---|
PgBouncer | https://www.pgbouncer.org/config.html |
PgPool: Performance Considerations | https://www.pgpool.net/docs/41/en/html/performance.html |
PgPool: Server Configuration | https://www.pgpool.net/docs/41/en/html/runtime-config.html |
PostgreSQL: connections | https://www.postgresql.org/docs/10/runtime-config-connection.html |
PostgreSQL: resources management | https://www.postgresql.org/docs/10/runtime-config-resource.html |
Installing PgBouncer and PgPool
NOTE
PgBouncer and PgPool Docker images are not supported for ARM. If these applications are enabled in configuration, installation will fail.
PgBouncer and PgPool are provided as K8s deployments. By default, they are not installed. To deploy them you need to
add configuration/applications
document to your configuration yaml file, similar to the example below (enabled
flags
must be set as true
):
---
kind: configuration/applications
version: 1.2.0
title: "Kubernetes Applications Config"
provider: aws
name: default
specification:
applications:
...
## --- pgpool ---
- name: pgpool
enabled: true
...
namespace: postgres-pool
service:
name: pgpool
port: 5432
replicas: 3
...
resources: # Adjust to your configuration, see https://www.pgpool.net/docs/42/en/html/resource-requiremente.html
limits:
# cpu: 900m # Set according to your env
memory: 310Mi
requests:
cpu: 250m # Adjust to your env, increase if possible
memory: 310Mi
pgpool:
# https://github.com/bitnami/bitnami-docker-pgpool#configuration + https://github.com/bitnami/bitnami-docker-pgpool#environment-variables
env:
PGPOOL_BACKEND_NODES: autoconfigured # you can use custom value like '0:pg-node-1:5432,1:pg-node-2:5432'
# Postgres users
PGPOOL_POSTGRES_USERNAME: ls_pgpool_postgres_admin # with SUPERUSER role to use connection slots reserved for superusers for K8s liveness probes, also for user synchronization
PGPOOL_SR_CHECK_USER: ls_pgpool_sr_check # with pg_monitor role, for streaming replication checks and health checks
# ---
PGPOOL_ADMIN_USERNAME: ls_pgpool_admin # Pgpool administrator (local pcp user)
PGPOOL_ENABLE_LOAD_BALANCING: false # set to 'false' if there is no replication
PGPOOL_MAX_POOL: 4
PGPOOL_CHILD_LIFE_TIME: 300
PGPOOL_POSTGRES_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_postgres_password
PGPOOL_SR_CHECK_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_sr_check_password
PGPOOL_ADMIN_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_admin_password
secrets:
pgpool_postgres_password: PASSWORD_TO_CHANGE
pgpool_sr_check_password: PASSWORD_TO_CHANGE
pgpool_admin_password: PASSWORD_TO_CHANGE
# https://www.pgpool.net/docs/42/en/html/runtime-config.html
pgpool_conf_content_to_append: |
#------------------------------------------------------------------------------
# CUSTOM SETTINGS (appended by LambdaStack to override defaults)
#------------------------------------------------------------------------------
# num_init_children = 32
connection_life_time = 600
reserved_connections = 1
# https://www.pgpool.net/docs/41/en/html/auth-pool-hba-conf.html
pool_hba_conf: autoconfigured
## --- pgbouncer ---
- name: pgbouncer
enabled: true
...
namespace: postgres-pool
service:
name: pgbouncer
port: 5432
replicas: 2
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 500m
memory: 128Mi
pgbouncer:
env:
DB_HOST: pgpool.postgres-pool.svc.cluster.local
DB_LISTEN_PORT: 5432
MAX_CLIENT_CONN: 150
DEFAULT_POOL_SIZE: 25
RESERVE_POOL_SIZE: 25
POOL_MODE: session
CLIENT_IDLE_TIMEOUT: 0
Default setup - main parameters
This chapter describes the default setup and main parameters responsible for the performance limitations. The limitations can be divided into 3 layers: resource usage, connection limits and query caching. All the configuration parameters can be modified in the configuration yaml file.
Resource usage
Each of the components has hardware requirements that depend on its configuration, in particular on the number of allowed connections.
PgBouncer
replicas: 2
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 500m
memory: 128Mi
PgPool
replicas: 3
resources: # Adjust to your configuration, see https://www.pgpool.net/docs/41/en/html/resource-requiremente.html
limits:
# cpu: 900m # Set according to your env
memory: 310Mi
requests:
cpu: 250m # Adjust to your env, increase if possible
memory: 310Mi
By default, each PgPool pod requires 176 MB of memory. This value has been determined based on
PgPool docs, however after stress testing we need
to add several extra megabytes to
avoid failed to fork a child issue. You may need to
adjust resources
after changing num_init_children
or max_pool
(PGPOOL_MAX_POOL
) settings. Such changes should be
synchronized with PostgreSQL and PgBouncer configuration.
PostgreSQL
Memory related parameters have PostgreSQL default values. If your setup requires performance improvements, you may consider changing values of the following parameters:
- shared_buffers
- work_mem
- maintenance _work_mem
- effective_cache_size
- temp_buffers
The default settings can be overridden by LambdaStack using configuration/postgresql
doc in the configuration yaml file.
Connection limits
PgBouncer
There are connection limitations defined in PgBouncer configuration. Each of these parameters is defined per PgBouncer instance (pod). For example, having 2 pods (with MAX_CLIENT_CONN = 150) allows for up to 300 client connections.
pgbouncer:
env:
...
MAX_CLIENT_CONN: 150
DEFAULT_POOL_SIZE: 25
RESERVE_POOL_SIZE: 25
POOL_MODE: session
CLIENT_IDLE_TIMEOUT: 0
By default, POOL_MODE
is set to session
to be transparent for Pgbouncer client. This section should be adjusted depending on your desired configuration. Rotating connection modes are well described in Official Pgbouncer documentation.
If your client application doesn't manage sessions you can use CLIENT_IDLE_TIMEOUT
to force session timeout.
PgPool
By default, PgPool service is configured to handle up to 93 active concurrent connections to PostgreSQL (3 pods x 31). This is because of the following settings:
num_init_children = 32
reserved_connections = 1
Each pod can handle up to 32 concurrent connections but one is reserved. This means that the 32nd connection from a client will be refused. Keep in mind that canceling a query creates another connection to PostgreSQL, thus, a query cannot be canceled if all the connections are in use. Furthermore, for each pod, one connection slot must be available for K8s health checks. Hence, the real number of available concurrent connections is 30 per pod.
If you need more active concurrent connections, you can increase the number of pods (replicas
), but the total number
of allowed concurrent connections should not exceed the value defined by PostgreSQL parameters: (max_connections
- superuser_reserved_connections
).
In order to change PgPool settings (defined in pgpool.conf), you can edit pgpool_conf_content_to_append
section:
pgpool_conf_content_to_append: |
#------------------------------------------------------------------------------
# CUSTOM SETTINGS (appended by LambdaStack to override defaults)
#------------------------------------------------------------------------------
connection_life_time = 900
reserved_connections = 1
The content of pgpool.conf file is stored in K8s pgpool-config-files
ConfigMap.
For detailed information about connection tuning, see "Performance Considerations" chapter in PgPool documentation.
PostgreSQL
PostgreSQL uses max_connections
parameter to limit the number of client connections to database server. The default is
typically 100 connections. Generally, PostgreSQL on sufficient amount of hardware can support a few hundred connections.
Query caching
Query caching is not available in PgBouncer.
PgPool
Query caching is disabled by default in PgPool configuration.
PostgreSQL
PostgreSQL is installed with default settings.
How to set up PostgreSQL audit logging
Audit logging of database activities is available through the PostgreSQL Audit Extension: PgAudit. It provides session and/or object audit logging via the standard PostgreSQL log.
PgAudit may generate a large volume of logging, which has an impact on performance and log storage. For this reason, PgAudit is not enabled by default.
To install and configure PgAudit, add to your configuration yaml file a doc similar to the following:
kind: configuration/postgresql
title: PostgreSQL
name: default
provider: aws
version: 1.0.0
specification:
extensions:
pgaudit:
enabled: yes
config_file_parameters:
## postgresql standard
log_connections: 'off'
log_disconnections: 'off'
log_statement: 'none'
log_line_prefix: "'%m [%p] %q%u@%d,host=%h '"
## pgaudit specific, see https://github.com/pgaudit/pgaudit/blob/REL_10_STABLE/README.md#settings
pgaudit.log: "'write, function, role, ddl' # 'misc_set' is not supported for PG 10"
pgaudit.log_catalog: 'off # to reduce overhead of logging'
# the following first 2 parameters are set to values that make it easier to access audit log per table
# change their values to the opposite if you need to reduce overhead of logging
pgaudit.log_relation: 'on # separate log entry for each relation'
pgaudit.log_statement_once: 'off'
pgaudit.log_parameter: 'on'
If enabled
property for PgAudit extension is set to yes
, LambdaStack will install PgAudit package and add PgAudit
extension to be loaded
in shared_preload_libraries
. Settings defined in config_file_parameters
section are populated to LambdaStack managed PostgreSQL configuration file.
Using this section, you can also set any additional parameter if needed (e.g. pgaudit.role
) but keep in mind that
these settings are global.
To configure PgAudit according to your needs, see PgAudit documentation.
Once LambdaStack installation is complete, there is one manual action at database level (per each database). Connect to your database using a client (like psql) and load PgAudit extension into current database by running command:
CREATE EXTENSION pgaudit;
To remove the extension from database, run:
DROP EXTENSION IF EXISTS pgaudit;
How to work with PostgreSQL connection pooling
PostgreSQL connection pooling is described in design documentaion page. Properly configured application (kubernetes service) to use fully HA configuration should be set up to connect to pgbouncer service (kubernetes) instead directly to database host. This configuration provides all the benefits of user PostgreSQL in clusteres HA mode (including database failover). Both pgbouncer and pgpool stores database users and passwords in configuration files and needs to be restarted (pods) in case of PostgreSQL authentication changes like: create, alter username or password. Pods during restart process are refreshing stored database credentials automatically.
How to configure PostgreSQL replication
Note
PostgreSQL native replication is now deprecated and removed. Use PostgreSQL HA replication with repmgr instead.
How to start working with OpenDistro for Elasticsearch
OpenDistro for Elasticsearch is an Apache 2.0-licensed distribution of Elasticsearch enhanced with enterprise security, alerting, SQL. In order to start working with OpenDistro change machines count to value greater than 0 in your cluster configuration:
kind: lambdastack-cluster
...
specification:
...
components:
kubernetes_master:
count: 1
machine: aws-kb-masterofpuppets
kubernetes_node:
count: 0
...
logging:
count: 1
opendistro_for_elasticsearch:
count: 2
Installation with more than one node will always be clustered - Option to configure the non-clustered installation of more than one node for Open Distro is not supported.
kind: configuration/opendistro-for-elasticsearch
title: OpenDistro for Elasticsearch Config
name: default
specification:
cluster_name: LambdaStackElastic
By default, Kibana is deployed only for logging
component. If you want to deploy Kibana
for opendistro_for_elasticsearch
you have to modify feature mapping. Use below configuration in your manifest.
kind: configuration/feature-mapping
title: "Feature mapping to roles"
name: default
specification:
roles_mapping:
opendistro_for_elasticsearch:
- opendistro-for-elasticsearch
- node-exporter
- filebeat
- firewall
- kibana
Filebeat running on opendistro_for_elasticsearch
hosts will always point to centralized logging hosts (./LOGGING.md).
How to start working with Apache Ignite Stateful setup
Apache Ignite can be installed in LambdaStack if count
property for ignite
feature is greater than 0. Example:
kind: lambdastack-cluster
specification:
components:
load_balancer:
count: 1
ignite:
count: 2
rabbitmq:
count: 0
...
Configuration like in this example will create Virtual Machines with Apache Ignite cluster installed. There is possible to modify configuration for Apache Ignite and plugins used.
kind: configuration/ignite
title: "Apache Ignite stateful installation"
name: default
specification:
version: 2.7.6
file_name: apache-ignite-2.7.6-bin.zip
enabled_plugins:
- ignite-rest-http
config: |
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!-- Set the page size to 4 KB -->
<property name="pageSize" value="#{4 * 1024}"/>
<!--
Sets a path to the root directory where data and indexes are
to be persisted. It's assumed the directory is on a separated SSD.
-->
<property name="storagePath" value="/var/lib/ignite/persistence"/>
<!--
Sets a path to the directory where WAL is stored.
It's assumed the directory is on a separated HDD.
-->
<property name="walPath" value="/wal"/>
<!--
Sets a path to the directory where WAL archive is stored.
The directory is on the same HDD as the WAL.
-->
<property name="walArchivePath" value="/wal/archive"/>
</bean>
</property>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
IP_LIST_PLACEHOLDER
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Property enabled_plugins
contains list with plugin names that will be enabled. Property config
contains xml
configuration for Apache Ignite. Important placeholder variable is IP_LIST_PLACEHOLDER
which will be replaced by
automation with list of Apache Ignite nodes for self discovery.
How to start working with Apache Ignite Stateless setup
Stateless setup of Apache Ignite is done using Kubernetes deployments. This setup uses standard applications
LambdaStack's feature (similar to auth-service
, rabbitmq
). To enable stateless Ignite deployment use following
document:
kind: configuration/applications
title: "Kubernetes Applications Config"
name: default
specification:
applications:
- name: ignite-stateless
image_path: "lambdastack/ignite:2.9.1" # it will be part of the image path: {{local_repository}}/{{image_path}}
namespace: ignite
service:
rest_nodeport: 32300
sql_nodeport: 32301
thinclients_nodeport: 32302
replicas: 1
enabled_plugins:
- ignite-kubernetes # required to work on K8s
- ignite-rest-http
Adjust this config to your requirements with number of replicas and plugins that should be enabled.
5 - Helm
Helm "system" chart repository
LambdaStack provides Helm repository for internal usage inside our Ansible codebase. Currently only the "system" repository is available, but it's not designed to be used by regular users. In fact, regular users must not reuse it for any purpose.
LambdaStack developers can find it inside this location roles/helm_charts/files/system
. To add a chart to the repository it's enough just to put unarchived chart directory tree inside the location (in a separate directory) and re-run epcli apply
.
When the repository
Ansible role is run it copies all unarchived charts to the repository host, creates Helm repository (index.yaml
) and serves all these files from Apache HTTP server.
Installing Helm charts from the "system" repository
LambdaStack developers can reuse the "system" repository from any place inside the Ansible codebase. Moreover, it's a responsibility of a particular role to call the helm upgrade --install
command.
There is a helpler task file that can be reused for that purpose roles/helm/tasks/install-system-release.yml
. It's only responsible for installing already existing "system" Helm charts from the "system" repository.
This helper task expects such parameters/facts:
- set_fact:
helm_chart_name: <string>
helm_chart_values: <map>
helm_release_name: <string>
helm_chart_values
is a standard yaml map, values defined there replace default config of the chart (values.yaml
).
Our standard practice is to place those values inside the specification
document of the role that deploys the Helm release in Kubernetes.
Example config:
kind: configuration/<mykind-used-by-myrole>
name: default
specification:
helm_chart_name: mychart
helm_release_name: myrelease
helm_chart_values:
service:
port: 8080
nameOverride: mychart_custom_name
Example usage:
- name: Mychart
include_role:
name: helm
tasks_from: install-system-release.yml
vars:
helm_chart_name: "{{ specification.helm_chart_name }}"
helm_release_name: "{{ specification.helm_release_name }}"
helm_chart_values: "{{ specification.helm_chart_values }}"
By default all installed "system" Helm releases are deployed inside the ls-charts
namespace in Kubernetes.
Uninstalling "system" Helm releases
To uninstall Helm release roles/helm/tasks/delete-system-release.yml
can be used. For example:
- include_role:
name: helm
tasks_from: delete-system-release.yml
vars:
helm_release_name: myrelease
6 - Istio
Istio
Open source platform which allows you to run service mesh for distributed microservice architecture. It allows to connect, manage and run secure connections between microservices and brings lots of features such as load balancing, monitoring and service-to-service authentication without any changes in service code. Read more about Istio here.
Installing Istio
Istio in LambdaStack is provided as K8s application. By default, it is not installed. To deploy it you need to add "configuration/applications" document to your configuration yaml file, similar to the example below (enabled
flag must be set as true
):
Istio is installed using Istio Operator. Operator is a software extension to the Kubernetes API which has a deep knowledge how Istio deployments should look like and how to react if any problem appears. It is also very easy to make upgrades and automate tasks that would normally be executed by user/admin.
---
kind: configuration/applications
version: 0.8.0
title: "Kubernetes Applications Config"
provider: aws
name: default
specification:
applications:
...
## --- istio ---
- name: istio
enabled: true
use_local_image_registry: true
namespaces:
operator: istio-operator # namespace where operator will be deployed
watched: # list of namespaces which operator will watch
- istio-system
istio: istio-system # namespace where Istio control plane will be deployed
istio_spec:
profile: default # Check all possibilites https://istio.io/latest/docs/setup/additional-setup/config-profiles/
name: istiocontrolplane
Using this configuration file, controller will detect Istio Operator resource in first of watched namespaces and will install Istio components corresponding to the specified profile (default). Using the default profile, Istio control plane and Istio ingress gateway will be deployed in istio-system namespace.
How to set up service mesh for an application
The default Istio installation uses automcatic sidecar injection. You need to label the namespace where application will be hosted:
kubectl label namespace default istio-injection=enabled
Once the proper namespaces are labeled and Istio is deployed, you can deploy your applications or restart existing ones.
You may need to make an application accessible from outside of your Kubernetes cluster. An Istio Gateway which was deployed using default profile is used for this purpose. Define the ingress gateway deploying gateway and virtual service specification. The gateway specification describes the L4-L6 properties of a load balancer and the virtual service specification describes the L7 properties of a load balancer.
Example of the gateway and virtual service specification (You have to adapt the entire specification to the application):
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: httpbin-gateway
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "httpbin.example.com"
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
spec:
hosts:
- "httpbin.example.com"
gateways:
- httpbin-gateway
http:
- match:
- uri:
prefix: /status
- uri:
prefix: /delay
route:
- destination:
port:
number: 8000
host: httpbin
:warning: Pay attention to the network policies in your cluster if a CNI plugin is used that supports them (such as Calico or Canal). In this case, you should set up secure network policies for inter-microservice communication and communication between Envoy proxy and Istio control plane in your application's namespace. You can also just apply the following NetworkPolicy
:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
namespace: <your_application_namespace>
name: allow-istio-communication
spec:
podSelector: {}
egress:
- {}
ingress:
- {}
policyTypes:
- Egress
- Ingress
7 - Konnectivity
Konnectivity
Replaces using SSH Tunneling
This is currently a WIP (Work In Progress). Ansible playbook roles are being built and tested along with testing.
Server
Agent
RBAC
8 - Kubernetes
Kubernetes
Issues
See Troubleshooting
Kubectl
You can see from the Troubleshooting link above that the default secruity setup for kubectl
is to have sudo
rights to run and then to specify the kubeconfig=/etc/kubernetes/admin.conf
as an additional parameter to kubectl
. Also, by default, this only works on the Control Plane nodes. To have it work on Worker nodes or any node in the cluster do the following. Make sure it complies with your Security strategy:
# Control Plane node - Option 2 from link above...
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Once kubectl
is working as desired from a non-root user, you can simply:
- Copy the
./kube/config
file from the Control Plane node - Create the
./kube
directory in the non-root user's home directory and then paste theconfig
file copied in #1 - Do this for any node you want to access
kubectl
on for a given cluster
Supported CNI plugins
LambdaStack supports following CNI plugins:
Flannel is a default setting in LambdaStack configuration.
NOTE
Calico is not supported on Azure. To have an ability to use network policies, choose Canal.
Use the following configuration to set up an appropriate CNI plugin:
kind: configuration/kubernetes-master
name: default
specification:
advanced:
networking:
plugin: flannel
Kubernetes applications - overview
Currently, LambdaStack provides the following predefined applications which may be deployed with lambdastack:
- ignite
- rabbitmq
- auth-service (Keycloak)
- pgpool
- pgbouncer
- istio
All of them have
default configuration.
The common parameters are: name, enabled, namespace, image_path and use_local_image_registry.
If you set use_local_image_registry
to false
in configuration manifest, you have to provide a valid docker image
path in image_path
. Kubernetes will try to pull image from image_path
value externally.
To see what version of the application image is in local image registry please refer
to components list.
Note: The above link points to develop branch. Please choose the right branch that suits to LambdaStackphany version you are using.
How to expose service through HA Proxy load balancer
-
Create
NodePort
service type for your application in Kubernetes. -
Make sure your service has statically assigned
nodePort
(a number between 30000-32767), for example 31234. More info here. -
Add configuration document for
load_balancer
/HAProxy
to your main config file.kind: configuration/haproxy title: "HAProxy" name: haproxy specification: frontend: - name: https_front port: 443 https: yes backend: - http_back1 backend: - name: http_back1 server_groups: - kubernetes_node port: 31234 provider: <your-provider-here-replace-it>
-
Run
lambdastack apply
.
How to do Kubernetes RBAC
Kubernetes that comes with LambdaStack has an admin account created, you should consider creating more roles and accounts - especially when having many deployments running on different namespaces.
To know more about RBAC in Kubernetes use this link
How to run an example app
Here we will get a simple app to run using Docker through Kubernetes. We assume you are using Windows 10, have an LambdaStack cluster on Azure ready and have an Azure Container Registry ready (might not be created in early version LambdaStack clusters. If you don't have one you can skip to point no 11 and test the cluster using some public app from the original Docker Registry). Steps with asterisk can be skipped.
-
Install Chocolatey
-
Use Chocolatey to install:
- Docker-for-windows (
choco install docker-for-windows
, requires Hyper-V) - Azure-cli (
choco install azure-cli
)
- Docker-for-windows (
-
Make sure Docker for Windows is running (run as admin, might require a restart)
-
Run
docker build -t sample-app:v1 .
in examples/dotnet/lambdastack-web-app. -
*For test purposes, run your image locally with
docker run -d -p 8080:80 --name myapp sample-app:v1
and head tolocalhost:8080
to check if it's working. -
*Stop your local docker container with:
docker stop myapp
and rundocker rm myapp
to delete the container. -
*Now that you have a working docker image we can proceed to the deployment of the app on the LambdaStack Kubernetes cluster.
-
Run
docker login myregistry.azurecr.io -u myUsername -p myPassword
to login into your Azure Container Registry. Credentials are in theAccess keys
tab in your registry. -
Tag your image with:
docker tag sample-app:v1 myregistry.azurecr.io/samples/sample-app:v1
-
Push your image to the repo:
docker push myregistry.azurecr.io/samples/sample-app:v1
-
SSH into your LambdaStack clusters master node.
-
*Run
kubectl cluster-info
andkubectl config view
to check if everything is okay. -
Run
kubectl create secret docker-registry myregistry --docker-server myregistry.azurecr.io --docker-username myusername --docker-password mypassword
to create k8s secret with your registry data. -
Create
sample-app.yaml
file with contents:apiVersion: apps/v1 kind: Deployment metadata: name: sample-app spec: selector: matchLabels: app: sample-app replicas: 2 template: metadata: labels: app: sample-app spec: containers: - name: sample-app image: myregistry.azurecr.io/samples/sample-app:v1 ports: - containerPort: 80 resources: requests: cpu: 100m memory: 64Mi limits: memory: 128Mi imagePullSecrets: - name: myregistry
-
Run
kubectl apply -f sample-app.yaml
, and after a minute runkubectl get pods
to see if it works. -
Run
kubectl expose deployment sample-app --type=NodePort --name=sample-app-nodeport
, then runkubectl get svc sample-app-nodeport
and note the second port. -
Run
kubectl get pods -o wide
and check on which node is the app running. -
Access the app through [AZURE_NODE_VM_IP]:[PORT] from the two previous points - firewall changes might be needed.
How to set resource requests and limits for Containers
When Kubernetes schedules a Pod, it’s important that the Containers have enough resources to actually run. If you schedule a large application on a node with limited resources, it is possible for the node to run out of memory or CPU resources and for things to stop working! It’s also possible for applications to take up more resources than they should.
When you specify a Pod, it is strongly recommended to specify how much CPU and memory (RAM) each Container needs. Requests are what the Container is guaranteed to get. If a Container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits make sure a Container never goes above a certain value. For more details about the difference between requests and limits, see Resource QoS.
For more information, see the links below:
How to run CronJobs
NOTE: Examples have been moved to their own repo but they are not visible at the moment.
-
Follow the previous point using examples/dotnet/LambdaStack.SampleApps/LambdaStack.SampleApps.CronApp
-
Create
cronjob.yaml
file with contents:apiVersion: batch/v1beta1 kind: CronJob metadata: name: sample-cron-job spec: schedule: "*/1 * * * *" # Run once a minute failedJobsHistoryLimit: 5 jobTemplate: spec: template: spec: containers: - name: sample-cron-job image: myregistry.azurecr.io/samples/sample-cron-app:v1 restartPolicy: OnFailure imagePullSecrets: - name: myregistrysecret
-
Run
kubectl apply -f cronjob.yaml
, and after a minute runkubectl get pods
to see if it works. -
Run
kubectl get cronjob sample-cron-job
to get status of our cron job. -
Run
kubectl get jobs --watch
to see job scheduled by the “sample-cron-job” cron job.
How to test the monitoring features
Prerequisites: LambdaStack cluster on Azure with at least a single VM with prometheus
and grafana
roles enabled.
-
Copy ansible inventory from
build/lambdastack/*/inventory/
toexamples/monitoring/
-
Run
ansible-playbook -i NAME_OF_THE_INVENTORY_FILE grafana.yml
inexamples/monitoring
-
In the inventory file find the IP adress of the node of the machine that has grafana installed and head over to
https://NODE_IP:3000
- you might have to head over to Portal Azure and allow traffic to that port in the firewall, also ignore the possible certificate error in your browser. -
Head to
Dashboards/Manage
on the side panel and selectKubernetes Deployment metrics
- here you can see a sample kubernetes monitoring dashboard. -
Head to
http://NODE_IP:9090
to see Prometheus UI - there in the dropdown you have all of the metrics you can monitor with Prometheus/Grafana.
How to run chaos on LambdaStack Kubernetes cluster and monitor it with Grafana
-
SSH into the Kubernetes master.
-
Copy over
chaos-sample.yaml
file from the example folder and run it withkubectl apply -f chaos-sample.yaml
- it takes code fromgithub.com/linki/chaoskube
so normal security concerns apply. -
Run
kubectl create clusterrolebinding chaos --clusterrole=cluster-admin --user=system:serviceaccount:default:default
to start the chaos - random pods will be terminated with 5s ferquency, configurable inside the yaml file. -
Head over to Grafana at
https://NODE_IP:3000
, open a new dashboard, add a panel, set Prometheus as a data source and putkubelet_running_pod_count
in the query field - now you can see how Kubernetes is replacing killed pods and balancing them between the nodes. -
Run
kubectl get svc nginx-service
and note the second port. You can access the nginx page via[ANY_CLUSTER_VM_IP]:[PORT]
- it is accessible even though random pods carrying it are constantly killed at random, unless you have more vms in your cluster than deployed nginx instances and choose IP of one not carrying it.
How to test the central logging features
Prerequisites: LambdaStack cluster on Azure with at least a single VM with elasticsearch
, kibana
and filebeat
roles enabled.
-
Connect to kubectl using kubectl proxy or directly from Kubernetes master server
-
Apply from LambdaStack repository
extras/kubernetes/pod-counter
pod-counter.yaml
with command:kubectl apply -f yourpath_to_pod_counter/pod-counter.yaml
Paths are system dependend so please be aware of applying correct separator for your operatins system.
-
In the inventory file find the IP adress of the node of the machine that has kibana installed and head over to
http://NODE_IP:5601
- you might have to head over to Portal Azure and allow traffic to that port in the firewall. -
You can right now search for data from logs in Discover section in Kibana after creating filebeat-* index pattern. To create index pattern click Discover, then in Step 1: Define index pattern as filebeat-*. Then click Next step. In Step 2: Configure settings click Create index pattern. Right now you can go to Discover section and look at output from your logs.
-
You can verify if CounterPod is sending messages correctly and filebeat is gathering them correctly querying for
CounterPod
in search field in Discover section. -
For more informations refer to documentation: https://www.elastic.co/guide/en/kibana/current/index.html
How to tunnel Kubernetes Dashboard from remote kubectl to your PC
-
SSH into server, and forward port 8001 to your machine
ssh -i ls_keys/id_rsa operations@40.67.255.155 -L 8001:localhost:8001
NOTE: substitute IP with your cluster master's IP. -
On remote host: get admin token bearer:
kubectl describe secret $(kubectl get secrets --namespace=kube-system | grep admin-user | awk '{print $1}') --namespace=kube-system | grep -E '^token' | awk '{print $2}' | head -1
NOTE: save this token for next points. -
On remote host, open proxy to the dashboard
kubectl proxy
-
Now on your local machine navigate to
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
-
When prompted to put in credentials, use admin token from the previous point.
How to run Keycloak on Kubernetes
- Enable Kubernetes master & node, repository and postgresql components in initial configuration manifest (yaml) by encreasing
count
value.
kind: lambdastack-cluster
title: LambdaStack Cluster Config
provider: azure
name: default
build_path: '' # Dynamically built
specification:
components:
repository:
count: 1
kubernetes_master:
count: 1
kubernetes_node:
count: 2
postgresql:
count: 2
- Enable
applications
in feature-mapping in initial configuration manifest.
---
kind: configuration/feature-mapping
title: Feature mapping to roles
name: default
specification:
available_roles:
- _merge: true
- name: applications
enabled: true
- Enable required applications by setting
enabled: true
and adjust other parameters inconfiguration/applications
kind.
The default applications configuration available here
Note: To get working with Pgbouncer, Keycloak requires Pgbouncer configuration parametr POOL_MODE
set to session
, see Installing Pgbouncer and Pgpool section. The reason is that Keycloak uses SET SQL statements. For details see SQL feature map for pooling modes.
---
kind: configuration/applications
title: Kubernetes Applications Config
name: default
specification:
applications:
- _merge: true
- name: auth-service
enabled: true
image_path: lambdastack/keycloak:14.0.0
use_local_image_registry: true
service:
name: as-testauthdb
port: 30104
replicas: 2
namespace: namespace-for-auth
admin_user: auth-service-username
admin_password: PASSWORD_TO_CHANGE
database:
name: auth-database-name
user: auth-db-user
password: PASSWORD_TO_CHANGE
To set specific database host IP address for Keyclock you have to provide additional parameter address
:
database:
address: 10.0.0.2
Note: If database address
is not specified, lambdastack assumes that database instance doesn't exist and will create it.
By default, if database address
is not specified and if Postgres is HA mode, Keycloak uses PGBouncer ClusterIP service name as database address.
If Postgres is in standalone mode, and database address
is not specified, then it uses first Postgres host address from inventory
.
-
Run
lambdastack apply
on your configuration manifest. -
Log into GUI
Note: Accessing the Keycloak GUI depends on your configuration.
By default, LambdaStack provides the following K8s Services for Keycloak: Headless and NodePort.
The simplest way for reaching GUI is to use ssh tunnel with forwarding NodePort.
Example:
ssh -L 30104:localhost:30104 user@target_host -i ssh_key
If you need your GUI accesible outside, you would have to change your firewall rules.
GUI should be reachable at: https://localhost:30104/auth
9 - Logging
Centralized logging setup
For centralized logging LambdaStack uses OpenDistro for Elasticsearch.
In order to enable centralized logging, be sure that count
property for logging
feature is greater than 0 in your
configuration manifest.
kind: lambdastack-cluster
...
specification:
...
components:
kubernetes_master:
count: 1
kubernetes_node:
count: 0
...
logging:
count: 1
...
Default feature mapping for logging
...
logging:
- logging
- kibana
- node-exporter
- filebeat
- firewall
...
Optional feature (role) available for logging: logstash more details here: link
The logging
role replaced elasticsearch
role. This change was done to enable Elasticsearch usage also for data
storage - not only for logs as it was till 0.5.0.
Default configuration of logging
and opendistro_for_elasticsearch
roles is identical (
./DATABASES.md#how-to-start-working-with-opendistro-for-elasticsearch). To modify configuration of centralized logging
adjust and use the following defaults in your manifest:
kind: configuration/logging
title: Logging Config
name: default
specification:
cluster_name: LambdaStackElastic
clustered: True
paths:
data: /var/lib/elasticsearch
repo: /var/lib/elasticsearch-snapshots
logs: /var/log/elasticsearch
How to manage Opendistro for Elasticsearch data
Elasticsearch stores data using JSON documents, and an Index is a collection of documents. As in every database, it's crucial to correctly maintain data in this one. It's almost impossible to deliver database configuration which will fit to every type of project and data stored in. LambdaStack deploys preconfigured Opendistro Elasticsearch, but this configuration may not meet user requirements. Before going to production, configuration should be tailored to the project needs. All configuration tips and tricks are available in official documentation.
The main and most important decisions to take before you deploy cluster are:
- How many Nodes are needed
- How big machines and/or storage data disks need to be used
These parameters are defined in yaml file, and it's important to create a big enough cluster.
specification:
components:
logging:
count: 1 # Choose number of nodes
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
name: logging-machine
specification:
size: Standard_DS2_v2 # Choose machine size
If it's required to have Elasticsearch which works in cluster formation configuration, except setting up more than one machine in yaml config file please acquaint dedicated support article and adjust Elasticsearch configuration file.
At this moment Opendistro for Elasticsearch does not support plugin similar to ILM, log rotation is possible only by configuration created in Index State Management.
ISM - Index State Management
- is a plugin that provides users and administrative panel to monitor the indices and
apply policies at different index stages. ISM lets users automate periodic, administrative operations by triggering them
based on index age, size, or number of documents. Using the ISM plugin, can define policies that automatically handle
index rollovers or deletions. ISM is installed with Opendistro by default - user does not have to enable this. Official
documentation is available
in Opendistro for Elasticsearch website.
To reduce the consumption of disk resources, every index you created should use well-designed policy.
Among others these two index actions might save machine from filling up disk space:
Index Rollover
- rolls an alias
to a new index. Set up correctly max index size / age or minimum number of documents to keep index size in requirements
framework.
Index Deletion
- deletes indexes
managed by policy
Combining these actions, adapting them to data amount and specification users are able to create policy which will maintain data in cluster for example: to secure node from fulfilling disk space.
There is example of policy below. Be aware that this is only example, and it needs to be adjusted to environment needs.
{
"policy": {
"policy_id": "ls_policy",
"description": "Safe setup for logs management",
"last_updated_time": 1615201615948,
"schema_version": 1,
"error_notification": null,
"default_state": "keep",
"states": [
{
"name": "keep",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "14d"
}
},
{
"state_name": "rollover_by_size",
"conditions": {
"min_size": "1gb"
}
},
{
"state_name": "rollover_by_time",
"conditions": {
"min_index_age": "1d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
},
{
"name": "rollover_by_size",
"actions": [
{
"rollover": {}
}
],
"transitions": []
},
{
"name": "rollover_by_time",
"actions": [
{
"rollover": {}
}
],
"transitions": []
}
]
}
}
Example above shows configuration with rollover daily or when index achieve 1 GB size. Indexes older than 14 days will be deleted. States and conditionals could be combined. Please see policies documentation for more details.
Apply Policy
To apply policy use similar API request as presented below:
PUT _template/template_01
{
"index_patterns": ["filebeat*"],
"settings": {
"opendistro.index_state_management.rollover_alias": "filebeat"
"opendistro.index_state_management.policy_id": "ls_policy"
}
}
After applying this policy, every new index created under this one will apply to it. There is also possibility to apply policy to already existing policies by assigning them to policy in Index Management Kibana panel.
How to export Kibana reports to CSV format
Since v1.0 LambdaStack provides the possibility to export reports from Kibana to CSV, PNG or PDF using the Open Distro for Elasticsearch Kibana reports feature.
Check more details about the plugin and how to export reports in the documentation
Note: Currently in Open Distro for Elasticsearch Kibana the following plugins are installed and enabled by default: security, alerting, anomaly detection, index management, query workbench, notebooks, reports, alerting, gantt chart plugins.
You can easily check enabled default plugins for Kibana using the following command on the logging machine:
./bin/kibana-plugin list
in Kibana directory.
How to export Elasticsearch data to CSV format
Since v0.8 LambdaStack provides the possibility to export data from Elasticsearch to CSV using Logstash (logstash-oss) along with logstash-input-elasticsearch and logstash-output-csv plugins.
To install Logstash in your cluster add logstash to feature mapping for logging, opendistro_for_elasticsearch or * elasticsearch* group.
NOTE
To check plugin versions following command can be used
/usr/share/logstash/bin/logstash-plugin list --verbose
LambdaStack provides a basic configuration file (logstash-export.conf.template)
as template for your data export. This
file has to be modified according to your Elasticsearch configuration and data you want to export.
NOTE
Exporting data is not automated. It has to be invoked manually. Logstash daemon is disabled by default after installation.
Run Logstash to export data:
/usr/share/logstash/bin/logstash -f /etc/logstash/logstash-export.conf
More details about configuration of input and output plugins.
NOTE
At the moment input plugin doesn't officially support skipping certificate validation for secure connection to Elasticsearch. For non-production environment you can easily disable it by adding new line:
ssl_options[:verify] = false
right after other ssl_options definitions in file:
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-elasticsearch-*/lib/logstash/inputs/elasticsearch.rb
How to add multiline support for Filebeat logs
In order to properly handle multilines in files harvested by Filebeat you have to provide multiline
definition in the
configuration manifest. Using the following code you will be able to specify which lines are part of a single event.
By default, postgresql block is provided, you can use it as example:
postgresql_input:
multiline:
pattern: >-
'^\d{4}-\d{2}-\d{2} '
negate: true
match: after
Supported inputs: common_input
,postgresql_input
,container_input
More details about multiline options you can find in
the official documentation
How to deploy Filebeat as Daemonset in K8s
There is a possibility to deploy Filebeat as daemonset in K8s. To do that, set k8s_as_cloud_service
option to true
:
kind: lambdastack-cluster
specification:
cloud:
k8s_as_cloud_service: true
How to use default Kibana dashboards
It is possible to configure setup.dashboards.enabled
and setup.dashboards.index
Filebeat settings using specification.kibana.dashboards
key in configuration/filebeat
doc.
When specification.kibana.dashboards.enabled
is set to auto
, the corresponding setting in Filebeat configuration file will be set to true
only if Kibana is configured to be present on the host.
Other possible values are true
and false
.
Default configuration:
specification:
kibana:
dashboards:
enabled: auto
index: filebeat-*
Note: Setting specification.kibana.dashboards.enabled
to true
not providing Kibana will result in a Filebeat crash.
10 - Maintenance
Maintenance
Verification of service state
This part of the documentations covers the topic how to check if each component is working properly.
- Docker
To verify that Docker services are up and running you can first check the status of the Docker service with the following command:
systemctl status docker
Additionally you can check also if the command:
docker info
doesn't return any error. You can also find there useful information about your Docker configuration.
- Kubernetes
First to check if everything is working fine we need to check verify status of Kubernetes kubelet service with the command:
systemctl status kubelet
We can also check state of Kubernetes nodes using the command:
root@primary01:~# kubectl get nodes --kubeconfig=/etc/kubernetes/admin.conf
NAME STATUS ROLES AGE VERSION
primary01 Ready master 24h v1.17.7
node01 Ready <none> 23h v1.17.7
node02 Ready <none> 23h v1.17.7
We can get additional information about Kubernetes components:
root@primary01:~# kubectl cluster-info --kubeconfig=/etc/kubernetes/admin.conf
Kubernetes master is running at https://primary01:6443
CoreDNS is running at https://primary01:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
We can also check status of pods in all namespaces using the command:
kubectl get pods -A --kubeconfig=/etc/kubernetes/admin.conf
We can get additional information about components statuses:
root@primary01:~# kubectl get cs --kubeconfig=/etc/kubernetes/admin.conf
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
For more detailed information please refer to official documentation
- Keycloak
To check the if a Keycloak service deployed on Kubernetes is running with the command:
kubectl get pods --kubeconfig=/etc/kubernetes/admin.conf --namespace=keycloak_service_namespace --field-selector=status.phase=Running | grep keycloak_service_name
- HAProxy
To check status of HAProxy we can use the command:
systemctl status haproxy
Additionally we can check if the application is listening on ports defined in the file haproxy.cfg running netstat command.
- Prometheus
To check status of Prometheus we can use the command:
systemctl status prometheus
We can also check if Prometheus service is listening at the port 9090:
netstat -antup | grep 9090
- Grafana
To check status of Grafana we can use the command:
systemctl status grafana-server
We can also check if Grafana service is listening at the port 3000:
netstat -antup | grep 3000
- Prometheus Node Exporter
To check status of Node Exporter we can use the command:
status prometheus-node-exporter
- Elasticsearch
To check status of Elasticsearch we can use the command:
systemct status elasticsearch
We can check if service is listening on 9200 (API communication port):
netstat -antup | grep 9200
We can also check if service is listening on 9300 (nodes coummunication port):
netstat -antup | grep 9300
We can also check status of Elasticsearch cluster:
<IP>:9200/_cluster/health
We can do this using curl or any other equivalent tool.
- Kibana
To check status of Kibana we can use the command:
systemctl status kibana
We can also check if Kibana service is listening at the port 5601:
netstat -antup | grep 5601
- Filebeat
To check status of Filebeat we can use the command:
systemctl status filebeat
- PostgreSQL
To check status of PostgreSQL we can use commands:
- on Ubuntu:
systemctl status postgresql
- on Red Hat:
systemctl status postgresql-10
where postgresql-10 is only an example, because the number differs from version to version. Please refer to your version number in case of using this command.
We can also check if PostgreSQL service is listening at the port 5432:
netstat -antup | grep 5432
We can also use the pg_isready command, to get information if the PostgreSQL server is running and accepting connections with command:
- on Ubuntu:
[user@postgres01 ~]$ pg_isready
/var/run/postgresql:5432 - accepting connections
- on Red Hat:
[user@postgres01 ~]$ /usr/pgsql-10/bin/pg_isready
/var/run/postgresql:5432 - accepting connections
where the path /usr/pgsql-10/bin/pg_isready is only an example, because the number differs from version to version. Please refer to your version number in case of using this command.
11 - Modules
Modules
Introduction
In version 0.8 of LambdaStack we introduced modules. Modularization of LambdaStack environment will result in:
- smaller code bases for separate areas,
- simpler and faster test process,
- interchangeability of elements providing similar functionality (eg.: different Kubernetes providers),
- faster and more focused release cycle.
Those and multiple other factors (eg.: readability, reliability) influence this direction of changes.
User point of view
From a user point of view, there will be no significant changes in the nearest future as it will be still possible to install LambdaStack "classic way" so with a single lambdastack
configuration using a whole codebase as a monolith.
For those who want to play with new features, or will need newly introduced possibilities, there will be a short transition period which we consider as a kind of "preview stage". In this period there will be a need to run each module separately by hand in the following order:
- moduleA init
- moduleA plan
- moduleA apply
- moduleB init
- moduleB plan
- moduleB apply
- ...
Init, plan and apply phases explanation you'll find in next sections of this document. Main point is that dependent modules have to be executed one after another during this what we called "preview stage". Later, with next releases there will be separate mechanism introduced to orchestrate modules dependencies and their consecutive execution.
New scenarios
In 0.8 we offer the possibility to use AKS or EKS as Kubernetes providers. That is introduced with modules mechanism, so we launched the first four modules:
- Azure Basic Infrastructure (AzBI) module
- Azure AKS (AzKS) module
- AWS Basic Infrastructure (AwsBI) module
- AWS EKS (AwsKS) module
Those 4 modules together with the classic LambdaStack used with any
provider allow replacing of on-prem Kubernetes cluster with managed Kubernetes services.
As it might be already visible there are 2 paths provided:
- Azure related, using AzBI and AzKS modules,
- AWS related, using AwsBI and AwsKS modules.
Those "... Basic Infrastructure" modules are responsible to provide basic cloud resources (eg.: resource groups, virtual networks, subnets, virtual machines, network security rules, routing, ect.) which will be used by next modules. So in this case, those are "... KS modules" meant to provide managed Kubernetes services. They use resources provided by basic infrastructure modules (eg.: subnets or resource groups) and instantiate managed Kubernetes services provided by cloud providers. The last element in both those cloud provider related paths is classic LambdaStack installed on top of resources provided by those modules using any
provider.
Hands-on
In each module, we provided a guide on how to use the module. Please refer:
- Azure Basic Infrastructure (AzBI) module
- Azure AKS (AzKS) module
- AWS Basic Infrastructure (AwsBI) module
- AWS EKS (AwsKS) module
After deployment of EKS or AKS, you can perform LambdaStack installation on top of it.
Install LambdaStack on top of AzKS or AwsKS
NOTE - Default OS users:
Azure:
redhat: ec2-user
ubuntu: operations
AWS:
redhat: ec2-user
ubuntu: ubuntu
-
Create LambdaStack cluster config file in
/tmp/shared/ls.yml
Example:kind: lambdastack-cluster title: LambdaStack Cluster Config name: your-cluster-name # <----- make unified with other places and build directory name build_path: # Dynamically built provider: any # <----- use "any" provider specification: name: your-cluster-name # <----- make unified with other places and build directory name admin_user: name: operations # <----- make sure os-user is correct key_path: /tmp/shared/vms_rsa # <----- use generated key file path: # Dynamically built cloud: k8s_as_cloud_service: true # <----- make sure that flag is set, as it indicates usage of a managed Kubernetes service components: repository: count: 1 machines: - default-lambdastack-modules-test-all-0 # <----- make sure that it is correct VM name kubernetes_master: count: 0 kubernetes_node: count: 0 logging: count: 0 monitoring: count: 0 kafka: count: 0 postgresql: count: 1 machines: - default-lambdastack-modules-test-all-1 # <----- make sure that it is correct VM name load_balancer: count: 0 rabbitmq: count: 0 --- kind: configuration/feature-mapping title: Feature mapping to roles name: your-cluster-name # <----- make unified with other places and build directory name provider: any specification: roles_mapping: repository: - repository - image-registry - firewall - filebeat - node-exporter - applications --- kind: infrastructure/machine name: default-lambdastack-modules-test-all-0 provider: any specification: hostname: lambdastack-modules-test-all-0 ip: 12.34.56.78 # <----- put here public IP attached to machine --- kind: infrastructure/machine name: default-lambdastack-modules-test-all-1 provider: any specification: hostname: lambdastack-modules-test-all-1 ip: 12.34.56.78 # <----- put here public IP attached to machine --- kind: configuration/repository title: "LambdaStack requirements repository" name: default specification: description: "Local repository of binaries required to install LambdaStack" download_done_flag_expire_minutes: 120 apache_lsrepo_path: "/var/www/html/lsrepo" teardown: disable_http_server: true remove: files: false helm_charts: false images: false packages: false provider: any --- kind: configuration/postgresql title: PostgreSQL name: default specification: config_file: parameter_groups: - name: CONNECTIONS AND AUTHENTICATION subgroups: - name: Connection Settings parameters: - name: listen_addresses value: "'*'" comment: listen on all addresses - name: Security and Authentication parameters: - name: ssl value: 'off' comment: to have the default value also on Ubuntu - name: RESOURCE USAGE (except WAL) subgroups: - name: Kernel Resource Usage parameters: - name: shared_preload_libraries value: AUTOCONFIGURED comment: set by automation - name: ERROR REPORTING AND LOGGING subgroups: - name: Where to Log parameters: - name: log_directory value: "'/var/log/postgresql'" comment: to have standard location for Filebeat and logrotate - name: log_filename value: "'postgresql.log'" comment: to use logrotate with common configuration - name: WRITE AHEAD LOG subgroups: - name: Settings parameters: - name: wal_level value: replica when: replication - name: Archiving parameters: - name: archive_mode value: 'on' when: replication - name: archive_command value: "'test ! -f /dbbackup/{{ inventory_hostname }}/backup/%f &&\ \ gzip -c < %p > /dbbackup/{{ inventory_hostname }}/backup/%f'" when: replication - name: REPLICATION subgroups: - name: Sending Server(s) parameters: - name: max_wal_senders value: 10 comment: maximum number of simultaneously running WAL sender processes when: replication - name: wal_keep_segments value: 34 comment: number of WAL files held for standby servers when: replication extensions: pgaudit: enabled: false shared_preload_libraries: - pgaudit config_file_parameters: log_connections: 'off' log_disconnections: 'off' log_statement: 'none' log_line_prefix: "'%m [%p] %q%u@%d,host=%h '" pgaudit.log: "'write, function, role, ddl' # 'misc_set' is not supported for\ \ PG 10" pgaudit.log_catalog: 'off # to reduce overhead of logging' pgaudit.log_relation: 'on # separate log entry for each relation' pgaudit.log_statement_once: 'off' pgaudit.log_parameter: 'on' pgbouncer: enabled: false replication: enabled: false replication_user_name: ls_repmgr replication_user_password: PASSWORD_TO_CHANGE privileged_user_name: ls_repmgr_admin privileged_user_password: PASSWORD_TO_CHANGE repmgr_database: ls_repmgr shared_preload_libraries: - repmgr logrotate: config: |- /var/log/postgresql/postgresql*.log { maxsize 10M daily rotate 6 copytruncate # delaycompress is for Filebeat delaycompress compress notifempty missingok su root root nomail # to have multiple unique filenames per day when dateext option is set dateformat -%Y%m%dH%H } provider: any --- kind: configuration/applications title: "Kubernetes Applications Config" name: default specification: applications: - name: ignite-stateless enabled: false image_path: "lambdastack/ignite:2.9.1" use_local_image_registry: false namespace: ignite service: rest_nodeport: 32300 sql_nodeport: 32301 thinclients_nodeport: 32302 replicas: 1 enabled_plugins: - ignite-kubernetes - ignite-rest-http - name: rabbitmq enabled: false image_path: rabbitmq:3.8.3 use_local_image_registry: false service: name: rabbitmq-cluster port: 30672 management_port: 31672 replicas: 2 namespace: queue rabbitmq: plugins: - rabbitmq_management - rabbitmq_management_agent policies: - name: ha-policy2 pattern: ".*" definitions: ha-mode: all custom_configurations: - name: vm_memory_high_watermark.relative value: 0.5 cluster: - name: auth-service enabled: false image_path: jboss/keycloak:9.0.0 use_local_image_registry: false service: name: as-testauthdb port: 30104 replicas: 2 namespace: namespace-for-auth admin_user: auth-service-username admin_password: PASSWORD_TO_CHANGE database: name: auth-database-name user: auth-db-user password: PASSWORD_TO_CHANGE - name: pgpool enabled: true image: path: bitnami/pgpool:4.1.1-debian-10-r29 debug: false use_local_image_registry: false namespace: postgres-pool service: name: pgpool port: 5432 replicas: 3 pod_spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - pgpool topologyKey: kubernetes.io/hostname nodeSelector: {} tolerations: {} resources: limits: memory: 176Mi requests: cpu: 250m memory: 176Mi pgpool: env: PGPOOL_BACKEND_NODES: autoconfigured PGPOOL_POSTGRES_USERNAME: ls_pgpool_postgres_admin PGPOOL_SR_CHECK_USER: ls_pgpool_sr_check PGPOOL_ADMIN_USERNAME: ls_pgpool_admin PGPOOL_ENABLE_LOAD_BALANCING: true PGPOOL_MAX_POOL: 4 PGPOOL_POSTGRES_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_postgres_password PGPOOL_SR_CHECK_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_sr_check_password PGPOOL_ADMIN_PASSWORD_FILE: /opt/bitnami/pgpool/secrets/pgpool_admin_password secrets: pgpool_postgres_password: PASSWORD_TO_CHANGE pgpool_sr_check_password: PASSWORD_TO_CHANGE pgpool_admin_password: PASSWORD_TO_CHANGE pgpool_conf_content_to_append: | #------------------------------------------------------------------------------ # CUSTOM SETTINGS (appended by LambdaStack to override defaults) #------------------------------------------------------------------------------ # num_init_children = 32 connection_life_time = 900 reserved_connections = 1 pool_hba_conf: autoconfigured - name: pgbouncer enabled: true image_path: brainsam/pgbouncer:1.12 init_image_path: bitnami/pgpool:4.1.1-debian-10-r29 use_local_image_registry: false namespace: postgres-pool service: name: pgbouncer port: 5432 replicas: 2 resources: requests: cpu: 250m memory: 128Mi limits: cpu: 500m memory: 128Mi pgbouncer: env: DB_HOST: pgpool.postgres-pool.svc.cluster.local DB_LISTEN_PORT: 5432 LISTEN_ADDR: "*" LISTEN_PORT: 5432 AUTH_FILE: "/etc/pgbouncer/auth/users.txt" AUTH_TYPE: md5 MAX_CLIENT_CONN: 150 DEFAULT_POOL_SIZE: 25 RESERVE_POOL_SIZE: 25 POOL_MODE: transaction provider: any
-
Run
lambdastack
tool to install LambdaStack:lambdastack --auto-approve apply --file='/tmp/shared/ls.yml' --vault-password='secret'
This will install PostgreSQL on one of the machines and configure PgBouncer, Pgpool and additional services to manage database connections.
Please make sure you disable applications that you don't need. Also, you can enable standard LambdaStack services like Kafka or RabbitMQ, by increasing the number of virtual machines in the basic infrastructure config and assigning them to LambdaStack components you want to use.
If you would like to deploy custom resources into managed Kubernetes, then the standard kubeconfig yaml document can be found inside the shared state file (you should be able to use vendor tools as well to get it).
We highly recommend using the
Ingress
resource in Kubernetes to allow access to web applications inside the cluster. Since it's managed Kubernetes and fully supported by the cloud platform, the classic HAProxy load-balancer solution seems to be deprecated here.
12 - Monitoring
Table of contents
Prometheus:
- How to enable provided Prometheus rules
- How to enable Alertmanager
- How to configure scalable Prometheus setup
Grafana:
Kibana:
Azure:
AWS:
Prometheus
Prometheus is an open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach. For more information about the features, components and architecture of Prometheus please refer to the official documentation.
How to enable provided Prometheus rules
Prometheus role provides the following files with rules:
- common.rules (contain basic alerts like cpu load, disk space, memomory usage etc..)
- container.rules (contain container alerts like container killed, volume usage, volume IO usage etc..)
- kafka.rules (contain kafka alerts like consumer lags, )
- node.rules (contain node alerts like node status, oom, cpu load, etc..)
- postgresql.rules (contain postgresql alerts like postgresql status, exporter error, dead locks, etc..)
- prometheus.rules (contain additional alerts for monitoring Prometheus itself + Alertmanager)
However, only common rules are enabled by default. To enable a specific rule you have to meet two conditions:
- Your infrastructure has to have a specific component enabled (count > 0)
- You have to set the value to "true" in Prometheus configuration in a manifest:
kind: configuration/prometheus
...
specification:
alert_rules:
common: true
container: false
kafka: false
node: false
postgresql: false
prometheus: false
For more information about how to setup Prometheus alerting rules, refer to the official website.
How to enable Alertmanager
LambdaStack provides Alertmanager configuration via configuration manifest. To see default configuration please refer to default Prometheus configuration file.
To enable Alertmanager you have to modify configuration manifest:
- Enable Alermanager
- Enable desired alerting rules
- Provide at least one receiver
Example:
...
specification:
...
alertmanager:
enable: true
alert_rules:
common: true
container: false
kafka: false
node: false
postgresql: false
prometheus: false
...
config:
route:
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: "test@domain.com"
For more details about Alertmanager configuration please refer to the official documentation
How to configure scalable Prometheus setup
If you want to create scalable Prometheus setup you can use federation. Federation lets you scrape metrics from different Prometheus instances on one Prometheus instance.
In order to create a federation of Prometheus add to your configuration (for example to prometheus.yaml
file) of previously created Prometheus instance (on which you want to scrape data from other
Prometheus instances) to scrape_configs
section:
scrape_configs:
- job_name: federate
metrics_path: /federate
params:
'match[]':
- '{job=~".+"}'
honor_labels: true
static_configs:
- targets:
- your-prometheus-endpoint1:9090
- your-prometheus-endpoint2:9090
- your-prometheus-endpoint3:9090
...
- your-prometheus-endpointn:9090
To check if Prometheus from which you want to scrape data is accessible, you can use a command like below (on Prometheus instance where you want to scrape data):
curl -G --data-urlencode 'match[]={job=~".+"}' your-prometheus-endpoint:9090/federate
If everything is configured properly and Prometheus instance from which you want to gather data is up and running, this should return the metrics from that instance.
Grafana
Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. For more information about Grafana please refer to the official website.
How to setup default admin password and user in Grafana
Prior to setup Grafana, please setup in your configuration yaml new password and/or name for your admin user. If not, default "admin" user will be used with the default password "PASSWORD_TO_CHANGE".
kind: configuration/grafana
specification:
...
# Variables correspond to ones in grafana.ini configuration file
# Security
grafana_security:
admin_user: admin
admin_password: "YOUR_PASSWORD"
...
More information about Grafana security you can find at https://grafana.com/docs/grafana/latest/installation/configuration/#security address.
Import and create Grafana dashboards
LambdaStack uses Grafana for monitoring data visualization. LambdaStack installation creates Prometheus datasource in Grafana, so the only additional step you have to do is to create your dashboard.
There are also many ready to take Grafana dashboards created by community - remember to check license before importing any of those dashboards.
Creating dashboards
You can create your own dashboards Grafana getting started page will help you with it. Knowledge of Prometheus will be really helpful when creating diagrams since it use PromQL to fetch data.
Importing dashboards via Grafana GUI
To import existing dashboard:
- If you have found dashboard that suits your needs you can import it directly to Grafana going to menu item
Dashboards/Manage
in your Grafana web page. - Click
+Import
button. - Enter dashboard id or load json file with dashboard definition
- Select datasource for dashboard - you should select
Prometheus
. - Click
Import
Importing dashboards via configuration manifest
In order to pull a dashboard from official Grafana website during lambdastack execution, you have to provide dashboard_id, revision_id and datasource in your configuration manifest.
Example:
kind: configuration/grafana
specification:
...
grafana_online_dashboards:
- dashboard_id: '4271'
revision_id: '3'
datasource: 'Prometheus'
Enabling predefined Grafana dashboards
Since v1.1.0 LambdaStack provides predefined Grafana dashboards. These dashboards are available in online and offline deployment modes.
To enable particular Grafana dashboard, refer to default Grafana configuration file, copy kind: configuration/grafana
section to your configuration manifest and uncomment desired dashboards.
Example:
kind: configuration/grafana
specification:
...
grafana_external_dashboards:
# Kubernetes cluster monitoring (via Prometheus)
- dashboard_id: '315'
datasource: 'Prometheus'
# Node Exporter Server Metrics
- dashboard_id: '405'
datasource: 'Prometheus'
Note: The above link points to develop branch. Please choose the right branch that suits to LambdaStack version you are using.
Components used for monitoring
There are many monitoring components deployed with LambdaStack that you can visualize data from. The knowledge which components are used is important when you look for appropriate dashboard on Grafana website or creating your own query to Prometheus.
List of monitoring components - so called exporters:
- cAdvisor
- HAProxy Exporter
- JMX Exporter
- Kafka Exporter
- Node Exporter
- Zookeeper Exporter
When dashboard creation or import succeeds you will see it on your dashboard list.
Note: For some dashboards, there is no data to visualize until there is traffic activity for the monitored component.
Kibana
Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. For more informations about Kibana please refer to the official website.
How to configure Kibana - Open Distro
In order to start viewing and analyzing logs with Kibana, you first need to add an index pattern for Filebeat according to the following steps:
- Goto the
Management
tab - Select
Index Patterns
- On the first step define as index pattern:
filebeat-*
Click next. - Configure the time filter field if desired by selecting
@timestamp
. This field represents the time that events occurred or were processed. You can choose not to have a time field, but you will not be able to narrow down your data by a time range.
This filter pattern can now be used to query the Elasticsearch indices.
By default Kibana adjusts the UTC time in @timestamp
to the browser's local timezone. This can be changed in Management
> Advanced Settings
> Timezone for date formatting
.
How to configure default user passwords for Kibana - Open Distro, Open Distro for Elasticsearch and Filebeat
To configure admin password for Kibana - Open Distro and Open Distro for Elasticsearch you need to follow the procedure below.
There are separate procedures for logging
and opendistro-for-elasticsearch
roles since most of the times for opendistro-for-elasticsearch
, kibanaserver
and logstash
users are not required to be present.
Logging component
- Logging role
By default LambdaStack removes users that are listed in demo_users_to_remove
section of configuration/logging
doc.
By default, kibanaserver
user (needed by default LambdaStack installation of Kibana) and logstash
(needed by default LambdaStack
installation of Filebeat) are not removed. If you want to perform configuration by LambdaStack, set kibanaserver_user_active
to true
for kibanaserver
user or logstash_user_active
for logstash
user. For logging
role, those settings are already set to true
by default.
We strongly advice to set different password for each user.
To change admin
user's password, change value for admin_password
key. For kibanaserver
and logstash
, change values
for kibanaserver_password
and logstash_password
keys respectively. Changes from logging role will be propagated to Kibana
and Filebeat configuration.
kind: configuration/logging
title: Logging Config
name: default
specification:
...
admin_password: YOUR_PASSWORD
kibanaserver_password: YOUR_PASSWORD
kibanaserver_user_active: true
logstash_password: YOUR_PASSWORD
logstash_user_active: true
demo_users_to_remove:
- kibanaro
- readall
- snapshotrestore
- Kibana role
To set password of kibanaserver
user, which is used by Kibana for communication with Open Distro Elasticsearch backend follow the procedure
described in Logging role.
- Filebeat role
To set password of logstash
user, which is used by Filebeat for communication with Open Distro Elasticsearch backend follow the procedure described
in Logging role.
Open Distro for Elasticsearch component
By default LambdaStack removes all demo users except admin
user. Those users are listed in demo_users_to_remove
section
of configuration/opendistro-for-elasticsearch
doc. If you want to keep kibanaserver
user (needed by default LambdaStack installation of Kibana),
you need to remove it from demo_users_to_remove
list and set kibanaserver_user_active
to true
in order to change the default password.
We strongly advice to set different password for each user.
To change admin
user's password, change value for admin_password
key. For kibanaserver
and logstash
, change values for kibanaserver_password
and logstash_password
keys respectively.
kind: configuration/opendistro-for-elasticsearch
title: Open Distro for Elasticsearch Config
name: default
specification:
...
admin_password: YOUR_PASSWORD
kibanaserver_password: YOUR_PASSWORD
kibanaserver_user_active: false
logstash_password: YOUR_PASSWORD
logstash_user_active: false
demo_users_to_remove:
- kibanaro
- readall
- snapshotrestore
- logstash
- kibanaserver
Upgrade of Elasticsearch, Kibana and Filebeat
During upgrade LambdaStack takes kibanaserver
(for Kibana) and logstash
(for Filebeat) user passwords and re-applies them to upgraded configuration of Filebeat and Kibana. LambdaStack upgrade of Open Distro, Kibana or Filebeat will fail if kibanaserver
or logstash
usernames were changed in configuration of Kibana, Filebeat or Open Distro for Elasticsearch.
Azure
How to configure Azure additional monitoring and alerting
Setting up addtional monitoring on Azure for redundancy is good practice and might catch issues the LambdaStack monitoring might miss like:
- Azure issues and resource downtime
- Issues with the VM which runs the LambdaStack monitoring and Alerting (Prometheus)
More information about Azure monitoring and alerting you can find under links provided below:
https://docs.microsoft.com/en-us/azure/azure-monitor/overview
https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-overview-alerts
AWS
How to configure AWS additional monitoring and alerting
TODO
13 - OS Patching
Patching OS with running LambdaStack components
This guide describes steps you have to perform to patch RHEL and Ubuntu operating systems in a way to not to interrupt working LambdaStack components.
Disclaimer
We provide a recommended way to patch your RHEL and Ubuntu operating systems. Before proceeding with patching the production environment we strongly recommend patching your test cluster first. This document will help you decide how you should patch your OS. This is not a step-by-step guide.
Requirements
- The fresh, actual backup containing your all important data
- Verify if repositories are in the desired state. Details here
Table of contents
AWS
Suggested OS images
For LambdaStack >= v1.2 we recommend the following image (AMI):
- RHEL:
RHEL-7.9_HVM-20210208-x86_64-0-Hourly2-GP2
(kernel 3.10.0-1160.15.2.el7.x86_64), - Ubuntu:
ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20210907
(kernel 5.4.0-1056-aws).
Note: For different supported OS versions this guide may be useful as well.
Patching methods
AWS provides Patch Manager
that automates the process of patching managed instances.
Benefits:
- Automate patching
- Define approval rules
- Create patch baselines
- Monitor compliance
This feature is available via:
- console: Systems Manager > Instances & Nodes > Patch Manager
- AWS CLI
For more information, refer to AWS Systems Manager User Guide.
Azure
Suggested OS images
For LambdaStack >= v1.2 we recommend the following image (urn):
- RHEL:
RedHat:RHEL:7-LVM:7.9.2021051701
(kernel 3.10.0-1160.el7.x86_64), - Ubuntu:
Canonical:UbuntuServer:18.04-LTS:18.04.202109130
(kernel 5.4.0-1058-azure).
Note: For different supported OS versions this guide may be useful as well.
Patching methods
Azure has Update Management
solution in Azure Automation
. It gives you visibility into update compliance across Azure and other clouds, and on-premises. The feature allows you to create scheduled deployments that orchestrate the installation of updates within a defined maintenance window.
To manage updates that way please refer to official documentation.
Patching with OS specific package manager
The following commands can be executed in both clustered and non-clustered environments. In case of patching non-clustered environment, you have to schedule a maintenance window due to the required reboot after kernel patching.
Note: Some of the particular patches may also require a system reboot.
If your environment is clustered then hosts should be patched one by one. Before proceeding with the next host be sure that the patched host is up and all its components are running. For information how to check state of specific LambdaStack components, see here.
Repositories
LambdaStack uses the repository role to provide all required packages. The role disables all existing repositories and provides a new one. After successful LambdaStack deployment, official repositories should be re-enabled and lambdastack-provided repository should be disabled.
RHEL
Verify if lsrepo is disabled:
yum repolist lsrepo
Verify if repositories you want to use for upgrade are enabled:
yum repolist all
List installed security patches:
yum updateinfo list security installed
List available patches without installing them:
yum updateinfo list security available
Grab more details about available patches:
yum updateinfo info security available
or specific patch: yum updateinfo info security <patch_name>
Install system security patches:
sudo yum update-minimal --sec-severity=critical,important --bugfix
Install all patches and updates, not only flagged as critical and important:
sudo yum update
You can also specify the exact bugfix you want to install or even which CVE vulnerability to patch, for example:
sudo yum update --cve CVE-2008-0947
Available options:
--advisory=ADVS, --advisories=ADVS
Include packages needed to fix the given advisory, in updates
--bzs=BZS Include packages needed to fix the given BZ, in updates
--cves=CVES Include packages needed to fix the given CVE, in updates
--sec-severity=SEVS, --secseverity=SEVS
Include security relevant packages matching the severity, in updates
Additional information Red Hat provides notifications about security flaws that affect its products in the form of security advisories. For more information, see here.
Ubuntu
For automated security patches Ubuntu uses unattended-upgrade facility. By default it runs every day. To verify it on your system, execute:
dpkg --list unattended-upgrades
cat /etc/apt/apt.conf.d/20auto-upgrades | grep Unattended-Upgrade
For information how to change Unattended-Upgrade configuration, see here.
The following steps will allow you to perform an upgrade manually.
Update your local repository cache:
sudo apt update
Verify if lsrepo is disabled:
apt-cache policy | grep lsrepo
Verify if repositories you want to use for upgrade are enabled:
apt-cache policy
List available upgrades without installing them:
apt-get upgrade -s
List available security patches:
sudo unattended-upgrade -d --dry-run
Install system security patches:
sudo unattended-upgrade -d
Install all patches and updates with dependencies:
sudo apt-get dist-upgrade
Verify if your system requires a reboot after an upgrade (check if file exists):
test -e /var/run/reboot-required && echo reboot required || echo reboot not required
Additional information Canonical provides notifications about security flaws that affect its products in the form of security notices. For more information, see here.
Patching with external tools
Solutions are available to perform kernel patching without system reboot.
- Red Hat kpatch only for RHEL,
- Canonical Livepatch Service only for Ubuntu,
- KernelCare - third-party software. Available also in AWS Marketplace in SaaS model.
If you have a valid subscription for any of the above tools, we highly recommend using it to patch your systems.
14 - Persistent Storage
Kubernetes persistent storage
LambdaStack supports Azure Files and Amazon EFS storage types to use as Kubernetes persistent volumes.
Azure
Infrastructure
LambdaStack creates a storage account with "Standard" tier and locally-redundant storage ("LRS" redundancy option). This storage account contains a file share with the name "k8s".
With the following configuration it is possible to specify storage account name and "k8s" file share quota in GiB.
---
kind: infrastructure/storage-share
name: default
provider: azure
specification:
quota: 50
Kubernetes
There are a few related K8s objects created such as PersistentVolume, PersistentVolumeClaim and "azure-secret" Secret
when specification.storage.enable
is set to true
. It is possible to control pv/pvc names and storage
capacity/request in GiB with the configuration below.
NOTE
It makes no sense to specify greater capacity than Azure file share allows using. In general these values should be the same.
---
kind: configuration/kubernetes-master
name: default
provider: azure
specification:
storage:
name: lambdastack-cluster-volume
enable: true
capacity: 50
Additional configuration
It is possible to use Azure file shares created by your own. Check documentation for details. Created file shares may be used in different ways. There are appropriate configuration examples below.
NOTE
Before applying configuration, storage access secret should be created
Direct approach
As LambdaStack always creates a file share when provider: azure
is used, in this case similar configuration can be used
even with specification.storage.enable
set to false
.
apiVersion: v1
kind: Pod
metadata:
name: azure1
spec:
containers:
- image: busybox
name: azure
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
volumeMounts:
- name: azure
mountPath: /mnt/azure
volumes:
- name: azure
azureFile:
secretName: azure-secret
shareName: k8s
readOnly: false
Using persistent volumes
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: lambdastack-cluster-volume
spec:
storageClassName: azurefile
capacity:
storage: 50Gi
accessModes:
- "ReadWriteMany"
azureFile:
secretName: azure-secret
shareName: k8s
readOnly: false
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: lambdastack-cluster-volume-claim
spec:
storageClassName: azurefile
volumeName: lambdastack-cluster-volume
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Pod
metadata:
name: azure2
spec:
containers:
- image: busybox
name: azure
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
volumeMounts:
- name: azure
mountPath: /mnt/azure
volumes:
- name: azure
persistentVolumeClaim:
claimName: lambdastack-cluster-volume-claim
AWS
Infrastructure
Amazon EFS can be configured using following configuration.
---
kind: infrastructure/efs-storage
provider: aws
name: default
specification:
encrypted: true
performance_mode: generalPurpose
throughput_mode: bursting
#provisioned_throughput_in_mibps: # The throughput, measured in MiB/s, that you want to provision for the file system. Only applicable when throughput_mode set to provisioned
Kubernetes
Configuration for AWS supports additional parameter specification.storage.path
that allows specifying the path on EFS
to be accessed by pods. When specification.storage.enable
is set to true
, PersistentVolume and PersistentVolumeClaim
are created
---
kind: configuration/kubernetes-master
name: default
provider: aws
specification:
storage:
name: lambdastack-cluster-volume
path: /
enable: true
capacity: 50
Additional configuration
If provider: aws
is specified, EFS storage is always created and can be used with persistent volumes created by the
user. It is possible to create a separate EFS and use it. For more information check Kubernetes
NFS storage documentation. There is another way
to use EFS by Amazon EFS CSI driver but this approach
is not supported by LambdaStack's AWS provider.
Persistent volume creation example
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: lambdastack-cluster-volume
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 100Gi
mountOptions:
- hard
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- timeo=600
- retrans=2
nfs:
path: /
server: fs-xxxxxxxx.efs.eu-west-1.amazonaws.com
storageClassName: defaultfs
volumeMode: Filesystem
15 - Prerequisites
Run LambdaStack from Docker image
There are 2 ways to get the image, build it locally yourself or pull it from the LambdaStack docker registry.
Option 1 - Build LambdaStack image locally
Shows the option of pushing the locally generated image to Docker Hub as well.
-
Install the following dependencies:
- Docker
-
Open a terminal in the root directory of the LambdaStack source code and run (it should contain the /cli subdirectory. This also where the Dockerfile is located). There are two options below, the first option builds and applies a specific tag/version to the image and the second option builds and applies a specific tag/version plus applies a 'latest' tag in the event the user only wanted the latest version:
TAG=$(cat version)
docker build --file Dockerfile --tag lambdastack/lambdastack:${TAG} .
OR
TAG=$(cat version)
docker build --file Dockerfile --tag lambdastack/lambdastack:${TAG} --tag lambdastack/lambdastack:latest .
- To push the image(s) to the default Docker Hub:
- Make sure to create an account at Docker. If you want to have more than one repo then create an Organization and add yourself as a member. If organization, then select or create repo name. For example, we use LambdaStack as the organization and lambdastack as a repo (lambdastack/lambdastack). We actually have a number of repos but you get the point.
- Push the image(s) to Docker Hub as follows: (Note - 'latest' tag is optional and Docker will see it's the same and simply create latest reference link)
TAG=$(cat version)
docker push lambdastack/lambdastack:${TAG}
docker push lambdastack/lambdastack:latest
Option 2a - Pull LambdaStack image from the registry
NOTE: This the default way. The latest version of LambdaStack will already be in the Docker Hub ready to be pulled down locally. If you built the image locally then it will already be in your local image so there is no need to pull it down - you can skip to doing a Docker Run like below.
TAG
is the specific version tag given to the image. If you don't know the specific version then use the second option and it will grab the latest version.
docker pull lambdastack/lambdastack:TAG
OR
docker pull lambdastack/lambdastack:latest
Check here for the available tags.
Option 2b - Running the LambdaStack image
To run the image:
docker run -it -v LOCAL_DIR:/shared --rm lambdastack/lambdastack:TAG
Where:
LOCAL_DIR
should be replaced with the local path to the directory for LambdaStack input (SSH keys, data yaml files) and output (logs, build states),TAG
should be replaced with an existing tag.
Example: docker run -it -v $PWD:/shared --rm lambdastack/lambdastack:latest
The lambdastack docker image will mount to $PWD
means present working directory so, change directory to where you want it to mount. It will expect any customized configs, SSH keys or data yaml files to be in that directory. The example above is for Linux based systems (including macs). See Windows method below.
Check here for the available tags.
Let LambdaStack run (it will take a while depending on the options you selected)!
Notes below are only here if you run into issues with a corporate proxy or something like that or if you want to do development and add cool new features to LambdaStack :).
LambdaStack development
For setting up the LambdaStack development environment please refer to this dedicated document here.
Important notes
Hostname requirements
LambdaStack supports only DNS-1123 subdomain that must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
Note for Windows users
-
Watch out for the line endings conversion. By default, Git for Windows sets
core.autocrlf=true
. Mounting such files with Docker results in^M
end-of-line character in the config files. Use: Checkout as-is, commit Unix-style (core.autocrlf=input
) or Checkout as-is, commit as-is (core.autocrlf=false
). Be sure to use a text editor that can work with Unix line endings (e.g. Notepad++). -
Remember to allow Docker Desktop to mount drives in Settings -> Shared Drives
-
Escape your paths properly:
- Powershell example:
docker run -it -v C:\Users\USERNAME\git\LambdaStack:/LambdaStack --rm lambdastack-dev:
- Git-Bash example:
winpty docker run -it -v C:\\Users\\USERNAME\\git\\LambdaStack:/LambdaStack --rm lambdastack-dev
-
Mounting NTFS disk folders in a linux based image causes permission issues with SSH keys. When running either the development or deploy image:
-
Copy the certs on the image:
mkdir -p ~/.ssh/lambdastack-operations/ cp /lambdastack/core/ssh/id_rsa* ~/.ssh/lambdastack-operations/
-
Set the proper permission on the certs:
chmod 400 ~/.ssh/lambdastack-operations/id_rsa*
Note about proxies
To run LambdaStack behind a proxy, environment variables need to be set.
When running a development container (upper and lowercase are needed because of an issue with the Ansible dependency):
export http_proxy="http://PROXY_SERVER:PORT"
export https_proxy="https://PROXY_SERVER:PORT"
export HTTP_PROXY="http://PROXY_SERVER:PORT"
export HTTPS_PROXY="https://PROXY_SERVER:PORT"
Or when running from a Docker image (upper and lowercase are needed because of an issue with the Ansible dependency):
docker run -it -v POSSIBLE_MOUNTS... -e HTTP_PROXY=http://PROXY_SERVER:PORT -e HTTPS_PROXY=http://PROXY_SERVER:PORT http_proxy=http://PROXY_SERVER:PORT -e https_proxy=http://PROXY_SERVER:PORT --rm IMAGE_NAME
Note about custom CA certificates
In some cases it might be that a company uses custom CA certificates or CA bundles for providing secure connections. To use these with LambdaStack you can do the following:
Devcontainer
Note that for the comments below the filenames of the certificate(s)/bundle do not matter, only the extensions. The certificate(s)/bundle need to be placed here before building the devcontainer.
- If you have one CA certificate you can add it here with the
crt
extension. - If you have multiple certificates in a chain/bundle you need to add them here individually with the
crt
extension and also add the single bundle with thepem
extension containing the same certificates. This is needed because not all tools inside the container accept the single bundle.
LambdaStack release container
If you are running LambdaStack from one of the prebuilt release containers you can do the following to install the certificate(s):
cp ./path/to/*.crt /usr/local/share/ca-certificates/
chmod 644 /usr/local/share/ca-certificates/*.crt
update-ca-certificates
If you plan to deploy on AWS you also need to add a separate configuration for Boto3
which can either be done by a config
file or setting the AWS_CA_BUNDLE
environment variable. More information about for Boto3
can be found here.
16 - Repository
Repository
Introduction
When installing a cluster, LambdaStack sets up its own internal repository for serving:
This ONLY applies to
airgapped
environments (no Internet access environments - high secure areas)
- OS packages
- Files
- Docker images
This document will provide information about the repository lifecyle and how to deal with possible issues that might popup during that.
Repository steps and lifecycle
Below the lifecycle of the LambdaStack repository:
- Download requirements (This can be automatic for online cluster or manual for an airgapped cluster. )
- Set up LambdaStack repository (create
lsrepo
and start HTTP server) - For all cluster machines:
- Back up and disable system package repositories
- Enable the LambdaStack repository
- Install LambdaStack components
- For all cluster machines:
- Disable the LambdaStack repository
- Restore original package repositories from the backup
- Stop LambdaStack repository (optionally removing data)
Troubleshooting
Downloading requirements progression and logging
Note: This will only cover online clusters
Downloading requirements is one of the most sensitive steps in deploying a new cluster because lots of resources are being downloaded from various sources.
When you see the following output from lambdastack, requirements are being downloaded:
INFO cli.engine.ansible.AnsibleCommand - TASK [repository : Run download-requirements script, this can take a long time
INFO cli.engine.ansible.AnsibleCommand - You can check progress on repository host with: journalctl -f -t download-requirements.sh] ***
As noted this process can take a long time depending on the connection and as downloading requirements is being done by a shell script, the Ansible
process cannot return any realtime information.
To view the progress during the downloading (realtime output from the logs), one can SSH into the repository machine and run:
journalctl -f -t download-requirements.sh
If for some reason the download-requirements fails you can also always check the log afterwards on the repository machine here:
/tmp/ls-download-requirements/log
Re-downloading requirements
If for some reason the download requirement step fails and you want to restart, it might be a good idea to delete the following directory first:
/var/www/html/lsrepo
This directory holds all the files being downloaded and removing it makes sure that there are no corrupted or temporary files which might interfere with the restarted download process.
If you want to re-download the requirements but the process finished successfully before, you might need to remove the following file:
/tmp/ls-download-requirements/download-requirements-done.flag
When this file is present and it isn't older than defined amount of time (2 hours by default), it enforces skipping re-downloading requirements.
Restoring system repositories
If during the component installation an issue will arise (e.g. network issue), it might be the case that the cluster machines are left in a state where step 5 of the repository lifecycle is not run. This might leave the machines with a broken repository setup making re-running lambdastack apply
impossible as noted in issue #2324.
To restore the original repository setup on a machine, you can execute the following scripts:
# Re-enable system repositories
/tmp/ls-repository-setup-scripts/enable-system-repos.sh
# Disable lsrepo
/tmp/ls-repository-setup-scripts/disable-lsrepo-client.sh
17 - Retention
An LambdaStack cluster has a number of components which log, collect and retain data. To make sure that these do not exceed the usable storage of the machines they running on, the following configurations are available.
Elasticsearch
TODO
Grafana
TODO
Kafka
There are two types of retention policies that can be configured at the broker or topic levels: based on time or size. LambdaStack defines the same default value for broker size retention policy as Kafka, -1, which means that no size limit is applied.
To define new log retention values following configuration can be used:
kind: configuration/kafka
title: "Kafka"
name: default
specification:
kafka_var:
partitions: 8
log_retention_hours: 168
log_retention_bytes: -1
Configuration parameters
specification.kafka_var.partitions
Sets num.partitions parameter
specification.kafka_var.log_retention_hours
Sets log.retention.hours parameter
specification.kafka_var.log_retention_bytes
Sets log.retention.bytes parameter
NOTE
Since this limit is enforced at the partition level, multiply it by the number of partitions to compute the topic retention in bytes.
Kibana
TODO
Kubernetes
TODO
Prometheus
TODO
Zookeeper
TODO
18 - Security Groups
Security Groups
This document describes the Security Groups layout which is used to deploy LambdaStack in AWS or Azure. You will find the default configuration here, as well as examples of adding own rules or changing existing ones.
Introduction
By default LambdaStack platform is creating security groups required to handle communication by all components (like postgress/rabbitmq etc). As per defaults, LambdaStack creates a subnet per component and each subnet has its own of security group, with rules that allow communication between them. This enables the smooth communication between all components. Please check our security document too. Be aware, that whenever you want to add a new rule, you need to copy all default rules from mentioned above url. That this document is splited into two parts: AWS and Azure. The reason why we do that, is that there are diffrent values in AWS and AZure, when setting the security rules.
Setting own security groups
Sometimes, there is a need to set additional security rules for application which we're deploying in LambdaStack kubernetes cluster. Than, we need to stick into following rules:
- Whenever we want to add new rule - for example open port "X", we should COPY all current roles into our deployment .yaml file, and at the end, add the rule which we want.
- Each component has his own rule-set, so we need to be very carefull where we're putting them.
- After coping, we can also modify existing default security groups.
- After adding new rules, and infra part is done (terraform), we can go into terraform build directory and check if fiiles contain our port definition.
Security groups diagram
Check bellow security diagram, which show how security groups are related to other components. This is example of AWS architecutre, but in Azure should be almost the same.
Azure Security groups
List of all security groups and related services in Azure are described here.
Rules description:
- name: "Name of the rule"
description: "Short rule description"
priority: "Priority (NUM), which describes which rules should be taken into conediration as first "
direction: "Inbound || Outbound" - which direction are you allowing rule"
access: "Allow|Deny - whenever we want to grant access or block"
protocol: "TCP || UDP" - which protocol should be used for connections"
source_port_range: "Source port ranges"
destination_port_range: "Destination port/s range"
source_address_prefix: "Source network address"
destination_address_prefix: "Destination network address"
Lets look into example on which, we are setting new rule name "nrpe-agent-port", with priority 250, which is allowing accesses from local network "10.1.4.0/24" into all hosts in our network on port 5666.
The rule:
- name: nrpe-agent-port
description: Allow access all hosts on port 5666 where nagios agent is running.
priority: 250
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: "5666"
source_address_prefix: "10.1.4.0/24"
destination_address_prefix: "0.0.0.0/0"
Azure Security groups full yaml file
To deploy previously mentioned rule, we need to setup a complete YAML configuraiton file. Bellow example shows how this file should looks like. In this configuration we set simple setup of LambdaStack with 2nodes and 1 master vm in Azure.
kind: lambdastack-cluster
name: default
provider: azure
title: LambdaStack Cluster Config
build_path: # Dynamically built
specification:
name: azure
prefix: azure
admin_user:
name: operations
key_path: id_rsa
path: # Dynamically built
cloud:
region: East US
subscription_name: PUT_SUBSCRIPTION_NAME_HERE
use_public_ips: true
use_service_principal: true
network:
use_network_security_groups: true
components:
kafka:
count: 0
kubernetes_master:
count: 1
machine: kubernetes-master-machine
configuration: default
kubernetes_node:
count: 2
load_balancer:
count: 0
logging:
count: 0
monitoring:
count: 0
postgresql:
count: 0
rabbitmq:
count: 0
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
provider: azure
name: kubernetes-master-machine
specification:
size: Standard_DS3_v2
security:
rules:
- name: ssh
description: Allow SSH
priority: 100
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: "22"
source_address_prefix: "0.0.0.0/0"
destination_address_prefix: "0.0.0.0/0"
- name: out
description: Allow out
priority: 101
direction: Outbound
access: Allow
protocol: "*"
source_port_range: "*"
destination_port_range: "0"
source_address_prefix: "0.0.0.0/0"
destination_address_prefix: "0.0.0.0/0"
- name: node_exporter
description: Allow node_exporter traffic
priority: 200
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: "9100"
source_address_prefix: "10.1.0.0/20"
destination_address_prefix: "0.0.0.0/0"
- name: subnet-traffic
description: Allow subnet traffic
priority: 201
direction: Inbound
access: Allow
protocol: "*"
source_port_range: "*"
destination_from_port: 0
destination_to_port: 65536
destination_port_range: "0"
source_address_prefix: "10.1.1.0/24"
destination_address_prefix: "0.0.0.0/0"
- name: monitoring-traffic
description: Allow monitoring subnet traffic
priority: 203
direction: Inbound
access: Allow
protocol: "*"
source_port_range: "*"
destination_from_port: 0
destination_to_port: 65536
destination_port_range: "0"
source_address_prefix: "10.1.4.0/24"
destination_address_prefix: "0.0.0.0/0"
- name: node-subnet-traffic
description: Allow node subnet traffic
priority: 204
direction: Inbound
access: Allow
protocol: "*"
source_port_range: "*"
destination_from_port: 0
destination_to_port: 65536
destination_port_range: "0"
source_address_prefix: "10.1.2.0/24"
destination_address_prefix: "0.0.0.0/0"
- name: package_repository
description: Allow package repository traffic
priority: 205
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: "80"
source_address_prefix: "10.1.0.0/20"
destination_address_prefix: "0.0.0.0/0"
- name: image_repository
description: Allow image repository traffic
priority: 206
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: "5000"
source_address_prefix: "10.1.0.0/20"
destination_address_prefix: "0.0.0.0/0"
# Add NRPE AGENT RULE
- name: nrpe-agent-port
description: Allow access all hosts on port 5666 where nagios agent is running.
priority: 250
direction: Inbound
access: Allow
protocol: Tcp
source_port_range: "*"
destination_port_range: "5666"
source_address_prefix: "10.1.4.0/24"
estination_address_prefix: "0.0.0.0/0"
AWS Security groups
List of all security groups and related services in AWS are described here.
Rules description:
- name: "Name of the rule"
description: "Short rule description"
direction: "Inbound || Egress" - which direction are you allowing rule"
protocol: "TCP || UDP" - which protocol should be used for connections"
destination_port_range: "Destination port/s range"
source_address_prefix: "Source network address"
destination_address_prefix: "Destination network address"
Lets look into example on which, we are setting new rule name "nrpe-agent-port", which is allowing accesses from local network "10.1.4.0/24" into all hosts in our network on port 5666.
The rule:
- name: nrpe-agent-port
description: Allow access all hosts on port 5666 where nagios agent is running.
direction: Inbound
protocol: Tcp
destination_port_range: "5666"
source_address_prefix: "10.1.4.0/24"
destination_address_prefix: "0.0.0.0/0"
AWS Setting groups full yaml file
Please check bellow example, how to setup basic LambdaStack cluster in AWS with 1 master, 2 nodes, mandatory repository machine, and open accesses to all hosts on port 5666 from monitoring network.
kind: lambdastack-cluster
name: default
provider: aws
build_path: # Dynamically built
specification:
admin_user:
name: ubuntu
key_path: id_rsa
path: # Dynamically built
cloud:
region: eu-central-1
credentials:
key: YOUR_AWS_KEY
secret: YOUR_AWS_SECRET
use_public_ips: true
components:
repository:
count: 1
machine: repository-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.11.0/24
kubernetes_master:
count: 1
machine: kubernetes-master-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-central-1b
address_pool: 10.1.2.0/24
kubernetes_node:
count: 2
machine: kubernetes-node-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-central-1b
address_pool: 10.1.2.0/24
logging:
count: 0
monitoring:
count: 0
kafka:
count: 0
postgresql:
count: 0
load_balancer:
count: 0
rabbitmq:
count: 0
ignite:
count: 0
opendistro_for_elasticsearch:
count: 0
single_machine:
count: 0
name: testing
prefix: 'aws-machine'
title: LambdaStack Cluster Config
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
provider: aws
name: kubernetes-master-machine
specification:
size: t3.medium
authorized_to_efs: true
mount_efs: true
security:
rules:
- name: ssh
description: Allow ssh traffic
direction: Inbound
protocol: Tcp
destination_port_range: "22"
source_address_prefix: "0.0.0.0/0"
destination_address_prefix: "0.0.0.0/0"
- name: node_exporter
description: Allow node_exporter traffic
direction: Inbound
protocol: Tcp
destination_port_range: "9100"
source_address_prefix: "10.1.0.0/20"
destination_address_prefix: "0.0.0.0/0"
- name: subnet-traffic
description: Allow master subnet traffic
direction: Inbound
protocol: ALL
destination_port_range: "0"
source_address_prefix: "10.1.1.0/24"
destination_address_prefix: "0.0.0.0/0"
- name: monitoring-traffic
description: Allow monitoring subnet traffic
direction: Inbound
protocol: ALL
destination_port_range: "0"
source_address_prefix: "10.1.4.0/24"
destination_address_prefix: "0.0.0.0/0"
- name: node-subnet-traffic
description: Allow node subnet traffic
direction: Inbound
protocol: ALL
destination_port_range: "0"
source_address_prefix: "10.1.2.0/24"
destination_address_prefix: "0.0.0.0/0"
- name: out
description: Allow out
direction: Egress
protocol: "all"
destination_port_range: "0"
source_address_prefix: "0.0.0.0/0"
destination_address_prefix: "0.0.0.0/0"
# New Rule
- name: nrpe-agent-port
description: Allow access all hosts on port 5666 where nagios agent is running.
direction: Inbound
protocol: Tcp
destination_port_range: "5666"
source_address_prefix: "10.1.4.0/24"
destination_address_prefix: "0.0.0.0/0"
19 - Security
How to enable/disable LambdaStack service user
To enable/disable LambdaStack service user you can use tool from our repository. You can find this in directory tools/service_user_disable_enable
under name service-user-disable-enable.yml
.
To use this you need to have Ansible installed on machine from which you want to execute this.
To disable user you need to run command:
ansible-playbook -i inventory --extra-vars "operation=disable name=your_service_user_name" service-user-disable-enable.yml
To enable user you need to run command:
ansible-playbook -i inventory --extra-vars "operation=enable name=your_service_user_name" service-user-disable-enable.yml
How to add/remove additional users to/from OS
To add/remove users you need to provide additional section to kind: lambdastack-cluster
configuration.
You need to add specification.users
in the format similar to example that you can find below:
kind: lambdastack-cluster
name: pg-aws-deb
provider: aws
build_path: '' # Dynamically built
specification:
...
users:
- name: user01 # name of the user
sudo: true # does user have sudo priviledge, not defined will set to false
state: present # user will be added if not exist
public_key: "ssh-rsa ..." # public key to add to .ssh/authorized_keys
- name: user02
state: absent # user will deleted if exists
public_key: "ssh-rsa ..."
- name: user03
state: present
public_key: "ssh-rsa ..."
...
How to use TLS/SSL certificate with HA Proxy
TODO
How to use TLS/SSL with Kafka
Right now LambdaStack supports only self-signed certificates generated and signed by CA self-sign certificate. If you want to provide your own certificates you need to configure Kafka manually according to Kafka documentation.
To use LambdaStack automation and self-signed certificates you need to provide your own configuration for kafka role and enable TLS/SSL as this is disabled by default.
To enable TLS/SSL communication in Kafka you can provide your own
configuration of Kafka by adding it to your LambdaStack configuration file
and set the enabled
flag to true
in the security/ssl
section.
If in the ssl
section you will also set the parameter client_auth
parameter as required
,
you have to also provide configuration of authorization and authentication
as this setting enforces checking identity. This option is by default set as
required
. Values requested
and none
don't require such setup.
When TLS/SSL is turned on then all communication to Kafka is encrypted and no other option is enabled. If you need different configuration, you need to configure Kafka manually.
When CA certificate and key is created on server it is also downloaded to host from
which LambdaStack was executed. By default LambdaStack downloads this package to build output
folder to ansible/kafka_certs
directory. You can also change this path in LambdaStack configuration.
Sample configuration for Kafka with enabled TLS/SSL:
kind: configuration/kafka
title: "Kafka"
name: default
specification:
...
security:
ssl:
enabled: True
port: 9093 # port on which Kafka will listen for encrypted communication
server:
local_cert_download_path: kafka-certs # path where generated key and certificate will be downloaded
keystore_location: /var/private/ssl/kafka.server.keystore.jks # location of keystore on servers
truststore_location: /var/private/ssl/kafka.server.truststore.jks # location of truststore on servers
cert_validity: 365 # period of time when certificates are valid
passwords: # in this section you can define passwords to keystore, truststore and key
keystore: PasswordToChange
truststore: PasswordToChange
key: PasswordToChange
endpoint_identification_algorithm: HTTPS # this parameter enforces validating of hostname in certificate
client_auth: required # authentication mode for Kafka - options are: none (no authentication), requested (optional authentication), required (enforce authentication, you need to setup also authentication and authorization then)
inter_broker_protocol: SSL # must be set to SSL if TLS/SSL is enabled
...
How to use TLS/SSL certificates for Kafka authentication
To configure Kafka authentication with TLS/SSL, first you need to configure ssl
section.
Then you need to add authentication
section with enabled
flag set to true
and set authentication_method
as certificates
. Setting authentication_method
as sasl
is not described right now in this document.
kind: configuration/kafka
title: "Kafka"
name: default
build_path: '' # Dynamically built
specification:
...
security:
...
authentication:
enabled: True
authentication_method: certificates
...
How to use TLS/SSL certificates for Kafka authorization
To configure Kafka authorization with TLS/SSL, first you need to configure ssl
and authentication
sections.
If authentication is disabled, then authorization will be disabled as well.
To enable authorization, you need to provide authorization
section and set enabled
flag to True
.
For authorization you can also provide different than default authorizer_class_name
.
By default kafka.security.auth.SimpleAclAuthorizer
is used.
If allow_everyone_if_no_acl_found
parameter is set to False
, Kafka will prevent accessing resources everyone
except super users and users having permissions granted to access topic.
You can also provide super users that will be added to Kafka configuration. To do this you need to provide list of users,
like in the example below, and generate certificate on your own only with CN that matches position that can be found in list
(do not set OU, DC or any other of parameters). Then the certificate needs to be signed by CA root certificate for Kafka.
CA root certificate will be downloaded automatically by LambdaStack to location set in ssl.server.local_cert_download_path
or can be found on first Kafka host in ssl.server.keystore_location
directory. To access the certificate key, you need root priviledges.
kind: configuration/kafka
title: "Kafka"
name: default
build_path: '' # Dynamically built
specification:
...
security:
...
authorization:
enabled: True
authorizer_class_name: kafka.security.auth.SimpleAclAuthorizer
allow_everyone_if_no_acl_found: False
super_users:
- tester01
- tester02
...
How to enable Azure disk encryption
Automatic encryption of storage on Azure is not yet supported by LambdaStack. Guides to encrypt manually can be found:
How to use TLS/SSL certificate with RabbitMQ
To configure RabbitMQ TLS support in LambdaStack you need to set custom_configurations
in the configuration file and
manually create certificate with common CA according to documentation on your RabbitMQ machines:
https://www.rabbitmq.com/ssl.html#manual-certificate-generation
or:
https://www.rabbitmq.com/ssl.html#automated-certificate-generation
If stop_service
parameter in configuration/rabbitmq
is set to true
,
then RabbitMQ will be installed and stopped to allow manual actions
that are required to copy or generate TLS certificates.
NOTE
To complete installation it's required to execute lambdastack apply
the second time
with stop_service
set to false
There is custom_configurations
setting in LambdaStack that extends RabbitMQ configuration
with the custom one. Also, it can be used to perform TLS configuration of RabbitMQ.
To customize RabbitMQ configuration you need to pass a list of parameters in format:
-name: rabbitmq.configuration.parameter value: rabbitmq.configuration.value
These settings are mapping to RabbitMQ TLS parameters configuration from documentation that you can find below the link: https://www.rabbitmq.com/ssl.html
Below you can find example of TLS/SSL configuration.
kind: configuration/rabbitmq
title: "RabbitMQ"
name: default
build_path: '' # Dynamically built
specification:
...
custom_configurations:
- name: listeners.tcp # option that disables non-TLS/SSL support
value: none
- name: listeners.ssl.default # port on which TLS/SSL RabbitMQ will be listening for connections
value: 5671
- name: ssl_options.cacertfile # file with certificate of CA which should sign all certificates
value: /var/private/ssl/ca/ca_certificate.pem
- name: ssl_options.certfile # file with certificate of the server that should be signed by CA
value: /var/private/ssl/server/server_certificate.pem
- name: ssl_options.keyfile # file with key to the certificate of the server
value: /var/private/ssl/server/private_key.pem
- name: ssl_options.password # password to key protecting server certificate
value: PasswordToChange
- name: ssl_options.verify # setting of peer verification
value: verify_peer
- name: ssl_options.fail_if_no_peer_cert # parameter that configure behaviour if peer cannot present a certificate
value: "false"
...
Please be careful about boolean values as they need to be double quoted and written in lowercase form. Otherwise RabbitMQ startup will fail.
How to enable AWS disk encryption
EC2 Root volumes
Encryption at rest for EC2 root volumes is turned on by default. To change this one can modify the encrypted
flag for the root
disk inside a infrastructure/virtual-machine
document:
...
disks:
root:
volume_type: gp2
volume_size: 30
delete_on_termination: true
encrypted: true
...
Additional EC2 volumes
Encryption at rest for additional EC2 volumes is turned on by default. To change this one can modify the encrypted
flag for each additional_disks
inside a infrastructure/virtual-machine
document:
...
disks:
root:
...
additional_disks:
- device_name: "/dev/sdb"
volume_type: gp2
volume_size: 60
delete_on_termination: true
encrypted: true
...
EFS storage
Encryption at rest for EFS storage is turned on by default. To change this one can modify the encrypted
flag inside the infrastructure/efs-storage
document:
kind: infrastructure/efs-storage
title: "Elastic File System Config"
provider: aws
name: default
build_path: '' # Dynamically built
specification:
encrypted: true
...
Additional information can be found here.
How to use Kubernetes Secrets
Prerequisites: LambdaStack Kubernetes cluster
-
SSH into the Kubernetes master.
-
Run
echo -n 'admin' > ./username.txt
,echo -n 'VeryStrongPassword!!1' > ./password.txt
andkubectl create secret generic mysecret --from-file=./username.txt --from-file=./password.txt
-
Copy over
secrets-sample.yaml
file from the example folder and run it withkubectl apply -f secrets-sample.yaml
-
Run
kubectl get pods
, copy the name of one of the ubuntu pods and runkubectl exec -it POD_NAME -- /bin/bash
with it. -
In the pods bash run
printenv | grep SECRET
- Kubernetes secret created in point 2 was attached to pods during creation (take a look atsecrets-sample.yaml
) and are availiable inside of them as an environmental variables.
How to authenticate to Azure AD app
-
Register you application. Go to Azure portal to
Azure Active Directory => App registrations
tab. -
Click button
New application registration
fill the data and confirm. -
Deploy app from
examples/dotnet/LambdaStack.SampleApps/LambdaStack.SampleApps.AuthService
.This is a test service for verification Azure AD authentication of registered app. (How to deploy app)
-
Create secret key for your app
settings => keys
. Remember to copy value of key after creation. -
Try to authenticate (e.g. using postman) calling service api
<service-url>/api/auth/
with following Body application/json type parameters :{ "TenantId": "<tenant-id>", "ClientId": "<client-id>", "Resource": "https://graph.windows.net/", "ClientSecret": "<client-secret>" }
-
TenantId - Directory ID, which you find in
Azure active Directory => Properties
tab. -
ClientId - Application ID, which you find in details of previously registered app
Azure Active Directory => App registrations => your app
-
Resource - https://graph.windows.net is the service root of Azure AD Graph API. The Azure Active Directory (AD) Graph API provides programmatic access to Azure AD through OData REST API endpoints. You can construct your own Graph API URL. (How to construct a Graph API URL)
-
ClientSecret - Created secret key from 4. point.
-
-
The service should return Access Token.
How to run lambdastack with password
LambdaStack encrypts Kubernetes artifacts (access tokens) stored in LambdaStack build directory. In order to achieve it, user is asked for password which will be used for encryption and decryption of artifacts. Remember to enter the same password for the same cluster - if password will not be the same, lambdastack will not be able to decrypt secrets.
Standard way of executing lambdastack has not been changed:
lambdastack apply -f demo.yaml
But you will be asked to enter a password:
Provide password to encrypt vault:
When running lambdastack from CI pipeline you can use new parameter for lambdastack:
lambdastack apply -f build/demo/demo.yaml --vault-password MYPWD
How to make kubectl work for non-root user on master node
For security reasons, the access to the admin credentials is limited to the root user. To make a non-root user the cluster administrator, run these commands (as the non-root user):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
See more options in Troubleshooting
How to turn on Hashicorp Vault functionality
In LambdaStack beside storing secrets in Kubernetes secrets there is also a possibility of using secrets stored in Vault from Hashicorp. This can provide much more sophisticated solution for using secrets and also higher level of security than standard Kubernetes secrets implementation. Also LambdaStack provides transparent method to access Hashicorp Vault secrets with applications running on Kubernetes. You can read in the more about it in How to turn on Hashicorp Vault integration with k8s section. In the future we want also to provide additional features that right now can be configured manually according to Hashicorp Vault documentation.
At the moment only installation on Kubernetes Control Plane is supported, but we are also planning separate installation with no other components. Also at this moment we are not providing clustered option for Vault deployment, but this will be part of the future releases. For multi-master (HA) Kubernetes, Vault is installed only on the first master defined in Ansible inventory.
Below you can find sample configuration for Vault with description of all options.
kind: configuration/vault
title: Vault Config
name: default
specification:
vault_enabled: true # enable Vault install
vault_system_user: vault # user name under which Vault service will be running
vault_system_group: vault # group name under which Vault service will be running
enable_vault_audit_logs: false # turn on audit logs that can be found at /opt/vault/logs/vault_audit.log
enable_vault_ui: false # enable Vault UI, shouldn't be used at production
vault_script_autounseal: true # enable automatic unseal vault at the start of the service, shouldn't be used at production
vault_script_autoconfiguration: true # enable automatic configuration of Hashicorp Vault. It sets the UNSEAL_VAULT variable in script.config
...
app_secret_path: devwebapp # application specific path where application secrets will be mounted
revoke_root_token: false # not implemented yet (more about in section Root token revocation)
secret_mount_path: secret # start of the path that where secrets will be mounted
vault_token_cleanup: true # should configuration script clean token
vault_install_dir: /opt/vault # directory where vault will be installed
vault_log_level: info # logging level that will be set for Vault service
override_existing_vault_users: false # should user from vault_users ovverride existing user and generate new password
vault_users: # users that will be created with vault
- name: admin # name of the user that will be created in Vault
policy: admin # name of the policy that will be assigned to user (descrption bellow)
- name: provisioner
policy: provisioner
vault_helm_chart_values: # helm chart values overwriting the default package (to be able to use internal registry for offline purposes)
injector:
externalVaultAddr: https://your-external-address:8200 # external vault address (only if you want to setup address to provide full name to use with signed certificate) [IMPORTANT: switch https->http if tls_disable parameter is set to true]
image:
repository: "{{ image_registry_address }}/hashicorp/vault-k8s" # docker image used by vault injector in kubernetes
agentImage:
repository: "{{ image_registry_address }}/vault" # docker image used by vault injector in kubernetes
server:
image:
repository: "{{ image_registry_address }}/vault" # docker image used by vault injector in kubernetes
# TLS part
tls_disable: false # enable TLS support, should be used always in production
certificate_name: fullchain.pem # certificate file name
private_key_name: privkey.pem # private key file name for certificate
vault_tls_valid_days: 365 # certificate valid time in days
selfsigned_certificate: # selfsigned certificate information
country: US # selfexplanatory
state: state # selfexplanatory
city: city # selfexplanatory
company: company # selfexplanatory
common_name: "*" # selfexplanatory
More information about configuration of Vault in LambdaStack and some guidance how to start working with Vault with LambdaStack you can find below.
To get more familiarity with Vault usage you can reffer to official getting started guide.
Creation of user using LambdaStack in Vault
To create user by LambdaStack please provide list of users with name of policy that should be assigned to user. You can use predefined policy delivered by LambdaStack, default Vault policies or your own policy. Remember that if you have written your own policy it must exist before user creation.
Password for user will be generated automatically and can be found in directory /opt/vault in files matching
tokens-*.csv pattern. If user password will be generated or changed you will see corresponding line in csv file with
username, policy and password. If password won't be updated you will see ALREADY_EXISTS
in password place.
Predefined Vault policies
Vault policies are used to define Role-Based Access Control that can be assigned to clients, applications and other components that are using Vault. You can find more information about policies here.
LambdaStack besides two already included in vault policies (root and default) provides two additional predefined policies:
- admin - policy granting administration privileges, have sudo permission on Vault system endpoints
- provisioner - policy granting permissions to create user secrets, adding secrets, enable authentication methods, but without access to Vault system endpoints
Manual unsealing of the Vault
By design Hashicorp Vault starts in sealed mode. It means that Vault data is encrypted and operator needs to provide unsealing key to be able to access data.
Vault can be unsealed manually using command:
vault operator unseal
and passing three unseal keys from /opt/vault/init.txt file. Number of keys will be defined from the level of LambdaStack configuration in the future releases. Right now we are using default Hashicorp Vault settings.
For development purposes you can also use vault_script_autounseal
option in LambdaStack configuration.
More information about unseal you can find in documentation for CLI and about concepts here.
Configuration with manual unsealing
If you are using option with manual unseal or want to perform manual configuration you can run script later on manually from the command line:
/opt/vault/bin/configure-vault.sh
-c /opt/vault/script.config
-a ip_address_of_vault
-p http | https
-v helm_chart_values_be_override
Values for script configuration in script.config are automatically generated by LambdaStack and can be later on used to perform configuration.
Log into Vault with token
To log into Vault with token you just need to pass token. You can do this using command:
vault login
Only root token has no expiration date, so be aware that all other tokens can expire. To avoid such situations you need to renew the token. You can assign policy to token to define access.
More information about login with tokens you can find here and about tokens here.
Log into Vault with user and password
Other option to log into Vault is to use user/password pair. This method doesn't have disadvantage of login each time with different token after expire. To login with user/password pair you need to have userpass method and login with command:
vault login -method=userpass username=your-username
More information about login with tokens you can find here and about userpass authentication here.
Token Helpers
Vault provide option to use token helper. By default Vault is creating a file .vault-token in home directory of user
running command vault login, which let to user perform automatically commands without providing a token. This token
will be removed by default after LambdaStack configuration, but this can be changed using vault_token_cleanup flag
.
More information about token helper you can find here.
Creating your own policy
In order to create your own policy using CLI please refer to CLI documentation and documentation.
Creating your own user
In order to create your own user with user and password login please refer to documentation. If you have configured any user using LambdaStack authentication userpass will be enabled, if not needs to be enabled manually.
Root token revocation
In production is a good practice to revoke root token. This option is not implemented yet, by LambdaStack, but will be implemented in the future releases.
Be aware that after revoking root token you won't be able to use configuration script without generating new token
and replace old token with the new one in /opt/vault/init.txt (field Initial Root Token
). For new root token generation
please refer to documentation accessible here.
TLS support
By default tls_disable is set to false which means that certificates are used by vault. There are 2 ways of certificate configuration:
- selfsigned
Vault selfsigned certificates are generated automatically during vault setup if no custom certificates are present in dedicated location.
- certificate provided by user
In dedicated location user can add certificate (and private key). File names are important and have to be the same as provided in configuration and .pem
file extensions are required.
Dedicated location of custom certificates:
core/src/lambdastack/data/common/ansible/playbooks/roles/vault/files/tls-certs
Certificate files names configuration:
kind: configuration/vault
title: Vault Config
name: default
specification:
...
certificate_name: fullchain.pem # certificate file name
private_key_name: privkey.pem # private key file name for certificate
...
Production hardening for Vault
In LambdaStack we have performed a lot of things to improve Vault security, e.g.:
- End-to-End TLS
- Disable Swap (when running on Kubernetes machine)
- Don't Run as Root
- Turn Off Core
- Enable Auditing
- Restrict Storage Access
- Tweak ulimits
However if you want to provide more security please refer to this guide.
Troubleshooting
To perform troubleshooting of vault and find the root cause of the problem please enable audit logs and set vault_log_level to debug. Please be aware that audit logs can contain sensitive data.
How to turn on Hashicorp Vault integration with k8s
In LambdaStack there is also an option to configure automatically integration with Kubernetes. This is achieved with applying additional settings to Vault configuration. Sample config with description you can find below.
kind: configuration/vault
title: Vault Config
name: default
specification:
vault_enabled: true
...
vault_script_autounseal: true
vault_script_autoconfiguration: true
...
kubernetes_integration: true # enable setup kubernetes integration on vault side
kubernetes_configuration: true # enable setup kubernetes integration on vault side
enable_vault_kubernetes_authentication: true # enable kubernetes authentication on vault side
kubernetes_namespace: default # namespace where your application will be deployed
...
Vault and Kubernetes integration in LambdaStack relies on vault-k8s tool. Thit tool enables sidecar injection of secret into pod with usage of Kubernetes Mutating Admission Webhook. This is transparent for your application and you do not need to perform any binding to Hashicorp libaries to use secret stored in Vault.
You can also configure Vault manually on your own enabling by LambdaStack only options that are necessary for you.
More about Kubernetes sidecar integration you can find at the link.
Vault Kubernetes authentication
To work with sidecar integration with Vault you need to enable Kubernetes authentication. Without that sidecar won't be able to access secret stored in Vault.
If you don't want to use sidecar integration, but you want to access automatically Vault secrets you can use Kubernetes authentication. To find more information about capabilities of Kubernetes authentication please refer to documentation.
Create your secret in Vault
In LambdaStack you can use integration of key value secrets to inject them into container. To do this you need to create them using vault CLI.
You can do this running command similar to sample below:
vault kv put secret/yourpath/to/secret username='some_user' password='some_password'
LambdaStack as backend for Vault secrets is using kv secrets engine. More information about kv secrets engine you can find here.
Kubernetes namespace
In LambdaStack we are creating additional Kubernetes objects to inject secrets automatically using sidecar. Those objects to have access to your application pods needs to be deployed in the same namespace.
Annotations
Below you can find sample of deployment configuration excerpt with annotations. For this moment vault.hashicorp.com/role
cannot be changed, but this will change in future release.
template:
metadata:
labels:
app: yourapp
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "devweb-app"
vault.hashicorp.com/agent-inject-secret-credentials.txt: "secret/data/yourpath/to/secret"
vault.hashicorp.com/tls-skip-verify: "true"
vault.hashicorp.com/tls-skip-verify
If true, configures the Vault Agent to skip verification of Vault's TLS certificate.
It's mandatory for selfsigned certificates and not recommended to set this value to true in a production environment.
More information about annotations you can find here.
20 - Upgrade
Upgrade
Introduction
From lscli 0.4.2 and up the CLI has the ability to perform upgrades on certain components on a cluster. The components it currently can upgrade and will add are:
NOTE
There is an assertion to check whether K8s version is supported before running upgrade.
- Kubernetes (master and nodes). Supported versions: v1.18.6 (LambdaStack 0.7.1+), v1.20.12 (LambdaStack 1.3.0+)
- common: Upgrades all common configurations to match them to current LambdaStack version
- repository: Adds the repository role needed for component installation in current LambdaStack version
- image_registry: Adds the image_registry role needed for offline installation in current LambdaStack version
The component upgrade takes the existing Ansible build output and based on that performs the upgrade of the currently
supported components. If you need to re-apply your entire LambdaStack cluster a manual adjustment of the input yaml is
needed to the latest specification which then should be applied with lambdastack apply...
. Please
see Run apply after upgrade chapter for more details.
Note about upgrade from pre-0.8 LambdaStack:
-
If you need to upgrade a cluster deployed with
lambdastack
in version earlier than 0.8, you should make sure that you've got enough disk space on master (which is used as repository) host. If you didn't extend OS disk on master during deployment process, you probably have only 32 GB disk which is not enough to properly upgrade cluster (we recommend at least 64 GB). Before you run upgrade, please extend OS disk on master machine according to cloud provider documentation: AWS , Azure. -
If you use logging-machine(s) already in your cluster, it's necessary to scale up those machines before running upgrade to ensure you've got enough resources to run ELK stack in newer version. We recommend to use at least DS2_v2 Azure size (2 CPUs, 7 GB RAM) machine, or its equivalent on AWS and on-prem installations. It's very related to amount of data you'll store inside. Please see logging documentation for more details.
Online upgrade
Online prerequisites
Your airgapped existing cluster should meet the following requirements:
- The cluster machines/vm`s are connected by a network or virtual network of some sorts and can communicate which each other and have access to the internet:
- The cluster machines/vm`s are upgraded to the following versions:
- RedHat 7.6
- CentOS 7.6
- Ubuntu 18.04
- The cluster machines/vm`s should be accessible through SSH with a set of SSH keys you provided and configured on each machine yourself.
- A provisioning machine that:
- Has access to the SSH keys
- Has access to the build output from when the cluster was first created.
- Is on the same network as your cluster machines
- Has LambdaStack 0.4.2 or up running. Note. To run LambdaStack check the Prerequisites
Start the online upgrade
Start the upgrade with:
lambdastack upgrade -b /buildoutput/
This will backup and upgrade the Ansible inventory in the provided build folder /buildoutput/
which will be used to
perform the upgrade of the components.
Offline upgrade
Offline prerequisites
Your airgapped existing cluster should meet the following requirements:
- The airgapped cluster machines/vm`s are connected by a network or virtual network of some sorts and can communicate with each other:
- The airgapped cluster machines/vm`s are upgraded to the following versions:
- RedHat 7.6
- CentOS 7.6
- Ubuntu 18.04
- The airgapped cluster machines/vm`s should be accessible through SSH with a set of SSH keys you provided and configured on each machine yourself.
- A requirements machine that:
- Runs the same distribution as the airgapped cluster machines/vm`s (RedHat 7, CentOS 7, Ubuntu 18.04)
- Has access to the internet.
- A provisioning machine that:
- Has access to the SSH keys
- Has access to the build output from when the cluster was first created.
- Is on the same network as your cluster machines
- Has LambdaStack 0.4.2 or up running.
NOTE
Before running lambdastack
, check the Prerequisites
Start the offline upgrade
To upgrade the cluster components run the following steps:
-
First we need to get the tooling to prepare the requirements for the upgrade. On the provisioning machine run:
lambdastack prepare --os OS
Where OS should be
centos-7
,redhat-7
,ubuntu-18.04
. This will create a directory calledprepare_scripts
with the needed files inside. -
The scripts in the
prepare_scripts
will be used to download all requirements. To do that, copy theprepare_scripts
folder over to the requirements machine and run the following command:download-requirements.sh /requirementsoutput/
This will start downloading all requirements and put them in the
/requirementsoutput/
folder. Once run successfully the/requirementsoutput/
needs to be copied to the provisioning machine to be used later on. -
Finally, start the upgrade with:
lambdastack upgrade -b /buildoutput/ --offline-requirements /requirementsoutput/
This will backup and upgrade the Ansible inventory in the provided build folder
/buildoutput/
which will be used to perform the upgrade of the components. The--offline-requirements
flag tells LambdaStack where to find the folder with requirements (/requirementsoutput/
) prepared in steps 1 and 2 which is needed for the offline upgrade.
Additional parameters
The lambdastack upgrade
command has additional flags:
-
--wait-for-pods
. When this flag is added, the Kubernetes upgrade will wait until all pods are in the ready state before proceeding. This can be useful when a zero downtime upgrade is required. Note: that this can also cause the upgrade to hang indefinitely. -
--upgrade-components
. Specify comma separated component names, so the upgrade procedure will only process specific ones. List cannot be empty, otherwise execution will fail. By default, upgrade will process all components if this parameter is not providedExample:
lambdastack upgrade -b /buildoutput/ --upgrade-components "kafka,filebeat"
Run apply after upgrade
Currently, LambdaStack does not fully support apply after upgrade. There is a possibility to re-apply configuration from
newer version of LambdaStack but this needs some manual work from Administrator. Re-apply on already upgraded cluster needs
to be called with --no-infra
option to skip Terraform part of configuration. If apply
after upgrade
is run
with --no-infra
, the used system images from the older LambdaStack version are preserved to prevent the destruction of
the VMs. If you plan modify any infrastructure unit (e.g., add Kubernetes Node) you need to create machine by yourself
and attach it into configuration yaml. While running lambdastack apply...
on already upgraded cluster you should use yaml
config files generated in newer version of LambdaStack and apply changes you had in older one. If the cluster is upgraded
to version 0.8 or newer you need also add additional feature mapping for repository role as shown on example below:
---
kind: lambdastack-cluster
name: clustername
provider: azure
build_path: # Dynamically built
specification:
admin_user:
key_path: id_rsa
name: operations
path: # Dynamically built
components:
repository:
count: 0 # Set repository to 0 since it's introduced in v0.8
kafka:
count: 1
kubernetes_master:
count: 1
kubernetes_node:
count: 2
load_balancer:
count: 1
logging:
count: 1
monitoring:
count: 1
postgresql:
count: 1
rabbitmq:
count: 0
ignite:
count: 0
opendistro_for_elasticsearch:
count: 0
name: clustername
prefix: 'prefix'
title: LambdaStack Cluster Config
---
kind: configuration/feature-mapping
title: Feature mapping to roles
provider: azure
name: default
specification:
roles_mapping:
kubernetes_master:
- kubernetes-master
- helm
- applications
- node-exporter
- filebeat
- firewall
- vault
- repository # add repository here
- image-registry # add image-registry here
...
Kubernetes applications
To upgrade applications on Kubernetes to the desired version after lambdastack upgrade
you have to:
- generate new configuration manifest using
lambdastack init
- in case of generating minimal configuration manifest (without --full argument), copy and paste the default configuration into it
- run
lambdastack apply
NOTE
The above link points to develop branch. Please choose the right branch that suits to LambdaStack version you are using.
How to upgrade Kafka
Kafka upgrade
Kafka will be automatically updated to the latest version supported by LambdaStack. You can check the latest supported version here. Kafka brokers are updated one by one - but the update procedure does not guarantee "zero downtime" because it depends on the number of available brokers, topic, and partitioning configuration.
ZooKeeper upgrade
Redundant ZooKeeper configuration is also recommended, since service restart is required during upgrade - it can cause ZooKeeper unavailability. Having at least two ZooKeeper services in ZooKeepers ensemble you can upgrade one and then start with the rest one by one.
More detailed information about ZooKeeper you can find in ZooKeeper documentation.
Open Distro for Elasticsearch upgrade
NOTE
Before upgrade procedure make sure you have a data backup!
In LambdaStack v1.0.0 we provided upgrade elasticsearch-oss package to v7.10.2 and opendistro-* plugins package to
v1.13.*. Upgrade will be performed automatically when the upgrade procedure detects your logging
, opendistro_for_elasticsearch
or kibana
hosts.
Upgrade of Elasticsearch uses API calls (GET, PUT, POST) which requires an admin TLS certificate. By default, LambdaStack
generates self-signed certificates for this purpose but if you use your own, you have to provide the admin certificate's
location. To do that, edit the following settings changing cert_path
and key_path
.
logging:
upgrade_config:
custom_admin_certificate:
cert_path: /etc/elasticsearch/custom-admin.pem
key_path: /etc/elasticsearch/custom-admin-key.pem
opendistro_for_elasticsearch:
upgrade_config:
custom_admin_certificate:
cert_path: /etc/elasticsearch/custom-admin.pem
key_path: /etc/elasticsearch/custom-admin-key.pem
They are accessible via the defaults of upgrade
role (/usr/local/lambdastack/data/common/ansible/playbooks/roles/upgrade/defaults/main.yml
).
Node exporter upgrade
NOTE
Before upgrade procedure, make sure you have a data backup, and you are familiar with breaking changes.
Starting from LambdaStack v0.8.0 it's possible to upgrade node exporter to v1.0.1. Upgrade will be performed automatically when the upgrade procedure detects node exporter hosts.
RabbitMQ upgrade
NOTE
Before upgrade procedure, make sure you have a data backup. Check that the node or cluster is in a good state: no alarms are in effect, no ongoing queue synchronisation operations and the system is otherwise under a reasonable load. For more information visit RabbitMQ site.
With the latest LambdaStack version it's possible to upgrade RabbitMQ to v3.8.9. It requires Erlang system packages upgrade that is done automatically to v23.1.4. Upgrade is performed in offline mode after stopping all RabbitMQ nodes. Rolling upgrade is not supported by LambdaStack, and it is advised not to use this approach when Erlang needs to be upgraded.
Kubernetes upgrade
Prerequisites
Before K8s version upgrade make sure that deprecated API versions are not used:
Upgrade
NOTE
If the K8s cluster that is going to be upgraded has the Istio control plane application deployed, issues can occur. The
default profiles we currently support for
installing Istio only deploy a single replica for the control services with a PodDisruptionBudgets
value of 0. This
will result in the following error while draining pods during an upgrade:
Cannot evict pod as it would violate the pods disruption budget.
As we currently don't support any kind of advanced configuration of the Istio control plane components outside the default profiles, we need to scale up all components manually before the upgrade. This can be done with the following command:
kubectl scale deploy -n istio-system --replicas=2 --all
After the upgrade, the deployments can be scaled down to the original capacity:
kubectl scale deploy -n istio-system --replicas=1 --all
Note: The istio-system
namespace value is the default value and should be set to whatever is being used in the
Istio application configuration.
PostgreSQL upgrade
NOTE
Before upgrade procedure, make sure you have a data backup.
Versions
LambdaStack upgrades PostgreSQL 10 to 13 with the following extensions (for versions, see COMPONENTS.md):
- PgAudit
- PgBouncer
- PgPool
- repmgr
Prerequisites
The prerequisites below are checked by the preflight script before upgrading PostgreSQL. Never the less it's good to check these manually before doing any upgrade:
-
Diskspace: When LambdaStack upgrades PostgreSQL 10 to 13 it will make a copy of the data directory on each node to ensure easy recovery in the case of a failed data migration. It is up to the user to make sure there is enough space available. The used rule is:
total storage used on the data volume + total size of the data directory < 95% of total size of the data volume
We use 95% of used storage after data directory copy as some space is needed during the upgrade.
-
Cluster health: Before starting the upgrade the state of the PostgreSQL cluster needs to be healthy. This means that executing:
repmgr cluster show
Should not fail and return 0 as exit code.
Upgrade
Upgrade procedure is based on PostgreSQL documentation and requires downtime as there is a need to stop old service(s) and start new one(s).
There is a possibility to provide a custom configuration for upgrade with lambdastack upgrade -f
, and there are a few
limitations related to specifying parameters for upgrade:
-
If there were non-default values provided for installation (
lambdastack apply
), they have to be used again not to be overwritten by defaults. -
wal_keep_segments
parameter for replication is replaced by wal_keep_size with the default value of 500 MB. Previous parameter is not supported. -
archive_command
parameter for replication is set to/bin/true
by default. It was planned to disable archiving, but changes toarchive_mode
require a full PostgreSQL server restart, whilearchive_command
changes can be applied via a normal configuration reload. See documentation. -
There is no possibility to disable an extension after installation, so
specification.extensions.*.enabled: false
value will be ignored during upgrade if it was set totrue
during installation.
Manual actions
LambdaStack runs pg_upgrade
(on primary node only) from a dedicated location (pg_upgrade_working_dir
).
For Ubuntu, this is /var/lib/postgresql/upgrade/$PG_VERSION
and for RHEL/CentOS /var/lib/pgsql/upgrade/$PG_VERSION
.
LambdaStack saves there output from pg_upgrade
as logs which should be checked after the upgrade.
Post-upgrade processing
As the "Post-upgrade processing" step in PostgreSQL documentation states
if any post-upgrade processing is required, pg_upgrade
will issue warnings as it completes.
It will also generate SQL script files that must be run by the administrator. There is no clear description in which cases
they are created, so please check logs in pg_upgrade_working_dir
after the upgrade to see if additional steps are required.
Statistics
Because optimizer statistics are not transferred by pg_upgrade
, you may need to run a command to regenerate that information
after the upgrade. For this purpose, consider running analyze_new_cluster.sh
script (created in pg_upgrade_working_dir
)
as postgres
user.
Delete old cluster
For safety LambdaStack does not remove old PostgreSQL data. This is a user responsibility to identify if data is ready to
be removed and take care about that. Once you are satisfied with the upgrade, you can delete the old cluster's data directories
by running delete_old_cluster.sh
script (created in pg_upgrade_working_dir
on primary node) on all nodes.
The script is not created if you have user-defined tablespaces inside the old data directory.
You can also delete the old installation directories (e.g., bin
, share
). You may delete pg_upgrade_working_dir
on primary node once the upgrade is completely over.