Deployment Checklist
Deployment checklist
The checklist below provides a list of prerequisites to ensure a successful installation process. Before you contact Rockfish Data to complete the installation or troubleshoot installation issues, make sure to complete the checklist. General checklist for Kubernetes clusters
Compute (CPU and memory) requirements
Make sure that your Kubernetes cluster meets the compute and memory requirements.
Storage requirements
A supported block storage class for shared storage, PostgreSQL and RabbitMQ volumes, and to store the license file.
Networking requirements
Your Kubernetes cluster can access the container image repository.
By default, Rockfish Data serves the container images from the Rockfish Data image repository at nexus.test.mostlylab.com. To deploy the images, your Kubernetes cluster needs to have Internet access.
If your internal IT policies require that you pull the images from an internal repository, ensure that your Kubernetes cluster has access to it. For more information, Configure an internal image repository.
Your Kubernetes cluster has network access to the defined storage classes and to the data sources (databases and cloud object storage providers) from which you want to pull original data.
Collaborate with your IT department and Customer Experience Engineer to configure an domain SSL certificate for your Kubernetes cluster and for the Rockfish Data Helm chart.
Access and permissions requirements
Your Kubernetes cluster user has permissions to read and write into the storage class.
On the AI worker nodes where Rockfish Data jobs run, you should have no taints
defined that might not allow pods to be created with the minimum and maximum resource requirements specified in the values.yaml for the engine. Otherwise, Rockfish Data jobs will fail to run.
If you already have taints on your worker nodes, you need to add tolerations on the Rockfish Data pods in the values.yaml file, under agent.tolerations and engine.tolerations.
values.yaml
...
tolerations:
# Replace with the actual key label of the taint
# For example: `Tainted-worker:NoSchedule`
- key: "Tainted-worker"
operator: "Exists"
effect: "NoSchedule"
...
On the AI worker nodes where Rockfish Data jobs run, make sure that no other pods belonging to other applications can run so as not to interfere with Rockfish Data workloads. You can apply taints
on nodes dedicated to Rockfish Data workloads so as to prevent other workloads from running. Verify that any resource quotas
created for your namespace allow Rockfish Data to successfully run worker nodes based on their requirements.
If you have specific username requirements to access databases or other resources, update the Helm chart values.yaml file. In specific cases, due to Oracle security policies you might need to allowlist container users in Oracle.
Other requirements
Disable any tools or service mesh services in the Rockfish Data namespace that intercept communications between pods and require manual approval to proceed. If you have enabled such services, such as Linkerd or Istio, Rockfish Data jobs might be prevented from starting and completing successfully.
Work with your IT team to enable backups of the PostgreSQL database.
AWS EKS Kubernetes cluster checklist
AWS EKS cluster running Kubernetes 1.23 or higher. For more information, see Amazon EKS documentation
. Compute resources. The AWS EKS cluster has at least six worker nodes of type m5.xlarge. For more information, see Amazon EC2 instance types . Storage (required for single- and multi-node). Integrate with Amazon Elastic Block Store (EBS) with your EKS cluster by installing and configuring the aws-ebs-csi-driver. For more information, Amazon EBS CSI driver . Networking. Create a Virtual Private Cloud (VPC) with a /16 subnet netmask. This provides up to 65,536 private IPv4 addresses. For more information, see the AWS VPC documentation.
How Amazon VPC works
Get started with Amazon VPCCreate a VPC Networking. Integrate with Amazon Elastic Load Balancing (ALB) to automatically distribute your incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. Install the aws-load-balancer-controller in your EKS cluster to manage your ALBs and create an Ingress that uses this controller. For more information, see Installing the AWS Load Balancer Controller add-on . Networking. Create a Domain name in Route 53 that will point to your ALB. Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service. To configure a domain registered in Route 53 to point to AWS ALB, you can use the Amazon Route 53 Documentation . For specific configurations and support, contact AWS Support. Security. AWS Certificate Manager (ACM) helps you to provision, manage, and renew publicly trusted TLS certificates on AWS based websites. Create an SSL certificate in ACM that can be used with your ALB. For more information, see AWS Certificate Manager > Requesting a public certificate
.
Rockfish Data Helm charts checklist
Obtain the Rockfish Data Helm charts from your Customer Experience Engineer.
Obtain a Docker pull image secret from your Customer Experience Engineer.
Ensure Internet connectivity to pull the Docker images from the Rockfish Data repository.
Get acquainted with the default configuration values.yaml file in the Rockfish Data Helm charts.
Make sure that the nodes in your Kubernetes cluster can accommodate the container resource requirements defined in the values.yaml file.