JupyterHub is a multi-user notebook that enables multiple users to develop, research, and create. In this post, I am going to cover deploying JupyterHub to Amazon EKS with single user persistent storage backed by Amazon EBS and TLS termintation using the AWS Certificate Manager (ACM).
Before we dive in, make sure you have eksctl, kubectl, and Helm installed on your local machine. We will be using these tools to deploy the Kubernetes cluster and JupyterHub. I have installed these tools using Homebrew on Mac OS.
brew install helm kubernetes-cli eksctl
Note: This guide is based on the Zero to JupyterHub with Kubernetes guide.
Now that we have necessary tools, it is time to deploy the cluster. We will be using eksctl, the official CLI tool of Amazon EKS, to deploy a managed Kubernetes cluster on AWS. The configuration below will configure a Kubernetes cluster, a set of managed node groups, and configure the Amazon EBS CSI Driver with a IAM Role for Service Accounts for least privileged containers. Feel free to modify this file to meet your needs.
We create a node group in each availability zone so that we always have capacity in each availability zone. This is important when using Amazon EBS, where disks are specific to an availability zone. We have pre-configured the cluster for Cluster AutoScaler, but will not be covering the deployment in this post. To deploy Cluster AutoScaler after this guide, see the Amazon EKS documentation for deployment.
# file: cluster.yml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: jupyterhub
region: us-east-2
iam:
withOIDC: true
serviceAccounts:
- metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
aws-usage: "cluster-ops"
app.kubernetes.io/name: cluster-autoscaler
attachPolicy:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "autoscaling:DescribeAutoScalingGroups"
- "autoscaling:DescribeAutoScalingInstances"
- "autoscaling:DescribeLaunchConfigurations"
- "autoscaling:DescribeTags"
- "autoscaling:SetDesiredCapacity"
- "autoscaling:TerminateInstanceInAutoScalingGroup"
- "ec2:DescribeLaunchTemplateVersions"
Resource: '*'
- metadata:
name: ebs-csi-controller-sa
namespace: kube-system
labels:
aws-usage: "cluster-ops"
app.kubernetes.io/name: aws-ebs-csi-driver
attachPolicy:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "ec2:AttachVolume"
- "ec2:CreateSnapshot"
- "ec2:CreateTags"
- "ec2:CreateVolume"
- "ec2:DeleteSnapshot"
- "ec2:DeleteTags"
- "ec2:DeleteVolume"
- "ec2:DescribeInstances"
- "ec2:DescribeSnapshots"
- "ec2:DescribeTags"
- "ec2:DescribeVolumes"
- "ec2:DetachVolume"
Resource: '*'
managedNodeGroups:
- name: ng-us-east-2a
instanceType: m5.large
volumeSize: 30
desiredCapacity: 1
privateNetworking: true
availabilityZones:
- us-east-2a
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/jupyterhub: "owned"
- name: ng-us-east-2b
instanceType: m5.large
volumeSize: 30
desiredCapacity: 1
privateNetworking: true
availabilityZones:
- us-east-2b
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/jupyterhub: "owned"
- name: ng-us-east-2c
instanceType: m5.large
volumeSize: 30
desiredCapacity: 1
privateNetworking: true
availabilityZones:
- us-east-2c
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/jupyterhub: "owned"
Next, deploy the cluster using this configuration.
eksctl create cluster -f ./cluster.yml
This will take 15-25 minutes to deploy. While this is deploying we can configure our values.yml
file that we will use configure JupyterHub. To get started, we need to configure a secret token that is used by JupyterHub. Run the following command to generate this string.
# generate a secret token
openssl rand -hex 32
Next, we need configure the rest of our values.yml
file. Start by replacing the value of secretToken
with the value from the previous command. To get started, we are disabled HTTPS, don’t worry we will configure it later. Then we are specifying a dummy authentication provider, if you have an authentication provider such as OAuth, OIDC, or LDAP, feel free to replace the configuration to meet your needs. Finally, we are configuring JupyterHub to use a Kubernetes Storage Class to provision disks for our users. In this case, we will be using the Amazon EBS CSI Driver to provide EBS disks for each user.
# file: values.yml
proxy:
secretToken: <replace with value from previous comamand>
https:
enabled: false
auth:
type: dummy
dummy:
password: 'supersecretpassword!'
whitelist:
users:
- admin
singleuser:
storage:
capacity: 4Gi
dynamic:
storageClass: gp2
Once the cluster creation has completed, we need to deploy the Amazon EBS CSI Driver. Run the following commands to deploy and configure the Amazon EBS CSI Driver. We are using the out of the box configuration, feel free to create additional Storage Classes with various EBS settings to meet your needs.
# deploy the ebs csi driver for persistent storage
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"
We are now ready to deploy JupyterHub. First, we need to add the JupyterHub Helm chart to our local machine. Then we will deploy the chart using the values.yml
file we configured earlier.
# setup helm
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
# deploy jupyterhub
helm install jupyterhub jupyterhub/jupyterhub \
--values values.yml
This will take a few minutes to settle. Once it is settled, run the following command to get the URL of the load balancer.
kubectl get svc proxy-public
Your output should look similar to this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
proxy-public LoadBalancer 10.100.61.130 a86c837dfe75545c8b3e311621278e82-357827081.us-east-2.elb.amazonaws.com 80:31950/TCP 15m
Navigate to the http version of the load balancer URL output by the previous command, you should be able to login using the dummy credentials we configured above.
However, we can make this more secure. Next, let’s use AWS Certificate Manager (ACM) to configure an SSL certificate on the load balancer. To do this, we need to update our values.yml file like:
proxy:
secretToken: <replace with token>
https:
enabled: true
type: offload
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<arn of certificate>"
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"
auth:
type: dummy
dummy:
password: 'supersecretpassword!'
whitelist:
users:
- admin
singleuser:
storage:
capacity: 4Gi
dynamic:
storageClass: gp2
Once you have updated your configuration, run the following command to apply the updates:
helm upgrade jupyterhub jupyterhub/jupyterhub --values values.yml
Now we need to create a DNS record that points our load balancer to the URL that matches our ACM certificate. In my case, this is jupyterhub.arhea.io
. Optionally, you can configure External DNS with Kubernetes to automatically configure this for you.
That’s it! You now have a highly scalable deployment of JupyterHub on Amazon EKS. To improve this deployment, I recommend looking at External DNS to automatically register the load balancer with your DNS provider, Cluster AutoScaler to automatically scale your cluster based on usage, and the Amazon EFS CSI Driver to attach shared storage to all user environments.