This repo has a set of basic Terraform configs for deploying a Deepgram on-prem instance on Google Cloud. This is not really complete; it includes two main things:
- A Packer config to build a VM image for running Deepgram on-prem.
- A Terraform config to stand up a GCP Managed Instance Group of VMs running the VM image.
There are other steps you will need to follow in order to fully deploy Deepgram on GCP. (All of these things can be done with Terraform, but are not yet included in this repo.) You will also need to:
- Configure health checks for your Managed Instance Group
- Configure a GCP load balancer to point to your Managed Instance Group
- (Optionally) configure an external IP, domain name, and SSL certificate for your load balancer
See below for complete instructions.
The packer/
directory contains a Packer configuration
to generate a custom image for the Deepgram on-prem service.
- Edit
packer/setup.sh
and set theDEEPGRAM_USERNAME
andDEEPGRAM_PASSWORD
environment variables at the top of that file to the values provided by your Deepgram account contact. These are required to authenticate to Deepgram's private Docker registry. - Edit
packer/setup.sh
and update thecurl
commands in that file to point to the URLs of the encrypted models provided by your Deepgram account manager. - Edit
packer/build.pkr.hcl
and set theproject_id
andzone
variables to the value of your GCP project ID and desired zone (e.g.us-west1-a
). Also, edit theaccelerator_type
variable to set the GCP project ID and type of GPU you wish to deploy on. - Run:
$ packer build build.pkr.hcl
This will create the VM image in your GCP project. It will also emit a file called manifest.json
that contains the image name, like the following:
{
"builds": [
{
"name": "dgonprem-packer-image",
"builder_type": "googlecompute",
"build_time": 1707254755,
"files": null,
"artifact_id": "deepgram-onprem-1707253541",
"packer_run_uuid": "b5a76386-82fb-228b-239c-7e0206c2b167",
"custom_data": null
}
],
"last_run_uuid": "b5a76386-82fb-228b-239c-7e0206c2b167"
}
The artifact_id
is the name of the image that was created, which can then be plugged
into the Terraform configs (see below) to deploy the Deepgram on-prem service.
The terraform/
directory contains a Terraform configuration to deploy a Managed Instance Group
of VMs running the Deepgram on-prem image.
The Terraform configs here don't include a health check, so you need to create this manually in the GCP console at https://console.cloud.google.com/compute/healthChecks. You should configure the health check as follows:
- Health check name:
dgonprem-mig-health-check-8080
- Path:
/v1/status
- Protocol: HTTP
- Port: 8080
- Proxy protocol: NONE
- Logs: Disabled
- Interval: 60 seconds
- Timeout 10 seconds
- Healthy threshold: 3 consecutive successes
- Unhealthy threshold: 10 consecutive failures
(Yes, this could in principle be included in the Terraform config itself.)
You'll need to edit a few things in the deepgram-onprem/main.tf
file before deployment:
- Edit
deepgram-onprem/main.tf
and set theproject
andzone
variables at the top to your GCP project and zone, respectively. - Edit the
packer_image
variable to theartifact_id
generated by Packer. - Edit the various instances of
YOUR-GCP-PROJECT-ID
and change them to your GCP project. - Edit
YOUR-GCP-SERVICE-ACCOUNT
and change this to the ID of the service account associated with your GCP project. - Edit the
min_replicas
andmax_replicas
variables to set the desired number of VMs in the managed instance group. For fast startup times I recommend settingmin_replicas
to at least 1.
You should now be able to run terraform init
and terraform apply
to deploy the Managed Instance Group.
You will also need to configure a GCP load balancer to point to your Managed Instance Group.
Use the Managed Instance Group as the "backend" service for the load balancer.
The endpoint protocol needs to be configured to HTTP (not HTTPS), and should use the
named port http
.
The Deepgram service is configured to send logs to the GCP Ops Agent running on each VM instance, which will forward logs to Google's Cloud Logging service.
If things aren't working, you can login to one of the VM instances with your service and use
docker ps -a
to see the list of containers. There should be an "API" container and an
"engine" container running. You can use docker logs
to inspect the container logs (which
should also appear in the Google Cloud console logs explorer as well).