Start GUAC with PostgreSQL using Docker Compose

If you’d prefer, you can set up GUAC with Kubernetes with the experimental Helm charts provided by Kusari. Note that these helm charts are still experimental and are hosted in a third-party repo and may not be synchronized with the GUAC repo.

This tutorial will walk you through how to deploy a full persistant GUAC deployment with a PostgreSQL database backend using Docker Compose.

Prerequisites

Optional - Verify images and binaries

Follow Verification of the GUAC images and binaries

Step 1: Download GUAC

Download the GUAC CLI guaccollect binary for your machine’s OS and architecture from the latest GUAC release if you have not already done so. For example:
- Linux x86_64 : guaccollect-linux-amd64
- MacOS x86_64 : guaccollect-darwin-amd64
- Windows x86_64 : guaccollect-windows-amd64.exe
Rename the binary to guaccollect, mark it executable if necessary, and add it to your shell’s path.
Download the compose yaml from the latest GUAC release.
Optional: If you want test data to use, download and unzip GUAC’s test data.

Step 2: Start the GUAC server

From the directory you downloaded the guac-postgres-compose.yaml, run:
```
docker compose -f guac-postgres-compose.yaml up
```
Verify that GUAC is running:
```
docker compose ls
```
You should see:
```
NAME                STATUS              CONFIG FILES
dirname             running(9)          /files/dirname/guac-postgres-compose.yml
```
If you don’t see the above, run docker compose down and try starting up GUAC again. Because Docker Compose caches the containers used, the unclean state can cause issues.

GUAC Ports

Port Number	GUAC Component	Note
8080	GraphQL server	To see the GraphQL playground, visit http://localhost:8080.
2782	Collector Subscriber	This service is notified whenever you run a collector, such as `guacone collect files` below. Then subscribers can collect more data on any packages ingested.
4222	Nats	Ingestion pubsub endpoint
8081	REST server	GUAC endpoint for simplified REST queries.

GUAC Volume Mounts

Two directories are created in the same directory as the compose file, these are used for:

blobstore: This directory is a temporary storage of documents that are being queued to the ingestor.
postgres-data: This directory contains the postgres database files.

Step 3: Start Ingesting Data

Before ingesting data, the blobstore directory must be writable by your local user. Because it was created by a the Ingestor docker container, it will have a different user id.

sudo chmod a+w ./blobstore

You can run the guaccollect files ingestion command to load data into your GUAC deployment. For example we can ingest the sample guac-data data. However, you may ingest what you wish to here instead.

guaccollect files --service-poll=false --blob-addr=file://./blobstore?no_tmp_dir=true ./guac-data-main/docs

This command will take all documents under the ./guac-data-main/docs directory and ingest them into GUAC by placing messages on the Nats pubsub queue, and also placing the documents in the ./blobstore directory for the ingestor to pick up.

Switch back to the compose window and you will soon see that the Ingestor is peforming the parsing and GraphQL mutations to add the documents to GUAC. Also, the deps.dev collector and OSV certifier have recognized the new packages and are looking up dependency and vulnerability information for them.

Step 4: Check that everything is ingesting and running

Run:

curl 'http://localhost:8080/query' -s -X POST -H 'content-type: application/json' \
  --data '{
    "query": "{ packages(pkgSpec: {}) { type } }"
  }' | jq

You should see the types of all the packages ingested

{
  "data": {
    "packages": [
      {
        "type": "oci"
      },
...

What is running?

Congratulations, you are now running a full GUAC deployment! Taking a look at the docker-compose.yaml we can see what is actually running:

PostgreSQL: Serves as the persistant data store for all the GUAC data.
GraphQL Server: Serves GUAC GraphQL queries and stores the data. As the in-memory backend is used, no separate backend is needed behind the server.
Collector-Subscriber: Helps communicate to the collectors when additional information is needed.
Deps.dev Collector: Gathers further information from Deps.dev for supported packages.
OSV Certifier: Gathers OSV vulnerability information from osv.dev about packages.
Ingestor: Retrieves ingestion messages from Nats, parses the documents, and mutates the GraphQL graph to add the data to GUAC.
Nats: Serves as the pubsub to accept ingestion requests which the Ingestor pulls from.
OCI Collector: Collects additional metadata from OCI registries about any container references found in documents.
GUAC REST Sercer: Serves simplified query endpoints.

Next steps

This compose configuration is suitable to leave running in an environment that is accessible to your environment for further GUAC ingestion, discovery, analysis, and evaluation. Explore the types of collectors available under the guaccollect collect command and see what will work for your build, ingestion, and SBOM workflow. These collectors can be run as another service that watches a location for new documents to ingest. If you’re curious about the various GUAC components and what they do, see How GUAC components work together.

You may wish to alter the volume configuration to change the blobstore and postgres-data locations. The blobstore needs to be accessable to any guaccollect commands and supports cloud buckets. The guacone command does not need access to the blobstore and interacts directly with the GraphQL server.