Just a Theory

pg_clickhouse 0.2.0

2026-04-13T22:22:53Z

In response to a generous corpus of real-world user feedback, we’ve been hard at work the past week adding a slew of updates to pg_clickhouse, the query interface for ClickHouse from Postgres. As usual, we focused on improving pushdown, especially for various date and time, array, and regular expression functions.

Regular expressions prove to be a particular challenge, because while Postgres supports POSIX Regular Expressions, ClickHouse relies on RE2. For simple regular expressions that no doubt make up a huge number of use cases, the differences matter little or not at all. But these two engines take quite different approaches to regular expression evaluation, so issues will come up.

To address this, the new regular expression pushdown code examines the flags passed to the Postgres regular expression functions and refuses to push down in the presence of incompatible flags. It will push down compatible flags, though it takes pains to also pass (?-s) to disable the s flag, because ClickHouse enables s by default, contrary to the expectations of the Postgres regular expression user.

pg_clickhouse does not (yet?) examine the flags embedded in the regular expression, but v0.2.0 now provides the pg_clickhouse.pushdown_regex setting, which can disable regular expression pushdown:

SET pg_clickhouse.pushdown_regex = 'false';

My colleague Philip Dubé has also started work embedding ClickHouse-compatible regular expression functions that use re2 directly, to provide more options soon — not to mention a standalone extension with just those functions.

As with all pg_clickhouse releases to date, v0.2.0 does not break compatibility with previous versions at all: once the new library has been installed and reloaded, existing v0.1 releases get all the benefits. There is, however, a new function, pgch_version(), which requires an upgrade to use:

try=# ALTER EXTENSION pg_clickhouse UPDATE TO '0.2';
ALTER EXTENSION

try=# select pgch_version();
 pgch_version 
--------------
 0.2.0
(1 row)

We plan for a lot more to come, including improved subquery pushdown, more function pushdown, string and date formatting pushdown, and more. Watch this space for further announcements and the ClickHouse Blog for a forthcoming post covering the pg_clickhouse features and improvements in detail. Meanwhile, here’s where to get the new release:

Thanks again my colleagues, Kaushik Iska and Philip Dubé for the slew of pull requests and feature brainstorming.

pg_clickhouse 0.1.10

2026-04-06T21:38:34Z

Hi, it’s me, back again with another update to pg_clickhouse, the query interface for ClickHouse from Postgres. This release, v0.1.10, maintains binary compatibility with earlier versions but ships a number of significant improvements that increase compatibility of Postgres features with ClickHouse. Highlights include:

Mappings for the JSON and JSONB -> TEXT and ->> TEXT operators, as well as jsonb_extract_path_text() and jsonb_extract_path(), to be pushed down to ClickHouse using its sub-column syntax.
Mappings to push down the Postgres statement_timestamp(), transaction_timestamp(), and clock_timestamp() functions, as well as the Postgres “SQL Value Functions”, including CURRENT_TIMESTAMP, CURRENT_USER, and CURRENT_DATABASE.
And the big one: mappings to push down compatible window functions, including ROW_NUMBER, RANK, DENSE_RANK, LEAD,LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE, NTILE, CUME_DIST, PERCENT_RANK, and MIN/MAX OVER.
Oh yeah, the other big one: added result set streaming to the HTTP driver. Rather that load all the results A testing loading a 1GB table reduced memory consumption from over 1GB to 73MB peak.

We’ll work up a longer post to show off some of these features in the next week. But in the meantime, git it while it’s hot!

Thanks to my colleagues, Kaushik Iska and Philip Dubé for the slew of pull requests I waded through this past week!

pg_clickhouse 0.1.6

2026-04-06T20:44:26Z

We fixed a few bugs this week in pg_clickhouse, the query interface for ClickHouse from Postgres. It features improved query cancellation and function & operator pushdown, including to_timestamp(float8), ILIKE, LIKE, and regex operators. Get the new v0.1.6 release from the usual places:

Thanks to my colleague, Kaushik Iska, for most of these fixes!

pg_clickhouse 0.1.5

2026-03-20T19:15:47Z

I’ve been busy with an internal project at work, but have responded to a few pg_clickhouse reports for a couple crashes and vulnerabilities, thanks to pen testing and a community security report. These changes drive the release of v0.1.5 today.

Get it from the usual sources:

Appreciation to my employer, ClickHouse, for championing this extension.

pg_clickhouse v0.1.4

2026-02-17T22:24:08Z

Just a quick post to note the release of pg_clickhouse v0.1.4. This v0.1 maintenance release can be upgraded in-place and requires no ALTER EXTENSION UPDATE command; as soon as sessions reload the shared library they’ll be good to go.

Thanks in part to reports from attentive users, v0.1.4’s most significant changes improve the following:

The binary driver now properly inserts NULL into a Nullable(T) column. Previously it would raise an error.
The http driver now properly parses arrays. Previously it improperly included single quotes in string items and would choke on brackets ([]) in values.
Both drivers now support mapping a ClickHouse String types to Postgres BYTEA columns. Previously the worked only with text types, which is generally preferred. But since ClickHouse explicitly supports binary data in String values (notably hash function return values), pg_clickhouse needs to support it, as well.

Get it in all the usual places:

My thanks to pg_clickhouse users like Rahul Mehta for reporting issues, and to my employer, ClickHouse, for championing this extension. Next up: more aggregate function mapping, hash function pushdown, and improved subquery (specifically, SubPlan) pushdown.

🛠️ PGXN Tools v1.7

2026-01-24T22:53:11Z

Today I released v1.7.0 of the pgxn-tools OCI image, which simplifies Postgres extension testing and PGXN distribution. The new version includes just a few updates and improvements:

Upgraded the Debian base image from Bookworm to Trixie
Set the PGUSER environment variable to postgres in the Dockerfile, removing the need for users to remember to do it.
Updated pg-build-test to set MAKEFLAGS="-j $(nprocs)" to shorten build runtimes.
Also updated pgrx-build-test to pass -j $(nprocs), for the same reason.
Upgraded the pgrx test extension to v0.16.1 and test it on Postgres versions 13-16.

Just a security and quality of coding life release. Ideally existing workflows will continue to work as they always have.

Welcome dmjwk

2025-12-30T03:21:05Z

Please welcome dmjwk into the world. This “demo JWK” (or “dumb JWK” if you like) service provides super simple Identity Provider APIs strictly for demo purposes.

Say you’ve written a service that depends on a public JSON Web Key (JWK) set to authenticate JSON Web Tokens (JWT) submitted as OAuth 2 Bearer Tokens. Your users will normally configure the service to use an internal or well-known provider, such as Auth0, Okta, or AWS. Such providers might be too heavyweight for demo purposes, however.

For my own use, I needed nothing more than a Docker Compose file with local-only services. I also wanted some control over the contents of the tokens, since my records the sub field from the JWT in an audit trail, and something like 1a1077e6-3b87-1282-789c-f70e66dab825 (as in Vault JWTs) makes for less-than-friendly text to describe in a demo.

I created dmjwk to scratch this itch. It provides a basic Resource Owner Password Credentials Grant OAuth 2 flow to create custom JWTs, a well-known URL for the public JWK set, and a simple API that validates JWTs. None of it is real, it’s all for show, but the show’s the point.

Quick Start

The simplest way to start dmjwk is with its OCI image (there are binaries for 40 platforms, as well). It starts on port 443, since hosts commonly reserve that port, let’s map it to 4433 instead:

docker run -d -p 4433:443 --name dmjwk --volume .:/etc/dmjwk ghcr.io/theory/dmjwk

This command fires up dmjwk with a self-signed TLS certificate for localhost and creates a root cert bundle, ca.pem, in the current directory. Use it with your favorite HTTP client to make validated requests.

JWK Set

For example, to fetch the JWK set:

curl -s --cacert ca.pem https://localhost:4433/.well-known/jwks.json

By default dmjwk creates a single JWK in the set that looks something like this (JSON reformatted):

{
  "keys": [
    {
      "kty": "EC",
      "crv": "P-256",
      "x": "Ld98DHMIIanlpdOhYf-8GljNHnxHW_i6Bq0iltw9J98",
      "y": "xxyRGhCFIjdQFD-TAs-y6uf18wsPvkq8wH_FsGY1GyU"
    }
  ]
}

Configure services to use this URL, https://localhost:4433/.well-known/jwks.json, to to validate JWTs created by dmjwk.

Authorization

To fetch a JWT signed by the first key in the JWK set (just the one in this example), make an application/x-www-form-urlencoded POST with the required grant_type, username, and password fields:

form='grant_type=password&username=kamala&password=a2FtYWxh'
curl -s --cacert ca.pem -d "$form" https://localhost:4433/authorization

dmjwk stores no actual usernames and passwords; it’s all for show. Provide any username you like and Base64-encode the username, without trailing equal signs, as the password.

Example successful response:

{
  "access_token": "eyJhbGciOiJFUzI1NiIsImtpZCI6IiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJrYW1hbGEiLCJleHAiOjE3NjY5NDQyNzcsImlhdCI6MTc2Njk0MDY3NywianRpIjoiZ3hhNnNib292aTg5dSJ9.04efdORHDA3GIPMnWErMPy4mXXsBfbnMJlzqZsxGVEc2cRvEWI0Mt_IqHDK4RYK_14BCEu2nTMiEPtgwC2IZ5A",
  "token_type": "Bearer",
  "expires_in": 3600,
  "scope": "read"
}

Parsing the the access_token JWT from the response provides this header:

{
  "alg": "ES256",
  "kid": "",
  "typ": "JWT"
}

And this payload:

{
  "sub": "kamala",
  "exp": 1766944277,
  "iat": 1766940677,
  "jti": "gxa6sboovi89u"
}

We can further customize its contents by passing any of a few additional parameters. To specify an audience and issuer, for example:

form='grant_type=password&username=kamala&password=a2FtYWxh&iss=spacely+sprockets&aud=cogswell.cogs'
curl -s --cacert ca.pem -d "$form" https://localhost:4433/authorization

Which returns something like:

{
  "access_token": "eyJhbGciOiJFUzI1NiIsImtpZCI6IiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzcGFjZWx5IHNwcm9ja2V0cyIsInN1YiI6ImthbWFsYSIsImF1ZCI6WyJjb2dzd2VsbC5jb2dzIl0sImV4cCI6MTc2NzAzNDIyNCwiaWF0IjoxNzY3MDMwNjI0LCJqdGkiOiIxNXZmaDhzYm41YWFxIn0.IGRdD5HGiWLOXggZhb9zPlLK40WWy8R0-HmSuIhaObD6WEwA2WXIBWg_MqtFFQISKLXrjNDHphXtEJsx6FZBOQ",
  "token_type": "Bearer",
  "expires_in": 3600,
  "scope": "read"
}

Now the JWT payload is:

{
  "iss": "spacely sprockets",
  "sub": "kamala",
  "aud": [
    "cogswell.cogs"
  ],
  "exp": 1767034206,
  "iat": 1767030606,
  "jti": "8ri9vfsg5f8mj"
}

This allows customization appropriate for your service, which might determine authorization based on the contents of the various JWT fields.

A request that fails to authenticate the username and password, e.g.:

form='grant_type=password&username=kamala&password=nope'
curl -s --cacert ca.pem -d "$form" https://localhost:4433/authorization

Will return an appropriate response:

{
  "error": "invalid_request",
  "error_description": "incorrect password"
}

Resource

For simple JWT validation, POST a JWT returned from the authorization API as a Bearer token to /resource:

tok=$(curl -s --cacert ca.pem -d "$form" https://localhost:4433/authorization | jq -r .access_token)
curl -s --cacert ca.pem -H "Authorization: Bearer $tok" https://localhost:4433/resource -d 'HELLO WORLD
'

The response simply returns the request body:

HELLO WORLD

A request that fails to authenticate, say with an invalid Bearer token:

curl -s --cacert ca.pem -H "Authorization: Bearer NOT" https://localhost:4433/resource -d 'HELLO WORLD'

Returns an appropriate error response:

{
  "error": "invalid_token",
  "error_description": "token is malformed: token contains an invalid number of segments"
}

That’s It

dmjwk includes a fair number of configuration options, including external certificates, custom host naming (useful with Docker Compose), and multiple key generation. If you find it useful for your demos (but not for production — DON’T DO THAT) — let me know. And if not, that’s fine, too. This is a bit of my pursuit of a thick desire, made mainly for me, but it pleases me if others find it helpful too.

🐏 Taming PostgreSQL GUC “extra” Data

2025-12-18T18:04:50Z

New post up on on the ClickHouse blog:

I wanted to optimize away parsing the key/value pairs from the pg_clickhouse pg_clickhouse.session_settings GUC for every query by pre-parsing it on assignment and assigning it to a separate variable. It took a few tries, as the GUC API requires quite specific memory allocation for extra data to work properly. It took me a few tries to land on a workable and correct solution.

Struggling to understand, making missteps, and ultimately coming to a reasonable design and solution satisfies me so immensely that I always want to share. This piece gets down in the C coding weeds; my fellow extension coders might enjoy it.

Introducing pg_clickhouse

2025-12-10T16:34:06Z

The ClickHouse blog has a posted a piece by yours truly introducing pg_clickhouse, a PostgreSQL extension to run ClickHouse queries from PostgreSQL:

While clickhouse_fdw and its predecessor, postgres_fdw, provided the foundation for our FDW, we set out to modernize the code & build process, to fix bugs & address shortcomings, and to engineer into a complete product featuring near universal pushdown for analytics queries and aggregations.

Such advances include:

Adopting standard PGXS build pipeline for PostgreSQL extensions

Adding prepared INSERT support to and adopting the latest supported

release of the ClickHouse C++ library

Creating test cases and CI workflows to ensure it works on PostgreSQL versions 13-18 and ClickHouse versions 22-25

Support for TLS-based connections for both the binary protocol and the HTTP API, required for ClickHouse Cloud

Bool, Decimal, and JSON support

Transparent aggregate function pushdown, including for ordered-set aggregates like percentile_cont()

SEMI JOIN pushdown

I’ve spent most of the last couple months working on this project, learning a ton about ClickHouse, foreign data wrappers, C and C++, and query pushdown. Interested? Try ou the Docker image:

docker run --name pg_clickhouse -e POSTGRES_PASSWORD=my_pass \
       -d ghcr.io/clickhouse/pg_clickhouse:18
docker exec -it pg_clickhouse psql -U postgres -c 'CREATE EXTENSION pg_clickhouse'

Or install it from PGXN (requires C and C++ build tools, cmake, and the openssl libs, libcurl, and libuuid):

pgxn install pg_clickhouse

Or download it and build it yourself from:

PGXN
GitHub

Let me know what you think!

Sqitch 1.6.0: Now with ClickHouse!

2025-10-06T22:01:19Z

Out today: Sqitch v1.6.0. This release adds a brand new engine: ClickHouse. I started a new job at ClickHouse on September 2, and my first task, as a way to get to know the database, was to add it to Sqitch. Fortuitously, ClickHouse added support for updates and deletes, which Sqitch requires, in the August release. Sqitch v1.6.0 therefore supports ClickHouse 25.8 or later.

As for the other engines Sqitch supports, this release includes a ClickHouse tutorial, the --with-clickhouse-support option in the Homebrew tap, and Sqitch ClickHouse Docker tags.

Find it in the usual places:

Thanks for using Sqitch, and do let me know if you use it to manage a ClickHouse database, or if you run into any issues or challenges.

Postgres Extensions: Use PG_MODULE_MAGIC_EXT

2025-05-29T22:09:22Z

A quick note for PostgreSQL extension maintainers: PostgreSQL 18 introduces a new macro: PG_MODULE_MAGIC_EXT. Use it to name and version your modules. Where your module .c file likely has:

PG_MODULE_MAGIC;

Or:

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Change it to something like:

#ifdef PG_MODULE_MAGIC_EXT
PG_MODULE_MAGIC_EXT(.name = "module_name", .version = "1.2.3");
#else
PG_MODULE_MAGIC;
#endif

Replace the name of your module and the version as appropriate. Note that PG_MODULE_MAGIC was added in Postgres 8.2; if for some reason your module still supports earlier versions, use a nested #ifdef to conditionally execute it:

#ifdef PG_MODULE_MAGIC_EXT
PG_MODULE_MAGIC_EXT(.name = "module_name", .version = "1.2.3");
#else
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
#endif

If you manage the module version in your Makefile, as the PGXN Howto suggests, consider renaming the .c file to .c.in and changing the Makefile like so:

Replace .version = "1.2.3" with .version = "__VERSION__"
Add src/$(EXTENSION).c to EXTRA_CLEAN

Add this make target:

src/$(EXTENSION).c: src/$(EXTENSION).c.in
	sed -e 's,__VERSION__,$(EXTVERSION),g' $< > $@

If you use Git, add /src/*.c to .gitignore

For an example of this pattern, see semver@3526789.

That’s all!

Adventures in Extension Packaging

2025-06-14T15:32:03Z

I gave a presentation at PGConf.dev last week, Adventures in Extension Packaging. It summarizes stuff I learned in the past year in developing the PGXN Meta v2 RFC, re-packaging all of the extensions on pgt.dev, and experimenting with the CloudNativePG community’s proposal to mount extension OCI images in immutable PostgreSQL containers.

Turns out a ton of work and experimentation remains to be done.

Video
Slides

Previous work covers the first half of the talk, including:

A brief introduction to PGXN, borrowing from the State of the Extensions Ecosystem
The metadata designed to enable automated packaging of extensions added to the PGXN Meta v2 RFC
The Trunk Packaging Format, a.k.a., PGXN RFC 2
OCI distribution of Trunk packages

The rest of the talk encompasses newer work. Read on for details.

Automated Packaging Challenges

Back in December I took over maintenance of the Trunk registry, a.k.a., pgt.dev, refactoring and upgrading all 200+ extensions and adding Postgres 17 builds. This experience opened my eyes to the wide variety of extension build patterns and configurations, even when supporting a single OS (Ubuntu 22.04 “Jammy”). Some examples:

pglogical requires an extra make param to build on PostgreSQL 17: make -C LDFLAGS_EX="-L/usr/lib/postgresql/17/lib"
Some pgrx extensions require additional params, for example:
- pg_search needs the --features flag to enable icu
- vectorscale requires the environment variable RUSTFLAGS="-C target-feature=+avx2,+fma"
pljava needs a pointer to libjvm: mvn clean install -Dpljava.libjvmdefault=/usr/lib/x86_64-linux-gnu/libjvm.so
plrust needs files to be moved around, a shell script to be run, and to be built from a subdirectory
bson also needs files to be moved around and a pointer to libbson
timescale requires an environment variable and shell script to run before building
Many extensions require patching to build for various configurations and OSes, like this tweak to build pguri on Postgres 17 and this patch to get duckdb_fdw to build at all

Doubtless there’s much more. These sorts of challenges led the RPM and APT packaging systems to support explicit scripting and patches for every package. I don’t think it would be sensible to support build scripting in the meta spec.

However, the PGXN meta SDK I developed last year supports the merging of multiple META.json files, so that downstream packagers could maintain files with additional configurations, including explicit build steps or lists of packages, to support these use cases.

Furthermore, the plan to add reporting to PGXN v2 means that downstream packages could report build failures, which would appear on PGXN, where they’d encourage some maintainers, at least, to fix issues within their control.

Dependency Resolution

Dependencies present another challenge. The v2 spec supports third party dependencies — those not part of Postgres itself or the ecosystem of extensions. Ideally, an extension like pguri would define its dependence on the uriparser library like so:

{
  "dependencies": {
    "postgres": { "version": ">= 9.3" },
    "packages": {
      "build": {
        "requires": {
          "pkg:generic/uriparser": 0,
        }
      }
    }
  }
}

An intelligent build client will parse the dependencies, provided as purls, to determine the appropriate OS packages to install to satisfy. For example, building on a Debian-based system, it would know to install liburiparser-dev to build the extension and require liburiparser1 to run it.

With the aim to support multiple OSes and versions — not to mention Postgres versions — the proposed PGXN binary registry would experience quite the combinatorial explosion to support all possible dependencies on all possible OSes and versions. While I propose to start simple (Linux and macOS, Postgres 14-18) and gradually grow, it could quickly get quite cumbersome.

So much so that I can practically hear Christoph’s and Devrim’s reactions from here:

Photo of Christoph, Devrim, and other long-time packagers laughing at me.

Or perhaps:

Photo of Christoph and Devrim laughing at me.

I hardly blame them.

A CloudNativePG Side Quest

Gabriele Bartolini blogged the proposal to deploy extensions to CloudNativePG containers without violating the immutability of the container. The introduction of the extension_control_path GUC in Postgres 18 and the ImageVolume feature in Kubernetes 1.33 enable the pattern, likely to be introduced in CloudNativePG v1.27. Here’s a sample CloudNativePG cluster manifest with the proposed extension configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-with-extensions
spec:
  instances: 1
  imageName: ghcr.io/cloudnative-pg/postgresql-trunk:18-devel
  postgresql:
    extensions:
      - name: vector
        image:
          reference: ghcr.io/cloudnative-pg/pgvector-18-testing
  storage:
    storageClass: standard
    size: 1Gi

The extensions object at lines 9-12 configures pgvector simply by referencing an OCI image that contains nothing but the files for the extension. To “install” the extension, the proposed patch triggers a rolling update, replicas first. For each instance, it takes the following steps:

Mounts each extension as a read-only ImageVolume under /extensions; in this example, /extensions/vector provides the complete contents of the image
Updates LD_LIBRARY_PATH to include the path to the lib directory of the each extension, e.g., /extensions/vector/lib.
Updates the extension_control_path and dynamic_library_path GUCs to point to the share and lib directories of each extension, in this example:
```
extension_control_path = '$system:/extensions/vector/share'
dynamic_library_path   = '$libdir:/extensions/vector/lib'
```

This works! Alas, the pod restart is absolutely necessary, whether or not any extension requires it,¹, because:

Kubernetes resolves volume mounts, including ImageVolumes, at pod startup
The dynamic_library_path and extension_control_path GUCs require a Postgres restart
Each extension requires another path to be appended to both of these GUCs, as well as the LD_LIBRARY_PATH

Say we wanted to use five extensions. The extensions part of the manifest would look something like this:

extensions:
  - name: vector
    image:
      reference: ghcr.io/cloudnative-pg/pgvector-18-testing
  - name: semver
    image:
      reference: ghcr.io/example/semver:0.40.0
  - name: auto_explain
    image:
      reference: ghcr.io/example/auto_explain:18
  - name: bloom
    image:
      reference: ghcr.io/example/bloom:18
  - name: postgis
    image:
      reference: ghcr.io/example/postgis:18

To support this configuration, CNPG must configure the GUCs like so:

extension_control_path = '$system:/extensions/vector/share:/extensions/semver/share:/extensions/auto_explain/share:/extensions/bloom/share:/extensions/postgis/share'

dynamic_library_path   = '$libdir:/extensions/vector/lib:/extensions/semver/lib:/extensions/auto_explain/lib:/extensions/bloom/lib:/extensions/postgis/lib'

And also LD_LIBRARY_PATH:

LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/extensions/vector/lib:/extensions/semver/lib:/extensions/auto_explain/lib:/extensions/"

In other words, every additional extension requires another prefix to be appended to each of these configurations. Ideally we could use a single prefix for all extensions, avoiding the need to update these configs and therefore to restart Postgres. Setting aside the ImageVolume limitation² for the moment, this pattern would require no rolling restarts and no GUC updates unless a newly-added extension requires pre-loading via shared_preload_libraries.

Getting there, however, requires a different extension file layout than PostgreSQL currently uses.

RFC: Extension Packaging and Lookup

Imagine this:

A single extension search path GUC
Each extension in its own eponymous directory
Pre-defined subdirectory names used inside each extension directory

The search path might look something like:

extension_search_path = '$system:/extensions:/usr/local/extensions'

Looking at one of these directories, /extensions, its contents would be extension directories:

❯ ls -1 extensions
auto_explain
bloom
postgis
semver
vector

And the contents of one these extension directories would be something like:

❯ tree extensions/semver
extensions/semver
├── doc
│   └── semver.md
├── lib
│   └── semver.so
├── semver.control
└── sql
    ├── semver--0.31.0--0.31.1.sql
    ├── semver--0.31.1--0.31.2.sql
    ├── semver--0.31.2--0.32.0.sql
    └── semver--0.5.0--0.10.0.sql

For this pattern, Postgres would look for the appropriately-named directory with a control file in each of the paths. To find the semver extension, for example, it would find /extensions/semver/semver.control.

All the other files for the extension would live in specifically-named subdirectories: doc for documentation files, lib for shared libraries, sql for SQL deployment files, plus bin, man, html, include, locale, and any other likely resources.

With all of the files required for an extension bundled into well-defined subdirectories of a single directory, it lends itself to the layout of the proposed binary distribution format. Couple it with OCI distribution and it becomes a natural fit for ImageVolume deployment: simply map each extension OCI image to a subdirectory of the desired search path and you’re done. The extensions object in the CNPG Cluster manifest remains unchanged, and CNPG no longer needs to manipulate any GUCs.

Some might recognize this proposal from a previous RFC post. It not only simplifies the CloudNativePG use cases, but because it houses all of the files for an extension in a single bundle, it also vastly simplifies installation on any system:

Download the extension package
Validate its signature & contents
Unpack its contents into a directory named for the extension in the extension search path

Simple!

Fun With Dependencies

Many extensions depend on external libraries, and rely on the OS to find them. OS packagers follow the dependency patterns of their packaging systems: require the installation of other packages to satisfy the dependencies.

How could a pattern be generalized by the Trunk Packaging Format to work on all OSes? I see two potential approaches:

List the dependencies as purls that the installing client translates to the appropriate OS packages it installs.
Bundle dependencies in the Trunk package itself

Option 1 will work well for most use cases, but not immutable systems like CloudNativePG. Option 2 could work for such situations. But perhaps you noticed the omission of LD_LIBRARY_PATH manipulation in the packaging and lookup discussion above. Setting aside the multitude of reasons to avoid LD_LIBRARY_PATH³, how else could the OS find shared libraries needed by an extension?

Typically, one installs shared libraries in one of a few directories known to tools like ldconfig, which must run after each install to cache their locations. But one cannot rely on ldconfig in immutable environments, because the cache of course cannot be mutated.

We could, potentially, rely on rpath, a feature of modern dynamic linkers that reads a list of known paths from the header of a binary file. In fact, most modern OSes support $ORIGIN as an rpath value⁴ (or @loader_path on Darwin/macOS), which refers to the same directory in which the binary file appears. Imagine this pattern:

The Trunk package for an extension includes dependency libraries alongside the extension module
The module is compiled with rpath=$ORIGIN

To test this pattern, let’s install the Postgres 18 beta and try the pattern with the pguri extension. First, remove the $libdir/ prefix (as discussed previously) and patch the extension for Postgres 17+:

perl -i -pe 's{\$libdir/}{}' pguri/uri.control pguri/*.sql
perl -i -pe 's/^(PG_CPPFLAGS.+)/$1 -Wno-int-conversion/' pguri/Makefile

Then compile it with CFLAGS to set rpath and install it with a prefix parameter:

make CFLAGS='-Wl,-rpath,\$$ORIGIN'
make install prefix=/usr/local/postgresql

With the module installed, move the liburiparser shared library from OS packaging to the lib directory under the prefix, resulting in these contents:

❯ ls -1 /usr/local/postgresql/lib
liburiparser.so.1
liburiparser.so.1.0.30
uri.so

The chrpath utility shows that the extension module, uri.so, has its RUNPATH (the modern implementation of rparth) properly configured:

❯ chrpath /usr/local/postgresql/lib/uri.so 
uri.so: RUNPATH=$ORIGIN

Will the OS be able to find the dependency? Use ldd to find out:

❯ ldd /usr/local/postgresql/lib/uri.so 
	linux-vdso.so.1
	liburiparser.so.1 => /usr/local/postgresql/lib/liburiparser.so.1
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6
	/lib/ld-linux-aarch64.so.1

The second line of output shows that it does in fact find liburiparser.so.1 where we put it. So far so good. Just need to tell the GUCs where to find them and restart Postgres:

extension_control_path = '$system:/usr/local/postgresql/share'
dynamic_library_path   = '$libdir:/usr/local/postgresql/lib'

And then it works!

❯ psql -c "CREATE EXTENSION uri"
CREATE EXTENSION
❯ psql -c "SELECT 'https://example.com/'::uri"
         uri          
----------------------
 https://example.com/

Success! So we can adopt this pattern, yes?

A Wrinkle

Well, maybe. Try it with a second extension, http, once again building it with rpath=$ORIGIN and installing it in the custom lib directory:

perl -i -pe 's{$libdir/}{}g' *.control
make CFLAGS='-Wl,-rpath,\$$ORIGIN'
make install prefix=/usr/local/postgresql

Make sure it took:

❯ chrpath /usr/local/postgresql/lib/http.so 
http.so: RUNPATH=$ORIGIN

Now use ldd to see what shared libraries it needs:

❯ ldd /usr/local/postgresql/lib/http.so
	linux-vdso.so.1 
	libcurl.so.4 => not found
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6

Naturally it needs libcurl; let’s copy it from another system and try again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


❯ scp dev:libcurl.so.4 /usr/local/postgresql/lib/
❯ ldd /usr/local/postgresql/lib/http.so
	linux-vdso.so.1
	libcurl.so.4 => /usr/local/postgresql/lib/libcurl.so.4
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6
	/lib/ld-linux-aarch64.so.1
	libnghttp2.so.14 => not found
	libidn2.so.0 => /lib/aarch64-linux-gnu/libidn2.so.0
	librtmp.so.1 => not found
	libssh.so.4 => not found
	libpsl.so.5 => not found
	libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3
	libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3
	libgssapi_krb5.so.2 => /lib/aarch64-linux-gnu/libgssapi_krb5.so.2
	libldap.so.2 => not found
	liblber.so.2 => not found
	libzstd.so.1 => /lib/aarch64-linux-gnu/libzstd.so.1
	libbrotlidec.so.1 => not found
	libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1

Line 4 shows it found libcurl.so.4 where we put it, but the rest of the output lists a bunch of new dependencies that need to be satisfied. These did not appear before because the http.so module doesn’t depend on them; the libcurl.so library does. Let’s add libnghttp2 and try again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


❯ scp dev:libnghttp2.so.14 /usr/local/postgresql/lib/
❯ ldd /usr/local/postgresql/lib/http.so
	linux-vdso.so.1
	libcurl.so.4 => /usr/local/postgresql/lib/libcurl.so.4
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6
	/lib/ld-linux-aarch64.so.1
	libnghttp2.so.14 => not found
	libidn2.so.0 => /lib/aarch64-linux-gnu/libidn2.so.0
	librtmp.so.1 => not found
	libssh.so.4 => not found
	libpsl.so.5 => not found
	libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3
	libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3
	libgssapi_krb5.so.2 => /lib/aarch64-linux-gnu/libgssapi_krb5.so.2
	libldap.so.2 => not found
	liblber.so.2 => not found
	libzstd.so.1 => /lib/aarch64-linux-gnu/libzstd.so.1
	libbrotlidec.so.1 => not found
	libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1

Sadly, as line 7 shows, it still can’t find libnghttp2.so.

It turns out that rpath works only for immediate dependencies. To solve this problem, liburl and all other shared libraries must also be compiled with rpath=$ORIGIN — which means we can’t simply copy those libraries from OS packages⁵. In th meantime, only deirect dependencies could be bundled with an extension.

Project Status

The vision of accessible, easy-install extensions everywhere remains intact. I’m close to completing a first release of the PGXN v2 build SDK with support for meta spec v1 and v2, PGXS, and pgrx extensions. I expect the first deliverable to be a command-line client to complement and eventuallly replace the original CLI. It will be put to work building all the extensions currently distributed on PGXN, which will surface new issues and patterns that inform the development and completion of the v2 meta spec.

In the future, I’d also like to:

Finish working out Trunk format and dependency patterns
Develop and submit the prroposed extension_search_path patch
Submit ImageVolume feedback to Kubernetes to allow runtime mounting
Start building and distributing OCI Trunk packages
Make the pattern available for distributed registries, so anyone can build their own Trunk releases!
Hack fully-dynamic extension loading into CloudNativePG

Let’s Talk

I recognize the ambition here, but feel equal to it. Perhaps not every bit will work out, but I firmly believe in setting a clear vision and executing toward it while pragmatically revisiting and revising it as experience warrants.

If you’d like to contribute to the project or employ me to continue working on it, let’s talk! Hit me up via one of the services listed on the about page.

The feature does not yet support pre-loading shared libraries. Presumably a flag will be introduced to add the extension to shared_preload_libraries. ↩︎
Though we should certainly request the ability to add new ImageVolume mounts without a restart. We can’t be the only ones thinking about kind of feature, right? ↩︎
In general, one should avoid LD_LIBRARY_PATH for variety of reasons, not least of which its bluntness. For various security reasons, macOS ignores it unless sip is disabled, and SELinux prevents its propagation to new processes. ↩︎
Although not Windows, alas. ↩︎
Unless packagers could be pursuaded to build all libraries with rpath=$ORIGIN, which seems like a tall order. ↩︎

Auto-Release PostgreSQL Extensions on PGXN

2025-05-20T15:49:30Z

I last wrote about auto-releasing PostgreSQL extensions on PGXN back in 2020, but I thought it worthwhile, following my Postgres Extensions Day talk last week, to return again to the basics. With the goal to get as many extensions distributed on PGXN as possible, this post provides step-by-step instructions to help the author of any extension or Postgres utility to quickly and easily publish every release.

TL;DR

Create a PGXN Manager account
Add a META.json file to your project
Add a pgxn-tools powered CI/CD pipeline to publish on tag push
Fully-document your extensions

Release your extensions on PGXN

PGXN aims to become the defacto source for all open-source PostgreSQL extensions and tools, in order to help users quickly find and learn how to use extensions to meet their needs. Currently, PGXN distributes source releases for around 400 extensions (stats on the about page), a fraction of the ca. 1200 known extensions. Anyone looking for an extension might exist to solve some problem must rely on search engines to find potential solutions between PGXN, GitHub, GitLab, blogs, social media posts, and more. Without a single trusted source for extensions, and with the proliferation of AI Slop in search engine results, finding extensions aside from a few well-known solutions proves a challenge.

By publishing releases and full documentation — all fully indexed by its search index — PGXN aims to be that trusted source. Extension authors provide all the documentation, which PGXN formats for legibility and linking. See, for example, the pgvector docs.

If you want to make it easier for users to find your extensions, to read your documentation — not to mention provide sources for binary packaging systems — publish every release on PGXN.

Here’s how.

Create an Account

Step one: create a PGXN Manager account. The Email, Nickname, and Why fields are required. The form asks “why” as a simple filter for bad actors. Write a sentence describing what you’d like to release — ideally with a link to the source repository — and submit. We’ll get the account approved forthwith, which will send a confirmation email to your address. Follow the link in the email and you’ll be good to go.

Anatomy of a Distribution

A PostgreSQL extension source tree generally looks something like this (taken from the pair repository):

pair
├── Changes
├── doc
│   └── pair.md
├── Makefile
├── META.json
├── pair.control
├── README.md
├── sql
│   ├── pair--unpackaged--0.1.2.sql
│   └── pair.sql
└── test
    ├── expected
    │   └── base.out
    └── sql
        └── base.sql

Extension authors will recognize the standard PGXS (or pgrx) source distribution files; only META.json file needs explaining. The META.json file is, frankly, the only file that PGXN requires in a release. It contains the metadata to describe the release, following the PGXN Meta Spec. This example contains only the required fields:

{
  "name": "pair",
  "version": "0.1.0",
  "abstract": "A key/value pair data type",
  "maintainer": "David E. Wheeler ",
  "license": "postgresql",
  "provides": {
    "pair": {
      "file": "sql/pair.sql",
      "version": "0.1.0"
    }
  },
  "meta-spec": {
    "version": "1.0.0"
  }
}

Presumably these fields contain no surprises, but a couple of details:

It starts with the name of the distribution, pair, and the release version, 0.1.0.
The abstract provides a brief description of the extension, while the maintainer contains contact information.
The license stipulates the distribution license, of course, usually one of a few known, but may be customized.
The provides object lists the extensions or tools provided, each named by an object key that points to details about the extension, including main file, version, and potentially an abstract and documentation file.
The meta-spec object identifies the meta spec version used for the META.json itself.

Release It!

This file with these fields is all you need to make a release. Assuming Git, package up the extension source files like so (replacing your extension name and version as appropriate).

git archive --format zip --prefix=pair-0.1.0 -o pair-0.1.0.zip HEAD

Then navigate to the release page, authenticate, and upload the resulting .zip file.

And that’s it! Your release will appear on pgxn.org and on Mastodon within five minutes.

Let’s Automate it!

All those steps would be a pain in the ass to follow for every release. Let’s automate it using pgxn-tools! This OCI image contains the tools necessary to package and upload an extension release to PGXN. Ideally, use a CI/CD pipeline like a GitHub Workflow to publish a release on every version tag.

Set up Secrets

pgxn-tools uses your PGXN credentials to publish releases. To keep them safe, use the secrets feature of your preferred CI/CD tool. This figure shows the “Secrets and variables” configuration for a GitHub repository, with two repository secrets: PGXN_USERNAME and PGXN_PASSWORD:

Create a Pipeline

Use those secrets and pgxn-tools in CI/CD pipeline. Here, for example, is a minimal GitHub workflow to publish a release for every SemVer tag:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


on:
  push:
    tags: ['v[0-9]+.[0-9]+.[0-9]+']
jobs:
  release:
    name: Release on PGXN
    runs-on: ubuntu-latest
    container: pgxn/pgxn-tools
    env:
      PGXN_USERNAME: ${{ secrets.PGXN_USERNAME }}
      PGXN_PASSWORD: ${{ secrets.PGXN_PASSWORD }}
    steps:
    - name: Check out the repo
      uses: actions/checkout@v4
    - name: Bundle the Release
      run: pgxn-bundle
    - name: Release on PGXN
      run: pgxn-release

Details:

Line 3 configures the workflow to run on a SemVer tag push, typically used to denote a release.
Line 8 configures the workflow job to run inside a pgxn-tools container.
Lines 10-11 set environment variables with the credentials from the secrets.
Line 16 bundles the release using either git archive or zip.
Line 18 publishes the release on PGXN.

Now publishing a new release is as simple as pushing a SemVer tag, like so:

git tag v0.1.0 -sm 'Tag v0.1.0'
git push --follow-tags

That’s it! The workflow will automatically publish the extension for every release, ensuring the latest and greatest always make it to PGXN where users and packagers will find them.

The pgxn-tools image also provides tools to easily test a PGXS or pgrx extension on supported PostgreSQL versions (going back as far as 8.2), also super useful in a CI/CD pipeline. See Test Postgres Extensions With GitHub Actions for instructions. Depending on your CI/CD tool of choice, you might take additional steps, such as publishing a release on GitHub, as previously described.

Optimizing for PGXN

But let’s dig deeper into how to optimize extensions for maximum discoverability and user visibility on PGXN.

Add More Metadata

The META.json file supports many more fields that PGXN indexes and references. These improve the chances users will find what they’re looking for. This detailed example demonstrates how a PostGIS META.json file might start to provide additional metadata:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61


{
   "name": "postgis",
   "abstract": "Geographic Information Systems Extensions to PostgreSQL",
   "description": "This distribution contains a module which implements GIS simple features, ties the features to R-tree indexing, and provides many spatial functions for accessing and analyzing geographic data.",
   "version": "3.5.0",
   "maintainer": [
      "Paul Ramsey ",
      "Sandro Santilli "
   ],
   "license": [ "gpl_2", "gpl_3" ],
   "provides": {
      "postgis": {
         "abstract": "PostGIS geography spatial types and functions",
         "file": "extensions/postgis/postgis.control",
         "docfile": "extensions/postgis/doc/postgis.md",
         "version": "3.5.0"
      },
      "address_standardizer": {
         "abstract": "Used to parse an address into constituent elements. Generally used to support geocoding address normalization step.",
         "file": "extensions/address_standardizer/address_standardizer.control",
         "docfile": "extensions/address_standardizer/README.address_standardizer",
         "version": "3.5.0"
      }
   },
   "prereqs": {
      "runtime": {
         "requires": {
            "PostgreSQL": "12.0.0",
            "plpgsql": 0
         }
      },
      "test": {
         "recommends": {
            "pgTAP": 0
         }
      }
   },
   "resources": {
      "bugtracker": {
         "web": "https://trac.osgeo.org/postgis/"
      },
      "repository": {
         "url": "https://git.osgeo.org/gitea/postgis/postgis.git",
         "web": "https://git.osgeo.org/gitea/postgis/postgis",
         "type": "git"
      }
   },
   "generated_by": "David E. Wheeler",
   "meta-spec": {
      "version": "1.0.0",
      "url": "https://pgxn.org/meta/spec.txt"
   },
   "tags": [
      "gis",
      "spatial",
      "geometry",
      "raster",
      "geography",
      "location"
   ]
}

Line 4 contains a longer description of the distribution.
Lines 6-9 show how to list multiple maintainers as an array.
Line 10 demonstrates support for an array of licenses.
Lines 11-24 list multiple extensions included in the distribution, with abstracts and documentation files for each.
Lines 25-37 identify dependencies for various phases of the distribution lifecycle, including configure, build, test, runtime, and develop. Each contains an object identifying PostgreSQL or extension dependencies.
Lines 38-47 lists resources for the distribution, including issue tracking and source code repository.
Lines 53-60 contains an array of tags, an arbitrary list of keywords for a distribution used both in the search index and the PGXN tag cloud.

Admittedly the PGXN Meta Spec provides a great deal of information. Perhaps the simplest way to manage it is to copy an existing META.json from another project (or above) and edit it. In general, only the version fields require updating for each release.

Write Killer Docs

The most successful extensions provide ample descriptive and reference documentation, as well as examples. Most extensions feature a README, of course, which contains basic information, build and install instructions, and contact info. But as the pair tree, illustrates, PGXN also supports extension-specific documentation in a variety of formats, including:

Some examples:

jsonschema (Markdown)
semver (MultiMarkdown)

PGXN will also index and format additional documentation files in any of the above formats. See, for example, all the files formatted for orafce.

Exclude Files from Release

Use gitattributes to exclude files from the release. For example, distributions don’t generally include .gitignore or the contents of the .github directory. Exclude them from the archive created by git archive by assigning export-ignore to each path to exclude in the .gitattributes file, like so:

.gitignore export-ignore
.gitattributes export-ignore
.github export-ignore

What’s It All For?

PGXN aims to be the trusted system of record for open-source PostgreSQL extensions. Of course that requires that it contain all (or nearly all) of said extensions. Hence this post.

Please help make it so by adding your extensions, both to help users find the extensions they need, and to improve the discoverability of your extensions. Over time, we aim to feed downstream extension distribution systems, such as Yum, APT, CloudNativePG, OCI, and more.

Let’s make extensions available everywhere to everyone.

Mini Summit 5 Transcript: Improving the PostgreSQL Extensions Experience in Kubernetes with CloudNativePG

2025-05-28T22:34:21Z

The final PostgresSQL Extension Mini-Summit took place on May 7. Gabriele Bartolini gave an overview of PostgreSQL extension management in CloudNativePG (CNPG). This talk brings together the topics of several previous Mini-Summits — notably Peter Eisentraut on implementing an extension search path — to look at the limitations of extension support in CloudNativePG and the possibilities enabled by the extension search path feature and the Kubernetes 1.33 ImageVolume feature. Check it out:

Or read on for the full transcript with thanks to Floor Drees for putting it together.

Introduction

Floor Drees.

On May 7 we hosted the last of five (5) virtual Mini-Summits that lead up to the big one at the Postgres Development Conference (PGConf.Dev), taking place next week, in Montreal, Canada. Gabriele Bartolini, CloudNativePG maintainer, PostgreSQL Contributor, and VP Cloud Native at EDB, joined to talk about improving the Postgres extensions experience in Kubernetes with CloudNativePG.

The organizers:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

The stream and the closed captions available for the recording are supported by PGConf.dev and their gold level sponsors, Google, AWS, Huawei, Microsoft, and EDB.

Improving the Postgres extensions experience in Kubernetes with CloudNativePG

Gabriele Bartolini.

Hi everyone. Thanks for this opportunity, and thank you Floor and David for inviting me today.

I normally start every presentation with a question, and this is actually the question that has been hitting me and the other maintainers of CloudNativePG — and some are in this call — from the first day. We know that extensions are important in Kubernetes, in Postgres, and we’ve always been asking how can we deploy extensions, without breaking the immutability of the container.

So today I will be telling basically our story, and hopefully providing good insights in the future about how with CloudNativePG we are trying to improve the experience of Postgres extensions when running databases, including issues.

I’ve been using Postgres for 25 years. I’m one of the co-founders of 2ndQuadrant, which was bought by a EDB in 2020. And because of my contributions, I’ve been recognized as a Postgres contributor and I’m really grateful for that. And I’m also “Data on Kubernetes ambassador”; my role is to promote the usage of stateful workloads in Kubernetes. I’m also DevOps evangelist. I always say this: DevOps is the reason why I encountered Kubernetes, and it will also be the reason why I move away one day from Kubernetes. It’s about culture and I’ll explain this later.

In the past I’ve been working with Barman; I’m one of the creators of Barman. And since 2022, I’m one of the maintainers of CloudNativePG. I want to thank my company, EDB, for being the major contributor in Postgres history in terms of source code. And right now we are also the creators of CloudNativePG. And as we’ll see, the company donated the IP to the CNCF. So it’s something that is quite rare, and I’m really grateful for that.

What I plan to cover tonight is first, set the context and talk about immutable application containers, which have been kind of a dogma for us from day one. Then, how we are handling right now extensions in Kubernetes with CNPG. This is quite similar to the way other operators deal with it. Then the future and key takeaways.

First, we’re talking about Kubernetes. If you’re not familiar, it’s an orchestration system for containers. It’s not just an executor of containers, but it’s a complex system that also manages infrastructure. When it manages infrastructure, it also manages cloud native applications that are also called workloads. When we’re thinking about Postgres in Kubernetes, the database is a workload like the others. That, I think, is the most important mind shift among Postres users that I have faced myself, that I’ve always treated Postgres differently from the rest. Here in Kubernetes is it’s just another workload.

Then of course, it’s not like any other workload, and that’s where operators come into play, and I think the work that we are doing even tonight is in the direction to improve how databases is run in Kubernetes in general, and for everyone.

It was open sourced in 2014, and, it’s owned by the CNCF, and it’s actually the first project that graduated, and graduated is the most advanced stage in the graduation process of the CNCF, which starts with sandbox, then incubation and then graduation.

CloudNativePG is an operator for Postgres. It’s production-ready — what we say is level five. Level five is kind of an utopic, and unbounded level, the highest one as defined by the operator development framework. It’s used by all these players including Tembo, IBM Cloud Paks, Google Cloud, Azure, Akamai, and so on. CNPG is a CNCF project since January. It’s distributed under Apache License 2.0 and the IP — the Intellectual Property — is owned by the community and protected by the CNCF. It therefore is a vendor neutral and openly governed project. This is kind of a guarantee that it will always be free. This is also, in my opinion, a differentiation between CloudNativePG and the rest.

The project was originally created by EDB, but specifically at that time, by 2ndQuadrant. And, as I always like to recall, it was Simon Riggs that put me in charge of the initiative. I’ll always be grateful to Simon, not only for that, but for everything he has done for me and the team.

CNPG can be installed in several ways. As you can see, it’s very popular in terms of stars. There’s more than 4,000 commits. And what’s impressive is the number of downloads in three years, which is 78 million, which means that it’s used the way we wanted it to be used: with CICD pipelines.

This is the CNCF landscape; these are the CNCF projects. As you can see, there are only five projects in the CNCF in the database area, and CloudNativePG is the only one for Postgres. Our aim for 2025 and 2026 is to become incubating. If you’re using CNPG and you want to help with the process, get in touch with me and Floor.

I think to understand again, what, why we’ve done all this process, that led to the patch that, you’ve seen in Postgres 18, it’s important to understand what cloud native has meant to us since we started in 2019. We’ve got our own definition, but I think it still applies. For us it’s three things, Cloud native. It’s people that work following DevOps culture. For example, there are some capabilities that come from DevOps that apply to the cloud native world. I selected some of them like in user infrastructure, infrastructure abstraction, version control. These three form the infrastructure-as-code principle, together with the declarative configuration.

A shift left on security. You’ll see with CloudNativePG, we rarely mention security because it’s pretty much everywhere. It’s part of the process. Then continuous delivery.

The second item is immutable application containers, which kind of led the immutable way of thinking about extensions. And then the third one is that these application containers must be orchestrated via an infrastructure-as-code by an orchestrator, and the standard right now is Kubernetes.

For us it’s these three things, and without any of them, you cannot achieve cloud native.

So what are these immutable application containers? To explain immutability I’d like to talk about immutable infrastructure, which is probably what the majority of people that have historically worked with Postgres are used to. I’m primarily referring to traditional environments like VMs and bare metal where the main ways we deploy Postgres is through packages, maybe even managed by configuration managers, but still, packages are the main artifacts. The infrastructure is seen as a long-term kind of project. Changes happen over time and are incremental updates, updates on an existing infrastructure. So if you want to know the history of the infrastructure over time, you need to check all the changes that have applied. In case of failure of a system, systems are healed. So that’s the pets concept that comes from DevOps.

On the other hand, immutable infrastructure relies on OCI container images. OCI is a standard, the Open Container Initiative and it’s part of the Linux Foundation as well. Immutable infrastructure is founded on continuous delivery, which is the foundation of GitOps practices. In an immutable infrastructure, releasing a new version of an application is not updating the system’s application, it is building a new image and publishing it on a public registry and then deploying it. Changes in the system happen in an atomic way: the new version of a container is pulled from the registry and the existing image is almost instantaneously replaced by the new one. This is true for stateless applications and we’ll see, in the case of stateful applications like Postgres, is not that instantaneous because we need to perform a switchover or restart — in any case, generate a downtime.

When it comes to Kubernetes, the choice was kind of obvious to go towards that immutable infrastructure. So no incremental updates, and in the case of stateful workloads where you cannot change the content of the container, you can use data volumes or persistent volumes. These containers are not changed. If you want to change even a single file or a binary in a container image, you need to create a new one. This is very important for security and change management policies in general.

But what I really like about this way of managing our infrastructure is that, at any time, Kubernetes knows exactly what software is running in your infrastructure. All of this is versioned in an SCM, like Git or whatever. This is something that in the mutable world is less easy to obtain. Again, for security, this is the foundational thing because this is how you can control CVEs, the vulnerabilities in your system. This is a very basic representation of how you build, contain — let’s say the lifecycle of a container image. You create a Dockerfile, you put it in Git, for example, then there’s an action or a pipeline that creates the container image, maybe even run some tests and then pushes it to the container registry.

I walked you through the concepts of mutable and immutable containers, what are, these immutable application containers? If you go back and read what we were rising before CloudNativePG was famous or was even used, we were always putting in immutable application containers as one of the principles we could not lose.

For an immutable application container, it means that there’s only a single application running; that’s why it’s called “application”. If you have been using Docker, you are more familiar with system containers: you run a Debian system, you just connect and then you start treating it like a VM. Application containers are not like that. And then they are immutable — read-only — so you cannot even make any change or perform updates of packages. But in CloudNativePG, because we are managing databases, we need to put the database files in separate persistent volumes. Persistent volumes are standard resources provided by Kubernetes. This is where we put PGDATA and, if you want, a separate volume for WAL files with different storage specifications and even an optional number of table spaces.

CloudNativePG orchestrates what we call “operand images”. These are very important to understand. They contain the Postgres binaries and they’re orchestrated via what we call the “instance manager”. The instance manager is just the process that runs and controlled Postgres; I’ss the PID 1 — or the entry point — of the container.

There’s no other, like SSHD or other, other applications work. There’s just the instance manager that then controls everything else. And this is the project of the operating images. This is one open source project, and every week we rebuild the Postgres containers. We recently made some changes to the flavors of these images and I’ll talk about it shortly.

We mentioned the database, we mentioned the binaries, but what about extensions? This is the problem. Postgres extensions in Kubernetes with CloudNativePG is the next section, and it’s kind of a drama. I’m not hiding this. The way we are managing extensions in Kubernetes right now, in my opinion, is not enough. It works, but it’s got several limitations — mostly limitations in terms of usage.

For example, we cannot place them in the data files or in persistent volumes because these volumes are not read-only in any way. In any case, they cannot be strictly immutable. So we discarded this option to have persistent volume where you could kind of deploy extensions and maybe you can even download on the fly or use the package manager to download them or these kind of operations. We discarded this from the start and we embraced the operand image solution. Essentially what we did was placing these extensions in the same operand image that contains the Postgres binaries. This is a typical approach of also the other operators. If you think about also Zalando we call it “the Spilo way”. Spilo contained all the software that would run with the Zalando operator.

Our approach was a bit different, in that we wanted lighter images, so we created a few flavors of images, and also selected some extensions that we placed in the images. But in general, we recommended to build custom images. We provided instructions and we’ve also provided the requirements to build container images. But as you can see, the complexity of the operational layer is quite high, it’s not reasonable to ask any user or any customer to build their own images.

This is how they look now, although this is changing as I was saying:

You’ve got a base image, for example, the Debian base image. You deploy the Postgres binaries. Then — even right now though it’s changing — CloudNativePG requires Barman Cloud to be installed. And then we install the extensions that we think are needed. For example, I think we distribute pgAudit, if I recall correctly, pgvector and pg_failover_slots. Every layer you add, of course, the image is heavier and we still rely on packages for most extensions.

The problem is, you’ve got a cluster that is already running and you want, for example, to test an extension that’s just come out, or you want to deploy it in production. If that extension is not part of the images that we build, you have to build your own image. Because of the possible combinations of extensions that exist, it’s impossible to build all of these combinations. You could build, for example, a system that allows you to select what extensions you want and then build the image, but in our way of thinking, this was not the right approach. And then you’ve got system dependencies and, if an extension brings a vulnerability that affects the whole image and requires more updates — not just of the cluster, but also of the builds of the image.

We wanted to do something else, but we immediately faced some limitations of the technologies. One was on Postgres, the other one was on Kubernetes. In Postgres, extensions need to be placed in a single folder. It’s not possible to define multiple locations, but thanks to the work that Peter and this team have done, now we’ve got extension_control_path in version 18.

Kubernetes could not allow until, 10 days ago, to mount OCI artifacts as read-only volumes. There’s a new feature that is now part of Kubernetes 1.33 that allows us to do it.

This is the patch that I was talking about, by Peter Eisentraut. I’m really happy that CloudNativePG is mentioned as one of the use cases. And there’s also mentioned for the work that, me, David, and Marco and, primarily Marco and Niccolò from CloudNativePG have done.

This is the patch that introduced VolumeSource in Kubernetes 1.33.

The idea is that with Postgres 18 now we can set in the configuration where we can look up for extensions in the file system. And then, if there are libraries, we can also use the existing dynamic_library_path GUC.

So, you remember, this is where we come from [image above]; the good thing is we have the opportunity to build Postgres images that are minimal, that only contain Postgres.

Instead of recreating them every week — because it’s very likely that something has some dependency, has a CVE, and so recreate them for everyone, forcing everyone to update their Postgres systems — we can now release them maybe once a month, and pretty much follow the Postgres cadence patch releases, and maybe if there are CVEs it’s released more frequently.

The other good thing is that now we are working to remove the dependency on Barman Cloud for CloudNativePG. CloudNativePG has a new plugin interface and with 1.26 with — which is expected in the next weeks — we are suggesting people start moving new workloads to the Barman Cloud plugin solution. What happens is that Barman Cloud will be in that sidecar image. So it will be distributed separately, and so its lifecycle is independent from the rest. But the biggest advantage is that any extension in Postgres can be distributed — right now we’ve got packages — The idea is that they are distributed also as images.

If we start thinking about this approach, if I write an extension for Postgres, until now I’ve been building only packages for Debian or for RPM systems. If I start thinking about also building container images, they could be immediately used by the new way of CloudNativePG to manage extensions. That’s my ultimate goal, let’s put it that way.

This is how things will change at run time without breaking immutability.

There will be no more need to think about all the possible combinations of extensions. There will be the Postgres pod that runs, for example, a primary or standby, that will have the container for Postgres. If you’re using Barman Cloud, the sidecar container managed by the plugin with Barman Cloud. And then, for every extension you have, you will have a different image volume that is read-only, very light, only containing the files distributed in the container image of the extension, and that’s all.

Once you’ve got these, we can then coordinate the settings for external extension_control_path and dynamic_library_path. What we did was, starting a fail fast pilot project within EDB to test the work that Peter was doing on the extension_control_path. For that we used the Postgres Trunk Containers project, which is a very interesting project that we have at CloudNativePG. Every day it rebuilds the latest snapshot of the master branch of Postgres so that we are able to catch, at an early stage, problems with the new version of Postgres in CloudNativePG. But there’s also an action that builds container images for a specific, for example, Commitfest patch. So we use that.

Niccolò wrote a pilot patch, an exploratory patch, for the operator to define the extensions stanza inside the cluster resource. He also built some bare container images for a few extensions. We make sure to include a very simple one and the most complex one, which is PostGIS. This is the patch that — it’s still a draft — and the idea is to have it in the next version, 1.27 for CloudNativePG. This is how it works:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-with-extensions
spec:
  instances: 1
  imageName: ghcr.io/cloudnative-pg/postgresql-trunk:18-devel
  postgresql:
    extensions:
      - name: pgvector
        image:
          reference: ghcr.io/cloudnative-pg/pgvector-18-testing:latest
  storage:
    storageClass: standard
    size: 1Gi

We have the extensions section in the cluster definition. We name the extension. Theoretically we could also define the version and we point to the image. What’s missing in this pilot patch is support for image catalogs, but that’s something else that we can worry about later.

What happens under the hood is that when you update, or when you add a new extension in the cluster definition, a rolling update is initiated. So there’s this short downtime, but the container image is loaded in the replicas first, and then in the primary. n image volume is mounted for each extension in, let’s say, /extensions/$name_of_extension folder and CNPG updates, these two parameters. It’s quite clean, quite neat. It works, but most of the work needs to happen here. So that’s been my call, I mean to call container images as a first class artifacts. If these changes, we have a new way to distribute images.

Just to approach the conclusion, if you want to know more about the whole story, I wrote this blog article that recaps everything, and the key takeaway for me — and then we go more on the patch if you want to, and also address the questions. But what is important for me? Being in the Postgres community for a long time, I think this is a good way, a good moment for us to challenge the status quo of the extension distribution ecosystem.

I think we have an opportunity now to define a standard, which, I just want to be clear, I’m focusing myself primarily on CNPG, but this is in general, even for other operators. I’m sure that this will benefit everyone and overall it will reduce the waste that we collectively create when distributing these extensions in Kubernetes. If this becomes a standard way to distribute extensions, the benefits will be much better operational work for everyone, primarily also easier testing and validation of extensions. I mean, right now, if you see an extension, ideally that extension — and it’s very easy to build — if you’re in GitHub, to build the container images. GitHub, for example, already provides the whole infrastructure for you to easily build container images.

So if we find a standard way to define a GitHub action to build Postgres extensions, I think, if you’re a developer of an extension, you can just use it and then you find a registry in your project directly that continuously publishes or periodically publishes this extension. Any user can just reference that image URL and then without having to build images, they’re just one rolling update away from testing a patch, testing also the upgrade paths.

I think there are some unknown unknowns that kind of scare me, in general, about upgrades, upgrades of extensions. This is, in my opinion, one of the biggest issues. It’s not that they’re not solved, but they require more attention and more testing if you’re using them in an immutable world. All of these will, in my opinion, will be much, much better with the approach we’ve proposed. Images will be lighter, and the lighter image is also safer and more secure, so less prone to have CVEs,lLess prone to require frequent updates, and also they reduce the usage of bandwidth, for an organization in general. What I was saying before, any extension project can be fully independent, have their own way to build images and publish them.

One last point. I keep hearing many signs, that all of the stuff that we are proposing right now seem like a kind of a limitation of Kubernetes. The way I see it, in my view, that it’s not actually a limitation, it’s that these problems have never been addressed before. The biggest mistake we can do is focus on the specific problem of managing extensions without analyzing the benefits that the entire stack brings to an organization. Kubernetes brings a lot of benefits in terms of security, velocity, change management and, operations that any organization must consider right now. Any Postgres DBA, any Postgres user, my advice is, if you haven’t done it yet, start taking Kubernetes, seriously.

Discussion

Floor: I do think that David, you wanted to talk maybe a little bit about the mutable volume pattern?

David: Well, if people are interested, in your early slide where you were looking at alternatives, one you were thinking of was putting extensions on a mutable volume and you decided not to do that. But at Tembo we did do that and I did a bunch of work trying to improve it and try to minimize image size and all that in the last couple months. Tembo Cloud is shutting down now, so I had to stop before I finished it, but I made quite a bit of progress. I’m happy to kind of talk through the ideas there. But I think that this approach is a better long term solution, fundamentally.

Gabriele: I would like if Marco and Niccolò, if you want to talk about the actual work you’ve done. Meanwhile, Peter asks, “why does an installation of an extension require a small downtime?” The reason is that at the moment, the image volume patch, if you add a new image volume, it requires the pod to restart. Nico or Marco, Jonathan, if you want to correct me on that.

Nico or Marco or Jonathan: It provides a rolling update of the cluster right now.

Gabriele: So that’s the reason. That’s the only drawback, but the benefits in my opinion, are…

David: My understanding is that, to add a new extension, it’s mounted it in a different place. And because every single extension is its own mount, you have to add it to both those GUCs. And at least one of them requires a restart.

Gabriele: But then for example, we’ve had this conversation at EDB for example, we’re planning to have flavors of predefined extensions. For example, you can choose a flavor and we distribute those extensions. For example, I dunno, for AI we place some AI kind of extensions in the same image, so it would be different.

But otherwise I’m considering the most extreme case of one extension, one container image, which in my opinion, for the open source world is the way that hopefully will happen. Because this way, think about that – I haven’t mentioned this — if I write an extension, I can then build the image and then run automated tests using Kubernetes to assess my extension on GitHub. If those tests fail, my commit will never be merged on main. This is trunk development, continuous delivery. This is, in my opinion, a far better way of delivering and developing software. This is, again, the reason why we ended up in Kubernetes. It’s not because it’s a technology we like, it’s a toy or so, it’s because it solves bigger problems than database problems.

Even when we talk about databases, there’s still work that needs to be done, needs to be improved. I’m really happy that we have more people that know Postgres nowadays that are joining CloudNativePG, and are elevating the discussions more and more on the database level. Because before it was primarily on Kubernetes level, but now we see people that know Postgres better than me get in CloudNativePG and propose new ideas, which is great. Which is the way it needs to be, in my opinion.

But I remember, Tembo approached us because we actually talked a lot with them. Jonathan, Marco, I’m sure that you recall, when they were evaluating different operators and they chose CloudNativePG. I remember we had these discussions where they asked us to break immutability and we said, “no way”. That’s why I think Tembo had to do the solution you described, because we didn’t want to do it upstream.

I think, to be honest, and to be fair, if image volumes were not added, we would’ve probably gone down that path, because this way of managing extensions, as I was saying, is not scalable, the current one. Because we want to always improve, I think that the approach we need to be critical on what we do. So, I don’t know, Niccolò, Marco, I would like you to, if you want, explain briefly.

[A bit of chatter, opened this Dockerfile.]

FROM ghcr.io/cloudnative-pg/postgresql-trunk:18-devel AS builder

USER 0

COPY . /tmp/pgvector

RUN set -eux; \
	mkdir -p /opt/extension && \
	apt-get update && \
	apt-get install -y --no-install-recommends build-essential clang-16 llvm-16-dev && \
	cd /tmp/pgvector && \
	make clean && \
	make OPTFLAGS="" && \
	make install datadir=/opt/extension/share/ pkglibdir=/opt/extension/lib/

FROM scratch

COPY --from=builder /opt/extension/lib/* /lib/
COPY --from=builder /opt/extension/share/extension/* /share/

Niccolò: I forked, for example, pgvector, That’s what we can do basically for every simple extensions that we can just build. This is a bit more complicated because we have to build from a trunk version of Postgres 18. So we have to compile pgvector from source, and then in a scratch layer we just archive the libraries and every other content that was previously built. But ideally whenever PG 18 comes out as a stable version of Postgres, we just need to apt install pgvector and grab the files from the path. Where it gets a bit more tricky is in the case of PostGIS, or TimescaleDB, or any extension whose library requires third party libraries. For example, PostGIS has a strong requirement on the geometric libraries, so you need to import them as well inside the mount volume. I can link you an example of the PostGIS one.

Gabriele: I think it’s important, we’ve got, I think Peter here, David as well, I mean, for example, if we could get standard ways in Postgres to generate Dockerfiles for extensions, that could be great. And as I said, these extensions can be used by any operator, not only CNPG.

David: That’s my POC does. It’s a patch against the PGXS that would build a trunk image.

Gabriele: This is the work that Niccolò had to do to make PostGIS work in the pilot project: he had to copy everything.

Niccolò: I think we can make it a little bit smoother and dynamically figure out everything from the policies library, so we don’t have to code everything like this, but this is just a proof of concept that it can work.

David: So you installed all those shared libraries that were from packages.

Niccolò: Yeah, they’re being copied in the same MountVolume where the actual extensions are copied as well. And then the pilot patch is able to set up the library path inside the pod so that it makes the libraries available to the system because of course, these libraries are only part of the MountVolume. They’re not injected inside the system libraries of the pod, so we have to set up the library path to make them available to Postgres. That’s how we’re able to use them.

David: So they end up in PKGLIBDIR but they still work.

Niccolò: Yeah.

Gabriele: I mean, there’s better ideas, better ways. As Niccolò also said, it was a concept.

David: Probably a lot of these shared libraries could be shared with other extensions. So you might actually want other OCI images that just have some of the libraries that shared between.

Gabriele: Yeah, absolutely. So we could work on a special kind of, extensions or even metadatas so that we can place, you know…

So, yeah, that’s it.

Jonathan: I think it’s important to invite everyone to try and test this, especially the Postgres trunk containers, when they want to try something new stuff, new like this one, just because we always need people testing. When more people review and test, it’s amazing. Because every time we release something, probably we’ll miss something, some extension like PostGIS missing one of the libraries that wasn’t included in the path. Even if we can try to find a way to include it, it will not be there. So testing, please! Test all the time!

Gabriele: Well, we’ve got this action now, they’re failing. I mean, it’s a bit embarrassing. [Cross talk.] We already have patch to fix it.

But I mean, this is a great project as I mentioned before, because it allows us to test the current version of Postgres, but also if you want to build from a Commitfest or if you’ve got your own Postgres repository with sources, you can compile, you can get the images from using this project.

Floor: Gabriele, did you want to talk about SBOMs?

Gabriele: I forgot to mention Software Bill of Materials. They’re very important. It’s kind of now basic for any container image. There’s also the possibility to add them to these container images too. This is very important. Again, in a change manager for security and all of that — in general supply chain. And signatures too. But we’ve got signature for packages as well. There’s also a attestation of provenance.

Floor: Very good, thanks everyone!

CBOR Tag for JSON Number Strings

2025-05-18T19:32:06Z

For a side project, I’m converting JSON inputs to CBOR, or Concise Binary Object Representation, defined by RFC 8949, in order to store a more compact representation in the database. This go Go app uses encoding/json package’s UseNumber decoding option to preserve numbers as strings, rather tha float64s. Alas, CBOR has no support for such a feature, so such values cannot survive a round-trip to CBOR and back, as demonstrating by this example using the github.com/fxamacker/cbor package (playground)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


// Decode JSON number using json.Number.
input := bytes.NewReader([]byte(`{"temp": 98.6}`))
dec := json.NewDecoder(input)
dec.UseNumber()
var val map[string]any
if err := dec.Decode(&val); err != nil {
	log.Fatalf("Err: %v", err)
}

// Encode as CBOR.
data, err := cbor.Marshal(val)
if err != nil {
	log.Fatalf("Err: %v", err)
}

// Decode back into Go.
var newVal map[string]any
if err := cbor.Unmarshal(data, &newVal); err != nil {
	log.Fatalf("Err: %v", err)
}

// Encode as JSON.
output, err := json.Marshal(newVal)
if err != nil {
	log.Fatalf("Err: %v", err)
}

fmt.Printf("%s\n", output)

The output:

{"temp":"98.6"}

Note that the input on line 2 contains the number 98.6, but once the value has been transformed to CBOR and back it becomes the string "98.6".

I wanted to preserve JSON numbers treated as strings. Fortunately, CBOR uses numeric tags to identify data types, and includes a registry maintained by IANA. I proposed a new tag for JSON numbers as strings and, through a few iterations, the CBOR group graciously accepted the formal description of semantics and assigned tag 284 in the registry.

Now any system that handles JSON numbers as strings can use this tag to preserve the numeric representation in JSON output.

Here’s how to use the tag customization features of github.com/fxamacker/cbor to transparently round-trip json.Number values playground:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


// Create tag 284 for JSON Number as string.
tags := cbor.NewTagSet()
tags.Add(
    cbor.TagOptions{
        EncTag: cbor.EncTagRequired,
        DecTag: cbor.DecTagRequired,
    },
    reflect.TypeOf(json.Number("")),
    284,
)

// Create a custom CBOR encoder and decoder:
em, _ := cbor.EncOptions{}.EncModeWithTags(tags)
dm, _ := cbor.DecOptions{
    DefaultMapType: reflect.TypeOf(map[string]any(nil)),
}.DecModeWithTags(tags)

// Decode JSON number using json.Number.
input := bytes.NewReader([]byte(`{"temp": 98.6}`))
dec := json.NewDecoder(input)
dec.UseNumber()
var val map[string]any
if err := dec.Decode(&val); err != nil {
    log.Fatalf("Err: %v", err)
}

// Encode as CBOR.
data, err := em.Marshal(val)
if err != nil {
    log.Fatalf("Err: %v", err)
}

// Decode back into Go.
var newVal map[string]any
if err := dm.Unmarshal(data, &newVal); err != nil {
    log.Fatalf("Err: %v", err)
}

// Encode as JSON.
output, err := json.Marshal(newVal)
if err != nil {
    log.Fatalf("Err: %v", err)
}

fmt.Printf("%s\n", output)

Lines 1-16 contain the main difference from the previous example. They create a CBOR encoder (em) and decoder (dm) with tag 284 assigned to json.Number values. The code then uses them rather than the cbor package to Marshal and Unmarshal the values on lines 28 and 35. The result:

{"temp":98.6}

Et voilà! json.Number values are once again preserved.

I believe these custom CBOR encoder and decoder configurations bring full round-trip compatibility to any regular JSON value decoded by encoding/json. The other important config for that compatibility is the DefaultMapType decoding option on line 15, which ensures maps use string values for map keys rather the CBOR-default any values.

2025 GSOC: Mankirat Singh — ABI Compliance Reporting

2025-05-13T18:25:11Z

I’m pleased to welcome Mankirat Singh to the Postgres community as a 2025 Google Summer of Code contributor. Mankirat will be developing an ABI compliance checker and reporting system to help identify and prevent unintentional ABI changes in future minor Postgres releases. This follows on the heels of the addition of ABI and API guidance in Postgres 18, as well as the ABI-breaking Postgres 17.1 release. What timing!

Please follow Mankirat’s blog as he develops the project this summer, under the mentorship of myself and Pavlo Golub. It should also soon be on Planet PostgreSQL. We’ve also set up the #gsoc2025-abi-compliance-checker channel on the community Slack for ad-hoc discussion. Join us!

Mini Summit 5: Extension Management in CNPG

2025-05-05T21:51:22Z

The last Extension Ecosystem Mini-Summit is upon us. How did that happen?

Join us for a virtual conference session featuring Gabriele Bartolini, who will be discussing Extension Management in CNPG. I’m psyched for this one, as the PostgresSQL community has contributed quite a lot to improving extensions management in CloudNativePG in the past year, some of which we covered in previously. If you miss it, the video, slides, and transcript will appear here soon.

Though it may be a week or two to get the transcripts done, considering that PGConf.dev is next week, and featuring the Extension Ecosystem Summit on Tuesday, 13 May in Montreál, CA. Hope to see you there; be sure to say “hi!”

Mini Summit 4 Transcript: The User POV

2025-05-01T21:02:39Z

On April 23, we hosted the fourth of five (5) virtual Mini-Summits that lead up to the big one at the Postgres Development Conference (PGConf.dev), taking place May 13-16, in Montreál, Canada. Celeste Horgan, Developer Educator at Aiven, Sonia Valeja, PostgreSQL DBA at Percona, and Alexey Palazhchenko, CTO FerretDB, joined for a panel discussion moderated by Floor Drees.

Video

And now, the transcripts of “The User POV” panel, by Floor Drees

Introduction

My name is Floor, I’m one of the organizers of these Extension Ecosystem Mini-Summits. Other organizers are also here:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

The stream and the closed captions available for the recording are supported by PGConf.Dev and their gold level sponsors, Google, AWS, Huawei, Microsoft, and EDB.

Next, and last in this series, on May 7 we’re gonna have Gabriele Bartolini talk to us about Extension Management in CloudNativePG. Definitely make sure you head over to the Meetup page, if you haven’t already, and RSVP for that one!

The User POV

Floor: For the penultimate edition of this series, we’re inviting a couple of Postgres extension and tooling users to talk about how they pick and choose projects that they want to use, how they do their due diligence and, their experience with running extensions.

But I just wanted to set the context for the meeting today. We thought that being in the depth of it all, if you’re an extension developer, you kind of lose the perspective of what it’s like to use extensions and other auxiliary tooling. You lose that user’s point of view. But users, maybe they’re coming from other ecosystems are used to, maybe a different, probably smoother experience. I’m coming from the Rails and Ruby community, so RubyGems are my one stop shop for extending functionality.

That’s definitely a completely different experience from when I started using Postgres extensions. That’s not to say that those ecosystems and NPM and PIP and WordPress don’t have their own issues, ut we can certainly learn from some of the differences between the ecosystems. Ultimately, what we want to cover today is the experience of using extensions in 2025, and what are our users' wishes for the future?

Celeste: Hello my name is Celeste, I am on the developer relations team at Aiven. I only really started using Postgres as a part of my job here at Aiven, but have been a much longer contributor to similar-sized ecosystems. I was really heavily involved in the Kubernetes ecosystem for quite a while. Kubernetes is an extensible-by-design piece of software, but it’s many, many generations of software development later than some of the concepts that Postgres pioneered. Thank you for having me, Floor!

Sonia: Hello everybody! I started working with PostgreSQL in the year 2012, and since then it has been a quite a journey. Postgres has been my primary database, and along with learning PostgreSQL, I learned the other database alongside. I learned Oracle, I learned SQLServer, but only from the perspective — which is important — to migrate from X database to PostgresSQL, as in Oracle to PostgreSQL migration, SQLServer to PostgreSQL migration. I learned about the other databases and I’m fortunate to work as a PostgreSQL developer, PL/pgSQL Developer, PostgreSQL DBA, onsite coordinator, offsite coordinator, sometimes a trainer. So, in and out, it has been like I’m breathing PostgreSQL since then.

Alexey: Thanks for having me! I first worked with Postgres in 2005. Fast forward to today and I am doing FerretDB, which is the open source MongoDB replacement built on top of PostgreSQL and also on top of the DocumentDB extension recently open-sourced by Microsoft. We provide this extension to our users, but also we consume this extension as users of that extension. Somewhere in between, between 2005 and now, I also worked at Percona. At Percona I worked on monitoring software and worked with pg_stat_statements and pg_stat_monitor, which is made by Percona and I have pretty much a lot of experience with Postgres extensions.

Floor: And you’re cheating a little on this panel, seeing as you are not only a user but also a provider. I definitely have some questions for you!

And y’all talked a little about your sort of experience with extensibility of other software or technology, and comparing that to the Postgres experience. Can you all talk about what the main differences are that you have observed with other ecosystems?

Celeste: I think as somebody who’s a bit of a newer Postgres user and I guess comes from a different community, the biggest thing that weirded me out, when I started working with Postgres, is that there’s no way to install an extension except to install it against your live database.

If you compare that to something like Kubernetes, which again has a rather robust extensibility ecosystem, both on the networking side of things, but also other aspects of it, the inherent software architecture makes it so that you have to plan out what you’re going to do, and then you apply a plan. In theory you can’t apply a plan or add extensions to Kubernetes that won’t work or will somehow break the system. Again, in theory, in practice things are more interesting.

But with Postgres and with databases in general, you’re always working with the live dataset, or at some point you have to work with the live dataset. So there’s no real way to test.

Sonia: Most of the other databases — apart from PostgreSQL, which I have worked with — most of them are licensed. So Oracle and SQLServer. When it comes to PostgreSQL, it’s an open source, so you do your own thing: you do the installation, do the checkout everything, which is open source, you can see the code, and things like that. But when it comes to other databases, I since it’s licensed, it is managed by the specific vendor, so you do not have rights to do anything else. The things which will be common, like you do the POC in both the databases before you actually implement it in the production environment.

Alexey: Floor, you mentioned RubyGems, and I was thinking that actually there is something similar between PostgreSQL extensions and RubyGems in a sense that RubyGems quite often extend built-in Ruby classes, and Postgres extensions could do the same. There is no separation between public and private inside PostgreSQL, it’s all just C symbols, no special mark, don’t touch the CPI, we are going to change it at central detail. Nothing like that. They try not to break compatibility needlessly, but on the other hand, you have to check all versions of your extensions with all separate versions of PostgreSQL. In that sense it’s quite similar, unlike some other languages where’s there’s better separation between internal private, if not on the compiler level, at least on like documentation level or something like that.

Celeste: That’s not necessarily a criticism of Postgres. I think it’s just that’s those were the tools available to Postgres as a community when Postgres was being developed. There are some advantages to that too, because, for lack of a better word, the lack of checks and balances let some Postgres extensions do very, very interesting things that would maybe not be possible under a more restricted framework.

Floor: The main difference I see between those two is that I know to go to RubyGems as my place to get my plugins — or my gems, in that case. Whereas with Postgres, they can live pretty much anywhere, right? There’s different directories and there’s different places where you can get your stuff and maybe there’s something that is in a private repo somewhere because that’s what another team at your company is working on. It’s a bit of a mess, you know? It’s really difficult to navigate, where maybe other things are lot less difficult to navigate because there’s just the single place.

I wanna talk a little bit about when you’re looking for an extension to do a certain thing for you. What do you consider when you’re looking for an extension or when you’re comparing some of its tooling? I wrote down a couple of things that you might be looking at, or what I might be looking at: maybe it’s docs and tutorials, maybe it’s “has it seen a recent release?” Has it seen frequent releases? Is there only one company that is offering this extension? Or is it multiple companies supporting this extension? Is it a community-built tool? Is it already in use by other teams in your company? So it’s something that has been tested out with your system, with your stack, and you feel like it’s something that you can easily adopt.

So what are some of the things for you that you definitely look at when you’re looking to adopt new tooling?

Celeste: I think the main thing you wanna look for when you’re looking at really any open source project, whether it’s an extension or not, is both proof points within the project, but also social proof. Proof points within the project are things that you mentioned, like is there documentation? Does this seem to be actively maintained? Is the commit log in GitHub moving? How many open issues are there? Are those open issues being closed over time? Those are project health indicators. For example, if you look at the CHAOSS Project, Dawn Foster has done a ton of work around monitoring project health there.

But I think the other half of this — and this was actually something we worked on a lot at the Cloud Native Computing Foundation when I was there, and that work continues — is — and this makes a bit more sense in some cases than others — is social proof. So, are there other companies using it? Can you point to case studies? Can you point to case studies of something being in production? Can you point to people giving conference talks where they mention something being in use?

This becomes really important when you start thinking about things being enterprise-grade, an when you start thinking about the idea of enterprise-grade open source. Everybody who’s on this panel works for a company that does enterprise-grade open source database software, and you have to ask yourself what that means. A lot of what that means is that other enterprises are using it ,because that’s means that something comes to a certain level of reliability.

Sonia: I would like to add some things. What I look at is how difficult or how easy it is to install, configure, and upgrade the extension, and, whether it needs restart of the database service or not. Why do I look at the restart aspect? Because when I install it or configure or upgrade or whatever activity I perform with it, if it requires the restart, that means it is not configured online, so I need to involve other folks to do the database restart, as in an application is connecting to it. When I restart, it goes for a maintenance window for a very small time — whatever duration it goes offline, the database service. So whether it requires restart or not, that is also very important for me to understand.

Apart from the documentation, which should be of course easy to understand. That is one of the aspects while you install and configure. It should not be that difficult that I need to refer every time, everything, and do it, and then maybe, I might need to create another script to use it. It should not be the case. I look to those aspects, as well.

Apart from that, I also see how do I monitor the activities of this extension, like whether it is available in the logs — what that extension is doing. So it should not break my existing things basically. So how stable and how durable it is, and I should be able to monitor the activities, whatever that extension is doing.

From the durability perspective, even if I’m not able to monitor via logs, it should be durable enough to that it should not break anything else, which is up and running.

One more thing. I will definitely perform the POC, before putting it into the production, into some lower environment or in my test environment somewhere else.

Floor: How do you figure out though, how easy something is to sort of set up and configure? Are you looking for that information from a README or some documentation? Because I’ve definitely seen some very poorly documented stuff out there…

Sonia: Yeah, documentation is one aspect. Apart from that, when you do the POC, you will actually using you’ll be actually using that. So with that POC itself, you’ll be able to understand how easy it is to install, configure, and use it.

Alexey: For me as a user, I would say the most important thing is whatever extension is packaged and easy to install. And if it’s not packaged in the same way as PostgreSQL is packaged. For example, if I get PostgreSQL from my Ubuntu distribution, if extension is not in the same Ubuntu target, it might as well not exist for me because there is no way I’m going to compile it myself. It’s like hundreds of flags and that being C, and okay, I can make it 1% faster, but then it’ll be insecure and will bring PostgreSQL down, or worse. So there are a lot of problems like that.

If it’s not a package, then I would just probably just do something which is not as good, not as stable, but I will do it myself and will be able to support them using some third party extensions that is not packaged properly. And properly for me, is the high bar. So if it’s some third party network of extensions, that might be okay, I will take a look. But then of course, if it’s in the Ubuntu repository or Debian repository, that would be of course, much better.

Floor: I think that’s the build versus buy — or not necessarily buy if it’s open source. Not to say that open source is free. But that’s the discussion, right? When do you decide to spend the time to build something over adopting something? And so for you, that’s mainly down to packaging?

Alexey: For me that’s the most important one because for features we generally need to use in the current job and previous jobs, there are enough hooks on the PostgreSQL itself to make what we want to do ourselves. Like if sometimes we need to parse logs, sometimes we need to parse some low level counters, but that’s doable and we could do it in a different language and in the way we can maintain it ourselves. If you talk about PostgreSQL, I typically recommend C and if there’s some problem, we will have a bigger problem finding someone to maintain it, to fix it fast.

Floor: Alright When you build it yourself, would you then also open-source it yourself and take on the burden of maintenance?

Alexey: I mean that really depends on the job. Like at Percona we open sourced pg_stat_monitor. But that was like, implicit goal of making this extension open source to make it like a superset of pg_stat_statement. In FerretDB of course, DocumentDB is open source — we contribute to it, but I couldn’t say that’s easier. Of course if it was written like in our perfect language, Go, it would be much, much easier. Unfortunately, it’s not. So we have to deal with it with packaging and what not.

Floor: I guess it’s also like build versus buy versus fork because there’s definitely different forks available for a similar tooling that is just optimized for a little bit of a different use case. But again, that’s then another project out there that needs to be maintained.

Alexey: But at the same time, if you fork something, and don’t want to contribute back, you just don’t have this problem of maintaining it for someone else. You just maintain it for yourself. Of course, like if someone else in upstream wants to pull your changes, they will be able to. And then when they look at you like you’re a bad part of the community because you don’t contribute back, but that depends on the size of the company, whatever you have the sources and all that.

Celeste: But now you’re touching on something that I feel very strongly about when it comes to open source. Why open source anything to begin with? If we can all just maintain close forks of everything that we need, why is Postgres open source to begin with and why does it continue to be open source and why are we having this discussion 30 or 40 years into the lifespan of Postgres at this point?

The fact of the matter is that Postgres being open source is the reason that we’re still here today. Postgres is a 30 plus year old database at this point. Yes, it’s extremely well architected because it continues to be applicable to modern use cases when it comes to data. But really the fundamental of the matter is that it is free, and being free means that two things can happen. One, it’s a very smart move for businesses to build a business on top of a particular piece of software. But two — and I would argue that this is actually the more important point when it comes to open source and the long term viability of open source — is that because it is free, that means it is A) proliferative, it has proliferated across the software industry and B) it is extremely valuable for professionals to learn Postgres or to learn Kubernetes or to learn Linux because they know that they’re gonna encounter that sometime in their career.

So when it comes to extensions, why open source an extension? You could simply close source an extension. It’s the same reason: if you use open source extensions, you can then hire for people who have potentially encountered those extensions before.

I work for a managed service provider that deploys quite a few Postgreses for quite a few clients. I obviously have a bit of a stake in the build versus buy versus fork debate that is entirely financial and entirely linked to my wellbeing. Regardless, it still makes sense for a company like Aiven to invest in open source technologies, but it makes a lot more sense for us to hire Postgres experts who can then manage those extensions and manage the installation of those extensions and manage whether your database works or not against certain extensions, than it is for literally every company out there on the planet to hire a Postgres professional. There’s still a use case for open-sourcing these things. That is a much larger discussion though, and I don’t wanna derail this panel. [Laughs.]

Floor: I mean, if Alexey is game, you got yourself a conversation.

Alexey: First of all, I completely agree with you and I of course built my whole carrier on open source. But there’s also the other side. So let’s say you build an open source extension which is very specific, very niche, solves your particular problem. And there are like 20 other people who are like, you have the same problem, and then all 20 come to your GitHub and ask questions about it. And they do it for free. You just waste your time supporting them essentially. And you are a small company, you are just three people and you open-source this extension just for fun. And they are three people and two of them work full time and support that.

Celeste: Oh yeah, no, I didn’t say the economics of this worked out for the people doing the open-sourcing, just to be perfectly clear. I think a much larger question around the sustainability of open source communities in general. Postgres, the overall project, and say, for example, the main Kubernetes project, are outliers in terms of the amount of support and the amount of manpower and people and the energy they get. Whereas most things that get open-sourced are — I think Tidelift had a survey: the average maintainer size for any given open source project is one. That is a much larger debate though. Realistically it makes a lot of sense, particularly for larger companies, to use open source software, Postgres included, because it accelerates their time to innovation. They don’t need to worry about developing a database, for example. And if they’re using Postgres and they decide they want time series data, they don’t need to worry about migrating to a time series database when they can just use Timescale.

However, “are they contributing back to those projects?” becomes a really big question. I think the next questions that Floor would like to lead us to, amd I’m just going to take the reins here, Floor —

Floor: Are you taking my job??

Celeste: Hardly, hardly, I could never! My understanding of why we’re having this series of conversations that’s around the sustainability of the Postgres extensions ecosystem,is that there’s a governance question there as well. For the end user, the ideal state for any Postgres extension is that they’re blessed and vetted by the central project. But as soon as you start doing that, you start realizing how limited the resources in even a massive project like Postgres are. And then you start asking: Where should those people be coming from? And then you start thinking: There are companies like Microsoft out there in the world that are hiring a lot of open source contributors, and that’s great, but… What about the governments? What about the universities? What about the smaller companies? The real issue is the manpower and there’s only so far you can go, as a result of that. There’s always sustainability issues around all open source, including Postgres extensions, that come down to the sustainability of open source as a whole and whether or not this is a reasonable way of developing software. Sorry to get deep. [Laughs.]

Floor: Yeah, I think these are discussions that we’re definitely having a lot in the open source community, and in the hallway at a lot of conferences.

We’re gonna open it up to audience questions too in a minute. So if people want to continue talking about the drama that is open source and sustainable open source, we can definitely continue this discussion.

Maybe going back a little bit, Alexey, can we talk a little bit about — because you’re also a provider — what your definition of “done” is or what you wanna offer your users at minimum when you do decide to open-source some of your stuff or make available some of some of your stuff.

Alexey: As an open source company, what we do, we just publish our code on GitHub and that’s it. It’s open source, that’s done. Knock yourself out and if you want some support, you just pay us, and then we will. That’s how we make money. Well, of course not. That’s more complicated than that, and I wish it was like to some degree, sometimes. Now there still a lot of users who just come and ask for questions for free, and you want to support them because you want to increase adoption and all that.

The same with extensions. So as I just described the situation, of course, that was a bit like, not to provoke a discussion, but, let’s say you built a PostgreSQL extension, you need to have some hooks in the core that ideally would be stable, don’t change between versions as we discussed. That’s a bit of a problem. PostgreSQL, no separation between private and public API. Then how do you install? You need to package it some way that is the same way as your current PostgreSQL version is packaged. There is no easy way, for example, to extend a version of PostgreSQL, which is a part of Docker, you just build your own container.

Celeste: I’ll segway into the point that I think I was supposed to make when we were talking about extensions ecosystem, as opposed to a rant about the sustainability of open source, which I am unfortunately always down to give. Here’s the thing with extensions ecosystems. For the end user, it is significantly more beneficial if those extensions are somehow centrally-controlled. If you think about something like RubyGems or the Python package installer or even Docker to a certain extent, those are all ways of centralizing. Though with some of the exploits that have gone on with NPM recently, there are obviously still problems there.

I mentioned, there’s always staffing problems when it comes to open source. Assigning somebody to approve every single extension under the sun isn’t really sustainable from a human perspective. The way that we handle this in the Kubernetes community — particularly the container network interfaces, of which there are many, many, many — is we effectively manage it with governance. We have a page on the documentation in the website that says: here are all the container network interfaces that have chosen to list themselves with us. The listings are alphabetical, so there is no order of precedence.

The community does not take responsibility for this code because we simply cannot. In being a container network interface, it means that they implement certain functionalities, like an interface in the programming sense. We just left it at that. That was the solution that the Kubernetes community came to. I don’t know if that’s the solution that the Postgres community will eventually come to, but community governance is a huge part of the solution to that problem, in my opinion.

Alexey: I think one big difference between NPM and NodeJS ecosystem in general, and, for example, Postgres extensions, is that NPM was so popular and there are so many packages mostly because NodeJS by itself is quite small. The core of NodeJS is really, really small. There is now standard library and a lot of functionality is external. So I would say as long as your core, like PostgreSQL or Ruby or Kubernetes is large enough, the amount of extensions will be limited just by that. Because many people will not use any extensions, they will just use the core. That could solve a problem of waiting and name-squatting, but just by itself. I would say PostgreSQL more or less solves this problem to some degree.

Floor: Before we open up for some questions from participants, Sonia, in a previous call, shared a little bit of a horror story with us, with wanting to use a certain extension and not being able to. I think this is something that other people can resonate with, having been through a similar thing. Let’s hear that story, And then, of course, Celeste, Alexey, if you have similar stories, do share before we open up for questions from the rest of the peeps joining here.

Sonia: So there was this requirement to transfer data from one database to another database, specifically with respect to PostgreSQL. I wanted to transfer the data from the production environment to some other environment, or internally within the non-production environments. I created this extension called dblink. I’m talking about way back, 2012, 2013, somewhere, when I started working with PostgreSQL, I used that extension. When you configure that extension, we need to give the credentials in a human readable format. And then, at times it also gets stored in the logs or somewhere.

I mean, even if it is not storing the logs, what the security team or the audit team mentioned was that since it is using the credentials in a human readable format, this is not good. And if somebody has has access to X database, they also get the access to the Y database or the Y cluster. And what if it goes to the production environment and then somebody can just steal the data, without us even knowing it. It’ll not get logged inside the logs, that somebody has accessed my production database via non-production database. So that’s not good, and was not acceptable by the auditors.

I love that extension today also, because without doing any scripting or anything, you just access one database from another database and then get whatever you want. But then as a developer, it might be very easy for me to use that thing. But then as an other person who is trying to snoop into your production database or the other data of anything, it’s easy for them. So we were asked not to use that extension specifically, at least not to connect to the production environment.

I was working for a taxation project. It was a financial critical data, and they did not want it to have any risk of anybody reaching to that data because it was the numbers, the financial figures, and was critical. So that’s the reason we were refrained from using it for that particular project. But then other projects, which were not that critical, I somehow managed to convince them to use it. [Laughs.]

Floor: So it’s sometimes you will choose it for convenience and it’s acceptable risk, and then there might be restrictions from other teams as well. Thanks for sharing that. If anyone wants to un-mute and ask questions or share their own horror stories, you’re now very welcome to.

Yurii: There was a really good point about extensions being available as part of your operating system environment, for example Ubuntu packages or Red Hat packages. This is where we still have a lot of difficulty in general, in this ecosystem. Obviously PGDG is doing an amazing job capturing a fraction of those extensions. But because it is a complicated job, oftentimes unpaid, people are trying to make the best out of it. On the one hand, it does serve as a filter, as in only the best of the best extensions that people really use get through that filter and become part of PGDG distribution. But it also creates an impediment. For example, PGDG is not always able to update them as the releases come out. Oftentimes people do need the latest, the best releases available, and not when the packagers have time.

The other problem is how do extensions become popular if they’re not there in the first place? It creates that kind of problem where you’re stuck with what you have. And there’s a problem with a discovery: how do I find them? And how do I trust this build? Or can I even get those builds for my operating system?

Obviously there are some efforts that try to mitigate that by building a docker container and you run them with just copies of those files. But obviously there’s a demand for a native deployment method. That is, if I deploy my Postgres this way — say using RPM in my Red Hat-based distro, or Debian based — I want everything else to fall into that. I don’t want a new system.

I think we, we still have a lot of work to do on that end. I’ve been putting some effort on our end to try and find how can we save a packager’s time that has basically decreased the amount of work that that needs to be done. Can we go essentially from, here’s the URL for the extension, figure it out. Like 80% of them can, we just figure them out and package them automatically, and repackage them when new versions come out, an only assign people on them for the remaining 20% that are not building according to a certain convention. So they need some attention.

This way we can get more extensions out and extract more value out of these extensions. By using them, we’re helping the authors gain a wider audience and effectively create value for everybody in the community. Otherwise, they would feel like, “I can’t really promote this as well as I would’ve loved to, like another ecosystems — RubyGems were mentioned today, and NPM, etc. It’s easy to get your stuff out there. Whereas in the Postgres community, it is not easy to get your stuff out there. Because there are so many risks associated with that, we are oftentimes working with production data, right?

We need to make sure there is less friction on any other side. We need to get these extensions to get considered. That’s at least one of the points that I wanted to mention. I think there’s a lot to be done and I really hope that the conference next month in Montréal will actually be a great place to get the best minds together again and hash out some of the ideas that we’ve been discussing in the past number of months.

Floor: David, do you wanna ask your question of where people go to learn more about extensions and find their extensions?

David: This is something that I tried to solve a while ago with a modicum of success — a bit. My question is, where do you all go to learn more about extensions? To find out what extensions are available or, is there an extension that does X, Y, Z? How do you find out if there is and, then evaluate it? Where do you go?

Alexey: I generally just search, I guess. I don’t go to anything. The last place I generally research and quite often I learned on some blog post on sometimes on GitHub itself.

Celeste: If you think about that project-level activity proof, and then the social proof, I think that Postgres actually has a really unique advantage compared to a lot of other open source projects because it’s been going for so long and because there is a very entrenched community. It’s very easy to find social proof for basically anything Postgres-related that you might want.

If you do a search for, like, “I want a Postgres extension that does X”, you’re going to get comparatively better Google search results because there’s years and years and years of search results in some cases. However, that does come with the equal and opposite problem of when you have maintenance issues, because things have been going for years and years, and you don’t know whether things have been maintained or not.

I’m thinking about this from an open source management perspective, and as somebody who is not necessarily involved in the open source development of Postgres. I think there is a case that you could make for some amount of community vetting of some extensions and publicizing that community-vetting, and having a small subset of — this has some sort of seal of approval, it’s not gonna like nuke your database. To a certain extent, I think Postgres already does that, because it does ship with a set of extensions by default. In shipping with those extensions, it’s effectively saying the upstream Postgres community blesses these, such that we will ship Postgres with them because we are pretty confident that these are note going to nuke your database.

When I was at the CNCF, I supported a whole bunch of different open source projects. I was everybody’s documentation girl. So I’m trying to throw things at them and then hopefully you can talk about them in Montréal and maybe something useful will come of it. Another thing that you can use is almost like an alpha beta experimental sort of feature where you define some set of criteria for something being alpha or experimental, you define some set of criteria that if met, they can call themselves beta, you define some set of criteria of something being “production ready” for an extensions ecosystem. Then you can have people submit applications and then it’s less of a mad rush.

I guess if I had any advice — not that Postgres needs my Charlton advice — it would be to think about how you wanna manage this from a community governance perspective, or else you will find yourself in utter mayhem. There’s a reason that the Kubernetes container network interface page specifies that things have to be listed in alphabetical order. It’s because there was mayhem until we decided to list things in alphabetical order. It seems completely silly, but it is real. [Laughs.]

Alexey: So my next project is going to start with “aa”.

Sonia: Yeah, what Celeste said. I will research about it online, normally, and I will find something and, if I get lots of options for doing X thing, a lot of extensions, I will go and search the documentation on postgresql.org and then try to figure out which one is the one to start with my POC.

Celeste: Let me flip the question for you, Sonia. In an ideal world. If you were to try and find an extension to use for a particular task, how would you find that extension?

Sonia: Normally I will research it, Google it most of the times, and then try to find out —

Celeste: But pretend you don’t have to Google it. Pretend that maybe there’s a website or a resource. What would your ideal way of doing that be? If you had some way that would give you more of a guarantee that it was trustworthy, or would make it easier to find, or something. Would it be a tool like RubyGems? Would it be a page on the Postgres website’s documentation?

Sonia: Page! The PostgreSQL website documentation. The Postgres documentation is like a Bible for me, so I keep researching on that. In fact, previously when you used to Google out anything, you used to get the initial link as the postgresql.org, the website. Nowadays you don’t get the link as a first link, but then I will scroll down to the page. I will try to figure out where it is postgresql.org and then go there. That’s the first thing. Now since I’ve been into the field, since a very long time, then I know, okay, this website is authentic, I can go and check out the blogs, like who else has used it or what is their experience or things like that.

Jay Miller: I have to ask this only because I am new to thinking about Postgres outside of how I interact with it from a web developer’s perspective. Usually I use some ORM, I use some module. I’m a Python developer, so I use Python, and then from there, I don’t think about my database ever again.

Now I want to think about it more. I want to have a very strong relationship with it. And we live in a world where you have to say that one of the answers is going to be AI. One of the answers is I search for something, I get some AI response, and, and here’s like the…

David in comments: SLOP.

Jay: Exactly, this is the problem. If I don’t know what I should do and I get a response, when the response could have just been, “use this extension, it does everything you need to do and it makes your life so much easier.” Instead, I wind up spending days, if not weeks, going in and fighting against the system itself. Sonia, you mentioned having that experience. The idea or the ability to discern when to go with some very kludgey PostgreSQL function that makes your life miserable, to, “oh, there’s an extension for this already! I’m just going to use that.” How do you expose that to people who are not dumb, they’re not vibe coding, they just finally have a reason to actively think about what their database is doing behind the scenes.

Sonia: If I understood your question correctly, you wanted to explore what kind of activities a specific extension is doing.

Jay: I would just love the like, “hey, you’re trying to do a thing, this has already been solved in this extension over here, so you don’t have to think about it.” Or “you’re trying to do something brand new, no one’s thought about this before, or people have thought about it before and talked about how much of a pain it is. Maybe you should create an extension that does this. And here’s the steps to do that.” Where is the proper documentation around coming to that decision, or the community support for it?

Sonia: That’s a great question to discuss inside the community, to be honest. Like, how do we go about that?

David: Come to Montréal and help us figure it out.

Jay: I was afraid of that answer. I’ll see you in New York, or hopefully Chicago on Friday.

Floor: Fair enough, but definitely a wonderful question that we should note down for the discussion.

Sonia: One thing which I want to add, this just reminded me of. There was one podcast which I was listening with Robert Haas. The podcast is organized by one of the Microsoft folks. The podcast was revolving around how to commit inside the PostgreSQL, or how to read what is written inside the PostgreSQL and the ecosystem around that. The questions were related to that. That could also help. And of course, definitely when you go to a conference, which we are discussing at the moment, there you’ll find a good answer. But listening to that podcast will help you give the answers to an extent.

Floor: I think that’s Talking Postgres with Claire Giordano, or if it was the previous version, it was the “Path to Citus Con”, because that was what it was called before.

David: The summit that’s in Montréal on May 13th is an unconference session. We have a limited amount of time, so we want to collect topic ideas and ad hoc votes for ideas of things to discuss. Last year I used a website with Post-Its. This year I’m just trying a spreadsheet. I posted a link to the Google Sheet, which anybody in the world can access and pollute — I mean, put in great ideas — and star the ideas they’re really interested in talking about. And I’d really appreciate, people contributing to that. Good topics came up today! Thank you.

Floor: Thanks everyone for joining us. Thank you for our panelists specifically, for sharing their experiences.

Update Your Control Files

2025-04-28T20:08:49Z

Reviews of the extension search path patch, now committed and slated for PostgreSQL 18, revealed a few issues with extension configuration. Based on the ensuing discussion, and even though PostgreSQL 18 will include workarounds, it’s best to make adjustments to the extensions you maintain, the better to serve existing PostgreSQL versions and to hew closer to best practices.

Thus, a couple of recommendations for extension maintainers.

Remove the $libdir/ prefix from the module_pathname directive in the control file. The $libdir/ requires extension modules to live in pkglibdir (see pg_config), and no other directories included in dynamic_library_path, which limits where users can install it. Although PostgreSQL 18 will ignore the prefix, the docs will also no longer recommend it.
Remove the directory parameter from the control file and the MODULEDIR directive from the Makefile. Honestly, few people used these directives, which installed extension files in subdirectories or even completely different absolute directories. In some cases they may have been useful for testing or extension organization, but the introduction of the extension search path alleviates its use cases.

These changes will future-proof your extensions and make them better ecosystem citizens. Plus, they clean out some otherwise funky configurations that just aren’t necessary. Make the changes today — and while you’re at it, test your extensions with PostgreSQL 18 pre-releases!

Look, I’ll go first.

Mini Summit 4: The User POV

2025-04-21T17:26:52Z

And we’re back.

This Wednesday, April 9 at noon America/New_York (16:00 UTC) for Extension Mini Summit #4, where our panel consisting of Celeste Horgan (Aiven), Sonia Valeja (Percona), and Alexey Palazhchenko (FerretDB) will discuss “The User POV”. This session will be a terrific opportunity for those of us who develop extensions to get an earful from the people who use them, in both anger and joy. Bang on the Meetup to register for this live video session.

Fix Postgres `strchrnul` Compile Error on macOS 15.4

2025-04-16T19:03:26Z

Just a quick note to users of pgenv and anyone else who compiles Postgres on macOS. In macOS 15.4, Apple introduced a new API, strchrnul, which is common from other platforms. As a result attempting to compile Postgres on 15.4 and later will lead to this error:

snprintf.c:414:27: error: 'strchrnul' is only available on macOS 15.4 or newer [-Werror,-Wunguarded-availability-new]
  414 |                         const char *next_pct = strchrnul(format + 1, '%');
      |                                                ^~~~~~~~~
snprintf.c:366:14: note: 'strchrnul' has been marked as being introduced in macOS 15.4 here, but the deployment target is macOS 15.0.0
  366 | extern char *strchrnul(const char *s, int c);
      |              ^
snprintf.c:414:27: note: enclose 'strchrnul' in a __builtin_available check to silence this warning

Tom Lane chased down and committed the fix, which will be in the next releases of Postgres 13-17. It should also go away once macOS 16.0 comes out. But in the meantime, set MACOSX_DEPLOYMENT_TARGET to the current OS release to avoid the error:

export MACOSX_DEPLOYMENT_TARGET="$(sw_vers -productVersion)"

If you use pgenv, you can add it to your configuration. It will need to be added to all the version configs, too, unless they don’t exist and you also set:

PGENV_WRITE_CONFIGURATION_FILE_AUTOMATICALLY=no

Mini Summit 3 Transcript: Apt Extension Packaging

2025-04-14T22:48:22Z

Last week Christoph Berg, who maintains PostgreSQL’s APT packaging system, gave a very nice talk on that system at the third PostgreSQL Extension Mini-Summit. We’re hosting five of these virtual sessions in the lead-up to the main Extension Summit at PGConf.dev on May 13 in Montréal, Canada. Check out Christoph’s session on April 9:

Video
Slides

There are two more Mini-Summits coming up:

April 23: The User POV. Join our panelist of extension users for a lively discussion on tool choice, due diligence, and their experience running extensions.
May 7: Extension Management in CloudNativePG". CNPG maintainer Gabriele Bartolini will talk about recent developments in extension management in this official CNCF project.

Join the Meetup to attend!

And now, without further ado, thanks to the efforts of Floor Drees, the thing you’ve all been waiting for: the transcript!

Introduction

David Wheeler introduced the organizers:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

Christoph Berg, PostgreSQL APT developer and maintainer par excellence, talked through the technical underpinnings of developing and maintaining PostgresSQL and extension packages.

The stream and the closed captions available for the recording are supported by PGConf.dev and its gold level sponsors: Google, AWS, Huawei, Microsoft, and EDB.

APT Extension Packaging

Speaker: Christoph Berg

Hello everyone. So what is this about? It’s about packaging things for PostgresSQL for Debian distributions. We have PostgreSQL server packages, extension packages, application packages and other things. The general workflow is that we are uploading packages to Debian unstable first. This is sort of the master copy, and from there things eventually get to Debian testing. Once they’re being released, they end up in Debian stable.

Perhaps more importantly for the view today is that the same package is then also rebuilt for apt.postgresql.org for greater coverage of Postgres major versions. And eventually the package will also end up in an Ubuntu release because, Ubuntu is copying Debian unstable, or Debian testing, every six months and then doing their release from there. But I don’t have any stakes in that.

For an overview of what we are doing in this Postgres team, I can just briefly show you this overview page. That’s basically the view of packages we are maintaining. Currently it’s 138, mostly Postgres extensions, a few other applications, and whatever comes up in the Postgres ecosystem.

To get a bit more technical let’s look at how the Debian packages look from the inside.

We have two sorts of packages. We have source packages, which are the source of things that are built. The way it works is that we have a directory inside that source tree called Debian, which has the configuration bits about how the packages created should look like. And from this the actual binary packages, the .deb files are built.

Over the past years, I’ve got a few questions about, “how do I get my application, my extension, and so on packaged?” And I wrote that down as a document. Hopefully to answer most of the questions. And I kind of think that since I wrote this down last year, the questions somehow stopped. If you use that document and like it, please tell me because no one has ever given me any feedback about that. The talk today is kind of loosely based on this document.

I’m not going to assume that you know a whole lot of Debian packaging, but I can’t cover all the details here, so I’ll keep the generic bits a bit superficial and dive a bit more into the Postgres-specific parts.

Generally, the most important file in the Debian package is this Debian control file, which describes the source and the binary packages. This is where the dependencies are declared. This is where the package description goes, and so on. In the Postgres context, we have the first problem that, we don’t want to encode any specific PG major versions inside that control file, so we don’t have to change it each year once a new Postgres version comes out.

This is why, instead of a Debian control file, we actually have a debian/control.in file, and then there’s a tool called pg_buildext, originally written by Dimitri Fontaine, one or two decades ago, and then maintained by me and the other Postgres maintainers since then. That tool is, among other things, responsible for rewriting that control.in file to the actual control file.

I just picked one random extension that I happen to have on the system here. This postgresql-semver extension, the upstream author is actually David here. In this control file we say the name of the package, the name of the Debian maintainer — in this case the group — there’s a few uploaders, there’s build dependencies and other things that are omitted here because, the slide was already full. And then we have, next to this source section, we have a package section and here we have this placeholder: postgresql-PGVERSION-semver.

Once we feed this control.in file through this pg_buildext tool, it’ll generate the control file, which expands this PGVERSION placeholder to actually a list of packages. This is just a mechanical translation; we have postgresql-15-semver, 16, 17 and whatever other version is supported at that point.

Once a new PostgreSQL version is released, PostgreSQL 18 comes out, we don’t have to touch anything in this control.in file. We just rerun this pg_buildext update control command, and it’ll automatically add the new package.

There’s about half a dozen layers talking to each other when building a package On the lowest level, no one actually touches it at at that level. But Debian packages are actually ar archives, the one from library fame, was yet another, archive inside control called control.tar.xz or something. But. No one actually touches it at that level anymore.

We have dpkg on top of that, which provides some building blocks for creating actual Debian packages. So you would call dpkg-builddeb and other dpkg helpers to actually create a package from that. But because this is complicated, there’s yet another level on top of that, called debhelper. This is the actual standard for building Debian package nowadays. So instead of invoking all the dpkg tools directly, everyone uses the step helper tools which provide some wrappers for the most common build steps that are executed. I will show an example in a second.

Next to these wrappers for calling “create me a package”, “copy all files”, and so on, there’s also this program called dh, it’s called a sequencer because it’ll invoke all the other tools in the correct order. So let me show you an example before it gets too confusing. The top level command to actually build a Debian package — to create the binary packages from the source package — is called dpkg-buildpackage. It will invoke this debian/rules file. The debian/rules file is where all the commands go that are used to build a package. For historical reasons it’s a Makefile. In the shortest incantation it just says, “for anything that is called invoke this dh sequencer with some arguments.”

Let me skip ahead one more slide and if we’re actually running it like that, it kind of looks like this. I’m invoking dpkg-buildpackage, dpkg-buildpackage invokes debian/rules with target name debian/rules, invokes dh and dh then calls all the helper steps that are required for getting the package to run. The first one would be dh_update_autotools_config, so if any ancient auto conf things are used, it’ll be updated. The package will be reconfigured, and then it would it will be built and so on.

This was the generic Debian part. Postgres actually adds more automation on top of that. This is this “dh with pgxs step.” Let me go back two slides. We have this pgxs plugin for debhelper which adds more build steps that actually call out this tool called pg_buildext, which interfaces with the pgxs build system in your extension package. Basically debhelper calls this pgxs plugin, and this pgxs plugin called pg_buildext, and this one finally invokes the make command, including any PG_CONFIG or whatever settings that are required for compiling this extension.

If we go back to the output here, we can see that one of the steps here is actually invoking this pg_buildext tool and pg_buildext will then continue to actually compile this extension.

This means in the normal case for extensions that don’t do anything special, you will actually get away with a very short debian/rules file. Most of the time it’s just a few lines. In this case I added more configuration for two of the helpers. In this step, I told dh_installchangelogs that, in this package, the changelog has a file name that dh_installchangelogs doesn’t automatically recognize. Usually if you have a file called changelog, it will be automatically picked up. But in this case I told it to use this file. Then I’m telling it that some documentation file should be included in all packages. Everything else is standard and will be picked up by the default Debian tool chain.

Another thing specific for the Postgres bits is that we like to run the package tests at build time. One of the build steps that gets executed is this dh_pgxs test wrapper, which in turn invokes pg_buildext install check. That will create a new Postgres cluster and proceed to invoke pg_regress on that package. This is actually the place where this patch that Peter was talking about two weeks ago is coming into play.

The actual call chain of events is that dh_pgxs starts pg_buildext installcheck, pg_buildext starts pg_virtualenv, which is a small wrapper shipped with Debian — but not very specific to Debian — that just creates a new Postgres environment and then executes any command in that environment. This is actually very handy to create test instances. I’m using that all day. So if anyone is asking me, “can you try this on Postgres 15?” or something, I’m using pg_virtualenv -v 15 to fire up a temporary Postgres instance. I can then play with it, break it or something, and, as soon as I exit the shell that pg_virtualenv opens, the cluster will be deleted again.

In the context of pg_buildext, what pg_virtualenv is doing here is that it’s calling pg_createcluster to actually fire up that instance and it’s passing an option to set this extension_control_path to the temporary directory that the extension was installed to during the build process. While we are compiling the package, the actual install command is invoked, but it does not write to /usr/share/postgresql or something, but it writes to a subdirectory of the package build directory. So it’s writing to debian/$PACKAGE/$THE_ORIGINAL_PATH.

And that’s why before we had this in Postgres 18, the Debian packages had a patch that does the same thing as this extension_control_path setting. It was called extension_destdir. It was basically doing the same thing except that it was always assuming that you had this structure of some prefix and then the original path. The new patch is more flexible that: it can be an arbitrary directory. The old extension_destdir patch assumes that it’s always /$something/usr/share/postgres/$something. I’m glad that that patch finally went in and we can still run the test at build time.

So far we’ve only seen how to build things for one Postgres version. The reason why this pg_buildext layer is there is that this tool is the one that does the building for each version in turn. So pg_buildext will execute any command pass to it for all the versions that are currently supported by that package. What’s happening here is that we have one source package for extension covered. And that one source package then builds a separate binary for each of the major versions covered. But it does this from a single build run.

In contrast to what Devrim is doing with the RPM packages, he’s actually in invoking the builds several times separately for each version. We could also have done this, it’s just a design choice that, we’ve done it one way round and he’s doing it the other way round.

To tell pg_buildext which versions are supported by the package, there’s a file called debian/pgversions which usually just contains a single line where you can either say, “all versions are supported”, or you can say that “anything, starting 9.1” or “starting PostgreSQL 15 and later” is supported. In this example here, 9.1+ is actually copied from the semver package because the requirement there was that it needs to support extensions and that’s when 9.1 was introduced. We don’t care about these old versions anymore, but the file was never changed since it was written.

We know how to build several Postgres major versions from a source package. Now the next axis is supporting multiple architectures. The build is invoked separately for each architecture. This single source package is compiled several times for each architecture. On apt.postgresql.org, we’re currently supporting amd64, arm64 and ppc64el. We used to have s390x support, but I killed that recently because IBM is not supporting any build machine anymore that actually works. Inside Debian there are a lot more architecture supported.

There’s also something called Debian ports, which are not official architectures, but either new architectures that are being introduced like this loong64 thing, or it’s sometimes it’s old architectures that are not official anymore, but are still being kept around like the Sparc one. There’s also some experimental things like hurd-amd64, hurd-i386. Isn’t even Linux. This is a hurd kernel, but still running everything Debian on top of it, and some time ago it even started to support Postgres. The packages are even passing the tests there, which is kind of surprising for something that hasn’t ever seen any production.

For Postgres 17, it looks like this. The architectures in the upper half of that table are the official ones, and the gray area on the bottom are the unofficial ones that are, let’s say, less supported. If anything breaks in the upper half, maintainers are supposed to fix it. If anything breaks in the lower half, people might care or might not care.

I like to keep it working because if Postgres breaks, all the other software that needs it — like libpq, so it’s not even extensions, but any software that depends on libpq — wouldn’t work anymore if that’s not being built anymore. So I try to keep everything updated, but some architectures are very weird and just don’t work. But at the moment it looks quite good. We even got Postgres 18 running recently. There were some problems with that until last week, but I actually got that fixed on the pg-hackers list.

So, we have several Postgres major versions. We have several architectures. But we also have multiple distribution releases. For Debian this is currently sid (or unstable), trixie, (currently testing), bookworm, bullseye, Ubuntu plucky, oracular, noble, jammy, focal — I get to know one funny adjective each year, once Ubuntu releases something new. We’re compiling things for each of those and because compiling things yields a different result on each of these distributions, we want things to have different version numbers so people can actually tell apart where the package is coming from.

Also, if you are upgrading — let’s say from Debian bullseye to Debian bookworm — you want new Postgres packages compiled for bookworm. So things in bookworm need to have higher version numbers than things in bullseye so you actually get an upgrade if you are upgrading the operating system. This means that packages have slightly different version numbers, and what I said before — that it’s just one source package — it’s kind of not true because, once we have new version numbers, we also get new source packages.

But these just differ in a new change log entry. It’s basically the same thing, they just get a new change log entry added, which is automatically created. That includes this, plus version number part. Wwhat we’re doing is that the original version number gets uploaded to Debian, but packages that show up on apt.postgresql.org have a marker inside the version number that says “PGDG plus the distribution release number”. So for the Ubuntu version, it says PGDG-24.0.4 or something and then Debian is, it’s plus 120-something.

The original source package is tweaked a bit using this shell script. I’m not going to show it now because it’s quite long, but, you can look it up there. This is mostly about creating these extra version numbers for these special distributions. It applies a few other tweaks to get packages working in older releases. Usually we can just take the original source or source package and recompile it on the older Debians and older Ubuntus. But sometimes build dependencies are not there, or have different names, or some feature doesn’t work. In that case, this generate-pgdg-source has some tweaks, which basically invokes set commands on the source package to change some minor bits. We try to keep that to minimum, but sometimes, things don’t work out.

For example, when set compression support was new in Postgre, compiling the newer Postgres versions for the older releases required some tweaks to disable that on the older releases, because they didn’t have the required libraries yet.

If you’re putting it all together, you get this combinatorial explosion. From one project, postgresql-semver, we get this many builds and each of those builds — I can actually show you the actual page — each of those builds is actually several packages. If you look at the list of artifacts there, it’s creating one package for PostgreSQL 10, 11, 12, and so on. At the moment it’s still building for PostgreSQL 10 because I never disabled it. I’m not going to complain if the support for the older versions is broken at some point. It’s just being done at the moment because it doesn’t cost much.

And that means that, from one source package quite a lot of artifacts are being produced. The current statistics are this:

63355 .deb files
2452 distinct package names
2928 source packages
210 distinct source package names
47 GB repository size

We have 63,000 .deb files. That’s 2,400 distinct package names — so package-$PGVERSION mostly built from that many source packages. The actual number of distinct source packages is 210. Let’s say half of that is extensions. Then there’s of course separate source packages for Postgres 10, 11, 12, and so on, and there’s a few application packages. Yeah, in total the repository is 47 gigabytes at the moment.

This is current stuff. All the old distributions are moved to apt-archive.postgresql.org. We are only keeping the latest built inside the repository. So if you’re looking for the second-latest version of something, you can go to apt-archive.postgresql.org. I don’t have statistics for that, but that is much larger. If I had to guess, I would say probably something like 400 gigabytes/ I could also be off by with guessing.

That was how to get from the source to the actual packages. What we’re doing on top of that is doing more testing. Next to the tests that we are running at build time, we are also running tests at installation time, or once the package is installed we can run tests. For many packages, that’s actually the same tests, just rerun on the actual binaries as installed, as opposed to debian/something. Sometimes it’s also different tests For some tests it’s just simple smoke tests. id everything get installed to the correct location and does the service actually start, sometimes it’s more complex things.

Many test suites are meant to be run at compilation time, but we want to run them at install time. This is kind of make check, make installcheck, but some projects are not really prepared to do that. They really want, before you can run the test suite, you have to basically compile everything. I try to avoid that because things that work at compilation time might not mean that it’s running at install time because we forgot to install some parts of the build.

I try to get the test suite running with as few compilation steps as possible, but sometimes it just doesn’t work. Sometimes the Makefile assumes that configure was run and that certain variables got substituted somewhere. Sometimes you can get it running by calling make with more parameters, but it tends to break easily if something changes upstream. If you’re an extension author, please think of someone not compiling your software but still wanting to run the tests.

What we’re doing there is to run these tests each month. On each day, each month, a random set of tests is scheduled — that’s three or four per day or something. It’s not running everything each day because if something breaks, I can’t fix 50 things in parallel. You can see test suite tab there. At the moment, actually everything worked. For example, we could check something…

With this background worker rapid status thing, that’s an extension that Magnus wrote sometime ago. Everything is running fine, but something was broken in January. Ah, there, the S390 machine was acting up. That was probably a pretty boring failure. Probably something with network broken. Not too interesting. This is actually why I shut down this architecture, because the built machine was always having weird problems. This is how we keep the system actually healthy and running.

One thing that’s also catching problems is called debcheck. This is a static installability analysis tool by Debian. You feed it a set of packages and it will tell you if everything is installable. In this case, something was not installable on Debian testing. And — if we scroll down there — it would say that postgresql-10-icu-ext was not installable because this lib-icu-72 package was missing. What happened there is that project or library change so-name, from time to time, and in this case, in Debian, ICU was moving from 72 to 76 and I just had to recompile this module to make it work.

Usually if something breaks, it’s usually on the development suites — sid, trixie, unstable, and testing — the others usually don’t break. If the others break, then I messed something up.

That was a short tour of how the packaging there works. For open issues or pain pain points that there might be, there are packages that don’t have any tests. If we are looking at, what was the number, 63,000 packages, I’m not going to test them by hand, so we really rely on everything being tested automatically. Extensions are usually very well covered, so there’s usually not a problem.

Sometimes there’s extensions that don’t have tests, but they are kind of hard to test. For example, modules that don’t produce any SQL outputs like auto_explain are kind of hard to test because the output goes somewhere else. I mean, in the concrete case, auto_explain probably has tests, but it’s sometimes it’s things that are not as easily testable as new data types.

Things that usually don’t have tests by nature is GUI applications; any program that opens a window is hard to test. But anything that produces text output is usually something I like to cover. Problems with software that we are shipping and that actually breaks in production is usually in the area where the tests were not existing before.

One problem is that some upstream extensions only start supporting Postgres 18 after the release. People should really start doing that before, so we can create the packages before the 18.0 release. Not sure when the actual best point to start would be; maybe today because yesterday was feature freeze. But sometime during the summer would be awesome. Otherwise Devrim and I will go chasing people and telling them, “please fix that.”

We have of course packages for Postgres 18, but we don’t have extension packages for Postgres 18 yet. I will start building that perhaps now, after feature freeze. Let’s see how, how much works and not. Usually more than half of the packages just work. Some have trivial problems and some have hard problems, and I don’t know yet if Postgres 18 will be a release with more hard problems or more trivial problems.

Another problem that we’re running into sometimes is that upstream only cares about 64bit Intel and nothing else. We recently stopped caring about 32 bits for extensions completely. So Debian at postgresql.org is not building any extension packages for any 32-bit architectures anymore. We killed i386, but we also killed arm, and so on, on the Debian side.

The reason is that there are too many weird bugs that I have to fix, or at at least find, and then chase upstreams about fixing their 32-bit problems. They usually tell me “I don’t have any 32-bit environment to test,” and they don’t really care. In the end, there are no users of most extensions on 32-bit anyway. So we decided that it just doesn’t make sense to fix that. In order to prevent the problems from appearing in the first place, we just disabled everything 32-bit for the extensions.

The server is still being built. It behaves nicely. I did find a 32-bit problem in Postgres 18 last week, but that was easy to fix and not that much of a problem. But my life got a lot better once I started not caring about 32-bit anymore. Now the only problem left is big-endian s390x in Debian, but that doesn’t cause that many problems.

One thing where we are only covering a bit of stuff is if projects have multiple active branches. There are some projects that do separate releases per Postgres major version. For example, pgaudit has separate branches for each of the Postgres versions, so we are tracking those separately, just to make pgaudit available. pg-hint-plan is the same, and this Postgres graph extension thing (Apache Age) is also the same. This is just to support all the Postgres major versions. We have separate source packages for each of the major versions, which is kind of a pain, but doesn’t work otherwise.

Where we are not supporting several branches is if upstream is maintaining several branches in parallel. For example, PostGIS is maintaining 3.5, 3.4, 3.3 and so on, and we are always only packaging the latest one. Same for Pgpool, and there’s probably other projects that do that. We just don’t do that because it would be even more packages we have to take care of. So we are just packaging the latest one, ad so far there were not that many complaints about it.

Possibly next on the roadmap is looking at what to do with Rust extensions. We don’t have anything Rust yet, but that will probably be coming. It’s probably not very hard; the question is just how much of the build dependencies of the average extension is already covered in Debian packages and how much would we have to build or do we just go and render all the dependencies or what’s the best way forward?

There’s actually a very small number of packages that are shipped on apt.postgresql.org that are not in Debian for this reason. For example, the PL/Java extension is not in Debian because too many of the build dependencies are not packaged in Debian. I have not enough free time to actually care about those Java things, and I can’t talk Java anyway, so it wouldn’t make much sense anyway.

I hope that was not too much, in the too short time.

Questions and comments

Pavlo Golub: When you show the pg_virtualenv, usage, do you use pre-built binaries or do you rebuild every time? Like for every new version you are using?
Christoph: No, no, that’s using the prebuilt binaries. The way it works is, I have many Postgres versions installed on that machine, and then I can just go and say, pg_virtualenv, and I want, let’s say, an 8.2 server. It’s calling initdb on the newer version, it’s actually telling it to skip the fsync — that’s why 8.3 was taking a bit longer, because it doesn’t have that option yet. And there it’s setting PGPORT, PGHOST and so on, variables. So I can just connect and then play with this old server. The problem is that psql pro-compatibility at some point, but it’s still working for sending normal commands to modern psql.
Pavlo: For modern psql, yeah. That’s cool! Can you add not only vanilla Postgres, but any other flavors like by EDB or Cybertec or, …?
Christoph: I’ve thought about supporting that; the problem there is that there’s conflicting requirements. What we’ve done on the Cybertec side is that if the other Postgres distribution wants to be compatible to this one, it really has to place things in the same directories. So it’s installing to exactly this location and if it’s actually behaving like the original, it’ll just work. If it’s installing to /opt/edb/something, its not supported at the moment, but that’s something we could easily add. What it’s really doing is just invoking the existing tools with enough parameters to put the data directory into some temporary location.
Pavlo: And one more question. You had Go extensions mentioned on your last slide, but you didn’t tell anything about those.
Christoph: Yeah, the story is the same as with Rust. We have not done anything with it yet and we need to explore it.
David Wheeler: Yurii was saying a bit about that in the chat. It seems like the problem is that, both of them expect to download most of their dependencies. And vendoring them swells up the size of the download and since they’re not runtime dependencies, but compile-time dependencies, it seems kind of silly to make packages.
Christoph: Yeah. For Debian, the answer is that Debian wants to be self-contained, so downloading things from the internet at build time is prohibited. The ideal solution is to package everything; if it’s things that are really used only by one package, then vendoring the modules might be an option. But people will look funny at you if you try to do that.
Yurii: I think part of the problem here is that in the Rust ecosystem in particular, it’s very common to have a lot of dependencies, as in hundreds. When you start having one dependency and that dependency brings another dependency. The other part of the problem is that you might depend on a particular range of versions of particular dependencies and others depend on others. Packaging all of that as individual dependencies is becoming something that is really difficult to accomplish. So vendorizing and putting that as part of the source is something that we could do to avoid the problem.
Christoph: Yeah, of course, it’s the easy solution. Some of the programming language ecosystems fit better into Debian than others. So I don’t know how well Rust fits or not.

What I know from the Java world is that they also like to version everything and put version restrictions on their dependencies. But what Debian Java packaging helpers are doing is just to nuke all those restrictions away and just use the latest version and usually that just works. So you’re reducing the problem by one axis by having everything at the latest version. No idea how reasonable the Rust version ranges there are. So if you can just ignore them and things still work, or…
Yurii: Realistically, this is impossible. They do require particular versions and they will not compile oftentimes. The whole toolchain expects particular versions. This is not only dependency systems themselves, it’s also Rust. A package or extension can have a particular demand for minimum supported Rust version. If that version is not available in particular distro, you just can’t compile.
Christoph: Then the answer is we don’t compile and you don’t get it. I mean, Rust is possibly still very new and people depend on the latest features and then are possibly just out of luck if they want something on Debian bullseye. But at some point that problem should resolve itself and Rust get more stable so that problem is not as common anymore.
Yurii: It’s an interesting take actually because if you think about, the languages that have been around for much longer should have solved this problem. But if you look at, I don’t know, C, C++, so GCC and Clang, right? They keep evolving and changing all the time too. So there’s a lot of code say in C++ that would not compile with a compiler that is older than say, three years. So yeah, but we see that in old languages.
Christoph: Yea, but Postgres knows about that problem and just doesn’t use any features that are not available in all compilers. Postgres has solved the problem.
Yurii: Others not so much. Others can do whatever they want.
Christoph: If upstream doesn’t care about their users, that’s upstream’s problem.
David: I think if there’s there’s a centralized place where the discussion of how to manage stuff, like Go and Rust do, on packaging systems is happening, I think it’s reaching a point where there’s so much stuff that we’ve gotta figure out how to work up a solution.
Christoph: We can do back ports of certain things in the repository and make certain toolchain bits available on the older distributions. But you have to stop at some point. I’m certainly not going to introduce GCC back ports, because I just can’t manage that. So far we haven’t done much of that. I think Devrim is actually backporting parts of the GIST tool chain, like GL and libproj or something. I’ve always been using what is available in the base distribution for that. There is some room for making it work, but it’s always the question of how much extra work we want to put in, how much do we want to deviate from the base distribution, and ultimately also, support the security bits of that.

[David makes a pitch for the next two sessions and thanks everyone for coming].

Mini Summit 3: APT Extension Packaging

2025-04-07T18:33:23Z

This Wednesday, April 9 at noon America/New_York (16:00 UTC) for Extension Mini Summit #3, where Christoph Berg will take us on a tour of the PostgreSQL Global Development Group’s APT repository with a focus on packaging extensions. For those of us foolish enough to consider building our own binary packaging systems for extensions, this will be an essential session. For everyone else, come be amazed by the sheer volume of extensions readily available from the repository. Browse on over to the Meetup to register for this live video conference.

2025 Postgres Extensions Mini Summit Two

2025-04-01T19:32:52Z

Last Wednesday, March 26, we hosted the second of five virtual Extension Mini-Summits in the lead up to the big one at the Postgres Development Conference (PGConf.dev) on May 13 in Montréal, Canada. Peter Eisentraut gave a very nice presentation on the history, design decisions, and problems solved by “Implementing an Extension Search Path”. That talk, plus another 10-15m of discussion, is now available for your viewing pleasure:

Video
Slides

If you’d like to attend any of the next three Mini-Summits, join the Meetup!

Once again, with many thanks again to Floor Drees for the effort, here’s the transcript from the session.

Introduction

Floor Drees introduced the organizers:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

Peter Eisentraut, contributor to PostgreSQL development since 1999, talked about implementing an extension search path.

The stream and the closed captions available for the recording are supported by PGConf.dev and their gold level sponsors, Google, AWS, Huawei, Microsoft, and EDB.

Implementing an extension search path

Peter: Thank you for having me!

I’m gonna talk about a current project by me and a couple of people I have worked with, and that will hopefully ship with Postgres 18 in a few months.

So, what do I know about extensions? I’m a Postgres core developer, but I’ve developed a few extensions in my time, here’s a list of extensions that I’ve built over the years.

Some of those are experiments, or sort of one-offs. Some of those are actually used in production.

I’ve also contributed to well-known extensions: orafce; and back in the day, pglogical, BDR, and pg_failover_slots, at EDB, and previously 2ndQuadrant. Those are obviously used widely and in important production environments.

I also wrote an extension installation manager called pex at one point. The point of pex was to do it in one shell script, so you don’t have any dependencies. It’s just a shell script, and you can say pex install orafce and it installs it. This was a proof of concept, in a sense, but was actually quite useful sometimes for development, when you just need an extension and you don’t know where to get it.

And then I wrote, even more experimental, a follow-on project called autopex, which is a plugin module that you load into Postgres that automatically installs an extension if you need it. If you call CREATE EXTENSION orafce, for example, and you don’t have it installed, autopex downloads and installs it. Obviously highly insecure and dubious in terms of modern software distribution practice, but it does work: you can just run CREATE EXTENSION, and it just installs it if you don’t have it. That kind of works.

So anyways, so I’ve worked on these various aspects of these over time. If you’re interested in any of these projects, they’re all under my GitHub account.

In the context of this presentation…this was essentially not my idea. People came to me and asked me to work on this, and as it worked out, multiple people came to me with their problems or questions, and then it turned out it was all the same question. These are the problems I was approached about.

The first one is extension management in the Kubernetes environment. we’ll hear about this in a future talk in this series. Gabriele Bartolini from the CloudNativePG project approached me and said that the issue in a Kubernetes environment is that if you launch a Postgres service, you don’t install packages, you have a pre-baked disk image that contains the software that you need. There’s a Postgres server and maybe some backup software in that image, and if you want to install an extension, and the extension is not in that image, you need to rebuild the image with the extension. That’s very inconvenient.

The ideal scenario would be that you have additional disk images for the extensions and you just somehow attach them. I’m hand waving through the Kubernetes terminology, and again, there will be a presentation about that in more detail. But I think the idea is clear: you want to have these immutable disk images that contain your pieces of software, and if you want to install more of them, you just wanna have these disk images augment ’em together, and that doesn’t work at the moment.

Problem number two is: I was approached by a maintainer of the Postgres.app project, a Mac binary distribution for Postgres. It’s a nice, user-friendly binary distribution for Postgres. This is sort of a similar problem: on macOS you have these .app files to distribute software. They’re this sort of weird hybrid between a zip file with files in it and a directory you can look into, so it’s kind of weird. But it’s basically an archive with software in it. And in this case it has Postgres in it and it integrates nicely into your system. But again, if you want to install an extension, that doesn’t work as easily, because you would need to open up that archive and stick the extension in there somehow, or overwrite files.

And there’s also a tie in with the way these packages are signed by Apple, and if you, mess with the files in the package, then the signature becomes invalid. It’s the way it’s been explained to me. I hope this was approximately accurate, but you already get the idea, right? There’s the same problem where you have this base bundle of software that is immutable or that you want to keep immutable and you want to add things to it, which doesn’t work.

And then the third problem I was asked to solve came from the Debian package maintainer, who will also speak later in this presentation series. What he wanted to do was to run the tests of an extension while the package is being built. That makes sense. You wanna run the tests of the software that you’re building the package for in general. But in order to do that, you have to install the extension into the the normal file system location, right? That seems bad. You don’t want to install the software while you’re into the main system while you’re building it. He actually wrote a custom patch to be able to do that, which then my work was inspired by.

Those are the problems I was approached about.

I had some problems I wanted to solve myself based on my experience working with extensions. While I was working on these various extensions over the years, one thing that never worked is that you could never run make check. It wasn’t supported by the PGXS build system. Again, it’s the same issue.

It’s essentially a subset of the Debian problem: you want to run a test of the software before you install it, but Postgres can only load an extension from a fixed location, and so this doesn’t work. It’s very annoying because it makes the software development cycle much more complicated. You always have to then, then run make all, make install, make sure you have a server running, make installcheck. And then you would want to test it against various different server versions. Usually they have to run this in some weird loop. I’ve written custom scripts and stuff all around this, but it’s was never satisfactory. It should just work.

That’s the problem I definitely wanted to solve. The next problem — and these are are all subsets of each other — that if you have Postgres installed from a package, like an RPM package for example, and then you build the extension locally, you have to install the extension into the directory locations that are controlled by your operating system. If you have Postgres under /usr, then the extensions also have to be installed under /usr, whereas you probably want to install them under /usr/local or somewhere else. You want to keep those locally built things separately, but that’s not possible.

And finally — this is a bit more complicated to explain — I’m mainly using macOS at the moment, and the Homebrew package manager is widely used there. But it doesn’t support extensions very well at all. It’s really weird because the way it works is that each package is essentially installed into a separate subdirectory, and then it’s all symlinked together. And that works just fine. You have a bunch of bin directories, and it’s just a bunch of symlinks to different subdirectories and that works, because then you can just swap these things out and upgrade packages quite easily. That’s just a design choice and it’s fine.

But again, if you wanna install an extension, the extension would be its own package — PostGIS, for example — and it would go into its own directory. But that’s not the directory where Postgres would look for it. You would have to install it into the directory structure that belongs to the other package. And that just doesn’t work. It’s just does not fit with that system at all. There are weird hacks at the moment, but it’s not satisfactory. Doesn’t work at all.

It turned out, all of these things have sort of came up over the years and some of these, people have approached me about them, and I realized these are essentially all the same problem. The extension file location is hard-coded to be inside the Postgres installation tree. Here as an example: it’s usually under something like /usr/share/postgresql/extension/, and you can’t install extensions anywhere else. If you want to keep this location managed by the operating system or managed by your package management or in some kind of immutable disk image, you can’t. And so these are essentially all versions of the same problem. So that’s why I got engaged and tried to find a solution that addresses all of ’em.

I had worked on this already before, a long time ago, and then someone broke it along the way. And now I’m fixing it again. If you go way, way back, before extensions as such existed in Postgres in 9.1, when you wanted to install a piece of software that consists of a shared library object and some SQL, you had to install the shared library object into a predetermined location just like you do now. In addition, you had to run that SQL file by hand, basically, like you run psql -f install_orafce.sql or something like that. Extensions made that a little nicer, but it’s the same idea underneath.

In 2001, I realized this problem already and implemented a configuration setting called dynamic_library_path, which allows you to set a different location for your shared library. Then you can say

dynamic_library_path = '/usr/local/my-stuff/something'

And then Postgres would look there. The SQL file just knows where is because you run it manually. You would then run

psql -f /usr/local/my-stuff/something/something.sql

That fixed that problem at the time. And when extensions were implemented, I was essentially not paying attention or, you know, nobody was paying attention. Extension support were a really super nice feature, of course, but it broke this previously-available feature: then you couldn’t install your extensions anywhere you wanted to; you were tied to this specific file system, location, dynamic_library_path still existed: you could still set it somewhere, but you couldn’t really make much use of it. I mean, you could make use of it for things that are not extensions. If you have some kind of plugin module or modules that install hooks, you could still do that. But not for an extension that consist of a set of SQL scripts and a control file and dynamic_library_path.

As I was being approached about these things, I realized that was just the problem and we should just now fix that. The recent history went as follows.

In April, 2024, just about a year ago now, David Wheeler started a hackers thread suggesting Christoph Berg’s Debian patch as a starting point for discussions. Like, “here’s this thing, shouldn’t we do something about this?”

There was, a fair amount of discussion. I was not really involved at the time. This was just after feature freeze,and so I wasn’t paying much attention to it. But the discussion was quite lively and a lot of people pitched in and had their ideas and thoughts about it. And so a lot of important, filtering work was done at that time.

Later, in September, Gabriele, my colleague from EDB who works on CloudNativePG, approached me about this issue and said like: “hey, this is important, we need this to make extensions useful in the Kubernetes environment.” And he said, “can you work, can you work on this?”

I said, “yeah, sure, in a couple months I might have time.” [Laughs]. But it sort of turns out that, at PGConf.EU we had a big brain trust meeting of various people who basically all came and said, “hey, I heard you’re working on extension_control_path, I also need that!”

Gabriele was there, and Tobias Bussmann from Postgres.app was there ,and Christoph, and I was like, yeah, I really need this extension_control_path to make this work. So I made sure to talk to everybody there and, and make sure that, if we did this, would it work for you? And then we kind of had a good idea of how it should work.

In November the first patch was posted and last week it was committed. I think there’s still a little bit of discussion of some details and, we certainly still have some time before the release to fine tune it, but the main work is hopefully done.

This is the commit I made last week. The fact that this presentation was scheduled gave me additional motivation to get it done. I wanna give some credits to people who reviewed it. Obviously David did a lot of reviews and feedback in general. My colleague Matheus, who I think I saw him earlier, he was also here on the call, did help me quite a bit with sort of finishing the patch. And then Gabriele, Marco and Nicolò, who work on CloudNativePG, did a large amount of testing.

They set up a whole sort of sandbox environment making test images for extensions and, simulating the entire process of attaching these to the main image. Again, I’m butchering the terminology, but I’m just trying to explain it in general terms. They did the whole end-to-end testing of what that would then look like with CloudNativePG. And again, that will, I assume, be discussed when Gabriele presents in a few weeks.

These are the stats from the patch

commit 4f7f7b03758

doc/src/sgml/config.sgml                                     |  68 +++++
doc/src/sgml/extend.sgml                                     |  19 +-
doc/src/sgml/ref/create_extension.sgml                       |   6 +-
src/Makefile.global.in                                       |  19 +-
src/backend/commands/extension.c                             | 403 +++++++++++++++++----------
src/backend/utils/fmgr/dfmgr.c                               |  77 +++--
src/backend/utils/misc/guc_tables.c                          |  13 +
src/backend/utils/misc/postgresql.conf.sample                |   1 +
src/include/commands/extension.h                             |   2 +
src/include/fmgr.h                                           |   3 +
src/test/modules/test_extensions/Makefile                    |   1 +
src/test/modules/test_extensions/meson.build                 |   5 +
.../modules/test_extensions/t/001_extension_control_path.pl  |  80 ++++++

the reason I show this is that, it’s not big! What I did is use the same infrastructure and mechanisms that already existed for the dynamic_library_path. That’s the code in that’s in dfmgr there in the middle. That’s where this little path search is implemented9. And then of course, in extension..c there’s some code that’s basically just a bunch of utility functions, like to list all the extensions and list all the versions of all the extensions. Those utility functions exist and they needed to be updated to do the path search. Everything else is pretty straightforward. There’s just a few configuration settings added to the documentation and the sample files and so on. It’s not that much really.

One thing we also did was add tests for this, Down there in test_extensions. We wrote some tests to make sure this works. Well, it’s one thing to make sure it works, but the other thing is if we wanna make changes or we find problems with it, or we wanna develop this further in the future, we have a record of how it works, which is why you write tests. I just wanted to point that out because we didn’t really have that before and it was quite helpful to build confidence that we know how this works.

So how does it work? Let’s say you have your Postgres installation in a standard Linux file system package controlled location. None of the actual packages look like this, I believe, but it’s a good example. You have your stuff under the /usr/bin/, you have the shared libraries in the /usr/lib/something, you have the extension control files and SQL files in the /usr/share/ or something. That’s your base installation. And then you wanna install your extension into some other place to keep these things separate. So you have /usr/local/mystuff/, for example.

Another thing that this patch implemented is that you can now also do this: when you build an extension, you can write make install prefix=something. Before you couldn’t do that, but there was also no point because if you installed it somewhere else, you couldn’t do anything with it there. Now you can load it from somewhere else, but you can also install it there — which obviously are the two important sides of that.

And then you set these two settings: dynamic_library_path is an existing configuration setting, yYou set that to where your lib directory is, and then the extension_control_path is a new setting. The titular setting of this talk, where you tell it where your extension control files are.

There’s these placeholders, $libdir and $system which mean the system location, and then the other locations are your other locations, and it’s separated by colon (and semi-colon on Windows). We had some arguments about what exactly the extension_control_path placeholder should be called and, people continue to have different opinions. What it does is it looks in the list directories for the control file, and then where it finds the control file from there, it loads all the other files.

And there’s a fairly complicated mechanism. There’s obviously the actual SQL files, but there’s also these auxiliary control files, which I didn’t even know that existed. So you can have version specific control files. It’s a fairly complicated system, so we wanted to be clear that what is happening is the, the main control file is searched for in these directories, and then wherever it’s found, that’s where it looks for the other things. You can’t have the control file in one path and then the SQL files in another part of the path; that’s not how it works.

That solves problem number five. Let’s see what problem number five was. I forgot [Chuckles]. This is the basic problem, that you no longer have to install the extensions in the directories that are ostensibly controlled by the operating system or your package manager.

So then how would Debian packaging use this? I got this information from Christoph. He figured out how to do this. He just said, “Oh, I did this, and that’s how it works.” During packaging, the packaging scripts that built it up in packages that you just pass these:

PKGARGS="--pgoption extension_control_path=$PWD/debian/$PACKAGE/usr/share/postgresql/$v/extension:\$system
--pgoption dynamic_library_path=$PWD/debian/$PACKAGE/usr/lib/postgresql/$v/lib:/usr/lib/postgresql/$v/lib"

These options set the control path and the dynamic_library_path and these versions and then it works. This was confirmed that this addresses his problem. He no longer has to carry his custom patch. This solves problem number three.

The question people ask is, “why do we have two?” Or maybe you’ve asked yourself that. Why do we need two settings. We have the dynamic_library_path, we have the extension_control_path. Isn’t that kind of the same thing? Kind of, yes! But in general, it is not guaranteed that these two things are in a in a fixed relative location.

Let’s go back to our fake example. We have the libraries in /usr/lib/postgresql and the SQL and control files in /usr/share/postgresql, for example. Now you could say, why don’t we just set it to /usr? Or, for example, why don’t we just set the path to /usr/local/mystuff and it should figure out the sub directories. That would be nice, but it doesn’t quite work in general because it’s not guaranteed that those are the subdirectories. There could be, for example. lib64, for example, right? Or some other so architecture-specific subdirectory names. Or people can just name them whatever they want. So, this may be marginal, but it is possible. You need to keep in mind that the subdirectory structure is not necessarily fixed.

So we need two settings. The way I thought about this, if you compile C code, you also have two settings. And if you think about it, it’s exactly the same thing. When you compile C code, you always have to do -I and -L: I for the include files, L for the lib files. This is basically the same thing. The include file is also the text file that describes the interfaces and the libraries are the libraries. Again, you need two options, because you can’t just tell the compiler, oh, look for it in /usr/local because the subdirectories could be different. There could be architecture specific lib directories. That’s a common case. You need those two settings. Usually they go in parallel. If somebody has a plan on how to do it simpler, follow up patches are welcome.

But the main point of why this approach was taken is also to get it done in a few months. I started thinking about this, or I was contacted about this in September and I started thinking about it seriously in the October/November timeframe. That’s quite late in the development cycle to start a feature like this, which I thought would be more controversial! People haven’t really complained that this breaks the security of extensions or anything like that. I was a little bit afraid of that.

So I wanted to really base it on an existing facility that we already had, and that’s why I wanted to make sure it works exactly in parallel to the other path that we already have, and that has existed for a long time, and was designed for this exact purpose. That was also the reason why we chose this path of least resistance, perhaps.

This is the solution progress for the six problems that I described initially. The CloudNativePG folks obviously have accompanied this project actively and have already prototyped the integration solution. And, and presumably we will hear about some of that at the meeting on May 7th, where Gabriele will talk about this.

Postgres.app I haven’t been in touch with, but one of the maintainers is here, maybe you can give feedback later. Debian is done as I described, and they will also be at the next meeting, maybe there will be some comment on that.

One thing that’s not fully implemented is the the make check issue. I did send a follow-up patch about that, which was a really quick prototype hack, and people really liked it. I’m slightly tempted to give it a push and try to get it into Postgres 18. This is a work in progress, but it’s, there’s sort of a way forward. The local install problem I said is done.

Homebrew, I haven’t looked into. It’s more complicated, and I’m also not very closely involved in the development of that. I’ll just be an outsider maybe sending patches or suggestions at some point, maybe when the release is closer and, and we’ve settled everything.

I have some random other thoughts here. I’m not actively working on these right now, but I have worked on it in the past and I plan to work on it again. Basically the conversion of all the building to Meson is on my mind, and other people’s mind.

Right now we have two build systems: the make build system and the Meson build system, and all the production packages, as far as I know, are built with make. Eventually we wanna move all of that over to Meson, but we want to test all the extensions and if it still works. As far as I know, it does work; there’s nothing that really needs to be implemented, but we need to go through all the extensions and test them.

Secondly — this is optional; I’m not saying this is a requirement — but you may wish to also build your own extensions with Meson. But that’s in my mind, not a requirement. You can also use cmake or do whatever you want. But there’s been some prototypes of that. Solutions exist if you’re interested.

And to facilitate the second point, there’s been the proposal — which I think was well received, but it just needs to be fully implemented — to provide a pkg-config file to build against the server, and cmake and Meson would work very well with that. Then you can just say here’s a pkg-config file to build against the server. It’s much easier than setting all the directories yourself or extracting them from pg_config. Maybe that’s something coming for the next release cycle.

That’s what I had. So extension_control_path is coming in Postgres 18. What you can do is test and validate that against your use cases and and help integration into the downstream users. Again, if you’re sort of a package or anything like that, you know, you can make use of that. That is all for me.

Thank you!

Questions, comments

Reading the comments where several audience members suggested Peter follows Conference Driven Development he confirmed that that’s definitely a thing.
Someone asked for the “requirements gathering document”. Peter said that that’s just a big word for “just some notes I have”. “It’s not like an actual document. I called it the requirements gathering. That sounds very formal, but it’s just chatting to various people and someone at the next table overheard us talking and it’s like, ‘Hey! I need that too!’”
Christoph: I tried to get this fixed or implemented or something at least once over the last 10 something-ish years, and was basically shot down on grounds of security issues if people mess up their system. And what happens if you set the extension path to something, install an extension, and then set the path to something else and then you can’t upgrade. And all sorts of weird things that people can do with their system in order to break them. Thanks for ignoring all that bullshit and just getting it done! It’s an administrator-level setting and people can do whatever they want with it.

So what I then did is just to implement that patch and, admittedly I never got around to even try to put it upstream. So thanks David for pushing that ahead. It was clear that the Debian version of the patch wasn’t acceptable because it was too limited. It made some assumptions about the direct restructure of Debian packages. So it always included the prefix in the path. The feature that Peter implemented solves my problem. It does solve a lot of more problems, so thanks for that.
Peter: Testing all extensions. What we’ve talked about is doing this through the Debian packaging system because the idea was to maybe make a separate branch or a separate sub-repository of some sort, switch it to build Meson, and rebuild all the extension packages and see what happens. I guess that’s how far we’ve come. I doesn’t actually mean they all work, but I guess that most of them has tests, so we just wanted to test, see if it works.

There are some really subtle problems. Well, the ones I know of have been fixed, but there’s some things that certain compilation options are not substituted into the Makefiles correctly, so then all your extensions are built without any optimizations, for example, without any -O options. I’m not really sure how to detect those automatically, but at least, just rebuild everything once might be an option. Or just do it manually. There are not thousands of extensions. There are not even hundreds that are relevant. There are several dozens, and I think that’s good coverage.
Christoph: I realize that doing it on the packaging side makes sense because we all have these tests running. So I was looking into it. The first time I tried, I stopped once I realized that Meson doesn’t support LLVM yet; and the second time I tried, I just diff-ed the generated Makefiles to see if there’s any difference that looks suspicious. At thus point I should just continue and do compilation run and see what the tests are doing and and stuff.

So my hope would be that I could run diff on the results; the problem is compiling with Postgres with Autoconf once and then with Meson the second time, then see if it has an impact on the extensions compiled. But my idea was that if I’m just running diff on the two compilations and there’s no difference, there’s no point in testing because they’re identical anyway.
Peter Oooh, you want the actual compilation, for the Makefile output to be the same.
Christoph: Yeah. I don’t have to run that test, But the diff was a bit too big to be readable. There was lots of white space noise in there. But there were also some actual changes. Some were not really bad, like9 in some points variables were using a fully qualified path for the make directory or something, and then some points not; but, maybe we can just work on making that difference smaller and then arguing about correctness is easier.
Peter: Yeah, that sounds like a good approach.
Jakob: Maybe I can give some feedback from Postgres.app. So, thank you very much. I think this solves a lot of problems that we have had with extensions over the years, especially because it allows us to separate the extensions and the main Postgres distribution. For Postgres.app we basically have to decide which extensions to include and we can’t offer additional extensions when people ask for them without shipping them for everyone. So that’s a big win.

One question I am wondering about is the use case of people building their own extensions. As far as I understand, you have to provide the prefix/ And one thing I’m wondering whether there is there some way to give a default value for the prefix. Like in pg_config or in something like that, so people who just type make install automatically get some path.
Peter: That might be an interesting follow on. I’m making a note of it. I’m not sure how you’d…
Jakob: I’m just thinking because a big problem is that a lot of people who try things don’t follow the instructions for the specific Postgres. So for example, if we write documentation how to build extensions and people on a completely different system — like people Google stuff and they get instruction — they’ll just try random paths. Right now, if you just type make install, it works on most systems because it just builds into the standard directories.
Peter: Yeah, David puts it like, “should there be a different default extension location?” I think that’s probably not an unreasonable direction. I think that’s something we should maybe think about, once this is stabilized. I think for your Postgres.app use case, it, I think you could probably even implement that yourself with a one or two line patch so that at least, if you install Postgres.app, then somebody tries to build an extension, they get a reasonable location.
David: If I could jump in there, Jakob, my assumption was that Postgres.app would do something like designate the Application Support directory and Preferences in ~/Library as where extensions should be installed. And yeah, there could be some patch to PGXS to put stuff there by default.
Jakob: Yeah, that would be nice!
Peter: Robert asked a big question here. What do we think the security consequences of this patch? Well, one of the premises is that we already have dynamic_library_path, which works exactly the same way, and there haven’t been any concerns about that. Well, maybe there have been concerns, but nothing that was acted on. If you set the path to somewhere where anybody can write stuff, then yeah, that’s not so good. But that’s the same as anything. Certainly there were concerns as I read through the discussion.

I assumed somebody would hav security questions, so I really wanted to base it on this existing mechanism and not invent something completely new. So far nobody has objected to it [Chuckles]. But yeah, of course you can make a mess of it if you go into that extension_control_path = /tmp! That’s probably not good. But don’t do that.
David: That’s I think in part the xz exploit kind of made people more receptive to this patch because we want to reduce the number of patches that packaging maintainers have to maintain.
Peter: Obviously this is something people do. Better we have one solution that people then can use and that we at least we understand, as opposed to everybody going out and figuring out their own complicated solutions.
David: Peter, I think there are still some issues with the behavior of MODULEDIR from PGXS and directory in the control file that this doesn’t quite work with this extension. Do you have some thoughts on how to address those issues?
Peter: For those who are not following: there’s an existing, I guess, rarely used feature that, in the control file, you can specify directory options, which then specifies where other files are located. And this doesn’t work the way you think it should maybe it’s not clear what that should do if you find it in a path somewhere. I guess it’s so rarely used that we might maybe just get rid of it; that was one of the options.

In my mental model of how the C compiler works, it sets an rpath on something. If you set an absolute rpath somewhere and you know it’s not gonna work if you move the thing to a different place in the path. I’m not sure if that’s a good analogy, but it sort of has similar consequences. If you hard-code absolute path, then path search is not gonna work. But yeah, that’s on the list I need to look into.
David: For what it’s worth, I discovered last week that the part of this patch where you’re stripping out $libdir and the extension make file that was in modules, I think? That also needs to be done when you use rpath to install an extension and point to extensions today with Postgres 17. Happy to see that one go.
Christoph: Thanks for fixing that part. I was always wondering why this was broken. The way it was broken. It looked very weird and it turned out it was just broken and not me not understanding it.
David: I think it might have been a documentation oversight back when extensions were added at 9.1 to say this is how you list the modules.

Anyway, this is great! Im super excited for this patch and where it’s going and the promise for stuff in the future. Just from your list of the six issues it addresses, it’s obviously something that covers a variety of pain points. I appreciate you doing that.
Peter: Thank you!

Many thanks and congratulations wrap up this call.

The next Mini-Summit is on April 9, Christoph Berg (Debian, and also Cybertec) will join us to talk about Apt Extension Packaging.

Mini Summit 2: Extension Search Path Patch

2025-03-24T21:14:27Z

This Wednesday, March 26 at noon America/New_York (16:00 UTC), Peter Eisentraut has graciously agreed to give a talk at the Extension Mini Summit #2 on the extension search path patch he recently committed to PostgreSQL. I’m personally stoked for this topic, as freeing extensions from the legacy of a single directory opens up a number of new patterns for packaging, installation, and testing extensions. Hit the Meetup to register for this live video conference, and to brainstorm novel uses for this new feature, expected to debut in PostgreSQL 18.

2025 Postgres Extensions Mini Summit One

2025-03-24T20:46:58Z

Back on March 12, we hosted the first in a series of PostgreSQL Extensions Mini Summits leading up to the Extension Ecosystem Summit at PGConf.dev on May 13. I once again inaugurated the series with a short talk on the State of the Extension Ecosystem. The talk was followed by 15 minutes or so of discussion. Here are the relevant links:

And now, with many thanks to Floor Drees for the effort, the transcript from the session.

Introduction

Floor Drees introduced the organizers:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

David presented a State of the Extension Ecosystem at this first event, and shared some updates from PGXN land.

The stream and the closed captions available for the recording are supported by PGConf.dev and their gold level sponsors, Google, AWS, Huawei, Microsoft, and EDB.

State of the Extensions Ecosystem

So I wanted to give a brief update on the state of the Postgres extension ecosystem, the past, present, and future. Let’s give a brie history; it’s quite long, actually.

There were originally two approaches back in the day. You could use shared preload libraries to have it preload dynamic shareable libraries into the main process. And then you could do pure SQL stuff using, including procedural languages like PL/Perl, PL/Tcl, and such.

And there were a few intrepid early adopters, including PostGIS, BioPostgres, PL/R, PL/Proxy, and pgTAP, who all made it work. Beginning of Postgres 9.1 Dimitri Fontaine added support for explicit support for extensions in the Postgres core itself. The key features included the ability to compile and install extensions. This is again, pure SQL and shared libraries.

There are CREATE, UPDATE, and DROP EXTENSION commands in SQL that you can use to add extensions to a database, upgrade them to new versions and to remove them. And then pg_dump and pg_restore support so that extensions could be considered a single bundle to be backed up and restored with all of their individual objects being included as part of the backup.

Back then, a number of us, myself included, saw this as an opportunity to have the extensibility of Postgres itself be a fundamental part of the community and distribution. I was a long time user of Perl and used CPAN, and I thought we had something like CPAN for Postgres. So, I proposed PGXN, the PostgreSQL Extension Network, back in 2010. The idea was to do distribution of source code. You would register namespaces for your extensions.

There was discovery via a website for search, documentation published, tags to help you find different kinds of objects, and to support installation through a command line interface. The compile and install stuff that Postgres itself provides, using PGXS and Configure.

This is what PGXN looks like today. It was launched in 2011. There’s a command line client, this website, an API an a registry you can upload your extensions to. The most recent one was pg_task a day or so ago.

In the interim, since that came out in 2011/2012, the cloud providers have come into their own with Postgres, but their support for extensions tends to be rather limited. For non-core extension counts, as of yesterday, Azure provides 38 extensions, GCP provides 44 extensions, and AWS 51. These are the third party extensions that don’t come with Postgres and its contrib itself. Meanwhile, PGXN has 420 extensions available to download, compile, build, and install.

A GitHub project that tracks random extensions on the internet, (joelonsql/PostgreSQL-EXTENSIONs.md), which is pretty comprehensive, has almost 1200 extensions listed. So the question is why is the support not more broad? Why aren’t there a thousand extensions available in every one of these systems?

Rthis has been a fairly common question that’s come up in the last couple years. A number of new projects have tired to fill in the gaps. One is Trusted Language Extensions. They wanted to make it easier to distribute extensions without needing dynamic shared libraries by adding additional features in the database itself.

The idea was to empower app developers to make it easy to install extensions via SQL functions rather than having to access the file system of the database server system itself. It can be portable, so there’s no compilation required, it hooks into the create extension command transparently, supports custom data types, and there have been plans for foreign data wrappers and background workers. I’m not sure how that’s progressed in the past year. The pg_tle extension itself was created by AWS and Supabase.

Another recent entrant in tooling for extensions is pgrx, which is native Rust extensions in Postgres. You build dynamic shared libraries, but write them in pure Rust. The API for pgrx provides full access to Postgres features, and still provides the developer-friendly tooling that Rust developers are used to. There’s been a lot of community excitement the last couple of years around pgrx, and it remains under active development — version 0.13.0 just came out a week or so ago. It’s sponsored and run out of the PgCentral Foundation.

There have also been a several new registries that have come up to try to fill the gap and make extensions available. They have emphasized different things than PGXN. One was ease of use. So, for example, here pgxman says it should be really easy to install a client in a single command, and then it installs something, and then it downloads and installs a binary version of your an extension.

And then there was platform neutrality. They wanted to do binary distribution and support multiple different platform, to know what binary∑ to install for a given platform. They provide stats. PGXN doesn’t provide any stats, but some of them are list stats like how many downloads we had, how many in the last 180 days.

And curation. Trunk is another binary extension registry, from my employer, Tembo. They do categorization of all the extensions on Trunk, which is at 237 now. Quite a few people have come forward to tells us that they don’t necessarily use Trunk to install extensions, but use them to find them, because the categories are really helpful for people to figure out what sorts of things are even available, and an option to use.

So here’s the State of the Ecosystem as I see it today.

There have been some lost opportunities from the initial excitement around 2010. Extensions remain difficult to find and discover. Some are on PGXN, some are on GitHub, some are on Trunk, some are on GitLab, etc. There’s no like one place to go to find them all.
They remain under-documented and difficult to understand. It takes effort for developers to write documentation for their extensions, and a lot of them aren’t able to. Some of them do write the documentation, but they might be in a format that something like PGXN doesn’t understand.
The maturity of extensions can be difficult to gauge. If you look at that list of 1200 extensions on GitHub, which ones are the good ones? Which ones do people care about? That page in particular show the number of stars for each extension, but that the only metric.
They’re difficult to configure and install. This is something TLE really tried to solve, but the uptake on TLE has not been great so far, and it doesn’t support all the use cases. There are a lot of use cases that need to be able to access the internal APIs of Postgres itself, which means compiling stuff into shared libraries, and writing them in C or Rust or a couple of other compiled languages.

That makes them difficult to configure. You have ask questions lik: Which build system do I use? Do I install the tooling? How do I install it and configure it? What dependencies does it have? Et cetera.
There’s no comprehensive binary packaging. The Postgres community’s own packaging systems for Linux — Apt, and YUM — do a remarkably good job of packaging extensions. They probably have more extensions packaged for those platforms than any of the others. If they have the extension you need and you’re using the PGDG repositories, then this stuff is there. But even those are still like a fraction of all the potential available extensions that are out there.
Dependency management can be pretty painful. It’s difficult to know what you need to install. I was messing around yesterday with the PgSQL HTTP extension, which is a great extension that depends on libcurl. I thought maybe I could build a package that includes libcurl as part of it. But then I realized that libcurl depends on other packages, other dynamic libraries. So I’d have to figure out what all those are to get them all together.

A lot of that goes away if you use a system like apt or yum. But if you, if you don’t, or you just want to install stuff on your Mac or Windows, it’s much more difficult.
Centralized source distribution, we’ve found found, is insufficient. Even if all the extensions were available on PGXN, not everybody has the wherewithal or the expertise to find what they need, download it, compile it, and build it. Moreover, you don’t want to have a compiler on your production system, so you don’t want to be building stuff from source on your production system. So then you have to get to the business of building your own packages, which is a whole thing.

But in this state of the extension ecosystem we see new opportunities too. One I’ve been working on for the past year, which we call “PGXN v2”, is made possible by my employer, Tembo. The idea was to consider the emerging patterns — new registries and new ways of building and releasing and developing extensions — and to figure out the deficiencies, and to engage deeply with the community to work up potential solutions, and to design and implement a new architecture. The idea is to serve the community for the next decade really make a PGXN and its infrastructure the source of record for extensions for Postgres.

In the past year, I did a bunch of design work on it. Here’s a high level architectural view. We’d have a root registry, which is still the source code distribution stuff. There’s a web UX over it that would evolve from the current website. And there’s a command line client that knows how to build extensions from the registry.

But in addition to those three parts, which we have today, we would evolve a couple of additional parts.

One is “interactions”, so that when somebody releases a new extension on PGXN, some notifications could go out through webhooks or some sort of queue so that downstream systems like the packaging systems could know something new has come out and maybe automate building and updating their packages.
There could be “stats and reports”, so we can provide data like how many downloads there are, what binary registries make them available, what kinds of reviews and quality metrics rate them. We can develop these stats and display those on the website.
And, ideally, a “packaging registry” for PGXN to provide binary packages for all the major platforms of all the extensions we can, to simplify the installation of extensions for anybody who needs to use them. For extensions that aren’t available through PGDG or if you’re not using that system and you want to install extensions. Late last year, I was focused on figuring out how t build the packaging system.

Another change that went down in the past year was the Extension Ecosystem Summit itself. This took place at PGConf.Dev last May. The idea was for a community of people to come together to collaborate, examine ongoing work in the extension distribution, examine challenges, identify questions, propose solutions, and agree on directions for execution. Let’s take a look at the topics that we covered last year at the summit.

One was extension metadata, where the topics covered included packaging and discoverability, extension development, compatibility and taxonomies as being important to represent a metadata about extensions — as well as versioning standards. One of the outcomes was an RFC for version two of the PGXN metadata that incorporates a lot of those needs into a new metadata format to describe extensions more broadly.
Another topic was the binary distribution format and what it should look like, if we were to have major, distribution format. We talked about being able to support multiple versions of an extension at one time. There was some talk about the Python Wheel format as a potential precedent for binary distribution of code.

There’s also an idea to distribute extensions through Docker containers, also known as the Open Container Initiative. Versioning came up here, as well. One of the outcomes from this session was another PGXN RFC for binary distribution, which was inspired by Python Wheel among other stuff.

I wanted to give a brief demo build on that format. I hacked some changes into the PGXS Makefile to add a new target, trunk that builds a binary package called a “trunk” and uploads it to an OCI registry for distribution. Here’s what it looks like.
- On my Mac I was compiling my semver extension. Then I go into a Linux container and compile it again for Linux using the make trunk command. The result is two .trunk files, one for Postgres 16 on Darwin and one for Postgres 16 on Linux.
- There are also some JSON files that are annotations specifically for OCI. We have a command where we can push these images to an OCI registry.
- Then we can then use an install command that knows to download and install the version of the build appropriate for this platform (macOS). And then I go into Linux and do the same thing. It also knows, because of the OCI standard, what the platform is, and so it installs the appropriate binary.
Another topic was ABI and API compatibility. There was some talk at the Summit about what is the definition of an ABI and an API and how do we define internal APIs and their use? Maybe there’s some way to categorize APIs in Postgres core for red, green, or in-between, something like that. There was desire to have more hooks available into different parts of the system.

One of the outcomes of this session was that I worked with Peter Eisentraut on some stability guidance for the API and ABI that is now committed in the docs. You can read them now on in the developer docs, they’ll be part of the Postgres 18 release. The idea is that minor version releases should be safe to use with other minor versions. If you compiled your extension against one minor version, it should be perfectly compatible with other minor versions of the same major release.

Interestingly, there was a release earlier this year, like two weeks after Peter committed this, where there was an API break. It’s the first time in like 10 years. Robert Treat and I spent quite a bit of time trying to look for a previous time that happened. I think there was one about 10 years ago, but then this one happened and, notably it broke the Timescale database. The Core Team decided to release a fix just a week later to restore the ABI compatibility.

So it’s clear that even though there’s guidance, you should in general be able to rely on it, and it was a motivating factor for the a new release to fix an ABI break, there are no guarantees.

Another thing that might happen is that I proposed a Google Summer of Code project to build an ABI checker service. Peter [embarrassing forgetfulness and misattributed national identity omitted] Geoghegan POC’d an ABI checker in 2023. The project is to take Peter’s POC and build something that could potentially run on every commit or push to the back branches of the project. Maybe it could be integrated into the build farm so that, if there’s a back-patch to an earlier branch and it turns red, they quickly the ABI was broken. This change could potentially provide a higher level of guarantee — even if they don’t end up using the word “guarantee” about the stability of the ABIs and APIs. I’m hoping this happens; a number of people have asked about it, and at least one person has written an application.
Another topic at the summit last year was including or excluding extensions in core. They’ve talked about when to add something to core, when to remove something from core, whether items in contrib should actually be moved into core itself, and whether to move metadata about extensions into catalog. And once again, support for multiple versions came up; this is a perennial challenge! But I’m not aware of much work on these questions. I’m wondering if it’s time for a revisit,
As a bonus item — this wasn’t a formal topic at the summit last year, but it came up many times in the mini-summits — is the challenge of packaging and lookup. There’s only one path to extensions in SHAREDIR. This creates a number of difficulties. Christoph Berg has a patch for a PGDG and Debian that adds a second directory. This allowed the PGDG stuff to actually run tests against extensions without changing the core installation of the Postgres service itself. Another one is Cloud Native Postgres immutability. If that directory is part of the image, for your CloudNative Postgres, you can’t install extensions into it.

It’s a similar issue, for Postgres.app immutability. Postgres.app is a Mac app, and it’s signed by a certificate provided by Apple. But that means that if you install an extension in its SHAREDIR, it changes the signature of the application and it won’t start. They work around this issue through a number of symlink shenanigans, but these issues could be solved by allowing extension to be installed in multiple locations.

Starting with Christoph’s search path patch and a number of discussions we had at PGConf last year, Peter Eisentraut has been working on a search path patch to the core that would work similar to shared preload libraries, but it’s for finding extension control files. This would allow you to have them in multiple directories and it will find them in path.

Another interesting development in this line has been, the CloudNativePG project has been using that extension search path patch to prototype a new feature coming to Kubernetes that allows one to mount a volume that’s actually another Docker image. If you have your extension distributed as an OCI image, you can specify that it be mounted and installed via your CNPG cluster configuration. That means when CNPG spins up, it puts the extension in the right place. It updates the search path variables and stuff just works.

A lot of the thought about the stuff went into a less formal RFC I wrote up in my blog, rather than on PGXN. The idea is to take these improvements and try to more formally specify the organization of extensions separate from how Postgres organizes shared libraries and shared files.

I said, we’re bringing the Extension Summit back! There will be another Extension Summit hosted our team of organizers, myself, Floor, Keith Fiske from Crunchy Data, and Yurii from Omnigres. That will be on May 13th in the morning at PGConf.dev; we appreciate their support.

The idea of these Mini Summits is to bring up a number of topics of interest. Have somebody come and do a 20 or 40 minute talk about it, and then we can have discussion about implications.

Floor mentioned the schedule, but briefly:

March 12: David Wheeler, PGXN: “State of the Extension Ecosystem”
March 24: Peter Eisentraut, Core Team: “Implementing an Extension Search Path”
April 9: Christoph Berg, Debian: “Apt Extension Packaging”
April 23:
May 7: Gabriele Bartolini, CNPG “Extension Management in CloudNativePG”

So, what are your interests in extensions and how they can be improved. There are a lot of potential topics to talk about at the Summit or at these Mini Summits: development tools, canonical registry, how easy it is to publish, continuous delivery, yada, yada, yada, security scanning — all sorts of stuff that could go into conceiving, designing, developing, distributing extensions for Postgres.

I hoe you all will participate. I appreciate you taking the time to listen to me for half an hour. So I’d like to turn it over to, discussion, if people would like to join in, talk about implications of stuff. Also, we can get to any questions here.

Questions, comments, shout-outs

Floor: David, at one point you talked about, metadata taxonomy. If you can elaborate on that a little bit, that’s Peter’s question.

David: So one that people told me that they found useful was one provided by Trunk. So it has these limited number of categories, so if you’re interested in machine learning stuff, you could go to the machine learning stuff and it shows you what extensions are potentially available. They have 237 extensions on Trunk now.

PGXN itself allows arbitrary tagging of stuff. It builds this little tag cloud. But if I look at this one here, you can see this one has a bunch of tags. These are arbitrary tags that are applied by the author. The current metadata looks like this. It’s just plain JSON, and it has a list of tags. The PGXN Meta v2 RFC has a bunch of examples. It’s an evolution of that META.json, so the idea is to have a classifications that includes tags as before, but also adds categories, which are a limited list that would be controlled by the core [he means “root”] registry:

{
  "classifications": {
    "tags": [
      "testing",
      "pair",
      "parameter"
    ],
    "categories": [
      "Machine Learning"
    ]
  }
}

Announcements

Yurii made a number of announcements, summarizing:

There is a new library that they’ve been developing at Omnigres that allows you to develop Postgres extensions in C++. For people who are interested in developing extensions in C++ and gaining the benefits of that and not having to do all the tedious things that we have to do with C extensions: look for Cppgres. Yurii thinks that within a couple of months it will reach parity with pgrx.

David: So it sounds like it would work more closely to the way PGXS and C works. Whereas pgrx has all these additional Rust crates you have to load and like slow compile times and all these dependencies.

Yurii: This is just like a layer over the C stuff, an evolution of that. It’s essentially a header only library, so it’s a very common thing in the C++ world. So you don’t have to build anything and you just include a file. And in fact the way I use it, I amalgamate all the header files that we have into one. Whenever I include it in the project, I just copy the amalgamation and it’s just one file. You don’t have any other build chain associated yet. It is C++ 20, which some people consider new, but by the time it’s mature it’s already five years old and most compilers support it. They have decent support of C++ 20 with a few exclusions, but those are relatively minor. So for that reason, it’s not C++ 23, for example, because it’s not very well supported across compilers, but C++ 20 is.
Yurii is giving a talk about PostgresPM at the Postgres Conference in Orlando. He’ll share the slides and recording with this group. The idea behind PostgresPM is that it takes a lot of heuristics, takes the URLs of packages and of extensions and creates packages for different outputs like for Red Hat, for Debian, perhaps for some other formats in the future. It focuses on the idea that a lot of things can be figured out.

For example: do we have a new version? Well, we can look at list of tags in the Git repo. Very commonly that works for say 80 percent of extensions. Do we need a C compiler? We can see whether we have C files. We can figure out a lot of stuff without packagers having to specify that manually every time they have a new extension. And they don’t have to repackage every time there is a new release, because we can detect new releases and try to build.
Yurii is also running an event that, while not affiliated with PGConf.dev, is strategically scheduled to happen one day before PGConf.dev: Postgres Extensions Day. The Call for Speakers is open until April 1st. There’s also an option for people who cannot or would not come to Montréal this year to submit a prerecorded talk. The point of the event is not just to bring people together, but also ti surface content that can be interesting to other people. The event itself is free.

Make sure to join our Meetup group and join us live, March 26, when Peter Eisentraut joins us to talk about implementing an extension search path.

Extension Ecosystem Summit 2025

2025-04-14T22:48:17Z

I’m happy to announce that some PostgreSQL colleagues and have once again organized the Extension Ecosystem Summit at PGConf.dev in Montréal on May 13. Floor Drees, Yurii Rashkovskii, Keith Fiske will be on hand to kick off this unconference session:

Participants will collaborate to learn about and explore the ongoing work on PostgreSQL development and distribution, examine challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf.dev? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!

Extension Ecosystem Mini-Summit 2.0

We are also once again hosting a series of virtual gatherings in the lead-up to the Summit, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 12 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup (will post updates as the schedule fills in):

March 12: David Wheeler, PGXN: “State of the Extension Ecosystem”
March 24: Peter Eisentraut, Core Team: “Implementing an Extension Search Path”
April 9: Christoph Berg, Debian: “Apt Extension Packaging”
April 23: Celeste Horgan, Sonia Valeja, and Alexey Palazhchenko: “The User POV”
May 7: Gabriele Bartolini, CNPG “Extension Management in CloudNativePG”

Join the meetup for details. These sessions will be recorded and Posted to the PGConf.dev YouTube and we’ll have again detailed transcripts. Many thanks to my co-organizers Floor Drees and Yurii Rashkovskii, as well as the PGConf.dev organizers for making this all happen!

Update 2025-04-14: Added the April 23 session topic and panelists.

Sqitch 1.5.0

2025-01-09T02:30:18Z

Released yesterday: Sqitch v1.5.0. This version the MySQL driver DBD::mysql with DBD::MariaDB, both for its better backward compatibility with MySQL as well as MariaDB driver libraries and for its improved Unicode handling. The Docker image likewise switched to the MariaDB mysql client. I expect no compatibility issues, but you never know! Please file an issue should you find any.

V1.5.0 also features a fixes for Yugabyte deployment, Oracle error handling, existing Snowflake schemas, connecting to MySQL/MariaDB without a database name, and omitting the checkit MySQL/MariaDB function when the Sqitch user lacks sufficient permission to create it. Sqitch now will also complain when deploying with --log-only and a deployment file is missing.

Find it in the usual places:

Many thanks to everyone who has enjoyed using Sqitch and let me know in person, via email Mastodon, bug reports, and patches. It gratifies me how useful people find it.

Should URI::mysql Switch to DBD::MariaDB?

2025-01-01T22:47:31Z

I seek the wisdom of the Perl Monks:

The Sqitch project got a request to switch from DBD::mysql to DBD::MariaDB. DBD::mysql 5’s requirement to build from the MySQL 8 client library provides the impetus for the request, but in poking around, I found a blogs.perl.org post highlighting some Unicode fixes in DBD::MariaDB, as well.

Now, Sqitch likely doesn’t have the Unicode issue (it always works with Perl Unicode strings), but it depends on URI::db to provide the DBI connection string. For MySQL URIs, the URI::mysql dbi_driver method returns mysql.

Should it be changed to return MariaDB, instead? Is there general community consensus that DBD::MariaDB provides better compatibility with both MySQL and MariaDB these days?

I’m also curious what the impact of this change would be for Sqitch. Presumably, if DBD::MariaDB can build against either the MariaDB or MySQL client library, it is the more flexible choice to continue supporting both databases going forward.

Feedback appreciated via PerlMonks or the Sqitch issue.

Update 2025-01-08

URI-db 0.23 uses DBD::MariaDB instead of DBD::mysql for both URI::mysql and URI::MariaDB.

Similarly, Sqitch v1.5.0 always uses DBD::MariaDB when connecting to MySQL or MariaDB, even when using older versions of URI::db. Thanks everyone for the feedback and suggestions!

New JSONPath Feature: SelectLocated

2025-01-01T20:43:50Z

Happy New Year! 🎉🥳🍾🥂

The JSONPath RFC includes a section on defining normalized paths, which use a subset of JSONPath syntax to define paths to the location of a node in a JSON value. I hadn’t thought much about it, but noticed that the serde JSONPath Sandbox provides a “Located” switch adds them to query results. For the sake of complementarity, I added the same feature to the Go JSONPath Playground.

🛝 See it in action with this example, where instead of the default output:

[
  8.95,
  12.99,
  8.99,
  22.99,
  399
]

The located result is:

[
  {
    "node": 8.95,
    "path": "$['store']['book'][0]['price']"
  },
  {
    "node": 12.99,
    "path": "$['store']['book'][1]['price']"
  },
  {
    "node": 8.99,
    "path": "$['store']['book'][2]['price']"
  },
  {
    "node": 22.99,
    "path": "$['store']['book'][3]['price']"
  },
  {
    "node": 399,
    "path": "$['store']['bicycle']['price']"
  }
]

v0.3.0 of the github.com/theory/jsonpath Go package enables this feature via its new SelectLocated method, which returns a LocatedNodeList that shows off a few of the benfits of pairing JSONPath query results with paths that uniquely identify their locations in a JSON value, including sorting and deduplication. It also takes advantage of Go v1.23 iterators, providing methods to range over all the results, just the node values, and just the paths. As a result, v0.3.0 now requires Go 1.23.

The serde_json_path Rust crate inspired the use of LocatedNodeList rather than a simple slice of LocatedNode structs, but I truly embraced it once I noticed the the focus on “nodelists” in the RFC’s overview, which provides this definition:

A JSONPath expression is a string that, when applied to a JSON value (the query argument), selects zero or more nodes of the argument and outputs these nodes as a nodelist.

It regularly refers to nodelists thereafter, and it seemed useful to have an object to which more features can be added in the future. github.com/theory/jsonpath v0.3.0 thererfore also changes the result value of Select from []any to the new NodeList struct, an alias for []any. For now it adds a single method, All, which again relies on Go v1.23 iterators to iterate over selected nodes.

While the data type has changed, usage otherwise has not. One can iterate directly over values just as before:

for _, val := range path.Select(jsonInput) {
    fmt.Printf("%v\n", val)
}

But All removes the need to alias-away the index value with _:

for val := range path.Select(jsonInput).All() {
    fmt.Printf("%v\n", val)
}

I don’t expect any further incompatible changes to the main jsonpath module, but adding these return values now allows new features to be added to the selected node lists in the future.

May you find it useful!

SQL/JSON Path Playground Update

2024-12-31T20:40:32Z

Based on the recently-released Go JSONPath and JSONTree playgrounds, I’ve updated the design and of the SQL/JSON Playground. It now comes populated with sample JSON borrowed from RFC 9535, as well as a selection of queries that randomly populate the query field on each reload. I believe this makes the playground nicer to start using, not to mention more pleasing to the eye.

The playground has also been updated to use the recently-released sqljson/path v0.2 package, which replicates a few changes included in the PostgreSQL 17 release. Notably, the .string() function no longer uses a time zone or variable format to for dates and times.

Curious to see it in action? Check it out!

JSONTree Module and Playground

2024-12-22T21:33:39Z

As a follow-up to the JSONPath module and playground I released last month, I’m happy to announce the follow-up project, called JSONTree. I’ve implemented it in the github.com/theory/jsontree Go package, and built a Wasm-powered browser playground for it.

JSONTree?

While a RFC 9535 JSONPath query selects and returns an array of values from the end of a path expression, a JSONTree compiles multiple JSONPath queries into a single query that selects values from multiple path expressions. It returns results not as an array, but as a subset of the query input, preserving the paths for each selected value.

In other words, it compiles multiple paths into a single tree of selection paths, and preserves the tree structure of the input. Hence JSONTree.

Example

Consider this JSON:

{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      {
        "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      {
        "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 399
    }
  }
}

This JSONPath query:

$..price

Selects these values (playground):

[8.95, 12.99, 8.99, 22.99, 399]

While this JSONPath query:

$..author

Selects (playground):

[
  "Nigel Rees",
  "Evelyn Waugh",
  "Herman Melville",
  "J. R. R. Tolkien"
]

JSONTree compiles these two JSONPaths into a single query that merges the author and price selectors into a single segment, which stringifies to a tree-style format (playground):

$
└── ..["author","price"]

This JSONTree returns the appropriate subset of the original JSON object (playground):

{
  "store": {
    "book": [
      {
        "author": "Nigel Rees",
        "price": 8.95
      },
      {
        "author": "Evelyn Waugh",
        "price": 12.99
      },
      {
        "author": "Herman Melville",
        "price": 8.99
      },
      {
        "author": "J. R. R. Tolkien",
        "price": 22.99
      }
    ],
    "bicycle": {
      "price": 399
    }
  }
}

Note that the original data structure remains, but only for the subset of the structure selected by the JSONPath queries.

Use Cases

A couple of use cases drove the conception and design of JSONPath.

Permissions

Consider an application in which ACLs define permissions for groups of users to access specific branches or fields of JSON documents. When delivering a document, the app would:

Fetch the groups the user belongs to
Convert the permissions from each into JSONPath queries
Compile the JSONPath queries into an JSONTree query
Select and return the permitted subset of the document to the user

Selective Indexing

Consider a searchable document storage system. For large or complex documents, it may be infeasible or unnecessary to index the entire document for full-text search. To index a subset of the fields or branches, one would:

Define JSONPaths the fields or branches to index
Compile the JSONPath queries into a JSONTree query
Select and submit only the specified subset of each document to the indexing system

Go Example

Use the github.com/theory/jsontree Go package together with github.com/theory/jsonpath to compile and execute JSONTree queries:

package main

import (
	"fmt"

	"github.com/theory/jsonpath"
	"github.com/theory/jsontree"
)

func main() {
	// JSON as unmarshaled by encoding/json.
	value := map[string]any{
		"name":  "Barrack Obama",
		"years": "2009-2017",
		"emails": []any{
			"potus@example.com",
			"barrack@example.net",
		},
	}

	// Compile multiple JSONPaths into a JSONTree.
	tree := jsontree.New(
		jsonpath.MustParse("$.name"),
		jsonpath.MustParse("$.emails[1]"),
	)

	// Select from the input value.
	js, err := json.Marshal(tree.Select(value))
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%#v\n", tree.Select(value))
}

And the output:

{"emails":["barrack@example.net"],"name":"Barrack Obama"}

Note that the index position of the selected email was not preserved. Replace New with NewFixedModeTree to create a “fixed mode” JSONTree that preserves index positions by filling gaps with nulls. Its output of the above example would be:

{"emails":[null,"barrack@example.net"],"name":"Barrack Obama"}

Status

The public interface of the jsontree module is quite minimal and stable. But I suspect there may remain some flaws in the merging of JSONPath selectors. Please report bugs via GitHub issues and I’ll get them fixed up ASAP.

Otherwise, please share and enjoy!

Just a Theory

pg_clickhouse 0.2.0

pg_clickhouse 0.1.10

pg_clickhouse 0.1.6

pg_clickhouse 0.1.5

pg_clickhouse v0.1.4

🛠️ PGXN Tools v1.7

Welcome dmjwk

Quick Start

JWK Set

Authorization

Resource

That’s It

🐏 Taming PostgreSQL GUC “extra” Data

Introducing pg_clickhouse

Sqitch 1.6.0: Now with ClickHouse!

Postgres Extensions: Use PG_MODULE_MAGIC_EXT

Adventures in Extension Packaging

Automated Packaging Challenges

Dependency Resolution

A CloudNativePG Side Quest

RFC: Extension Packaging and Lookup

Fun With Dependencies

A Wrinkle

Project Status

Let’s Talk

Auto-Release PostgreSQL Extensions on PGXN

TL;DR

Release your extensions on PGXN

Create an Account

Anatomy of a Distribution

Release It!

Let’s Automate it!

Set up Secrets

Create a Pipeline

Optimizing for PGXN

Add More Metadata

Write Killer Docs

Exclude Files from Release

What’s It All For?

Mini Summit 5 Transcript: Improving the PostgreSQL Extensions Experience in Kubernetes with CloudNativePG

Introduction

Improving the Postgres extensions experience in Kubernetes with CloudNativePG

Discussion

CBOR Tag for JSON Number Strings

2025 GSOC: Mankirat Singh — ABI Compliance Reporting

Mini Summit 5: Extension Management in CNPG

Mini Summit 4 Transcript: The User POV

Introduction

The User POV

Update Your Control Files

Mini Summit 4: The User POV

Fix Postgres strchrnul Compile Error on macOS 15.4

Mini Summit 3 Transcript: Apt Extension Packaging

Introduction

APT Extension Packaging

Questions and comments

Mini Summit 3: APT Extension Packaging

2025 Postgres Extensions Mini Summit Two

Introduction

Implementing an extension search path

Questions, comments

Mini Summit 2: Extension Search Path Patch

2025 Postgres Extensions Mini Summit One

Introduction

State of the Extensions Ecosystem

Questions, comments, shout-outs

Announcements

Extension Ecosystem Summit 2025

Extension Ecosystem Mini-Summit 2.0

Sqitch 1.5.0

Should URI::mysql Switch to DBD::MariaDB?

New JSONPath Feature: SelectLocated

SQL/JSON Path Playground Update

JSONTree Module and Playground

JSONTree?

Example

Use Cases

Permissions

Selective Indexing

Fix Postgres `strchrnul` Compile Error on macOS 15.4