Skip to content

Latest commit

 

History

History
425 lines (337 loc) · 16.3 KB

pgzx-update-0.2.0.mdx

File metadata and controls

425 lines (337 loc) · 16.3 KB
title description image author authorEmail date tags published slug ogImage
pgzx: Version 0.2.0 update
Learn about the latest changes in pgzx
Steffen Siering
steffen@xata.io
07-19-2024
open-source
postgres
fpPgzx
oss
fpPgstream
true
pgzx-update-0.2.0

We introduced pgzx during our last launch week 4 months ago. As part of this summers' Xata launch week we are bumping the version to 0.2.0. We are excited to share some details about the recent improvements.

pgzx is our open-source framework for developing PostgreSQL extensions in Zig. In case it's the first time hearing about pgzx and you are intrigued, you can find more details in our "Introducing pgzx" blog post.

We made quite a number of improvements since the first update. Unfortunately this also includes some breaking changes. Let's have a look at the highlights for this release:

Function Manager

The Function Manager interface allows you to register functions with PostgreSQL so you can call them from SQL.

In this iteration, we've added support for optional Zig error returns to PG_FUNCTION_V1. With this improvement you can now return an error from your function like so:

comptime {
    pgzx.PG_FUNCTION_V1("hello_world", hello_world);
}

pub fn hello_world(name: ?[:0]const u8) ![:0]const u8 {
    return if (name) |n|
        try std.fmt.allocPrintZ(pgzx.mem.PGCurrentContextAllocator, "Hello, {s}!", .{n})
    else
        "Hello World";
}

If the PostgreSQL MemoryContext runs out of memory, our allocator will capture the error and return PGErrorStack to indicate that a Postgres error was reported. The Zig error will be returned by the hello world function. The wrapper generated by PG_FUNCTION_V1 will then do a longjump to pass the error back to PostgreSQL.

Exporting a number of functions can become a bit tedious when you want to export multiple functions. In that case you might want to use PG_EXPORT. PG_EXPORT will export all public functions from the struct given:

comptime {
    pgzx.PG_EXPORT(sqlfns);
}

const sqlfns = struct {
    pub fn hello_world(name: ?[:0]const u8) ![:0]const u8 {
        return if (name) |n|
            try std.fmt.allocPrintZ(pgzx.mem.PGCurrentContextAllocator, "Hello, {s}!", .{n})
        else
            "Hello World";
    }
};

We have added a sample extension called sqlfns that demonstrates multiple ways to write a SQL function in Zig. The sample extension documents different ways to export a function, accept arguments and return values.

Server Programming Interface (SPI) support

The PostgreSQL Server Programming Interface (SPI) is a C API for extensions that simplifies access to the database.

With this release we improve our SPI support.

You can find a sample extension using the pgzx.spi module in the examples/spi_sql folder.

With this release we also introduced the pgzx.spi.query function that returns a Rows data type for you to iterate over the result set. When creating the Rows we do capture the SPI_tuptable result allowing you to run other queries while iterating over the rows.

Although rows has a scan method for converting the current record into Zig types, we recommend you use queryTyped which returns a typed RowsOf iterator. The RowsOf(T).next method unpacks the record into a new value of the given type.

This query returns a single integer or an error if the cname field is unknown:

    const QUERY = "SELECT id FROM catalog.cluster_nodes WHERE cname = $1";
    var rows = try pgzx.spi.queryTyped(i32, QUERY, .{
        .limit = 1,
        .args = .{
            .types = &[_]pg.Oid{pg.TEXTOID},
            .values = &[_]pg.NullableDatum{try toNullableDatum(name)},
        },
    });
    defer rows.deinit();

    if (try rows.next()) |id| {
        return @intCast(id);
    }

    if (!missing_ok) {
        try elog.ereportNoJump(@src(), .Error, .{
            elog.errcode(pg.ERRCODE_UNDEFINED_OBJECT),
            elog.errmsg("cluster \"{s}\" does not exist", .{name}),
        });
    }
    return pg.InvalidOid;

You can also use queryTyped with a struct:

    const NodeInfo = struct {
        server: pg.Oid,
        foreign_schema: [:0]const u8,
    };

    const QUERY = "SELECT cnserver, cnfschema FROM cluster_nodes WHERE ccluster = $1 ORDER BY random()";
    var rows = try pgzx.spi.queryTyped(NodeInfo, QUERY, .{
        .limit = 1,
        .args = .{
            .types = &[_]pg.Oid{pg.OIDOID},
            .values = &[_]pg.NullableDatum{
                try pgzx.datum.toNullableDatum(cluster.id),
            },
        },
    });
    defer rows.deinit();
    const random_node = try rows.next();

The SPI interface requires you to connect and finish to the SPI subsystem. Internally SPI will maintain a stack of "SPI frames". Often at times you will find yourself writing:

    pgzx.spi.connect();
    defer pgzx.spi.finish();

If you want to return unprocessed Rows from a function, you must not use pgzx.spi.finish yet. Either the caller needs to establish the SPI stack frame or you have to transfer ownership of the stack frame (meaning someone must call finish).

Alternatively you can use the pgzx.spi.OwnedSPIFrameRows type:

    pgzx.spi.connect();
    errdefer pgzx.spi.finish();

    var rows = pgzx.spi.queryTyped(u32, ..., .{});
    return rows.owned();

When the caller uses rows.deinit() we will automatically call pgzx.spi.finish() for you.

Error logging API overhaul (pgzx.elog)

We still support the helper functions elog.Error, elog.Warning, elog.Notice and friends. We also introduced WithCause variants like:

pub fn WarningWithCause(src: SourceLocation, cause: anyerror, comptime fmt: []const u8, args: anytype) void;

The WithCause variants will check the cause argument. If the error code is PGErrorStack we emit the captured error report and emit the new report. If it is any other Zig error code, we append it to the message. These utilities are mostly for convenience and nice to have when adding logs for debugging purposes.

The underlying error reporting API has been rewritten to follow the C API more closely. For example:

    elog.ereport(@src(), .Error, .{
        elog.errcode(pg.ERRCODE_UNDEFINED_OBJECT),
        elog.errmsg("Foreign schema \"{s}\" does not exist", .{foreign_schema_name}),
    });

Originally we've been considering using the builder pattern for the elog API, but we found it a little inflexible when combining the builder with the C-API. Plus we had been forced to allocate buffers for messages in advance, even if PostgreSQL would have discarded the message.

The new API works similar to the C-API. In the example given ereport will check if the message will be emitted or discarded by calling errstart. The errcode and errmsg functions will only be executed if errstart returns true.

How does this work? Well, we use a mix of comptime and a pattern known to the Go community as "Functional Options". In essence, in the functional options pattern, a function accepts a variable number of arguments where each argument implements a common interface. The accepted argument type can be a function or an actual Go interface. The receiver might establish an internal 'context' and calls each passed argument with the given context. Once all options have updated the 'context', it is used as configuration of the function body.

In the elog API the errcode and errmsg functions are the functional options. The errmsg type will capture the format string and the arguments, but it is up to ereport to actually call errmsg, which will finally use std.fmt to format the message and pass it to the C-API.

The FmtMessage(C-function, format, args) function is a generic function that implements the functional options interface:

    fn FmtMessage(comptime msgtype: anytype, comptime fmt: []const u8, comptime Args: type) type {
        return struct {
            args: Args,

            pub inline fn call(self: @This()) void {
                var memctx = mem.getErrorContextThrowOOM();
                const msg = std.fmt.allocPrintZ(memctx.allocator(), fmt, self.args) catch unreachable();
                _ = msgtype(msg.ptr);
            }
        };
    }

Like all formatting functions in Zig, we capture the format string as a comptime argument and return a new Zig type. The type only captures the arguments at runtime.

Now we can implement errmsg and others:

pub inline fn errmsg(comptime fmt: []const u8, args: anytype) FmtMessage(pg.errmsg, fmt, @TypeOf(args)) {
    return .{ .args = args };
}

pub inline fn errdetail(comptime fmt: []const u8, args: anytype) FmtMessage(pg.errdetail, fmt, @TypeOf(args)) {
    return .{ .args = args };
}

pub inline fn errdetail_log(comptime fmt: []const u8, args: anytype) FmtMessage(pg.errdetail_log, fmt, @TypeOf(args)) {
    return .{ .args = args };
}

pub inline fn errhint(comptime fmt: []const u8, args: anytype) FmtMessage(pg.errhint, fmt, @TypeOf(args)) {
    return .{ .args = args };
}

And ereport decides if and when to format the string:

pub inline fn ereport(src: SourceLocation, level: Level, opts: anytype) void {
    ereportDomain(src, level, null, opts);
}

pub inline fn ereportDomain(src: SourceLocation, level: Level, domain: ?[:0]const u8, opts: anytype) void {
    if (errstart(level, domain)) {
        inline for (opts) |opt| opt.call();
        errfinish(src, .{ .allow_longjmp = true }) catch unreachable;
    }
}

Once our ereport is compiled, it behaves exactly like the C macros (minus using the zig string formatting functionality). The errstart function checks if the message will be emitted and sets up the context in the error stack. The given options update the active error context and errfinish finally emits the error message. Depending on the log level, errfinish might also execute a longjump or even abort the current process.

Zig structs are awesome in so many ways. One rarely needs builders, let alone the functional options pattern. Yet it seems it is a viable pattern for logging and tracing APIs that are supposed to have low overhead when logging is disabled.

Zig update

Zig 0.13 is the latest stable release, but the Zig toolchain, including the compiler, build system, and standard library, is still in development and breaking changes do happen every now and then. For this reason we currently do not focus on a specific Zig version, but try to follow the latest Zig changes in the master branch.

The Nix based development shell uses zig-overlay in conjunction with the flake.lock file to pin the zig toolchain version to a recent commit ID.

We update the dependency every so often and we try to test and fix breaking changes when updating the toolchain version. We highly recommend using the projects development shell when testing the example extensions provided, otherwise you might have problems compiling the extensions if you use an older or newer Zig toolchain.

We understand not everyone is keen to install Nix locally. For getting to know the environment, you can build and run a development shell in a local docker container. Use ./dev/docker/build.sh to build the container and ./dev/docker/run.sh to start the dockerized development shell. Once you are inside the containers development shell you can use ./ci/setup.sh and ./ci/run.sh to build and run the projects tests.

Foreign Data Wrapper

We initiated support for writing Foreign Data Wrappers (FDW) in Zig. The support is still very rudimentary and you still have to write C-style functions in Zig. We plan to provide a more Zig idiomatic API in the future.

When writing an FDW you export 2 functions, the handler function that sets up a list of callbacks for the FDW framework and a validator function that verifies your FDWs options when users configure the server, user mapping, or foreign table. The pgzx.fdw modules provides some helpers to declaratively define the options your foreign data wrapper supports.

The options API also allows you to combine multiple configurations in case you read a list of supported options from a library that you use for your implementation.

In this example we combine a static set of options with options that we read from libpq the first time the validator is called:

const pgzx = @import("pgzx");
const fdw = pgzx.fdw;

// postgres_fdw options not related to connection or transaction handling.
const postgres_fdw_options = fdw.OptionList.init(&[_]fdw.Option{
    fdw.Option.String("schema_name", pg.ForeignTableRelationId),
    fdw.Option.String("table_name", pg.ForeignTableRelationId),
    fdw.Option.String("column_name", pg.AttributeRelationId),

    ...
});

// internal options for connection and transaction handling.
const xact_conn_options = fdw.OptionList.init(&[_]fdw.Option{
    fdw.Option.Bool("keep_connections", pg.ForeignServerRelationId),
    fdw.Option.Bool("password_required", pg.UserMappingRelationId),
});


// libpq connection options.
var extension_libpq_options: ?[]fdw.Option = undefined;


fn pgdc_fdw_validator(list_datum: pg.Datum, catalog: pg.Oid) void {
    initPqOptions(); // read libpq options into extension_libpq_options

    const extension_options = fdw.MultiOptionList.init(&[_]fdw.OptionList{
        postgres_fdw_options,
        xact_conn_options,
        fdw.OptionList.init(extension_libpq_options.?),
    });

    const options = pg.untransformRelOptions(list_datum);
    fdw.validateOptions(extension_options, options, catalog);
}

fn initPqOptions() void {
    if (extension_libpq_options != null)
        return;

    extension_libpq_options = ...;
}

The handler function can be implemented as follows:

fn pgdc_fdw_handler() pg.Datum {
    return pg.PointerGetDatum(pgzx.node.create(pg.FdwRoutines{
          .BeginForeignScan = ...,
          .IterateForeignScan = ...,
          .EndForeignScan = ...,
          ...
    }));
}

Other changes

  • Add some more utilities to pgzx.meta module.
  • We updated the pgzx.datum module to also accept the type OID for a value as a hint to the converter. This is especially important when working with strings, as different strings types might be encoded differently. We also export a number of helper functions for converting between pg.Datum and Zig strings in case the derived converter does not work well for you.
  • Collections: Added at to List and valuesIterator to HTab.
  • add pgzx.node module for working with the PostgreSQL Node type.
  • Build: We moved the C header includes into its own Zig module pgsys'.
  • Do not use PG_HOME environment variable in Zig build utilities. Instead the build utilities derive the PostgreSQL installation folder by looking up the parent folder of pg_config --bindir. The projects shell scripts still use PG_HOME to locate the local PostgreSQL installation folder in the development shell.
  • pgzx.shmem: Add registerHooks and registerSharedState convenience functions.
  • We currently do not include all headers available, but rather selectively add new headers when needed. If you are missing a header, please open a PR or an issue on the pgzx Github repository. The full set of headers from the foreign, nodes, optimizer, and catalog folder have been added.

We'd love to hear from you. If you have any feedback, suggestions or questions, please open an issue in our GitHub repo, reach out to us on Discord or follow us on X / Twitter.

You can follow along this week with a new announcement every day on our launch week page or just pop into Discord and say hi 👋