What is PL/Rust?

This documentation is under development.

PL/Rust is a loadable procedural language that enables writing PostgreSQL functions in the Rust programming language. These functions are compiled to native machine code. Unlike other procedural languages, PL/Rust functions are not interpreted.

The top advantages of PL/Rust include writing natively-compiled functions to achieve the absolute best performance, access to Rust's large development ecosystem, and Rust's compile-time safety guarantees.

PL/Rust is Open Source and actively developed on GitHub.

Features

PL/Rust provides access to Postgres' Server Programming Interface (SPI) including dynamic queries, prepared statements, and cursors. It also provides safe Rust types over most of Postgres built-in data types, including (but not limited to), TEXT, INT, BIGINT, NUMERIC, FLOAT, DOUBLE PRECISION, DATE, TIME, etc.

On x86_64 and aarch64 systems PL/Rust can be a "trusted" procedural language, assuming the proper compilation requirements are met. On other systems, it is perfectly usable as an "untrusted" language but cannot provide the same level of safety guarantees.

Example PL/Rust function

The following example shows an example PL/Rust function to count the length of an input string. See PL/Rust Functions and Arguments for more examples.

CREATE FUNCTION strlen(name TEXT)
    RETURNS int LANGUAGE plrust AS
$$
    Ok(Some(name.unwrap().len() as i32))
$$;

Using the function is just like any other PostgreSQL function.

SELECT strlen('Hello, PL/Rust');
┌────────┐
│ strlen │
╞════════╡
│     14 │
└────────┘

Built on pgrx

PL/Rust itself is a pgrx-based Postgres extension. Furthermore, each LANGUAGE plrust function are themselves mini-pgrx extensions. pgrxis a generalized framework for developing Postgres extensions with Rust. Like this project, pgrx is developed by TCDI.

The following sections discuss PL/Rusts safety guarantees, configuration settings, and installation instructions.

General Safety, by Rust

Quoted from the "Rustonomicon":

Safe Rust is the true Rust programming language. If all you do is write Safe Rust, you will never have to worry about type-safety or memory-safety. You will never endure a dangling pointer, a use-after-free, or any other kind of Undefined Behavior (a.k.a. UB).

This is the universe in which PL/Rust functions live. If a PL/Rust function compiles it has these guarantees, by the Rust compiler, that it won't "crash." This quality is important for natively-compiled code running in a production database.

What about unsafe?

PL/Rust uses the Rust compiler itself to wholesale disallow the use of unsafe in user functions. If a LANGUAGE plrust function uses unsafe it won't compile.

Generally, what this means is that PL/Rust functions cannot call unsafe fns, cannot call extern "C"s into Postgres itself, and cannot dereference pointers.

This is accomplished using Rust's built-in #![forbid(unsafe_code)] lint.

3rd-party crate dependencies are allowed to use unsafe. We'll discuss this below.

What about pgrx?

If pgrx is a "generalized framework for developing Postgres extensions with Rust", and if PL/Rust user functions are themselves "mini-pgrx extensions", what prevents a LANGUAGE plrust function from using any part of pgrx?

The plrust-trusted-pgrx crate does! The plrust-trusted-pgrx crate is a tightly-controlled "re-export crate" on top of pgrx that exposes the bare minimum necessary for PL/Rust user functions to compile along with the bare minimum, safe features of pgrx.

The crate is versioned independently to both pgrx and plrust and is published on crates.io. By default, the version a plrust user function will use is that of the one set in the project repository when plrust itself is compiled. However, the plrust.trusted_pgrx_version GUC can be set to specify a specific version.

The intent is that plrust-trusted-pgrx can evolve independently of both pgrx and plrust.

There are a few "unsafe" parts of pgrx exposed through plrust-trusted-pgrx, but PL/Rust's ability to block unsafe renders them useless by PL/Rust user functions. plrust-trusted-pgrx's docs are available on docs.rs.

What about Rust compiler bugs?

PL/Rust uses its own rustc driver which enables it to apply custom lints to the user's LANGUAGE plrust function. In general, these lints will fail compilation if the user's code uses certain code idioms or patterns which we know to have "I-Unsound" issues.

PL/Rust contains a small set of lints to block what the developers have deemed the most egregious "I-Unsound" Rust bugs.

Should new Rust bugs be found, and detection lints are developed for PL/Rust, the lints can be applied to new user function compilations along with ensuring that future function executions had those lints applied at compile time.

Note that this is done on a best-effort basis, and does not provide a strong level of security — it's not a sandbox, and as such, it's likely that a skilled hostile attacker who is sufficiently motivated could find ways around it (PostgreSQL itself is not a particularly hardened codebase, after all). You should ensure such actors cannot execute SQL on your database, but to be clear: this is true regardless of whether or not PL/Rust is installed. Having said that, any issues found with our implementation will be taken seriously, and should be reported appropriately.

Trusted with postgrestd on Linux x86_64/aarch64

The "trusted" version of PL/Rust uses a unique fork of Rust's std entitled postgrestd when compiling LANGUAGE plrust user functions. postgrestd is a specialized Rust compilation target which disallows access to the filesystem and the host operating system. The Install PL/Rust section outlines the steps required for trusted install of PL/Rust. Currently, postgrestd is only supported on Linux x86_64 and aarch64 platforms.

When plrust user functions are compiled and linked against postgrestd, they are prohibited from using the filesystem, executing processes, and otherwise interacting with the host operating system.

In order for PL/Rust to use postgrestd, its Rust compilation targets must be installed on the Postgres server. This happens via plrust's plrust/build script, which clones postgrestd, compiles it, by default, for both x86_64 and aarch64 architectures, and ultimately places a copy of the necessary libraries used by Rust for std into the appropriate "sysroot", which is the location that rustc will look for building those libraries.

The trusted Feature Flag

PL/Rust has a feature flag simply named trusted. When compiled with the trusted feature flag PL/Rust will always use the postgrestd targets when compiling user functions. Again, this is only supported on x86_64 and aarch64 Linux systems. postgrestd and the trusted feature flag are not supported on other platforms. As such, PL/Rust cannot be considered fully trusted on those platforms.

If the trusted feature flag is not used when compiling PL/Rust, which is the default, then postgrestd is not used when compiling user functions, and while they'll still benefit from Rust's general compile-time safety checked, forced usage of the plrust-trusted-pgrx crate, and PL/Rust's unsafe blocking, they will be able to access the filesystem and communicate with the host operating system, as the user running the connected Postgres backend (typically, this is a user named postgres).

PL/Rust is also a Cross Compiler

In this day and age of sophisticated and flexible Postgres replication, along with cloud providers offering Postgres on, and replication to, disparate CPU architectures, it's important that plrust, since it stores the user function binary bytes in a database table, support running that function on a replicated Postgres server of a different CPU architecture.

cross compilation has entered the chat

By default, PL/Rust will not perform cross compilation. It must be installed and enabled through configuration.

Configuring a host to properly cross compile is a thing that can take minimal effort to individual feats of heroic effort. Reading the (still in-progress) pgrx cross compile guide can help. Generally speaking, it's not too awful to setup on Debian-based Linux systems, such as Ubuntu. Basically, you install the "cross compilation toolchain" apt package for the other platform.

Install PL/Rust Prerequisites

These instructions explain how to install PL/Rust on a typical OS PostgreSQL installation installed using the OS' package manager. These instructions include steps for trusted and untrusted plrust and are tested using Ubuntu 22.04 and PostgreSQL 15. PostgreSQL 15 for this document is installed using apt using the pgdg repository. See the PostgreSQL apt wiki page for instructions.

Steps to install PL/Rust:

  • Prerequisites
  • Install Rust
  • Install pgrx
  • Install PL/Rust
  • Create amazing things!

Prerequisites

PL/Rust requires PostgreSQL and all prerequisites outlined for pgrx are installed.

PL/Rust also requires that any databases in which it's created is UTF8. Postgres' default encoding is determined by the locale of the environment when initdb is first run. Depending on your operating system configuration, this may not resolve to UTF8.

Building PL/Rust from source requires installing cargo-pgrx which requires a development toolchain capable of building Postgres itself.

Permissions

Installing PL/Rust with these instructions installs rustc, pgrx, and plrust as the Linux postgres user. The postgres user is created during the standard PostgreSQL installation via apt. For pgrx to successfully install plrust, the postgres user needs ownership of the extension and lib directories. The standard Ubuntu locations are indicated below.

sudo chown postgres -R /usr/share/postgresql/15/extension/
sudo chown postgres -R /usr/lib/postgresql/15/lib/

These permissions are later reset back to being owned by root in the Reset Permissions section.

Install rustc

Installing PL/Rust requires that the rustc compiler is available to the user installing it. Switch to the postgres Linux user and change into its home directory.

sudo su - postgres

The typically installation for rustc uses curl and rustup. If you want to install rustc without using rustup see the Other Rust installation methods page.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

The rustup installer prompts for an installation choice. The default installation (1) should work for most use cases.

1) Proceed with installation (default)
2) Customize installation
3) Cancel installation

After installing rust, use exit to log out and back in to the postgres account. This ensures your terminal is using the newly installed rustc installation.

# Log in  as postgres fresh with rustc installed
exit
sudo su - postgres

Clone plrust and check Rust version

PL/Rust is installed from source code using pgrx. This installation requires that pgrx is compiled using a specific version of rustc. The rustc version required for PL/Rust is defined in the project's rust-toolchain.toml. The steps below ensure the proper versions are used.

Clone the plrust repo from GitHub and change into the plrust/plrust directory. Running rustc -V in this location is used to verify the version reported is by rustc -V is the version defined by PL/Rust.

git clone https://github.com/tcdi/plrust.git
cd plrust/plrust
rustc -V

The output from rustc -V should look similar to the following example.

rustc 1.72.0 (90c541806 2023-05-31)

Use rustup default to check that the explicit version of rustc is selected. You need to see the version number reported in by rustc -V in your rustup default output.

rustup default

The expected output is below.

1.72.0-x86_64-unknown-linux-gnu (default)

If rustup default returns a different version number or stable, set the default version as shown below and check that the output updates accordingly.

rustup default 1.72.0
rustup default

Be careful with Rust versions

WARNING! The stable version of rustc cannot be used to install Trusted PL/Rust. This is the case even when the stable version is identical to the tagged version number, such as 1.72.0.

The above checks of rustc -V and rustup default are important to follow before installing pgrx and PL/Rust. You must install pgrx with the version of rustc that plrust expects in the rust-toolchain.toml. Failing to do so will result in a mismatched version error in a subsequent step.

A misconfigured rustup default results in errors when creating functions with trusted PL/Rust. The error can manifest as a problem in the postgrestd linking with the following error. This happens because Rust makes a distinction between the latest stable version of Rust, and the actual version of the stable release (e.g. 1.72.0), even when they refer to the same release.

Error loading target specification: Could not find specification for target "x86_64-postgres-linux-gnu".

Install pgrx

The PL/Rust extension is built and installed using pgrx. Install pgrx with the --locked option. This step takes a few minutes.

cargo install cargo-pgrx --locked

Pgrx needs to be initialized for use with the PostgreSQL installation. This is done using pgrx init. This step needs to know where your pg_config file is located at. If you have a standard Ubuntu apt installation of PostgreSQL with a single version of PostgreSQL installed you can use the generic /usr/bin/pg_config path.

cargo pgrx init --pg15 /usr/bin/pg_config

Output from cargo pgrx init looks like the following example. You may notice it mentions information about a new data directory under your user's ~/.pgrx/ directory. This does not replace your PostgreSQL instance's data directory. The ~/.pgrx/data-15/ directory is there in case you run cargo pgrx run pg15, which would use this custom data directory, not your installation's data directory.

   Validating /usr/bin/pg_config
 Initializing data directory at /var/lib/postgresql/.pgrx/data-15

The generic pg_config used above will not work for all installations, such as if you have both PostgreSQL 14 and 15 installed on one instance. In these cases you should specify the exact pg_config file for your installation.

cargo pgrx init --pg14 /usr/lib/postgresql/14/bin/pg_config

The instructions on this page have setup the prerequisite software required to install PL/Rust. The next section, Install PL/Rust, finishes the installation process.

Install PL/Rust

This section provides steps on how to proceed with installing PL/Rust. There are three ways to install PL/Rust. Most users will want to install trusted PL/Rust.

These instructions assume you have followed the Install Prerequisites section and are logged in as the postgres Linux user. Install PL/Rust by following the installation steps under your method of choice below. Then visit the configuration subsection and give PL/Rust a try!

Untrusted install

To install untrusted PL/Rust use cargo pgrx install without --features trusted. See the trusted install if you wish to install the trusted PL/Rust instead.

cargo pgrx install --release -c /usr/bin/pg_config

Continue on to configuring PostgreSQL for PL/Rust.

Trusted install

The trusted installation requires postgrestd and a few additional Rust dependencies. First install the additional dependencies. This example uses x86_64 and ensures the target is installed. If you are using aarch64, update the command accordingly.

Linux:

rustup component add llvm-tools-preview rustc-dev
rustup target install x86_64-unknown-linux-gnu

macOS:

rustup component add llvm-tools-preview rustc-dev
rustup target install x86_64-apple-darwin

Change into the plrust/plrustc directory to build plrustc. Move the generated binary into ~/.cargo/bin/.

cd ~/plrust/plrustc
./build.sh
mv ~/plrust/build/bin/plrustc ~/.cargo/bin/

Note: The path ~/.cargo/bin/ is the default path used by PL/Rust. This can be overridden using plrust.PATH_override, see PostgreSQL Config.

Change into the plrust/plrust/ directory and run the build process to install postgrestd. This example is for installing PL/Rust on x86_64 architecture, switch to aarch64 if using that architecture instead.

cd ~/plrust/plrust
PG_VER=15 \
    STD_TARGETS="x86_64-postgres-linux-gnu " \
    ./build

The above step can take quite a few minutes to install postgrestd and run the associated tests. It is not uncommon to see output like the following during the test process.

test tests::tests::pg_plrust_aggregate has been running for over 60 seconds

The final step for trusted PL/Rust installation is to use cargo pgrx install with --features trusted.

cargo pgrx install --release --features trusted -c /usr/bin/pg_config

Continue on to configuring PostgreSQL for PL/Rust.

Choosing a different plrust-trusted-pgrx dependency at compile time

When a user creates a LANGUAGE plrust function, PL/Rust first generates a small Cargo crate for the function. That crate has a dependency on plrust-trusted-pgrx. By default, plrust-trusted-pgrx comes from crates.io, using the same version as PL/Rust itself.

It is possible to override this dependency when compiling PL/Rust itself so that PL/Rust will use a different plrust-trusted-pgrx crate. To do this, set an environment variable named PLRUST_TRUSTED_PGRX_OVERRIDE to the full "Cargo.toml"-compatible dependency line, like so:

PLRUST_TRUSTED_PGRX_OVERRIDE="pgrx = { path = '~/code/plrust/plrust-trusted-pgrx', package='plrust-trusted-pgrx' }" \
cargo pgrx install --release --features trusted -c /usr/bin/pg_config

This will instead compile all user functions using this specific plrust-trusted-pgrx, not the default on crates.io. Generally, changing the plrust-trusted-pgrx dependency is only useful for PL/Rust development and CI, not for production deployments, but is worth mentioning as the environment variable will influence how user functions are compiled.

It may also be useful for providing a local patch to plrust-trusted-pgrx if such a need were to arise.

Trusted installation plus cross compilation

Adding cross compilation support to PL/Rust requires a few minor changes to the trusted installation steps above. This section only highlights the changes to make for cross compile support, not the full process.

Linux

As a Linux user with sudo access, install these additional prerequisites.

sudo apt install crossbuild-essential-arm64 crossbuild-essential-amd64

The normal trusted install uses rustup to install one architecture target. Cross compilation support requires both.

rustup component add llvm-tools-preview rustc-dev
rustup target install aarch64-unknown-linux-gnu
rustup target install x86_64-unknown-linux-gnu

macOS

The normal trusted install uses rustup to install one architecture target. Cross compilation support requires both.

rustup component add llvm-tools-preview rustc-dev
rustup target install aarch64-apple-darwin
rustup target install x86_64-apple-darwin

The above environment variables are the default... you can just run ./build. PG_VER=15 currently represents the latest released PostgreSQL version.

Configure and restart PostgreSQL

The PostgreSQL configuration in postgresql.conf must be updated for PL/Rust to function. This section illustrates the minimum required changes so PL/Rust will function. See the PostgreSQL configuration section for more configuration details.

PL/Rust requires shared_preload_libraries includes plrust and that you define plrust.work_dir.

NOTE: PL/Rust with cross compilation support also requires plrust.compilation_targets.

Edit the PostgreSQL configuration file still as the postgres Linux user.

nano /etc/postgresql/15/main/postgresql.conf

Update the configuration with these items. Note that shared_preload_libraries might already be set with a value before you add plrust. Use a comma separated list of extensions to include multiple libraries in this configuration option.

shared_preload_libraries = 'plrust'
plrust.work_dir = '/tmp'

The PostgreSQL service needs to be restarted for the configuration changes to take effect. Exit the postgres user and restart the PostgreSQL service.

exit
sudo systemctl restart postgresql

Reset permissions

In order to install the PL/Rust extension as the postgres users permissions were updated in the Permissions section of the Install PL/Rust Prerequisites section. Change the permissions for the extension and lib folders back to being owned by the root user.

sudo chown root -R /usr/share/postgresql/15/extension/
sudo chown root -R /usr/lib/postgresql/15/lib/

Try it out

Create a plrust database and connect to the plrust database using psql.

PL/Rust only supports databases encoded as UTF8. This is to ensure proper compatibility between Postgres/SQL TEXT (and internal strings) and Rust String and &str types.

sudo -u postgres psql -c "CREATE DATABASE plrust WITH ENCODING = 'utf8' TEMPLATE = 'template0';"
sudo -u postgres psql -d plrust

Create the plrust extension.

CREATE EXTENSION plrust;

If you installed the untrusted PL/Rust you will be warned of that detail in this step.

WARNING:  plrust is **NOT** compiled to be a trusted procedural language

The following example creates a plrust function named plrust.one() that simply returns the integer 1.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT LANGUAGE plrust
AS
$$
    Ok(Some(1))
$$;

Using a function created with PL/Rust is the same as using any other PostgreSQL function. A scalar function like plrust.one() can be used simply like below.

SELECT plrust.one();
┌─────┐
│ one │
╞═════╡
│   1 │
└─────┘

Update PL/Rust

This section explains how to update PL/Rust installations. This assumes you installed PL/Rust following our installation guide and pgrx and PL/Rust are installed using the postgres Linux user.

Update pgrx

A PL/Rust update is often accompanied by an update of the underlying pgrx project. Install the latest version of pgrx. Changing into the plrust folder ensures the rustc version used for installation is the same required by PL/Rust.

Start as a user with sudo access.

sudo chown postgres -R /usr/share/postgresql/15/extension/
sudo chown postgres -R /usr/lib/postgresql/15/lib/
sudo su - postgres
cd ~/plrust
git pull
cargo install cargo-pgrx --locked

Update PL/Rust

Follow these steps to upgrade PL/Rust from GitLab to use the latest release.

Update plrustc, postgrestd and plrust installations.

cd ~/plrust/plrustc
./build.sh
mv ~/plrust/build/bin/plrustc ~/.cargo/bin/

cd ~/plrust/plrust
PG_VER=15 \
    STD_TARGETS="x86_64-postgres-linux-gnu " \
    ./build

cargo pgrx install --release \
    --features trusted \
    -c /usr/bin/pg_config

Exit out of postgres user back to user with sudo.

exit

Restart Postgres, required b/c plrust is in shared_preload_libraries. Set permissions back to default.

sudo systemctl restart postgresql

sudo chown root -R /usr/share/postgresql/15/extension/
sudo chown root -R /usr/lib/postgresql/15/lib/

Rust versions

See the section(s) about Rust versions the the Install PL/Rust section. Pay special attention to the versions defined by PL/Rust, and your system defaults for rustc and rustup.

Install PL/Rust on Debian/Ubuntu

Debian packages for PL/Rust are available for download on the Github Releases page here: https://github.com/tcdi/plrust/releases

Assumptions

The provided Debian packages for PL/Rust make certain assumptions about the environment in which they are installed. Notably, they require PostgreSQL either has been installed or will be installed with the official Postgres Debian packages. Because of this, the PL/Rust Debian packages assume:

  • The user which runs Postgres is postgres
  • The home directory for the postgres user is located at /var/lib/postgresql
  • The postgres user has the ability to create and manage databases and extensions
  • The postgres user has the ability to install Rust and all of the required dependencies

Filename convention

The PL/Rust artifacts that are uploaded to Github releases follow this pattern:

plrust-trusted-<PLRUSTVER>_<RUSTTOOLCHAINVER>-debian-pg<PGMAJORVER>-<DPKGARCH>.deb

Where:

  • PLRUSTVER is the PL/Rust release version
  • RUSTTOOLCHAINVER is the version of Rust+toolchains in which the Debian package was built
  • PGMAJORVER is the major version of PostgreSQL the package targets, such as 13, 14 or 15
  • DPKGARCH is the CPU architecture name according to dpkg, such as arm64 or amd64

Example:

plrust-trusted-1.2.3_1.72.0-debian-pg15-amd64.deb

Preparing the environment

Certain applications, libraries and dependencies must be set up before PL/Rust can be installed from a Debian package.

Note that the following instructions assume sudo capabilities.

System and development requirements

Because PL/Rust is a compiled language, certain libraries and development tools will be required to be installed:

sudo apt-get update && \
sudo apt-get install -y --no-install-recommends \
    build-essential \
    ca-certificates \
    clang \
    clang-11 \
    gcc \
    git \
    gnupg \
    libssl-dev \
    llvm-11 \
    lsb-release \
    make \
    pkg-config \
    wget

Installing Postgres

If Postgres has already been installed with the official Postgres Debian packages, then skip this section and proceed to Installing Rust and components

  1. Set up the official PostgreSQL APT repository:
    echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | \
      sudo tee -a /etc/apt/sources.list.d/pgdg.list > /dev/null
    
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
      gpg --dearmor | sudo tee -a /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg >/dev/null
    
  2. Update APT and install PostgreSQL. Replace XX with the PostgreSQL major version to be installed (e.g. 13, 14, 15):
    sudo apt-get update -y -qq --fix-missing && \
    sudo apt-get install -y --no-install-recommends \
        postgresql-XX \
        postgresql-server-dev-XX
    

Installing Rust and components

A key component of making PL/Rust work is the Rust compiler and components. However, Rust and its tooling must be installed as the postgres user. If the package-required version of Rust and toolchains are already installed and defaulted for the postgres user, then skip this section and proceed to Installing PL/Rust.

Instructions that follow assume the latest version of PL/Rust, which requires 1.72.0 of Rust and toolchain set. If the desired PL/Rust Debian package requires a different version of the toolchain set (as indicated by the filename), then substitute that version in the following instructions.

  1. Switch to the postgres user:
    sudo su -l - postgres
    
  2. If Rust has never been installed as the postgres user, then run the following:
    wget -qO- https://sh.rustup.rs | \
      sh -s -- \
      -y \
      --profile minimal \
      --default-toolchain=1.72.0
    
    source "$HOME/.cargo/env"
    
  3. If Rust has previously been installed as the postgres user, then ensure that the 1.72.0 toolchain is installed and set to default:
    rustup toolchain install 1.72.0
    rustup default 1.72.0
    
  4. Ensure that the rustc-dev component has been installed:
    rustup component add rustc-dev
    
  5. Drop back to your normal user:
    exit
    

Future versions of PL/Rust may require a different version of the Rust toolchain to be installed and set to default. In such an event, step 3 and onward must be repeated with the new required version of the specific toolchain. The filename of the PL/Rust Debian package contains the version of the Rust toolchain it was built with -- see Filename convention for more details.

Installing PL/Rust

With the prerequisites installed and set up, it is time to install the PL/Rust Debian package.

  1. Head to the PL/Rust releases page and download the appropriate version onto the target system. When choosing a destination in which to place the dowloaded Debian package, pick a directory that is readable and writable by the _apt user, such /tmp. Otherwise, the installation might produce an error.
  2. Install the package:
    sudo apt install /tmp/plrust-trusted-X.X.X_1.72.0-debian-pgXX-yourarch.deb
    

The package installation will fail if at least one of the above Rust dependencies are not met.

Service Restart Notice

Newer versions of Debian/Ubuntu may prompt for a restart of certain services during the Debian package installation process, notably the PostgreSQL server service. Take caution in determining the appropriate time for the PostgreSQL service to be restarted. For example, if the GUC settings have not been set up before as outlined in the following Configuring PostgreSQL section, then it may be advisable to delay the restart until after those are set up. It may also be advisable to delay a restart for a system that is already in production.

Regardless of timing, PostgreSQL will need to be restarted in any of the following conditions:

  1. PL/Rust is installed for the first time and GUC additions have been added
  2. PL/Rust is updated on an existing system at some point in the future, with or without GUC changes
  3. Any GUC change is required for an existing PL/Rust setup

Configuring PostgreSQL

In order for PL/Rust to be available to the PostgreSQL server, some new GUC settings must configured. See the PostgreSQL Configuration for PL/Rust for more details on the required setup and other options that may be necessary.

Any configuration changes will require a restart of the PostgreSQL service on the system.

Finishing up

To test if PL/Rust is set up correctly, load up psql as the postgres user

sudo su -l - postgres
psql

and run the following:

CREATE EXTENSION plrust;

Then, create a simple function and try it out:

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT LANGUAGE plrust
AS
$$
    Ok(Some(1))
$$;
SELECT plrust.one();
┌─────┐
│ one │
╞═════╡
│   1 │
└─────┘

Updating

Future versions of PL/Rust can be installed using the same methods described in Installing PL/Rust. One consideration when upgrading is that new versions of PL/Rust may require a different version of the Rust toolchain. See Installing Rust and components for details.

Try PL/Rust with Docker

Giving PL/Rust a try has never been easier with Docker! This document outlines what is required to get a functional Postgres + PL/Rust environment running with just a few commands.

The PL/Rust repository supplies a Dockerfile named Dockerfile.try that contains everything necessary to spin up and test PL/Rust in a target environment.

The following instructions assume a very basic understanding of what Docker is and that it is already installed in the target environment. If Docker is not yet installed, instructions can be found here: https://docs.docker.com/engine/install/

  1. Clone the PL/Rust code, and switch to that directory
  2. From a command line, run the following from the root of the checkout directory (sudo or equivalent may be required):
    docker build -f Dockerfile.try -t tcdi/try-plrust .
    
    Note that this may take a little while to finish.
  3. Once the above has finished, run the following (sudo may be required here):
    docker run -it tcdi/try-plrust
    
  4. There will be some output that the Postgres server has started, and psql prompt will start up in the foreground:
    Type "help" for help.
    
    postgres(plrust)=#
    

That's it! From here, the psql interactive prompt with PL/Rust installed can be used to create and run PL/Rust functions. Here is a very small example to get started:

CREATE FUNCTION plrust.one()
    RETURNS INT LANGUAGE plrust
AS
$$
    Ok(Some(1))
$$;

Creating PL/Rust functions compiles Rust code in the backend, so this may take some time depending on the host's hardware specifications and internet connection speeds. Once this completes, the PL/Rust function can be executed similar to other Postgres functions:

SELECT * FROM plrust.one();

which will provide the following results:

postgres(plrust)=# SELECT * FROM plrust.one();
 one
-----
   1
(1 row)

To exit out of the prompt and the Docker container, type the Postgres command quit:

postgres(plrust)=# quit

Alternate running modes

Running the Docker container using docker run -it tcdi/try-plrust as described above will spin up both the Postgres server in the background and the psql command line utility in the foreground in the same running container. However, the option exists to run the Postgres server only (with PL/Rust installed) so that an alternative Postgres client can be used. To do this, run the following command (which may require sudo or equivalent):

docker run -it -p 5432:5432 tcdi/try-plrust server

This will set up everything that is necessary and run the Postgres server only, binding to TCP port 5432. Output here will be all of the Postgres log entries, including any errors that result from a PL/Rust compilation error. The final server argument in the command indicates that it should launch the server script upon container bootup. In order to connect with an alternative client, the only pieces of information that are required are the Postgres username (postgres), the hostname or IP address (e.g. localhost or 192.168.0.2) and the port (5432). There is no password set for the postgres user in this setup. An example Postgres URI might look like this:

postgres://postgres@localhost:5432

To exit out of server mode, press Ctrl+c in the running Docker container.

Caveats

  • This Dockerfile and resulting image should not be used in production. It does not take many security precautions into consideration. As such, the way Dockerfile.try is constructed should not be considered a best practice as it relates to setting up and securing a Postgres instance with PL/Rust installed.

  • The Postgres data directories, logs and built PL/Rust functions are not persistent and are destroyed upon container termination. Externally mounting Postgres' data, log and function directories is outside the scope of this example.

PL/Rust Functions

PL/Rust functions are created with the standard CREATE FUNCTION syntax and LANGUAGE plrust. This section provides examples how to create a variety of function using PL/Rust.

The basic function structure is shown in the following example.

CREATE FUNCTION funcname (argument-list)
    RETURNS return-type
    -- function attributes can go here
AS $$
    # PL/Rust function body goes here
$$ LANGUAGE plrust;

The body of the function is ordinary Rust code. When the CREATE FUNCTION is ran the Rust code is complied using the pgrx framework. This compile process can take a bit of time. The compile time required is one reason anonymous blocks (DO blocks) are not supported at this time.

The syntax of the CREATE FUNCTION command requires the function body to be written as a string constant. It is usually most convenient to use dollar quoting ($$, see Section 4.1.2.4) for the string constant. If you choose to use escape string syntax E'', you must double any single quote marks (') and backslashes () used in the body of the function (see Section 4.1.2.1).

Basic PL/Rust Example

The following example creates a basic plrust function named plrust.one() to simply returns the integer 1.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    Ok(Some(1))
$$
;

Function with parameters

The following example creates a function named plrust.strlen that accepts one parameter named val. The function returns a BIGINT representing the length of the text in val. The variable names defined in the function definition can be used directly in the Rust code within the function's body.

CREATE OR REPLACE FUNCTION plrust.strlen(val TEXT)
    RETURNS BIGINT
    LANGUAGE plrust
AS $$
    Ok(Some(val.unwrap().len() as i64))
$$;

Using the strlen() function works as expected.

SELECT plrust.strlen('Hello, PL/Rust');
┌────────┐
│ strlen │
╞════════╡
│     14 │
└────────┘

The plrust.strlen function above used unwrap() to parse the text variable. Changing the function definition to include STRICT avoids the need to use unwrap(). The following version of plrust.strlen works the same as above.

    CREATE OR REPLACE FUNCTION plrust.strlen(val TEXT)
    RETURNS BIGINT
    LANGUAGE plrust STRICT
AS $$
    Ok(Some(val.len() as i64))
$$;

Functions with anonymous parameters not allowed

PL/Rust functions with parameters require named parameters. This is different from functions written in other languages, such as SQL where strlen(TEXT, INT) allows the use of $1 and $2 within the function body.

https://www.postgresql.org/docs/current/sql-createfunction.html

The succinct reason anonymous parameters are not allowed is because "It does not align with Rust." Requiring named parameters keeps functionality simple, direct and obvious.

One of the reasons people use Rust is because of the quality of the compiler's feedback on incorrect code. Allowing anonymous parameters would ultimately require transforming the code in a way that would either result in potentially garbled error messages, or arbitrarily restricting what sets of identifiers can be used. Simply requiring identifiers skips all of that.

Calculations

PL/Rust functions can performance calculations, such as converting distance values from feet to miles.

CREATE OR REPLACE FUNCTION plrust.distance_feet_to_miles(feet FLOAT)
    RETURNS FLOAT
    LANGUAGE plrust STRICT
AS $$
    Ok(Some(feet / 5280.0))
$$;

Using the function.

SELECT plrust.distance_feet_to_miles(10000);
┌────────────────────────┐
│ distance_feet_to_miles │
╞════════════════════════╡
│      1.893939393939394 │
└────────────────────────┘

Use dependencies

One of the powerful features of plrust is its ability to define dependencies in the function. The following examples use the faker_rand crate in functions to generate fake text data.

The random_first_name() function returns a random first name using the en_us locale.

CREATE OR REPLACE FUNCTION plrust.random_slogan() RETURNS TEXT
LANGUAGE plrust AS $$
[dependencies]
    faker_rand = "0.1"
    rand = "0.8"
[code]
    use faker_rand::en_us::company::Slogan;
    Ok(Some(rand::random::<Slogan>().to_string()))
$$;
SELECT plrust.random_slogan();
┌─────────────────────────────┐
│        random_slogan        │
╞═════════════════════════════╡
│ Ergonomic composite schemas │
└─────────────────────────────┘
CREATE OR REPLACE FUNCTION plrust.random_company_name(locale TEXT)
    RETURNS TEXT
    LANGUAGE plrust STRICT
AS $$
[dependencies]
    faker_rand = "0.1"
    rand = "0.8"
[code]
    match locale {
        "en_us" => {
            use faker_rand::en_us::company::CompanyName;
            Ok(Some(rand::random::<CompanyName>().to_string()))
        }
        "fr_fr" => {
            use faker_rand::fr_fr::company::CompanyName;
            Ok(Some(rand::random::<CompanyName>().to_string()))
        }
        _ => panic!("Unsupported locale. Use en_us or fr_fr")
    }
$$;
SELECT plrust.random_company_name('en_us') AS en_us,
    plrust.random_company_name('fr_fr') AS fr_fr;
┌────────────┬───────────────┐
│   en_us    │     fr_fr     │
╞════════════╪═══════════════╡
│ Furman Inc │ Élisabeth SEM │
└────────────┴───────────────┘

Function Anatomy

A PL/Rust function is Rust code embedded in an SQL CREATE FUNCTION statement. Behind the scenes, PL/Rust injects the function body into a true Rust function, automatically creating the wrapper function signature along with applying a set of lints.

It's important to understand the surrounding code environment of an individual LANGUAGE plrust function, and this environment is different depending on certain properties of the function itself. The important differences arise around whether the function is declared as STRICT. This is discussed in the STRICT and NULL chapter.

Using a PL/Rust function that converts a TEXT datum to lowercase:

CREATE OR REPLACE FUNCTION lcase(s TEXT) RETURNS TEXT LANGUAGE plrust AS $$
    Ok(Some(s.unwrap().to_lowercase())) 
$$;

PL/Rust then generates the following Rust code:

#![allow(unused)]
fn main() {
mod forbidden {
       #![forbid(deprecated)]
       #![forbid(implied_bounds_entailment)]
       #![forbid(plrust_async)]
       #![forbid(plrust_autotrait_impls)]
       #![forbid(plrust_env_macros)]
       #![forbid(plrust_extern_blocks)]
       #![forbid(plrust_external_mod)]
       #![forbid(plrust_filesystem_macros)]
       #![forbid(plrust_fn_pointers)]
       #![forbid(plrust_leaky)]
       #![forbid(plrust_lifetime_parameterized_traits)]
       #![forbid(plrust_print_macros)]
       #![forbid(plrust_static_impls)]
       #![forbid(plrust_stdio)]
       #![forbid(plrust_suspicious_trait_object)]
       #![forbid(soft_unstable)]
       #![forbid(suspicious_auto_trait_impls)]
       #![forbid(unsafe_code)]
       #![forbid(where_clauses_object_safety)]
    
       #[allow(unused_imports)]
       use pgrx::prelude::*;
    
       #[allow(unused_lifetimes)]
       fn plrust_fn_oid_16384_16404<'a>(
           s: Option<&'a str>,
       ) -> ::std::result::Result<
           Option<String>,
           Box<dyn std::error::Error + Send + Sync + 'static>,
       > {
           Ok(Some(s.unwrap().to_lowercase()))
       }
   }
}

mod forbidden {}

Every PL/Rust function is wrapped with this module and cannot be influenced by users. It exists so that PL/Rust can apply lints to the user's code which will detect forbidden code patterns and idioms at compile time.

#[!forbid(...)]

These are the lints that, if triggered, will fail compilation. These lints are only applied here and are not applied to external dependencies.

use pgrx::prelude::*

A default set of types and traits available to every PL/Rust function. Despite the name, these originate from plrust-trusted-pgrx. plrust-trusted-pgrx is a very small subset of pgrx, the crate upon which PL/Rust and LANGUAGE plrust functions are based.

fn plrust_fn_oid_16384_16404(...) -> ... {}

The function in which the LANGUAGE plrust function body is injected. The naming convention is the literal string plrust_fn_oid_, then the database's OID, an underscore, and the function's OID.

A PL/Rust function author does not need to know this function name and would never have a reason to call it directly, but it's important to know how the name is generated.

Generation of the function's arguments and return type are discussed in more detail in their respective sections.

Ok(Some(s.unwrap().to_lowercase()))

The function body itself. This is injected, unchanged, directly from the body of the CREATE FUNCTION statement.

It's worth nothing that the function body is parsed for syntactic correctness by the Rust crate syn prior to generating the entire block of code outlined here. This means PL/Rust doesn't rely on the compiler for syntax checking -- it happens up-front. As such, syntax errors may report error messages that are sometimes unhelpful.

Function Arguments

PL/Rust function arguments are mapped in the same order declared by the CREATE FUNCTION statement. They'll have the same names and the types will be mapped following the supported data type mappings. Note that the STRICT function property impacts the actual type. This is described below.

Naming

The basic rules for naming are:

  1. Argument names are case-sensitive.
  2. Argument names must also be valid Rust identifiers. It's best to stick with lowercase ASCII in the set [a-z0-9_].
  3. Anonymous argument names are not supported. Procedural Languages such as sql and plpgsql support anonymous arguments where they can be referenced as $1, $2, etc. PL/Rust does not.

Argument Ownership

Except in the case of SQL the TEXT/VARCHAR and BYTEA types, all argument datums are passed to the PL/Rust function as owned, immutable instances.

Quick Code Example

Given a LANGUAGE plrust function like this:

CREATE OR REPLACE FUNCTION lots_of_args(a TEXT, b INT, c BOOL[], d JSON) RETURNS INT STRICT LANGUAGE plrust AS $$
   // ... code goes here ...
$$;

PL/Rust essentially generates a wrapper Rust function like this:

#![allow(unused)]
fn main() {
use pgrx::prelude::*;

fn lots_of_args(a: &str, b: i32, c: Vec<Option<bool>>, d: Json) -> Result<Option<i32>, Box<dyn std::error::Error + Send + Sync + 'static>> {
    // ... code goes here ...
}
}

It is the developer's responsibility to fully implement this function, including returning the proper value.
Note that the above is just an abridged example. The anatomy section describes in detail what really happens.

The section below describes how the STRICT keyword impacts the actual function signature, specifically each argument type.

STRICT and NULL

PL/Rust uses Rust's Option<T> type to represent arguments that might be NULL, plus all return types. A Postgres UDF that is not declared as STRICT means that any of its arguments might be NULL, and PL/Rust is required to account for this at compile time. This means that the actual PL/Rust function argument type is context dependent.

As a Postgres refresher, declaring a function as STRICT (which is not the default) means that if any argument value is NULL then the return value is also NULL. In this case, Postgres elides calling the function.

This distinction allows PL/Rust to optimize a bit. STRICT functions have Rust argument types of T whereas non-STRICT functions have argument types of Option<T>.

Here is the "same" function, the first declared as STRICT, the second not:

CREATE OR REPLACE FUNCTION lcase(s TEXT) RETURNS TEXT STRICT LANGUAGE plrust AS $$
    let lcase = s.to_lowercase();   // `s` is a `&str`
    Ok(Some(lcase)) 
$$;

# SELECT lcase('HELLO WORLD'), lcase(NULL) IS NULL AS is_null;
    lcase    | is_null 
-------------+---------
 hello world | t

CREATE OR REPLACE FUNCTION lcase(s TEXT) RETURNS TEXT LANGUAGE plrust AS $$
    let unwrapped_s = s.unwrap();     // `s` is an `Option<&str>` and will panic if `s` IS NULL
    let lcase = unwrapped_s.to_lowercase();
    Ok(Some(lcase)) 
$$;

# SELECT lcase('HELLO WORLD'), lcase(NULL) IS NULL AS is_null;
ERROR:  called `Option::unwrap()` on a `None` value

Rust programmers likely recognize this error message. When a function is not declared as STRICT, it is the programmer's responsibility to properly handle the possibility of an argument being Option::None.

STRICT is an Immutable Property

PL/Rust requires that a LANGUAGE plrust function's STRICT property be immutable. As such, PL/Rust prohibits ALTERing the STRICT property:

ALTER FUNCTION lcase STRICT;
ERROR:  plrust functions cannot have their STRICT property altered
DETAIL:  Use 'CREATE OR REPLACE FUNCTION' to alter the STRICT-ness of an existing plrust function

Instead, you must CREATE OR REPLACE the function. The reason for this is that the underlying Rust wrapper function's signature will be different and this will require that the code be changed to account for the new argument type (Option<T> or T).

Return Type

Every LANUAGE plrust function has the same general return type, which is:

#![allow(unused)]
fn main() {
fn foo() -> Result<
        Option<T>, 
        Box<dyn std::error::Error + Send + Sync + 'static>
> {
    
}
}

The T is determined by the mapping from the declared SQL type during CREATE FUNCTION, and the rest is essentially boilerplate to allow easy handling of Rust Results and the SQL concept of NULL.

Why Option<T>?

Any PostgreSQL procedural language function can return NULL. PL/Rust understands and represents SQL NULL as Option::None. It may seem cumbersome, but PL/Rust function must return an Option<T>, as either Some(T) (non-null value) or None (NULL value).

While PostgreSQL's STRICT function property can influence the return value such that "any NULL argument guarantees a NULL return", Postgres does not have a way to express that a function "will never return NULL". As such, PL/Rust functions have the opportunity to return NULL built into their underlying function signature.

If a PL/Rust function would never return NULL, always return the Some variant.

Why Result<..., Box<dyn std::error::Error + Send + Sync + 'static>>?

Generally speaking, Postgres procedural language functions, and even Postgres internals, can be considered "fail fast" in that they tend to raise an error/exception at the exact point when it happens. Rust tends towards propagating errors up the stack, relying on the caller to handle it.

PL/Rust bridges this gap by requiring all LANGUAGE plrust functions to return a Result, and PL/Rust itself will interpret the return value from the function and report a Result::Err(e) as a Postgres ERROR, aborting the current transaction.

Returning a Result helps to simplify error handling, especially when a LANGUAGE plrust function uses Spi as the Rust ? operator is usable to propagate errors during function execution.

Since the Rust "Error" type cannot be expressed as part of the CREATE FUNCTION statement, PL/Rust generalizes the error to Box<dyn std::error::Error + Send + Sync + 'static> to provide as much compatability as possible with the wide range of concrete Error types in the Rust ecosystem.

Set Returning Functions

PL/Rust supports both set returning function styles, RETURNS SETOF $type and RETURNS TABLE (...). In both cases, the function returns a specialized Iterator for the specific style.

It's useful to think of set returning functions as returning something that resembles a table, either with one unnamed column (RETURNS SETOF) or multiple, named columns (RETURNS TABLE).

In both cases, the Iterator Item type is an Option<T>, where T is the return type. The reason for this is that PL/Rust needs to allow a returned row/tuple to be NULL (Option::None).

RETURNS SETOF $type

RETURNS SETOF $type returns a "table" with one, unnamed column. Each returned row must be an Option of the return type, either Some(T) or None, indicating NULL.

A simple example of splitting a text string on whitespace, following Rust's rules:

CREATE OR REPLACE FUNCTION split_whitespace(s text) RETURNS SETOF text STRICT LANGUAGE plrust AS $$
    let by_whitespace = s.split_whitespace();   // borrows from `s` which is a `&str`
    let mapped = by_whitespace.map(|token| {
        if token == "this" { None }     // just to demonstrate returning a NULL 
        else { Some(token.to_string()) }
    });
    let iter = SetOfIterator::new(mapped);
    Ok(Some(iter)) 
$$;

PL/Rust generates the following method signature for the above function:

#![allow(unused)]
fn main() {
fn plrust_fn_oid_19691_336344<'a>(
    s: &'a str,
) -> ::std::result::Result< // the function itself can return a `Result::Err`
    Option< // `Option::None` will return zero rows
        ::pgrx::iter::SetOfIterator< // indicates returning a set of values
            'a, // allows borrowing from `s` 
            Option<String>  // and the type is an optional, owned string
        > 
    >,
    Box<dyn std::error::Error + Send + Sync + 'static>, // boilerplate error type
> {
   //  <your code here>
}
}

And finally, its result:

SELECT * FROM split_whitespace('hello world, this is a plrust set returning function');
split_whitespace 
------------------
 hello
 world,
             -- remember we returned `None` for the token "this" 
 is
 a
 plrust
 set
 returning
 function
 (9 rows)

RETURNS TABLE (...)

Returning a table with multiple named (and typed) columns is similar to returning a set. Instead of SetOfIterator, PL/Rust uses TableIterator. TableIterator is a Rust Iterator whose Item is a tuple where its field types match those of the UDF being created:

CREATE OR REPLACE FUNCTION count_words(s text) RETURNS TABLE (count int, word text) STRICT LANGUAGE plrust AS $$
    use std::collections::HashMap;
    let mut buckets: HashMap<&str, i32> = Default::default();
    
    for word in s.split_whitespace() {
        buckets.entry(word).and_modify(|cnt| *cnt += 1).or_insert(1);
    }
    
    let as_tuples = buckets.into_iter().map(|(word, cnt)| {
        ( Some(cnt), Some(word.to_string()) )
    }); 
    Ok(Some(TableIterator::new(as_tuples)))
$$;

PL/Rust generates this function signature:

#![allow(unused)]
fn main() {
fn plrust_fn_oid_19691_336349<'a>(
   s: &'a str,
) -> ::std::result::Result::< // the function itself can return a `Result::Err`
   Option< // `Option::None` will return zero rows
       ::pgrx::iter::TableIterator< // indicates returning a "table" of tuples
           'a,  // allows borrowing from `s`
           (    // a Rust tuple
               ::pgrx::name!(count, Option < i32 >),    // the "count" column, can be "NULL" with `Option::None`
               ::pgrx::name!(word, Option < String >),  // the "word" column, can be "NULL" with `Option::None`
           ),
       >,
   >,
   Box<dyn std::error::Error + Send + Sync + 'static>,
> {
    // <your code here>
}
}

And the results from this function are:

# SELECT * FROM count_words('this is a test that is testing plrust''s SRF support');
 count |   word   
-------+----------
     1 | a
     1 | test
     1 | that
     2 | is
     1 | this
     1 | testing
     1 | SRF
     1 | support
     1 | plrust's
(9 rows)

The important thing to keep in mind when writing PL/Rust functions that RETURNS TABLE is that the structure being returned is a Rust tuple of Option<T>s where each field's T is the return type as specified in the RETURNS TABLE (...) clause.

Data types

PL/Rust provides mappings for many of the built-in PostgreSQL data types. Rust's ownership rules means that these mappings may be different based on their usage. Where it can PL/Rust borrows (zero-copy) arguments and returns owned values.

SQLPL/Rust ArgumentPL/Rust Return Type
NULLOption::NoneOption::None
"char"i8i8
biginti64i64
boolboolbool
boxBOX1BOX
bytea&[u8]Vec<u8>
cstring&CStrCString
dateDateDate
daterangeRange<Date>Range<Date>
double precisionf64f64
int4rangeRange<i32>Range<i32>
int8rangeRange<i64>Range<i64>
integeri32i32
intervalIntervalInterval
jsonJson(serde_json::Value)Json(serde_json::Value)
jsonbJsonB(serde_json::Value)JsonB(serde_json::Value)
numericAnyNumericAnyNumeric
numrangeRange<AnyNumeric>Range<AnyNumeric>
oidOidOid
pointPointPoint
realf32f32
smallinti16i16
text&strString
tidItemPointerDataItemPointerData
time with time zoneTimeWithTimeZoneTimeWithTimeZone
timeTimeTime
timestamp with time zoneTimestampWithTimeZoneTimestampWithTimeZone
timestampTimestampTimestamp
tsrangeRange<Timestamp>Range<Timestamp>
tstzrangeRange<TimestampWithTimeZone>Range<TimestampWithTimeZone>
uuidUuidUuid
varchar&strString
voidn/a()

1: This is Postgres' geometric BOX type, not to be confused with Rust's Box type, which stores allocated data on the heap

No Unsigned Types

Rust programmers may be asking "where are the unsigned types like u32?". PostgreSQL does not have unsigned integer types. As such, neither does PL/Rust.

In order to represent a value larger than i32::MAX, BIGINT is the proper SQL type. To represent a value larger than i64::MAX, use the NUMERIC type. Postgres also has no concept of an isize or usize, so these have no corresponding SQL mapping.

PL/Rust's AnyNumeric type has From and TryFrom implementations for all of Rust's primitive types (plus strings). This makes it fairly straightforward to up-convert a Rust primitive into a SQL NUMERIC:

CREATE OR REPLACE FUNCTION upconvert_bigint(i BIGINT) RETURNS NUMERIC STRICT LANGUAGE plrust AS $$
    let n: AnyNumeric = i.into();   // `i` is an `i64`, lets convert to `AnyNumeric`
    Ok(Some(n + 1))    
$$;

# SELECT upconvert_bigint(9223372036854775807);
upconvert_bigint   
---------------------
 9223372036854775808
(1 row)

Arrays

Arrays of all of supported types are mapped as Array<T> where T is the Rust mapping for the SQL datatype. For example, a SQL BIGINT[] is mapped to Array<i64>, and a TEXT[] is mapped to Array<&str>.

Working with arrays can be slightly cumbersome as Postgres allows NULL as an individual array element. As Rust has no concept of "null", PL/Rust uses Option<T> to represent the SQL idea of "I don't have a value".

CREATE FUNCTION sum_array(a INT[]) RETURNS BIGINT STRICT LANGUAGE plrust AS $$
    let sum = a.into_iter().map(|i| i.unwrap_or_default() as i64).sum();
    Ok(Some(sum))
$$;

# SELECT sum_array(ARRAY[1,2,3]::int[]);
 sum_array 
-----------
         6

Iteration and Slices

Pl/Rust Arrays support slices over the backing Array data if it's an array of a primitive type (i8/16/32/64, f32/64). This can provide drastic performance improvements and even help lead to the Rust compiler autovectorizing code.

Let's examine this using arrays of random FLOAT4 values:

CREATE OR REPLACE FUNCTION random_floats(many int) RETURNS float4[] STRICT PARALLEL SAFE LANGUAGE sql AS $$
    SELECT array_agg(random()) FROM generate_series(1, many)
$$;

CREATE TABLE floats AS SELECT random_floats(1000) f FROM generate_series(1, 100000);

Next, we'll sum the array using a function similar to the above:

CREATE OR REPLACE FUNCTION sum_array(a float4[]) RETURNS float4 STRICT LANGUAGE plrust AS $$
    let sum = a.into_iter().map(|i| i.unwrap_or_default()).sum();
    Ok(Some(sum))
$$;

# explain analyze select sum_array(f) from floats;
QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on floats  (cost=0.00..23161.32 rows=86632 width=4) (actual time=0.064..981.105 rows=100000 loops=1)
 Planning Time: 0.037 ms
 Execution Time: 983.753 ms

Since in this case we know that the input array won't contain null values, we can optimize slightly. This does a fast "O(1)" check for NULLs when creating the iterator, rather than checking each individual element during iteration:

CREATE OR REPLACE FUNCTION sum_array_no_nulls(a float4[]) RETURNS float4 STRICT LANGUAGE plrust AS $$
    let sum = a.iter_deny_null().sum();
    Ok(Some(sum))
$$;

explain analyze select sum_array_no_nulls(f) from floats;
QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on floats  (cost=0.00..26637.00 rows=100000 width=4) (actual time=0.055..672.365 rows=100000 loops=1)
 Planning Time: 0.035 ms
 Execution Time: 676.243 ms

Next, lets take a look at converting the input array into a slice before summing the values. This is particularly fast as it's a true "zero copy" operation:

CREATE OR REPLACE FUNCTION sum_array_slice(a float4[]) RETURNS float4 STRICT LANGUAGE plrust AS $$
    let slice = a.as_slice()?;  // use the `?` operator as not all `Array<T>`s can be converted into a slice
    let sum = slice.iter().sum();
    Ok(Some(sum))
$$;

explain analyze select sum_array_slice(f) from floats;
QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on floats  (cost=0.00..26637.00 rows=100000 width=4) (actual time=0.055..478.635 rows=100000 loops=1)
 Planning Time: 0.036 ms
 Execution Time: 482.344 ms

Finally, lets do some magic to coax the Rust compiler into autovectorizing our "sum_array" function. The code for this comes from, interestingly enough, Stack Overflow: https://stackoverflow.com/questions/23100534/how-to-sum-the-values-in-an-array-slice-or-vec-in-rust/67191480#67191480

CREATE OR REPLACE FUNCTION sum_array_simd(a float4[]) RETURNS float4 STRICT LANGUAGE plrust AS $$
    use std::convert::TryInto;
    
    const LANES: usize = 16;
    
    pub fn simd_sum(values: &[f32]) -> f32 {
        let chunks = values.chunks_exact(LANES);
        let remainder = chunks.remainder();
    
        let sum = chunks.fold([0.0f32; LANES], |mut acc, chunk| {
            let chunk: [f32; LANES] = chunk.try_into().unwrap();
            for i in 0..LANES {
                acc[i] += chunk[i];
            }
            acc
        });
    
        let remainder: f32 = remainder.iter().copied().sum();
    
        let mut reduced = 0.0f32;
        for i in 0..LANES {
            reduced += sum[i];
        }
        reduced + remainder
    }

    let slice = a.as_slice()?;
    let sum = simd_sum(slice);
    Ok(Some(sum))
$$;


explain analyze select sum_array_simd(f) from floats;
QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on floats  (cost=0.00..26637.00 rows=100000 width=4) (actual time=0.054..413.702 rows=100000 loops=1)
 Planning Time: 0.038 ms
 Execution Time: 417.237 ms

User Defined Types

PL/Rust supports using User Defined Types (UDTs; sometimes referred to as "composite types") in LANGUAGE plrust functions. UDTs can be used as arguments and return types.

The general approach with UDTs is to first define one in SQL:

CREATE TYPE person AS (
    name text,
    age  float8
);

person can now be used in any PL/Rust function. To instantiate a new person:

create function make_person(name text, age float8) returns person
    strict parallel safe
    language plrust as
$$
    // create the Heap Tuple representation of the SQL type `person`
    let mut p = PgHeapTuple::new_composite_type("person")?;
    
    // set a few of its attributes
    //
    // Runtime errors can occur if the attribute name is invalid or if the Rust type of the value
    // is not compatible with the backing SQL type for that attribute.  Hence the use of the `?` operator
    p.set_by_name("name", name)?;
    p.set_by_name("age", age)?;
    
    // return the `person`
    Ok(Some(p))
$$;

Individual field accessors for the properties are straight-forward:

create function get_person_name(p person) returns text
    strict parallel safe
    language plrust as
$$
   // `p` is a `PgHeapTuple` over the underlying data for `person`
   Ok(p.get_by_name("name")?)
$$;

create function get_person_age(p person) returns float8
    strict parallel safe
    language plrust as
$$
   // `p` is a `PgHeapTuple` over the underlying data for `person`
   Ok(p.get_by_name("age")?)
$$;

A generic accessor, for example, requires encoding knowledge of the UDT structure, but provides quite a bit of flexibility.

Note that this function returns text. This is a common denominator type to represent the various attribute types used by person. Fortunately, Postgres and PL/Rust have fantastic support for converting values to text/Strings:

create function get_person_attribute(p person, attname text) returns text
    strict parallel safe
    language plrust as
$$
   match attname.to_lowercase().as_str() {
    "age" => {
        let age:Option<f64> = p.get_by_name("age")?;
        Ok(age.map(|v| v.to_string()))
    },
    "name" => {
        Ok(p.get_by_name("name")?)
    },
    _ => panic!("unknown attribute: `{attname}`")
   }
$$;

This lends itself nicely to creating a custom operator to extract a person's named attribute.

create operator ->> (function = get_person_attribute, leftarg = person, rightarg = text);

Tying these pieces together:


-- assume all of the above sql has been executed

create table people
(
    id serial8 not null primary key,
    p  person
);

insert into people (p) values (make_person('Johnny', 46.24));
insert into people (p) values (make_person('Joe', 99.09));
insert into people (p) values (make_person('Dr. Beverly Crusher of the Starship Enterprise', 32.0));

select p ->> 'name' as name, (p ->> 'age')::float8 as age from people;
                      name                      |  age  
------------------------------------------------+-------
 Johnny                                         | 46.24
 Joe                                            | 99.09
 Dr. Beverly Crusher of the Starship Enterprise |    32
(3 rows)

Discussion

In Rust, PgHeapTuple is the type that generically represents all UDTs.

PgHeapTuple provides the ability to construct a new UDT by its SQL name. It also provides attribute getter and setter methods for reading and mutating attributes.

Attributes can be addressed by name or one-based index. Typical errors such as specifying an attribute name that doesn't exist, an index that is out of bounds, or a Rust type for the value that is not compatible with that attribute's SQL type will return a TryFromDatumError. An early-return that error using the ? operator (as demonstrated in the examples above) or matching on the error are both fine ways of handling such errors.

Built-in functions

This page documents many of the high level functions, targeted functionality is covered on dedicated sub-sections.

Functions available

Functions available to PL/Rust are defined under the trusted-pgrx directory in lib.rs. User functions in plrust will not compile if they use the unsafe keyword. There are a handful of functions in trusted-pgrx that are declared unsafe; plrust functions cannot use them because they would need an unsafe {} block.

Datum functions

PL/Rust function support for various Datums are documented by pgrx on docs.rs, the source is on GitHub for those interested. There are Datums defined in pgrx that are not included in PL/Rust because they have not been imported by plrust.

AnyNumeric: A plain PostgreSQL NUMERIC with default precision and scale values. This is a sufficient type to represent any Rust primitive value from i128::MIN to u128::MAX and anything in between.

FromDatum and IntoDatum: Provide conversions between pg_sys::Datum and Rust types.

Json and JsonB match the types in PostgreSQL of the same name.

Date: A plain PostgreSQL DATE type without a time component.

Time / TimeWithTimeZone / Timestamp / TimestampWithTimeZone

Range Support In progress

Logging to PostgreSQL from PL/Rust

PL/Rust provides the ability to log details using PostgreSQL's logging system. This functionality is exposed from pgrx via plrust/plrust-trusted-pgrx/src/lib.rs.

The macros available for logging are defined:

#![allow(unused)]
fn main() {
pub use ::pgrx::{
    debug1, debug2, debug3, debug4, debug5, ereport, error, info, log, notice, warning,
};
}

Basic logging

Using the log!() macro will send the message defined in the function to the PostgreSQL logs defined by your postgresql.conf. Running the following example of plrust.one() creates a LOG record.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    log!("Hello!  Friendly log message here.");
    Ok(Some(1))
$$
;

Running plrust.one() will run normally and the client running the query will be presented with the results. The log!() macro adds the defined log message to the PostgreSQL log file.

The exact contents on the log line created in PostgreSQL's log file depends on your postgresql.conf settings related to logging. The following example is what it may look like.

2023-03-04 16:06:40 UTC [8109]: [15-1] user=postgres,db=plrust,app=psql,client=[local],query_id=-2211430114177040240  LOG:  Hello!  Friendly log message here.

The remainder of logging examples will only show the details controlled by PL/Rust like the following example.

LOG:  Hello!  Friendly log message here.

Logging is not limited to static messages. Values from the function can be included using the {variable} notation. Beware of data types, the i32 value returned by the plrust.one() function needs to be converted .to_string() to include in the logged message string.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    let one_val = 1_i32;
    log!("The plrust.one() function is returning: {one_val}");
    Ok(Some(one_val))
$$
;

When the above function runs, the resulting log line looks like the following.

LOG:  The plrust.one() function is returning: 1

Warnings

Use the warning!() macro to log a more severe message. Warnings are sent to the log file as well as being returned to the client as a WARNING.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    let expected_val = 1_i32;
    let one_val = 2_i32;

    if expected_val != one_val {
        warning!("The value for plrust.one() is unexpected. Found {one_val}")
    } else {
        log!("The plrust.one() function is returning: {one_val}");
    }

    Ok(Some(one_val))
$$
;

The following WARNING message is sent to the PostgreSQL log and to the client.

WARNING:  The value for plrust.one() is unexpected. Found 2

Running the above in psql looks like the following example. You can see the user is presented with the warning message as well as the results showing the one function returning the value 2.

plrust=# select plrust.one();
WARNING:  The value for plrust.one() is unexpected. Found 2
DETAIL:  
 one 
-----
   2
(1 row)

Errors

There are cases when a function simply cannot proceed and these errors need to be logged. The following example changes the warning from the previous example to an error.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    let expected_val = 1_i32;
    let one_val = 2_i32;
    let one_val_str = one_val.to_string();

    if expected_val != one_val {
        error!("Invalid for plrust.one(). Found {one_val_str}")
    } else {
        log!("The plrust.one() function is returning: {one_val_str}");
    }

    Ok(Some(one_val))
$$
;

When PL/Rust runs the error!() macro the message is logged to the log file, returned to the client, and the execution of the function is terminated with a panic. In psql the user sees:

plrust=# select plrust.one();
ERROR:  Invalid for plrust.one(). Found 2
DETAIL:  
plrust=# 

In the PostgreSQL logs the following output is recorded. Notice the panicked line prior to the ERROR reported by the PL/Rust function.

thread '<unnamed>' panicked at 'Box<dyn Any>', /var/lib/postgresql/.cargo/registry/src/github.com-1ecc6299db9ec823/pgrx-pg-sys-0.7.2/src/submodules/panic.rs:160:13
ERROR:  Invalid for plrust.one(). Found 2

Notifying the user

Using notice!() and info!() macros return the message to the client running the query, allowing functions to provide feedback to the user running the query. These options do not log the message to the PostgreSQL logs.

CREATE OR REPLACE FUNCTION plrust.one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    notice!("Hello, this is a notice");
    Ok(Some(1))
$$
;

Running SELECT plrust.one() returns the expected value of 1 along with the defined notice. Using psql returns and example like the following code block.

NOTICE:  Hello, this is a notice
DETAIL:  
┌─────┐
│ one │
╞═════╡
│   1 │
└─────┘

Using ereport

For the most control over logging you can use the ereport!() macro. This is not necessary for most use cases.

CREATE FUNCTION one()
    RETURNS INT
    LANGUAGE plrust
AS
$$
    ereport!(PgLogLevel::LOG,
        PgSqlErrorCode::ERRCODE_SUCCESSFUL_COMPLETION,
        "A user ran the one() function");
    Ok(Some(1))
$$
;

Triggers

PL/Rust functions can be used to define trigger functions on data changes. A trigger function is created using the CREATE FUNCTION command, declaring it as a function with no arguments and a return type of trigger. Trigger variables are available from in trigger to describe the condition that triggered the call and the new and old rows.

PL/Rust trigger support options are documented on docs.rs and defined in the .rs files in the trigger_support directory.

These examples are an expansion of the code from plrust/plrust/src/tests.rs. The elaborations here illustrate additional functionality.

Table for Triggers

Create the plrust.dog table to allow us to keep track of our dogs, and how much attention they have received via a count of scritches.

CREATE TABLE plrust.dog (
    id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT,
    scritches INT NOT NULL DEFAULT 0,
    last_scritch TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

The name column in plrust.dog is the only column without a default value set. The scritches and last_scritch column both have defaults set. The goal of this design is to only have to define the name during INSERT. Subsequent UPDATE queries should only have to update the last_scritch column.

Trigger example

The following example creates a trigger function named plrust.dog_trigger(). The trigger will be used on INSERT and UPDATE with slightly different behavior depending on which operation is being used. This logic is based on the value of trigger.op()?, for INSERT the trigger.new object is used, for UPDATE the trigger.old object is used. This code is explained further after the code block.

CREATE OR REPLACE FUNCTION plrust.dog_trigger()
RETURNS trigger AS
$$
    let tg_op = trigger.op()?;

    let my_row = match tg_op {
        INSERT => trigger.new().unwrap(),
        _ => trigger.old().unwrap()
    };
    let mut my_row = my_row.into_owned();

    let counter_field = "scritches";
    match my_row.get_by_name::<i32>(counter_field)? {
        Some(val) => my_row.set_by_name(counter_field, val + 1)?,
        None => (),
    }

    Ok(Some(my_row))
$$
LANGUAGE plrust;


CREATE TRIGGER dog_trigger
    BEFORE INSERT OR UPDATE ON plrust.dog
    FOR EACH ROW
    EXECUTE FUNCTION plrust.dog_trigger();

The tg_op variable is available from the trigger.op() method and has values of INSERT, UPDATE, DELETE and TRUNCATE. See the definition of PgTriggerOperation for more. The tg_op value is used in a match to define the my_row variable.

#![allow(unused)]
fn main() {
let tg_op = trigger.op()?;

let my_row = match tg_op {
    INSERT => trigger.new().unwrap(),
    _ => trigger.old().unwrap()
};
let mut my_row = my_row.into_owned();
}

With the appropriate my_row identified, the next step is to increment the scritches column by 1. This is defined in the counter_field variable for easy reuse. The get_by_name and set_by_name functions are used for this operation.

#![allow(unused)]
fn main() {
let counter_field = "scritches";
match my_row.get_by_name::<i32>(counter_field)? {
    Some(val) => my_row.set_by_name(counter_field, val + 1)?,
    None => (),
}
}

Finally, the my_row is returned for the operation to proceed.

#![allow(unused)]
fn main() {
Ok(Some(my_row))
}

Next we INSERT a row and then query the table to observe the effects of the trigger.

INSERT INTO plrust.dog (name) VALUES ('Nami');
SELECT * FROM plrust.dog;

The results show that while the DEFAULT value for the scritches column is defined as 0 in the table, the initial value is 1 because trigger updated the value.

 id | name | scritches |         last_scritch          
----+------+-----------+-------------------------------
  1 | Nami |         1 | 2023-03-04 17:30:43.601525+00

If we update the record for Nami by setting the last_scritch value to NOW() the trigger will increment the scritches column value for us.

UPDATE plrust.dog
    SET last_scritch = NOW()
    WHERE id = 1;

SELECT * FROM plrust.dog;
 id | name | scritches |         last_scritch          
----+------+-----------+-------------------------------
  1 | Nami |         2 | 2023-03-04 17:35:05.320482+00

Not yet supported

Event Triggers and DO blocks are not (yet) supported by PL/Rust.

Server Programming Interface (SPI)

PL/Rust provides support for PostgreSQL's SPI.

Error

Result

Spi

Example usage

The following function uses SPI to create a PostgreSQL Set Returning Function (SRF).

CREATE FUNCTION spi_srf()
    RETURNS SETOF BIGINT
    LANGUAGE plrust
AS
$$
    let query = "SELECT id::BIGINT FROM generate_series(1, 3) id;";

    Spi::connect(|client| {
        let mut results = Vec::new();
        let mut tup_table = client.select(query, None, None)?;

        while let Some(row) = tup_table.next() {
            let id = row["id"].value::<i64>()?;
            results.push(id);
        }
        Ok(Some(SetOfIterator::new(results)))
    })

$$;

Complex return types

PL/Rust currently does not support RETURNS TABLE or complex types with RETURNS SETOF.

Dynamic Function Calling

PL/Rust provides the ability to dynamically call any function (callable to the current user) directly from a Rust function. These functions can be in any language, including sql, plpgsql, plrust, plperl, etc.

The call interface is dynamic in that the callee is resolved at runtime and its argument and return types are also checked at runtime. While this does introduce a small bit of overhead, it's significantly less than doing what might be the equivalent operation via Spi.

The ability to dynamically call functions enables users to write functions in the language that makes the most sense for the operation being performed. In many cases, a LANGUAGE plpgsql function is exactly what's needed, and a LANGUAGE plrust function can now use its result to execute further, possibly CPU-intensive, work.

Important Rust Types

This dynamic calling interface introduces two new types that are used to facilitate dynamically calling functions: Arg and FnCallError.

Arg

Arg describes the style of a user-provided function argument.

#![allow(unused)]
fn main() {
/// The kinds of [`fn_call`] arguments.  
pub enum Arg<T> {
    /// The argument value is a SQL NULL
    Null,

    /// The argument's `DEFAULT` value should be used
    Default,

    /// Use this actual value
    Value(T),
}
}

Rust doesn't exactly have the concept of "NULL" nor does it have direct support for overloaded functions. This is where the Null and Default variants come in.

There's a sealed trait that corresponds to this enum named FnCallArg. It is not a trait that users needs to implement, but is used by PL/Rust to dynamically represent a set of heterogeneous argument types.

FnCallError

There's also a set of runtime error conditions if function resolution fails. These are recoverable errors in that user code could match on the return value and potentially make different decisions, or just raise a panic with the error to immediately abort the current transaction.

#![allow(unused)]
fn main() {
/// [`FnCallError`]s represent the set of conditions that could case [`fn_call()`] to fail in a
/// user-recoverable manner.
#[derive(thiserror::Error, Debug, Clone, Eq, PartialEq)]
pub enum FnCallError {
    #[error("Invalid identifier: `{0}`")]
    InvalidIdentifier(String),

    #[error("The specified function does not exist")]
    UndefinedFunction,

    #[error("The specified function exists, but has overloaded versions which are ambiguous given the argument types provided")]
    AmbiguousFunction,

    #[error("Can only dynamically call plain functions")]
    UnsupportedFunctionType,

    #[error("Functions with OUT/IN_OUT/TABLE arguments are not supported")]
    UnsupportedArgumentModes,

    #[error("Functions with argument or return types of `internal` are not supported")]
    InternalTypeNotSupported,

    #[error("The requested return type `{0}` is not compatible with the actual return type `{1}`")]
    IncompatibleReturnType(pg_sys::Oid, pg_sys::Oid),

    #[error("Function call has more arguments than are supported")]
    TooManyArguments,

    #[error("Did not provide enough non-default arguments")]
    NotEnoughArguments,

    #[error("Function has no default arguments")]
    NoDefaultArguments,

    #[error("Argument #{0} does not have a DEFAULT value")]
    NotDefaultArgument(usize),

    #[error("Argument's default value is not a constant expression")]
    DefaultNotConstantExpression,
}
}

Calling a Function

The top-level function fn_call() is what is used to dynamically call a function. Its signature is:

#![allow(unused)]
fn main() {
pub fn fn_call<R: FromDatum + IntoDatum>(
    fname: &str,
    args: &[&dyn FnCallArg],
) -> Result<Option<R>, FnCallError>
}

fn_call itself takes two arguments. The first, fname is the (possibly schema-qualified) function name, as a string.

The second argument, args, is a slice of FnCallArg dyn references (these are written using &Arg::XXX). And it returns a Result<Option<R>, FnCallError>.

An Ok response will either contain Some(R) if the called function returned a non-null value, or None if it did.

An Err response will contain one of the FnCallError variants detailed above, indicating the problem encountered while trying to call the function. It is guaranteed that if fn_call returns an Err, then the desired function was not called.

If the called function raises a Postgres ERROR then the current transaction is aborted and control is returned back to Postgres, not the caller. This is typical Postgres and PL/Rust behavior in the face of an ERROR or Rust panic.

Simple Example

First, define a SQL function that sums the elements of an int[]. We're using a LANGUAGE sql function here to demonstrate how PL/Rust can call functions of any other language:

CREATE OR REPLACE FUNCTION sum_array(a int[]) RETURNS int STRICT LANGUAGE sql AS $$ SELECT sum(e) FROM unnest(a) e $$;

Now, call this function from a PL/Rust function:

CREATE OR REPLACE FUNCTION transform_array(a int[]) RETURNS int STRICT LANGUAGE plrust AS $$
    let a = a.into_iter().map(|e| e.unwrap_or(0) + 1).collect::<Vec<_>>();  // add one to every element of the array
    Ok(fn_call("sum_array", &[&Arg::Value(a)])?)
$$;

SELECT transform_array(ARRAY[1,2,3]);
transform_array 
-----------------
               9
(1 row)

Complex Example

This is contrived, of course, but let's make a PL/Rust function with a few different argument types and have it simply convert their values to a debug-formatted String. Then we'll call that function from another PL/Rust function.

CREATE OR REPLACE FUNCTION debug_format_args(a text, b bigint, c float4 DEFAULT 0.99) RETURNS text LANGUAGE plrust AS $$
    Ok(Some(format!("{:?}, {:?}, {:?}", a, b, c)))  
$$;

SELECT debug_format_args('hi', NULL);
      debug_format_args       
------------------------------
 Some("hi"), None, Some(0.99)
(1 row)

Now, call it from another PL/Rust function using these same argument values. Which is 'hi' for the first argument, NULL for the second, and using the default value for the third:

CREATE OR REPLACE FUNCTION complex_example() RETURNS text LANGUAGE plrust AS $$
    let result = fn_call("debug_format_args", &[&Arg::Value("hi"), &Arg::<i64>::Null, &Arg::<f32>::Default])?;
    Ok(result)    
$$;

SELECT complex_example();
complex_example        
------------------------------
 Some("hi"), None, Some(0.99)
(1 row)

You'll notice here that the Arg::Null and Arg::Default argument values are typed with ::<i64> and ::<f32> respectively. It is necessary for PL/Rust to know the types of each argument at compile time, so that during runtime the proper function can be chosen. This helps to ensure there's no ambiguity related to Postgres' function overloading features. For example, let's overload debug_format_args with a different type for the second argument:

CREATE OR REPLACE FUNCTION debug_format_args(a text, b bool, c float4 DEFAULT 0.99) RETURNS text LANGUAGE plrust AS $$
    Ok(Some(format!("{:?}, {:?}, {:?}", a, b, c)))  
$$;

SELECT debug_format_args('hi', NULL);
ERROR:  42725: function debug_format_args(unknown, unknown) is not unique
LINE 1: SELECT debug_format_args('hi', NULL);
               ^
HINT:  Could not choose a best candidate function. You might need to add explicit type casts.

As you can see, even Postgres can't figure out which debug_format_args function to call as it doesn't know the intended type of the second NULL argument. We can tell it, of course:

SELECT debug_format_args('hi', NULL::bool);
      debug_format_args       
------------------------------
 Some("hi"), None, Some(0.99)
(1 row)

Note that if we call our complex_example function again, now that we've added another version of debug_format_args, it still calls the correct one -- the version with an int as the second argument.

Limitations

PL/Rust does not support dynamically calling functions with OUT or IN OUT arguments. Nor does it support calling functions that return SETOF $type or TABLE(...).

It is possible these limitations will be lifted in a future version.

Trusted and Untrusted PL/Rust

Normally, PL/Rust is installed as a "trusted" programming language named plrust. In this setup, certain Rust and pgrx operations are disabled to preserve security. In general, the operations that are restricted are those that interact with the environment. This includes file handle operations, require, and use (for external modules). There is no way to access internals of the database server process or to gain OS-level access with the permissions of the server process, as a C function can do. Thus, any unprivileged database user can be permitted to use this language.

Here is an example of a function that will not work because file system operations are not allowed for security reasons:

EXAMPLE COMING SOON

The creation of this function will fail as its use of a forbidden operation will be caught by the validator.

Sometimes it is desirable to write Rust functions that are not restricted. To handle these cases, PL/Rust can also be installed as an "untrusted" language. In this case the full Rust language is available including unsafe code. See the Untrusted install section for steps to install untrusted PL/Rust.

The writer of an untrusted PL/Rust function must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Note that the database system allows only database superusers to create functions in untrusted languages.

If the above function was created by a superuser using the untrusted plrust, execution would succeed.

PostgreSQL Configuration for PL/Rust

PL/Rust has two required configuration options and a variety of non-required options. These options are set in the standard postgresql.conf configuration file used by PostgreSQL.

PL/Rust has two required configuration options in untrusted and trusted installations. Using PL/Rust with cross compilation support has a 3rd required configuration option. Failure to set these variables will cause plrust extension to not function.

Required

PL/Rust has two PostgreSQL configuration options that are always required for use,shared_preload_libraries and plrust.work_dir.

shared_preload_libraries (string)

The shared_preload_libraries entry needs to include plrust. This is a comma separated list of libraries that need to be pre-loaded in order to operate properly.

shared_preload_libraries = 'plrust'

plrust.work_dir (string)

The plrust.work_dir must be set to location for PL/Rust to save necessary intermediate files. This path must be writable by the user running the PostgreSQL process, typically postgres on common Linux distributions.

plrust.work_dir = '/tmp'

Additional Configuration Options

plrust.allowed_dependencies (string)

Define the path to a toml file with an allow-list of Rust crates and versions when creating PL/Rust functions. When plrust.allowed_dependencies is not defined, all Rust crates are allowed when creating PL/Rust functions.

For more discussion, see dependencies.md

plrust.path_override (string)

Set this if cargo and cc are not in the postmaster's $PATH.

plrust.path_override = '/special/path/to/.cargo/bin:/usr/bin'

plrust.trusted_pgrx_version (string)

The version of the plrust-trusted-pgrx crate from crates.io to use when compiling user functions. By default, PL/Rust uses the same version of plrust-trusted-pgrx as PL/Rust itself, and does not need to be explicitly set.

plrust.trusted_pgrx_version = 'the.latest.version'

plrust.tracing_level (string)

A tracing directive.

plrust.tracing_level = 'info'

Required for Cross Compilation

plrust.compilation_targets (string)

Using PL/Rust with cross compilation requires the plrust.compilation_targets configuration option. This is required for PL/Rust to cross compile user functions. The plrust.compilation_targets option is a comma-separated list of values, of which only x86_64 and aarch64 are currently supported.

plrust.compilation_targets = 'x86_64, aarch64'

For PL/Rust to cross compile user functions it needs to know which CPU architectures via plrust.compilation_targets. This is a comma-separated list of values, of which only x86_64 and aarch64 are currently supported.

plrust.{arch}_linker (string)

This is the name of the linker rustc should use on for cross-compile. The architecture linker names have sensible defaults and shouldn't need to be be changed (unless the host is some esoteric Linux distribution we have not encountered yet).

Linux defaults:

plrust.x86_64_linker = 'x86_64_linux_gnu_gcc'
plrust.aarch64_linker = 'aarch64_linux_gnu_gcc'

macOS defaults:

plrust.x86_64_linker = 'cc'
plrust.aarch64_linker = 'cc'

plrust.{arch}_pgrx_bindings_path (string)

The plrust.{arch}_pgrx_bindings_path settings are actually required but PL/Rust will happily cross compile without them. If unspecified, PL/Rust will use the pgrx bindings of the host architecture for the cross compilation target architecture too. In other words, if the host is x86_64 and PL/Rust is configured to cross compile to aarch64 and the plrust.aarch64_pgrx_bindings_path is not configured, it'll blindly use the bindings it already has for x86_64. This may or may not actually work.

To get the bindings, install cargo-pgrx on the other system and run cargo pgrx cross pgrx-target. That'll generate a tarball. Copy that back to the primary host machine and untar it somewhere (PL/Rust doesn't care where), and use that path as the configuration setting.

Note that it is perfectly fine (and really, expected) to set all of these configuration settings on both architectures. PL/Rust will silently ignore the one for the current host. In other words, plrust only uses them when cross compiling for the other architecture.

Lints

There are two The PL/Rust configuration options related to lints. These options should not be changed. Altering these configuration options has two main negative side effects. Disabling any of the pre-configured lints removes any and all expectation of PL/Rust being trusted. Changing this option can also prevent upgrading PL/Rust.

See the Lints Configuration section for more details about the purpose of the Lints.

plrust.compile_lints (string)

A comma-separated list of Rust lints to apply to every user function.

plrust.compile_lints = 'plrust_extern_blocks, plrust_lifetime_parameterized_traits, implied_bounds_entailment, unsafe_code, plrust_filesystem_macros, plrust_env_macros, plrust_external_mod, plrust_fn_pointers, plrust_async, plrust_leaky, plrust_print_macros, plrust_stdio, unknown_lints, deprecated, suspicious_auto_trait_impls, soft_unstable, plrust_autotrait_impls'

plrust.required_lints (string)

A comma-separated list of Rust lints that are required to have been applied to a user function before PL/Rust will load the library and execute the function.

Architecture

PL/Rust, a Rust-based extension built using pgrx, provides a procedural language handler for Rust in PostgreSQL. When installed, PL/Rust allows this to work:

CREATE FUNCTION {fn_name} ({args})
RETURNS {ret}
-- function attributes can go here
AS $$
    // PL/Rust function body goes here
    // All PL/Rust functions return Result<Option<{ret}>>
$$ LANGUAGE plrust;

PL/Rust will compile the function as a dynamic library, load it, and execute it.

Trusted Language

In order to create a trusted language handler in PostgreSQL we must restrict the functions compiled and executed by the language handler to the set of operations other code in PostgreSQL has access to.

  • No operations on files except through the database itself
  • Limit access to the database to that of other procedural language functions
  • Limit access to system resources to those of a trusted language user function
  • It must be sound enough to allow any unprivileged database user to use the language (postgresql.org)

Rust

A target tuple describes a "platform" that can execute code. Rust uses rustc, which requires that code is ahead-of-time compiled in order to do code generation, so it requires a target tuple that defines the code object it must generate. A code object has a format (e.g. ELF or Windows PE) which an operating system supports, instructions (e.g. aarch64 or wasm) which a machine architecture supports, and calls to system interfaces to the operating system (such as via GNU libc.so or MacOS libSystem.dylib) which require holistic support. These code objects may be executables (which the system may initialize as a process) or libraries (which may be "linked", relocating code from them into the final executable at build time, or loading their code to call at runtime). Libraries used at build time are also called static libraries, objects, or archives. Libraries used at runtime are also called dynamic libraries or shared objects.

The Rust compiler builds the Rust standard library as a static library and then links it into Rust code objects. The contents of this static library include the code which dynamically links against system interfaces. These system interfaces are what postgrestd intercepts by itself being a modification of the Rust standard library.

The extension called "PL/Rust" which includes the language handler is responsible for covering the linking and loading steps. This extension may have privileges that user functions do not, using the Rust std of the same host target that PostgreSQL itself is compiled for, to interoperate in that privileged mode. This is as-usual for language handlers: they must typically be written in C.

Design Goals

Design a custom rust compilation target for PostgreSQL that provides nearly "safe" (as Rust defines it) and "trusted" (as PostgreSQL defines a procedural language) PL/Rust.

The goals for the approach include

  • Architecture support for x86_64 and aarch64
  • Operating system support for Linux
  • Disallow File Handle operations
  • Disallow access to the internals of the database
  • Disallow access to the OS as the user executing the PostgreSQL process
  • Disallow access into active postmaster process, i.e. no ability to reach into PostgreSQL memory space, despite executing inside it.
  • Gracefully handle rust panics and have them interoperate with PostgreSQL's transaction system
  • Memory allocation within PostgreSQL's palloc/pfree functions

Approach

Following an approach similar to the selection between libc and the musl libc standard library for compilation, a PostgreSQL compilation target is defined that instructs the compiler to use the postgrestd library. The postgrestd library provides the rust standard library interfaces except in the case where it is desirable to prevent access. In those cases the code is configured to be not present. The result is a small shim on top of the rust library limited access to the standard library.

Bird's Eye View

Supporting Crates

Because PL/Rust implements a fairly complicated language with the intention to make it sound to use as a trusted procedural language, there are multiple supporting crates necessary to make it work.

postgrestd

See postgrestd for more details.

Cross-Cutting Concerns

This sections talks about the things which are everywhere and nowhere in particular.

Code generation

PL/Rust uses Cargo for code generation. Each function is built as its own crate to allow it to be individually dynamically loaded (however this is not a strict requirement: multiple functions could be generated together, it's merely a current implementation detail that simplifies some handling).

PL/Rust builds reuse the same build directory to assist in exploiting the existing build caching implemented in Cargo. However, because of the resolver, as soon as dependencies are involved, and because building PL/Rust code involves a nonzero number of default crate dependencies, the exact build graph may vary from build to build even for what appears to be the "same crate" to a programmer, as subtle changes in feature or version resolution can all cause the crate to need to be recompiled.

Cancellation

Testing

Error Handling

Observability

Designing for Trust

A note on definitions

In order to allow building a Postgres extension in Rust, quite a lot of bindings to C code are required, and a language handler is necessarily a Postgres "C" function, which is usually packaged as a Postgres extension. Since the only Postgres extension of true concern for PL/Rust is the language handler and associated components, and most other functions that will be embedded in Postgres will be managed by this extension, I will redefine the difference arbitrarily for this document: the language handler is a Postgres extension, and "Postgres function" will be used to refer to any Postgres function except a language handler.

The goal

Nominally, to make PL/Rust exist: a dialect of Rust nested in SQL that can function as a "trusted procedural language".

The caveat

A major obstacle to making PL/Rust a trustworthy language is that Rust is not an intrinsically safe language.

Again, Rust is not an intrinsically safe language.

There are three major details to this:

  1. Rust has not been formally verified to have all of the safety properties it intends to have. Bugs exist that undoubtedly violate its own design for memory safety. These bugs will eventually be fixed, because there is no soundness bug that is considered a "breaking change", or rather, Rust considers all flaws in its type system that would prevent the type system from verifying memory safety to be acceptable to change and they are explicitly not governed by any stability promises. Nonetheless, Rust is only as safe as its implementation is safe.
  2. Rust is split into two sublanguages: Safe Rust and Unsafe Rust. Most Rust is Safe Rust. An unsafe { } block allows the usage of Unsafe Rust code, and most Unsafe Rust code item declarations are also annotated with unsafe1. It is required to have Unsafe Rust as an implementation primitive in order to be able to specify the behavior of Rust: otherwise it would have to be written in another, also memory-unsafe language. By using both as part of Rust, certain guarantees based in the type system can traverse between Safe and Unsafe Rust and remain intact. Otherwise, the work to prove the type soundness would have to begin entirely within Safe Rust, without the ability to incrementally validate claims. However, this means that Unsafe Rust is always waiting behind all Safe Rust, so the abstraction boundary must be evaluated carefully.
  3. Rust is not safe against all logic errors, nor does it consider all operations to be unsafe that the programmer might think of as unsafe. For instance, Rust considers panic! to be "safe": arguably, it is very not safe for someone if Rust code forms the core of an actively-running flight system for some airplane or helicopter and an uncaught panic terminates the flight system abruptly, rendering it inoperative for sufficiently long that the flight system cannot recover stability even after it reboots. It is also usually considered safe to perform IO on arbitrary files, but a database might take a dim view of writing to its storage files.

This three-part caveat, one might notice, is largely a problem of definition:

  1. Safe according to whom?
  2. Safe for what uses?
  3. Safe in which context?

However, each of these remain distinct issues because they cover different domains: validity, implementation, and context.

Is Trust Insufficient Paranoia?

The caveats that apply to Rust apply in very similar form to other existing procedural languages, whether or not they are "trusted":

  1. The question is not whether there is another vulnerability to discover in PL/Tcl, PL/Perl, PL/pgSQL, or with their shared interface with PostgreSQL: it's how long it will take to find it, whether anyone bothers to look, and whether it can actually be used to inflict damage.
  2. The trusted procedural languages have an underlying implementation in a memory-unsafe language. This poses the question of whether those languages are fully secure against the surface implementation being used to achieve unsafe effects. They undoubtedly are against trivial attacks.
  3. Some undesirable effects can still be achieved via the procedural languages. Notably, it's not clear they have much of a defense against e.g. using infinite loops to lock up that thread of execution rather than proceed further.

This is not to say these languages are equally safe or unsafe: there's some advantages in being able to deploy dynamic checks. It merely is to observe that in the presence of sufficient paranoia, all implementations for all languages that exist are hard to trust. Web browsers face similar dilemmas, and many users run browsers with JavaScript limited or disabled because they do not trust it, despite its sandboxing. Any trusted language still means allowing arbitrary users with access to the database to execute code within that database which has broad capabilities. If there is a weak point those capabilities can be applied to break through, and an attacker cares enough to keep searching, it will be found.

In effect, "trust" in practice only exists in two cases:

  • not being aware of the consequences
  • being willing to accept the possibility that worst-case consequences might happen

Safety and trust are implementation-defined

Rust defines "safety" around the concept of "memory safety", and uses a type system that includes ownership types to implement that.

For PostgreSQL's database code, a "trusted procedural language" has only one concrete definition: Did a database administrator install it with the TRUSTED designation? There's nothing technically stopping a DBA with the appropriate privileges from installing an "untrusted" language as TRUSTED in PostgreSQL.

A more amorphous but more practically useful definition is extensively implied throughout the documentation on procedural language: A trusted procedural language is a language where, if you install it as TRUSTED, this decision will not immediately bite you on the ass. The Postgres documentation defines this kind of "trusted" around the idea of limiting trusted language code to effects that either are of no consequence to the database or that the database was going to allow a user to hypothetically do anyway, and it uses dynamic checks and SQL roles to assist implementing that. Specifically, this means a trusted language's code should also respect SQL roles and not produce unintentional denials of service. It may still serve as an attack vector on the system, as can normal SQL-DDL commands, but if it does, it should make it slightly more frustrating for an attacker than running arbitrary assembly (AKA shellcode) would permit. Many attacks of this nature unfortunately will still end in being able to run shellcode if successful.

It may be worth drawing a parallel to cryptography, another way of assuring data security and integrity: many supposedly "one-way" hash functions can theoretically be reversed by an attacker with sufficient power. The security of hashed data usually instead lies in making it so that the attacker would require large amounts of computational power, considerable time, and probably at least one or two novel breakthroughs in the understanding of computation itself, or else they may be spending so much time that the Earth will grow cold before they can unlock the data. Or hopefully at least a few days, allowing time for, say, discovering the breach and generating new passwords. We call something that achieves this goal "secure", even though in actuality it is in fact "eventually breakable". Likewise, a "trusted procedural language" will in practice be "eventually breakable", and the goal is not necessarily to be inviolate but to offer some resistance.

A quality implementation of a trusted procedural language should offer enough resistance that you can worry much less. The rest of this discussion will revolve around what is ultimately a proposal to implement PL/Rust as a high-quality trusted procedural language and how to evaluate that as an ongoing event, rather than one that is necessarily expected to be "finished".

Solving the problems

A perfectly elegant solution would address all of these parts of the problem in one swoop. However, that would require there to be some unifying dilemma that, if answered, can easily handle all of these outward projections. Unfortunately, a formally-verified wasm virtual machine that can be used to safely execute arbitrary Rust code inside it, yet still bind easily against PostgreSQL's C API is... a tall order. In other words, the more elegant solution simply doesn't exist yet.

Because such a provably-secure-yet-porous-enough wasm sandbox currently doesn't exist, it's debatable if it would actually elegantly solve the issue, as we can't actually assess that claim. Notably, it's not clear that allowing arbitrary bindings in such a wasm sandbox would not simply create a sandbox that can do dangerous things. A protective box that encloses its contents yet still has many dangerous projections outside it is usually called a "tank", and is considered to be a weapon of war, which may not be something you wish to introduce into your database.

So in this, more clumsy world, such a three-part problem calls for a three-part solution... at least.

  1. To align Safe Rust more closely with what Postgres expects a trusted language to be able to do, replace std with postgrestd.
  2. To prevent Unsafe Rust from being used to violate expectations, bar the use of unsafe code.
  3. Deploy any and all additional hardening necessary.
  4. Keep doing that, actually: Defense in depth is a good thing.

Eventually, using more effective and total layers of sandboxing can be used when that becomes more convenient, but the problem would remain: Normally, Rust code has the ability to call bindings that can do things a trusted procedural language should not be allowed to do, so if you allow Rust to bind calls to arbitrary external functions into wasm, then you allow Rust to "break trust". A comprehensive approach that blocks off these exit routes is still required, and any additional sandboxing serves as reinforcement.

Safety, Unwinding, and impl Drop

Needs rewrite after rewrite of PGRX error handling

In Rust, the Drop trait promises that if execution reaches certain points in a program then a destructor has been run. There is an immediate and obvious problem with this: Rust does not guarantee forward progress and includes diverging control flow that "never returns". Thus it is possible for Rust code to never reach certain points in control flow, such as by invoking panic!() first. Normally, however, panic!() will cause "unwinding", which walks back through Rust code to the nearest catch_unwind, running Drop as it goes.

However, this is not always the case, and panic!() may be implemented by other forms of divergence such as immediate termination. This may seem surprising, but it is a simple extension of the natural observation that SIGKILL exists, or its sundry equivalents on non-Unix-like operating systems, and Rust code usually runs under an operating system. Rust does not consider terminating Rust code to be a violation of memory safety, because ceasing to progress is considered the appropriate way to respond to a situation where the program is not capable of soundly handling further events. A possible event that can cause this is the "panic-in-panic" scenario: if unwinding also causes a panic, Rust simply aborts.

In a more targeted fashion, it is possible also to mem::forget something with Drop, or to wrap it in ManuallyDrop. Together, these facts mean that a destructor can never be relied on to be run when following arbitrary control flow. Only Rust control flow that lacks these features can be expected to run all destructors. In other words: Drop can be intercepted by both events inside normal Rust code and also "outside" it.

Controlling unsafe

Code can by hypothetically verified to be "safe" by either scanning the tokens directly using a procedural macro or by compiling it with various lints of the Rust compiler to detect and constrain use of unsafe enabled.

Is automatically blocking all unsafe code enough?

No.

The problem with blocking all unsafe code is that pgrx, the Rust standard library, and essentially all implementation details of PL/Rust, will be implemented using unsafe code. There are also many crates which are soundly implemented and theoretically fine to use for PL/Rust, but rely on an unsafe implementation primitive.

Further, some way must exist to implement the function call interface from PostgreSQL to Rust code. In PL/Rust, that is done via the pgrx crate. This requires a lot of unsafe code. Thus, in order to compile any PL/Rust function, a lot of unsafe code must be used. This also means that something must be done to prevent the use of pgrx's unsafe fn in PL/Rust while still allowing pgrx to use unsafe code to implement its own interfaces.

plutonium

postgrestd: containing the problem

If Rust is not allowed to bind against arbitrary external interfaces, then it only has std and whatever crates are permitted. This makes controlling std a priority, and postgrestd is used to implement that.

The result of this is that as long as only Rust code compiled with the postgrestd fork is executed via PL/Rust, and as long as e.g. arbitrary unsafe asm! is not permitted, an escalation in privileges cannot simply jump outside the database and start doing arbitrary things. It is limited to subverting the database, which admittedly is still a bountiful target, but in this event containing the database itself can still be meaningfully done.

The other elephant in the room: pgrx

In addition to being used as the implementation detail of PL/Rust, pgrx offers a full-fledged interface for building Postgres extensions in general. This means that like the Rust standard library, pgrx is not perfectly adapted to being an interface for a trusted procedural language. There are two possible options in carving out what parts of pgrx are appropriate to use:

  • remove all inappropriate features behind #[cfg] blocks, OR
  • create a separate crate and expose it as the pgrx-Postgres user-callable interface

Neither of these are perfectly satisfying because neither option provides a neatly-defined, automatic answer to the question "of pgrx's safe code, what should be allowed?" to begin with.

There is also the unfortunate question of "is pgrx's safe code actually sound?" The crate's early implementation days included a few declared-safe wrappers that didn't fully check all invariants, and in some cases did not document the implied invariants, so an audit of code in pgrx is required. There is no getting around this, as it falls back on the fundamental problem of all procedural languages: They can only be as trustworthy as their implementations, which puts a burden on their implementation details to be correct. Fortunately, most of this audit has already been accomplished simply by the crate receiving scrutiny over the past 3 years.

Building arbitrary crates

Part of what makes Rust such a useful language is that it has crates.io: an ecosystem that allows easy sharing of code, like most "dynamic" languages do, with very little support needed from an operating system's package manager, yet is a systems programming language. Thus, it's inevitable that PL/Rust will want to be able to build arbitrary dependencies. In fact, it has to add at least a few specially approved crates, its own build dependencies like pgrx, in order to build Rust code, but we have to fully trust those crates anyways, so this is nothing new. But using crates we may not necessarily want to automatically trust introduces many, many complications as a direct result.

Some of the primary concerns:

  • build.rs
  • procedural macros
  • unsafe code in dependencies
  • #[cfg] for a very strange runtime

Future directions

When you allow a user to run code in your database's process, you are allowing them to attempt to subvert that process, so all users to some extent must also be trusted with the tools you are giving them, claims that trusted procedural languages allow untrusted users to run untrusted code besides. They just can be trusted less. However, if a user is expected to possibly "sublet" their tenancy to another user, creating a complex multitenancy situation, where the current superuser adopts the position of a "hyperuser", and the user adopts the position of "virtual superuser", the hyperuser who decides what languages are installed may still want to allow the virtual superuser's guests to run code, but has to be aware that they have even less trust. This means various traditional attack venues, e.g. heap attacks, become even more of a concern, as the hyperuser may have to mount a defense against the virtual superuser's guests, and the virtual superuser may install and run PL/Rust code on behalf of these guests.

These are possible future directions in adding layers of security, not currently implemented or experimented with yet.

Dynamic allocator hardening?

While PL/Rust merely interposes palloc, it... still interposes palloc. This means it can implement a "buddy allocator". Since it's possible to control the global allocator for Rust code, this can help interfere with attacks on the heap. This is likely necessary, at the cost of some runtime overhead (offset by PL/Rust precompiling code for execution speed), to buy security against any attacks that target flaws in the Rust type system when those issues are not solved. Having to do this to harden a "memory-safe" language is not unusual, and the system administrator should be aware of this when deploying PostgreSQL and consider deploying PostgreSQL with a similarly hardened allocator so that all allocations benefit from this protection, but it's not unreasonable to want a second layer for PL/Rust.

Background worker executor?

The process boundary offers a great deal of resilience against heap attacks. Background workers are separate processes, and PL/Java implementations use a similar approach of running code inside a daemon (which also takes care of compiling code). This may trade off a lot of performance gains from PL/Rust's overall approach, but it still may be worth it.

Control Flow Integrity

There are various hardware-, kernel-, or compiler-level approaches to protect the integrity of even C or C++ code against reasonably determined attackers trying to usurp its control flow. Thus these approaches are sometimes called "control flow integrity" collectively, but they have various specific brand names like "indirect branch targeting", "control flow guard", or "pointer authentication". The Rust compiler supports a number of these as nightly features, and while they require Postgres to also be built with support these features for them to work, it would be worth exploring their use for PL/Rust.

witx: the wasm strikes back

A method of generating bindings for wasm automatically is being prototyped, called witx, which builds on "WebAssembly Interface Types". It's not currently ready for primetime, but it is possible that within a few years it may be a feasible answer to many of these problems, especially in terms of hardening the Rust stack and heap against code just doing arbitrary nonsense to it, even if things get overly "interesting".

Notes

1

There are a few cases where Unsafe Rust code can be declared without it being visibly denoted as such, and these are intended to be phased out eventually, but in these cases they generally still require an unsafe { } block to be called or they must be wrapped in an unsafe fn. The absence of the unsafe token can only be bypassed in Rust by declaring an extern fn (which is implicitly also an unsafe fn, allowing one to fill it with other unsafe code) and then calling that function from another language, like C.

External Dependencies

PL/Rust supports the use of external dependencies. By default, this is unrestricted even when PL/Rust is used as a Trusted Language Handler, allowing user functions to specify any desired dependency.

For instance:

CREATE OR REPLACE FUNCTION randint() RETURNS bigint LANGUAGE plrust AS $$
[dependencies]
rand = "0.8"

[code]
use rand::Rng; 
Ok(Some(rand::thread_rng().gen())) 
$$;

It is recommended that administrators create a dependency allow-list file and specify its path in postgresql.conf using the plrust.allowed_dependencies setting.

To disable external dependencies completely, create a zero-byte file or point the configuration to /dev/null.

The Allow-List File

The dependency allow-list is a TOML file. Its format mirrors that of the [dependencies] section in a standard Cargo.toml, albeit with certain requirements on the version strings.

The Format

The file consists of dependency_name = version_requirement pairs, where version_requirement can adopt several forms. It can be a quoted string such as "=1.2.3", a TOML table like { version = "=1.2.3", features = ["a", "b", "c"] }, or an array of either, such as [ "=1.2.3", { version = "=1.2.3" }, ">=4, <5".

Here is a valid allow-list file for reference:

rand = ">=0.8, <0.9"
bitvec = [">=1, <2", "=0.2", { version = "=1.0.1", features = [ "alloc" ], default-features = false }]

This added flexibility empowers administrators to specify the exact crate version and its associated features and properties.

When a LANGUAGE plrust function designates a dependency and version, the largest (presumably most recent) matching version from the allow-list is used.

Version Requirement Format

PL/Rust employs Cargo's interpretation of semver to manage dependency versions, but it requires each version requirement to be an exact value like =1.2.3, a bounded range such as >=1, <2, or a bare wildcard (*).

For example, these are valid version requirement values:

rand = "=0.8.5"
serde = ">=1.0.151, <1.1"
bitvec = "*"

These, however, are not:

rand = "0.8.5"
serde = ">1.1"

The cargo tool may select a slightly different version based on the specification. However, with exact and bounded values, cargo's choices are limited to the versions that administrators allow.

The bare wildcard pattern (*) is acceptable and has a unique interpretation within a user LANGUAGE plrust function.

Using a Dependency

As shown above, a LANGUAGE plrust function can include a [dependencies] section. Authors should specify exact versions for each dependency. PL/Rust will match this exact version with an entry in the allow-list.

If a function requests a version in the 1.2.3 format and it matches an entry on the allow-list, PL/Rust will revise it to an exact version, i.e., =1.2.3.

If the allow-list merely contains a wildcard version:

rand = "*"

... and the user function asks for a specific version, such as 0.8.5, PL/Rust will utilize that exact version.

Conversely, if the allow-list specifies one or more particular version requirements...

rand = [ "0.8.5", "0.6" ]

... and the PL/Rust function requests a wildcard (i.e., rand = "*"), PL/Rust will select the largest version requirement from the allow-list. In this case, it would be 0.8.5.

Working with Crate Features

When a user function employs a crate from the allow-list, the allow-list controls the permitted set of dependency properties such as features and default-features for each version. Users cannot override these. They can specify them, but the specifications must match exactly with the allow-list.

This control enables administrators to dictate the usage of dependencies.

For instance, this would be acceptable for a user function:

CREATE OR REPLACE FUNCTION randint(seed bigint) RETURNS bigint STRICT LANGUAGE plrust AS $$
[dependencies]
rand = { version = "*", features = [ "small_rng" ], default-features = false }

[code]
use rand::rngs::SmallRng;
use rand::SeedableRng;
use rand::RngCore;

let mut rng = SmallRng::seed_from_u64(seed as _);
Ok(Some(rng.next_u64() as _))
$$;

Provided that the allow-list includes the following:

rand = { version = "=0.8.5", features = [ "small_rng" ], default-features = false }

Note that the user function could omit the dependency features since the allow-list declares them:

CREATE OR REPLACE FUNCTION randint(seed bigint) RETURNS bigint STRICT LANGUAGE plrust AS $$
[dependencies]
rand = "*"

[code]
use rand::rngs::SmallRng;
use rand::SeedableRng;
use rand::RngCore;

let mut rng = SmallRng::seed_from_u64(seed as _);
Ok(Some(rng.next_u64() as _))
$$;

PL/Rust provides a function plrust.allowed_dependencies which lists all the allowlisted crates with their respective enabled features. For example, with an allowlist as follows:

rand = ">=0.8, <0.9"
bitvec = [">=1, <2", "=0.2", { version = "=1.0.1", features = [ "alloc" ], default-features = false }]

The result of plrust.allowed_dependencies would be:

SELECT * FROM plrust.allowed_dependencies();
  name  |   version   | features | default_features
--------+-------------+----------+------------------
 bitvec | =0.2        | {}       | t
 bitvec | >=1, <2     | {}       | t
 bitvec | =1.0.1      | {alloc}  | f
 rand   | >=0.8, <0.9 | {}       | t
(4 rows)

Operational Notes

  • The dependency allow-list file path must be set in plrust.allowed_dependencies GUC value in postgresql.conf.
  • Changing the GUC value requires a configuration reload on the database to take effect.
  • The file must be readable by the user that runs Postgres backend connections. Typically, this user is named postgres.
  • Every time a CREATE FUNCTION ... LANGUAGE plrust statement is executed, the file is read, parsed, and validated. This arrangement allows administrators to edit it without needing to restart the Postgres cluster.

Lints

PL/Rust has its own "rustc driver" named plrustc. This must be installed using the plrustc/build.sh script and the resulting executable must be on the PATH, or it should reside somewhere that is included in the plrust.PATH_override GUC.

PL/Rust uses its own "rustc driver" so that it can employ custom lints to detect certain Rust code idioms and patterns that trigger "I-Unsound" bugs in Rust itself. Think "clippy" but built into the Rust compiler itself. In addition to these custom lints, PL/Rust uses some standard Rust lints to enforce safety.

The plrust.required_lints GUC defines which lints must have been applied to a function before PL/Rust will load the library and execute the function. The default value is the empty set -- PL/Rust will not require any specific lints to have been previously applied to a function.

Using the PLRUST_REQUIRED_LINTS environment variable, it is possible to enforce that certain lints are always required of compiled functions, regardless of the plrust.required_lints GUC value.PLRUST_REQUIRED_LINTS's format is a comma-separated list of lint named. It must be set in the environment in which Postgres is started. The intention here is that the system administrator can force certain lints for execution if for some reason postgresql.conf or the users able to modify it are not trusted.

In all cases, these lints are added to the generated code which wraps the user's LANGUAGE plrust function, as #![forbid(${lint_name})]. They are used with "forbid" to ensure a user function cannot change it back to "allow".

PL/Rust does not apply these lints to dependant, external crates. Dependencies are allowed to internally use whatever code they want, including unsafe. Note that any public-facing unsafe functions won't be callable by a plrust function.

Dependencies are granted more freedom as the usable set can be controlled via the plrust.allowed_dependencies GUC.


It is the administrator's responsibility to properly vet external dependencies for safety issues that may impact the running environment.


Any LANGUAGE plrust code that triggers any of the below lints will fail to compile, indicating the triggered lint.

Standard Rust Lints

unknown_lints

https://doc.rust-lang.org/rustc/lints/listing/warn-by-default.html#unknown-lints

PL/Rust won't allow any unknown (to our "rustc driver") lints to be applied. The justification for this is to mainly guard against type-os in the plrust.compile_lints GUC.

unsafe_code

https://doc.rust-lang.org/rustc/lints/listing/allowed-by-default.html#unsafe-code

PL/Rust does not allow usage of unsafe code in LANGUAGE plrust functions. This includes all the unsafe idioms such as dereferencing pointers and calling other unsafe functions.

implied_bounds_entailment

https://doc.rust-lang.org/rustc/lints/listing/warn-by-default.html#implied-bounds-entailment

This lint detects cases where the arguments of an impl method have stronger implied bounds than those from the trait method it's implementing.

If used incorrectly, this can be used to implement unsound APIs.

deprecated

https://doc.rust-lang.org/rustc/lints/listing/warn-by-default.html#deprecated

The deprecated lint detects use of deprecated items. This is forbidden because certain items in the Rust standard library are incorrectly-safe APIs but were only deprecated rather than removed when a version with the appropriate safety annotation was added.

suspicious_auto_trait_impls

https://doc.rust-lang.org/rustc/lints/listing/warn-by-default.html#suspicious-auto-trait-impls

This defends against some patterns that can lead to soundness issues. These cases currently can only trigger in patterns which are otherwise blocked by the unsafe_code lint, but for better defense-in-depth, it's explicitly forbidden in PL/Rust.

soft_unstable

https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#soft-unstable

This prevents the use of language and library features which were accidentally stabilized. This is forbidden because there's no reason to need to use these, and forbidding them reduces the set of APIs and features we have to consider in PL/Rust.

where_clauses_object_safety

https://doc.rust-lang.org/rustc/lints/listing/warn-by-default.html#where-clauses-object-safety

This avoids some soundness holes that are in the language which can be used to trigger various crashes, see the lint documentation for details.

PL/Rust plrustc Lints

plrust_extern_blocks

This blocks the declaration of extern "API" {}" blocks. Primarily, this is to ensure a plrust function cannot declare internal Postgres symbols as external.

For example, this code pattern is blocked:

#![allow(unused)]
fn main() {
extern "C" {
    pub fn palloc(size: Size) -> *mut ::std::os::raw::c_void;
}
}

plrust_lifetime_parameterized_traits

Traits parameterized by lifetimes can be used to exploit Rust compiler bugs that lead to unsoundness issues. PL/Rust does not allow such traits to be declared.

For example, this code pattern is blocked:

#![allow(unused)]
fn main() {
    trait Foo<'a> {}
}

plrust_filesystem_macros

Filesystem macros such as include_bytes! and include_str! are disallowed, as they provide access to the underlying filesystem which should be unavailable to a trusted language handler.

For example, this code pattern is blocked:

#![allow(unused)]
fn main() {
const SOMETHING: &str = include_str!("/etc/passwd");
}

plrust_fn_pointers

Currently, several soundness holes have to do with the interaction between function pointers, implied bounds, and nested references. As a stopgap against these, use of function pointer types and function trait objects are currently blocked. This lint will likely be made more precise in the future.

Note that function types (such as the types resulting from closures as required by iterator functions) are still allowed, as these do not have the issues around variance.

For example, the following code pattern is blocked:

#![allow(unused)]
fn main() {
fn takes_fn_arg(x: fn()) {
    x();
}
}

plrust_async

Currently async/await are forbidden by PL/Rust due to unclear interactions around lifetime and soundness constraints. This may be out of an overabundance of caution. Specifically, code like the following will fail to compile:

#![allow(unused)]
fn main() {
async fn an_async_fn() {
    // ...
}

fn normal_function() {
    let async_block = async {
        // ...
    };
    // ...
}
}

plrust_leaky

This lint forbids use of "leaky" functions such as mem::forget and Box::leak. While leaking memory is considered safe, it has undesirable effects and thus is blocked by default. For example, the lint will trigger on (at least) the following code:

#![allow(unused)]
fn main() {
core::mem::forget(something);
let foo = Box::leak(Box::new(1u32));
let bar = vec![1, 2, 3].leak();
}

Note that this will not prevent all leaks, as PL/Rust code could still create a leak by constructing a reference cycle using Rc/Arc, for example.

plrust_env_macros

This lint forbids use of environment macros such as env! and option_env!, as it allows access to data that should not be available to a trusted language handler.

#![allow(unused)]
fn main() {
let path = env!("PATH");
let rustup_toolchain_dir = option_env!("RUSTUP_TOOLCHAIN");
// ...
}

plrust_external_mod

This lint forbids use of non-inline mod blah, as it can be used to access files a trusted language handler should not give access to.

#![allow(unused)]
fn main() {
// This is allowed
mod foo {
    // some functions or whatever here...
}

// This is disallowed.
mod bar;
// More importantly, this is disallowed as well.
#[path = "/sneaky/path/to/something"]
mod baz;
}

plrust_print_macros

This lint forbids use of the println!/eprintln! family of macros (including dbg! and the non-ln variants), as these allow bypassing the norm. Users should use pgrx::log! or pgrx::debug! instead.

#![allow(unused)]
fn main() {
println!("hello");
print!("plrust");

eprintln!("this is also blocked");
eprint!("even without the newline");

dbg!("same here");
}

plrust_stdio

This lint forbids use of the functions for accessing standard streams (stdin, stdout, stderr) from PL/Rust, for the same reason as above. For example, the following code is forbidden:

#![allow(unused)]
fn main() {
std::io::stdout().write_all(b"foobar").unwrap();
std::io::stderr().write_all(b"foobar").unwrap();
let _stdin_is_forbidden_too = std::io::stdin();
}

plrust_static_impls

This lint forbids certain impl blocks for types containing &'static references. The precise details are somewhat obscure, but can usually be avoided by making a custom struct to contain your static reference, which avoids the particular soundness hole we're concerned with. For example:

#![allow(unused)]
fn main() {
// This is forbidden:
impl SomeTrait for (&'static Foo, Bar) {
    // ...
}

// Instead, do this:
struct MyType(&'static Foo, Bar);
impl SomeTrait for MyType {
    // ...
}
}

plrust_autotrait_impls

This lint forbids explicit implementations of the safe auto traits, as a workaround for various soundness holes around these. It may be relaxed in the future if those are fixed.

#![allow(unused)]
fn main() {
struct Foo(std::cell::Cell<i32>, std::marker::PhantomPinned);
// Any of the following implementations would be forbidden.
impl std::panic::UnwindSafe for Foo {}
impl std::panic::RefUnwindSafe for Foo {}
impl std::marker::Unpin for Foo {}
}

As a workaround, in most cases, you should be able to use std::panic::AssertUnwindSafe instead of implementing one of the UnwindSafe traits, and Boxing your type can usually work around the need for Unpin (which should be rare in non-async code anyway).

plrust_suspicious_trait_object

This lint forbids trait object use in turbofish and generic defaults. This is an effort to fix certain soundness holes in the Rust language. More simply, the following patterns are disallowed:

// Trait object in turbofish
foo::<dyn SomeTrait>();
// Trait object in type default (enum, union, trait, and so on are all also forbidden)
struct SomeStruct<T = dyn SomeTrait>(...);

plrust_closure_trait_impl

This lint forbids trait impls over generic over Fn, FnOnce, FnMut or FnPtr types. This is to work around some soundness issues where closure types are incorrectly 'static. For example, the following is forbidden:

trait Trait {}
// This is generic over a function trait.
impl<F: Fn()> Trait for F {}

However, this is currently overly strict. In the future, it may be relaxed to forbid only the case where the return type is projected into an associated item on the trait, as in:

trait Trait {
    type Assoc;
}
impl<R, F: Fn() -> R> Trait for F {
    // This is the problem
    type Assoc = R;
}

Environment variables

As part of PL/Rust's function compilation machinery, and in conjunction with pgrx which does the hard work, a number of environment variables are set when PL/Rust executes cargo.

These are not environment variables that need to set manually. Generally, these are auto-detected and cannot be overridden through configuration.

NameValueHow it's Used
PATH~/cargo/bin:/usr/bin or /usr/bin if "postgres" user has no home directoryThe PATH environment variable is only set by PL/Rust if it detects that one isn't already set.
As mentioned above, this one can be overridden via the plrust.PATH_override GUC in postgresql.conf.
RUSTCplrustcThis is set to plrust's "rust driver" executable, named plrustc. It must be on the system PATH.
RUSTFLAGS"-Clink-args=-Wl,-undefined,dynamic_lookup"Used by rustc to indicate that Postgres internal symbols are only available at run-time, not compile-time.
CARGO_TARGET_DIRvalue of GUC plrust.work_dir/targetThis is the filesystem path cargo will store its intermediate compilation artifacts.
CARGO_TARGET_X86_64_LINKERx86_64-linux-gnu-gccUsed only when cross-compiling to x86_64, this tells rustc which linker to use. The plrust.x86_64_linker GUC can override the default.
CARGO_TARGET_AARCH64_LINKERaarch64-linux-gnu-gccUsed only when cross-compiling to aarch64, this tells rustc which linker to use. The plrust.aarch64_linker GUC can override the default.
PGRX_TARGET_INFO_PATH_PG${MAJOR_VERSION_NUM}unset unless plrust.{x86_64/aarch64}_pgrx_bindings_path GUC is setUsed only when cross-compiling to the specified target. This tells pgrx where to find the generated Postgres bindings for that platform.
PGRX_PG_CONFIG_AS_EN_VARtrueIndicates to the trusted-pgrx dependency, and ultimately pgrx itself that instead of getting the values it needs for compilation from the Postgres pg_config tool, it should get them from environment variables.
PGRX_PG_CONFIG_VERSIONProvided by the running Postgres instanceUsed by pgrx to build the PL/Rust user function.
PGRX_PG_CONFIG_CPPFLAGSProvided by the running Postgres instanceUsed by pgrx to build the PL/Rust user function (technically unused by PL/Rust's build process as PL/Rust does not include the pgrx "cshim" for which this is normally used).
PGRX_PG_CONFIG_INCLUDEDIR-SERVERProvided by the running Postgres instanceUsed by pgrx to build the PL/Rust user function.

Safety

Note that PL/Rust uses Rust's std::process::Command to exec cargo. As such, it will inherit all environment variables set under the active backend postgres process. We recommend Postgres' execution environment be properly sanitized to your organizations requirements.

As a pre-emptive measure, PL/Rust proactively un-sets a few environment variables that could negatively impact user function compilation. These are generally things used by the pgrx development team that are not necessary for PL/Rust.

  • DOCS_RS
  • PGRX_BUILD_VERBOSE
  • PGRX_PG_SYS_GENERATE_BINDINGS_FOR_RELEASE
  • CARGO_MANIFEST_DIR
  • OUT_DIR

Reserved environment variables

There are a number of other pg_config-related environment variables that plrust sets. These are not currently used, but are reserved for future use, should they become necessary to build a user function:

  • PGRX_PG_CONFIG_BINDIR
  • PGRX_PG_CONFIG_DOCDIR
  • PGRX_PG_CONFIG_HTMLDIR
  • PGRX_PG_CONFIG_INCLUDEDIR
  • PGRX_PG_CONFIG_PKGINCLUDEDIR
  • PGRX_PG_CONFIG_INCLUDEDIR-SERVER
  • PGRX_PG_CONFIG_LIBDIR
  • PGRX_PG_CONFIG_PKGLIBDIR
  • PGRX_PG_CONFIG_LOCALEDIR
  • PGRX_PG_CONFIG_MANDIR
  • PGRX_PG_CONFIG_SHAREDIR
  • PGRX_PG_CONFIG_SYSCONFDIR
  • PGRX_PG_CONFIG_PGRXS
  • PGRX_PG_CONFIG_CONFIGURE
  • PGRX_PG_CONFIG_CC
  • PGRX_PG_CONFIG_CPPFLAGS
  • PGRX_PG_CONFIG_CFLAGS
  • PGRX_PG_CONFIG_CFLAGS_SL
  • PGRX_PG_CONFIG_LDFLAGS
  • PGRX_PG_CONFIG_LDFLAGS_EX
  • PGRX_PG_CONFIG_LDFLAGS_SL
  • PGRX_PG_CONFIG_LIBS
  • PGRX_PG_CONFIG_VERSION

Influencing PL/Rust Compilation

If set, PL/Rust will use the PLRUST_TRUSTED_PGRX_OVERRIDE environment variable when PL/Rust itself is being compiled. See the Choosing a different plrust-trusted-pgrx dependency at compile time section for details.