OrioleDB: The next-generation storage engine for PostgreSQL
OrioleDB is a storage extension for PostgreSQL which uses PostgreSQL's pluggable storage system.
It is designed to be a drop-in replacement for PostgreSQL's existing storage engine. OrioleDB is built to take advantage of modern hardware and cloud infrastructure, providing better performance and scalability for PostgreSQL workloads.
Example
OrioleDB uses Postgres Table Access Method (TAM) to provide a pluggable storage engine for PostgreSQL. Here is an example of how you can create a table using OrioleDB:
-- Enable the OrioleDB extension
CREATE EXTENSION orioledb;
CREATE TABLE blog_post
(
id int8 NOT NULL,
title text NOT NULL,
body text NOT NULL,
PRIMARY KEY(id)
) USING orioledb; -- Use the OrioleDB storage engine
Pluggable Storage in PostgreSQL
Pluggable Storage gives developers the ability to use different storage engines for different tables within the same database. Developers will be able to choose a storage method that is optimized for their specific needs: some tables could be configured for high transactional loads, others for analytics workloads, and still others for archiving.
See examples
create table analytics_data
(
id int8,
created_at timestamptz,
event text
) using parquet; -- Store data in a analytical optimized storage engine
create table timeseries_data
(
id int8,
created_at timestamptz,
event text
) using timeseries; -- Store data in a time-series optimized storage engine
Something like this is already available in MySQL, which uses the InnoDB
as the default storage engine since MySQL 5.5 (replacing MyISAM
). Read more about the history of pluggable storage here.
Using OrioleDB with existing PostgreSQL installations
OrioleDB currently requires a set of patches to PostgreSQL to enhance the pluggable storage API and other PostgreSQL subsystems. All of these patches have been submitted to the PostgreSQL community and are under review.
The important property of this set of patches is keeping the binary compatibility. That is, you can switch to the patched PostgreSQL binary while keeping the same data directory. The existing tables will continue working with the default heap
engine until you switch them to use orioledb
. Moreover, it's possible to switch back to using unpatched PostgreSQL binaries. You would just need to convert your orioledb
tables back to heap before.
The goal is to upstream everything: once these patches are accepted, OrioleDB will be able to run on any PostgreSQL installation without any modifications. This will also enable the entire PostgreSQL community to create their own pluggable storage engines.
Until then, you can use our pre-built Docker image to try out OrioleDB. The Docker image includes a patched version of PostgreSQL with OrioleDB pre-installed. Follow the Getting started guide to get started.
Patch set
You can get the full set of patches here. The following patches have been submitted to the PostgreSQL community to enhance the TAM interface and other subsystems.
Name | Link | Version | |
---|---|---|---|
✅ | Add missing inequality searches to rbtree | Link | PostgreSQL 16 |
✅ | Document the ability to specify TableAM for pgbench | Link | PostgreSQL 16 |
✅ | Remove Tuplesortstate.copytup function | Link | PostgreSQL 16 |
✅ | Add new Tuplesortstate.removeabbrev function | Link | PostgreSQL 16 |
✅ | Put abbreviation logic into puttuple_common() | Link | PostgreSQL 16 |
✅ | Move memory management away from writetup() and tuplesort_put*() | Link | PostgreSQL 16 |
✅ | Split TuplesortPublic from Tuplesortstate | Link | PostgreSQL 16 |
✅ | Split tuplesortvariants.c from tuplesort.c | Link | PostgreSQL 16 |
✅ | Fix typo in comment for writetuple() function | Link | PostgreSQL 16 |
✅ | Support for custom slots in the custom executor nodes | Link | PostgreSQL 16 |
✉️ | Allow table AM to store complex data structures in rd_amcache | Link | PostgreSQL 18 |
✉️ | Allow table AM tuple_insert() method to return the different slot | Link | PostgreSQL 18 |
✉️ | Add TupleTableSlotOps.is_current_xact_tuple() method | Link | PostgreSQL 18 |
✉️ | Allow locking updated tuples in tuple_update() and tuple_delete() | Link | PostgreSQL 18 |
✉️ | Add EvalPlanQual delete returning isolation test | Link | PostgreSQL 18 |
✉️ | Generalize relation analyze in table AM interface | Link | PostgreSQL 18 |
✉️ | Custom reloptions for table AM | Link | PostgreSQL 18 |
✉️ | Let table AM insertion methods control index insertion | Link | PostgreSQL 18 |
Legend
✅ - Patch has been accepted.
✉️ - Patch is sumbitted and under review by the PostgreSQL community.
✏️ - Patch is being worked on.
Features
OrioleDB opens the door to a future of more powerful storage models that are optimized for cloud and modern hardware architectures.
Open source
OrioleDB is distributed under the standard PostgreSQL license. The goal is to upstream all the patches required to run OrioleDB on any PostgreSQL installation without any modifications.
Designed for modern hardware
OrioleDB design avoids legacy CPU bottlenecks on modern servers containing dozens and hundreds CPU cores, providing optimized usage of modern storage technologies such as SSD and NVRAM.
Reduced maintenance needs
OrioleDB implements the concepts of undo log and page-mergins, eliminating the need for dedicated garbage collection processes. Additionally, OrioleDB implements default 64-bit transaction identifiers, thus eliminating the well-known and painful wraparound problem.
Designed to be distributed
OrioleDB implements a row-level write-ahead log with support for parallel apply. This log architecture is optimized for raft consensus-based replication allowing the implementation of active-active multimaster.
Differentiators
The key technical differentiations of OrioleDB are as follows:
No buffer mapping and lock-less page reading
In-memory pages in OrioleDB are connected with direct links to the storage pages. This eliminates the need for in-buffer mapping along with its related bottlenecks. Additionally, in OrioleDB in-memory page reading doesn't involve atomic operations. Together, these design decisions bring vertical scalability for Postgres to the whole new level.
MVCC is based on the UNDO log concept
In OrioleDB, old versions of tuples do not cause bloat in the main storage system, but eviction into the undo log comprising undo chains. Page-level undo records allow the system to easily reclaim space occupied by deleted tuples as soon as possible. Together with page-mergins, these mechanisms eliminate bloat in the majority of cases. Dedicated VACUUMing of tables is not needed as well, removing a significant and common cause of system performance deterioration and database outages.
Copy-on-write checkpoints and row-level WAL
OrioleDB utilizes copy-on-write checkpoints, which provides a structurally consistent snapshot of data every moment of time. This is friendly for modern SSDs and allows row-level WAL logging. In turn, row-level WAL logging is easy to parallelize (done), compact and suitable for active-active multimaster (planned).
Resources
Hacker news
- OrioleDB - solving some PostgreSQL wicked problems
- PostgreSQL: No More Vacuum, No More Bloat
- OrioleDB Reached Beta
Solving PostgreSQL Wicked Problems
This talk covers how the new engine is integrated with PostgreSQL Core and solves the wicked PostgreSQL problems.