Rebuild PostgreSQL Database using pg_repack

Akshay Bhadange
4 min readNov 6, 2021

--

In this blog, we will look at how to use pg_repack to rebuild the PostgreSQL database.

PostgreSQL uses a multi-version model. Each table row can have multiple versions, but transactions can only see one of them which is updated recently.

All these versions need to be stored somewhere and PostgreSQL used a paging mechanism to manage the memory. Minimum one page of data can be read or written to disk.

By updating and deleting rows, We will get a page that is half-filled with garbage data. This data is invisible to other transactions. In simple terms, PostgreSQL maintains both the past image and the latest image of the row in its own table using versioning.

In order to understand how these versions are maintained within each table, you should understand the hidden columns of a table in PostgreSQL.

When you describe the table using the below query you would only see the columns you have added.

However, if you look at all the columns of the table in pg_attribute, you should see several hidden columns as you see in the following log.

Now, let’s understand these columns in detail.

tableoid: tableoid column contains the OID of the table that contains this row.

xmin: xmin contains the transaction ID of inserting transaction for this row version.

After execution of the first command, select txid_current(); we got the current transaction ID as 26505402. Thus, the immediate INSERT statement got transaction ID as 26505403. With the above example, you should now understand that every tuple has an xmin that is assigned the txid that inserted it.

xmax: The initial value of xmax record is 0. The xmax of the row version changes to the ID of the transaction that has issued the DELETE / Update. This value is updated in the existing row version before DELETE or Update is committed.

Before deleting record

After execution of delete without committing it.

as you see in the above log, the xmax value changed to the transaction ID that has issued the delete. If you ROLLBACK, or if the transaction got aborted, xmax remains the same.

Postgres creates a new version of the row on every update that leads to a large number of dead tuples. To avoid or overcome this issue, it is recommended to use the auto vacuum on the DB instance.

VACUUM FULL is the default option available with a PostgreSQL installation that allows us to rebuild a table. This is similar to ALTER TABLE in PostgreSQL. However, this command acquires an exclusive lock and locks reads, and writes on a table.

VACUUM FULL tablename;

pg_repack is an extension available for PostgreSQL that helps us rebuild a table online and it allows you to remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

Step by step guide to install and run pg_repack

If the Postgres version is 12.6 and above then follow the below steps

  1. Connect to your target database and install pg_repack extension using CREATE EXTENSION pg_repack;
  2. Install pg_repack on server or localhost using sudo apt-get install postgresql-12-repack
  3. We can execute the below command from the terminal.
pg_repack -d <DATABASE_NAME> -U <USERNAME> -h <DATABASE_HOST_URL> -p <DATABASE_PORT> -k
  • -k :- flag is to remove the superuser check
  • -j <COUNT> :- Number of parallel jobs
  • — table <table-name> :- Specify table name (Note: for entire database remove do not add this flag in command)
  • — table <table1> — table <table2> :- Use these flags if you want to repack multiple tables.

--

--