5 more reasons why we love PostgreSQL 13
Sushant Pawar I Database Solution Consultant, Ashnik
As we know, PostgreSQL 13 has been released for production usage for over two months now. I can’t help but admire the efforts by community members who are contributing and adding these cool, amazing features in PostgreSQL, day after day.
I may have mentioned this a few times in my past PostgreSQL blogs also but again, the latest features with every Postgres release is making it more and more suitable for Enterprise-class applications now. These features are giving a strong backbone to the Postgres database making it able to handle the extensive workloads of Financial, Telecom or Retail domains.
I recently had two opportunities to design and deploy the production setup of a large-scale regional Telco and a Government customer, and in both cases we could meet our customer’s expectations with the help of Postgres. These new features of Postgres 13 are like extra icing on the cake (yay), making it the right choice for a production application.
Today, let me try to cover the latest features in PostgreSQL 13 that can be beneficial for enterprise customers. And not to say, why I love them:
1. Compact B-tree index, thanks to de-duplication of B-tree index entries
The B-tree index is the default index type in Postgres. By default, each index tuple points to a corresponding column in the table. If there are duplicate values, the index entry will be created for each value in the indexed column. Now, deduplication feature has been introduced in the latest version. Deduplication works by periodically merging groups of duplicate tuples, forming a single posting list tuple for each group. The column key value(s) only appear once in this representation. This significantly reduces the size of the index where column values reappear several times. When I tested this feature against Postgres 12 database, I noticed that the index in Postgres 13 instance is almost 1/3rd in size of that index in the Postgres 12 instance. Meaning, performance is better in the latest version with this compact index.
The deduplication is enabled by default. If you want to turn on the behaviour of Postgres 12 version, you can use (deduplicate_items=off) clause while creating index which is available for backward compatibility.
This is a very positive change with the following benefits:
- The index size gets reduced significantly. This increases the probability of index fitting into memory and giving better throughput for queries.
- The maintenance would be easier as vacuum and reindex would be quicker.
- This will be a boost for partitioning as along with partitioned tables the local indexes are usually created for each partition. With the index deduplication, one reaps the benefit of partitioning as well as compact indexes.
2. Extended statistics results in better query planning
The Postgres planner plays an essential role in selecting the appropriate execution plan for queries. It does rely on available statistics to choose a better execution plan. With more accurate statistics, the planner can choose a more efficient approach. The parameter (default_statisitcs_target) or column level setting determine the number of statistics gathered by Postgres processes. The statistics are always collected for each column separately. They have information about the number of rows, most common values and histogram bound.
The planner usually thinks that multiple conditions are independent of each other. It is not correct in case of particular queries which has an interrelated clause. Here is where extended statistics would be beneficial:
Let us take the following example of a customer table. Each country has a region code assigned to it. Below table indicates the country, region, and number of times the country name has appeared in the country column.
When we try to extract the record for country “UK” and with region=2 we get the below result. From the results, it seems the planner had estimated that query would return 501 rows whereas, in reality, it had reverted 1002 rows. It is equal to count in the above table when country= “UK”. It indicates that the planner did not have accurate statistics available to determine the proper rows estimate.
Here’s where extended statistics would help as the planner get more accurate estimates. Statistics objects are created using create statistics command.
Next, when you collect the statistics using the analyze command, the extended statistics are collected for correlated columns in the statistics object. The planner uses these to generate better estimates. In our case, the planner estimated that the query would return 1002 rows. It is equivalent to an actual number of rows returned. So, the extended statistics would help planner in better query planning.
3. Parallelized vacuuming of indexes
When we perform the vacuum operation on tables, the corresponding indexes are also vacuumed. If the table has multiple indexes on it then the indexes would be vacuumed one after another increasing the overall vacuum operation time. Come Postgres 13, the parallelized vacuum operation allows multiple indexes of the target table to be vacuumed in parallel which makes the vacuum operation faster. With this new feature, it is possible to make extensive use of server resources to speed up the maintenance task.
The (max_parallel_maintenance_workers) parameter control the degree of parallelism for vacuum of indexes. The indexes bigger than (min_parallel_index_scan_size) would be considered for parallel vacuum operations.
4. Improvement in Partitioning features
- There has been an improvement wherein partitioning pruning is now supported in more cases
- An improvement has been made to allow partition-wise join in more cases. For example, even if the partition boundary does not match, still it will allow to join partitioned tables as shown below, which was not the case with the earlier Postgres versions.
If we refer to below example, during the first run, when I tried to execute the query with join condition on bid column, each partition was scanned to create a hash table of each table and then return the output of hash join as we can see below. It is the default behaviour in Postgres V12 and V13.
If we enable (enable_partitionwise_join) parameter, the plan remains the same in Postgres V12 whereas in Postgres V13 Hash join will occur at each partition level.
5. System views for monitoring purpose
The introduction of (pg_stat_progress_basebackup) view has made it easier to monitor the pg_basebackup status
- The (backup_streamed) column would give an estimate of how many blocks have been processed out of (backup_total) blocks
- pg_stat_progress_create_index view was introduced in Postgres 12. To add to it, now we can monitor the progress of the analyzed task using pg_stat_progress_analyze view
- It has become easier to keep track of parallel operations running on the database server. Pg_stat_activity view has got leader_pid column that will report a leader process for parallel workers doing the task under leader process co-ordination.
So here are my top 5 reasons to love the new and improved PostgreSQL – hope you find these pointers worth from an administration and monitoring perspective. There are definitely more features introduced in Postgres 13, for which you can refer to the PostgreSQL 13 release notes.
And, if you’re looking for any Postgres related solutions or services for your enterprise, talk to us firstname.lastname@example.org we will be happy to help!
- Sushant is working as a Database Solution Consultant with Ashnik, having more than 9 years of experience in Database Administration. He has worked for companies like Capgemini, J.P. Morgan in the past on a variety of Database projects.
- 5 more reasons why we love PostgreSQL 13
- 10 Tip untuk Melindungi Data Anda dalam PostgreSQL
- 10 Tips for Securing Your Data in PostgreSQL