Configuration parameters are the most mythical creatures in the world of PostgreSQL.
Ordinary PostgreSQL users often do not know that PostgreSQL configuration parameters exist, let alone what they are and what they mean. There is a good reason for such ignorance since, in real life, ordinary users don’t have any say in how these parameters are set. Configuration parameters are set not just for a database but for the whole instance, which may have multiple databases, so any individual user will get the same as others get. To be completely transparent, in some cases, the said ordinary users can specify some parameters just for their own uses, but let’s hold our horses for now.
There are over three hundred PostgreSQL configuration parameters, so no wonder that even experienced DBAs often do not know what each of these parameters does. That is perfectly fine; however, there is a widespread belief that somewhere, in the secret vaults of many consulting companies, there is a treasure chest of perfect PostgreSQL parameter settings.
When the users start to complain that “everything is slow” or nightly backups finish in broad daylight, everybody around screams that “the database server should be optimized!” Many administrators often assume that as soon as we choose the correct values for parameters and restart the database instance, all the world problems will be solved.
Indeed, we all hope to be able to solve all problems at once; however, although some parameter tuning can indeed improve the database performance, we are usually talking about 10%, 20%, and in rare cases, up to 50% performance improvement if some parameters are set poorly. That might sound like impressive numbers, but individual query optimization routinely makes queries run several times faster, in some cases, ten or more times faster, and restructuring applications can improve overall system performance by hundreds of times!
Maybe not magic, but we can’t ignore config parameters, right?
It’s important to at least know what the most critical configuration parameters can do, and it doesn’t hurt to explore them all. However, believing that there are some magical “correct” values that exist is dangerous. If it were the case, these parameters would already be set up correctly in the default configuration, and nobody would ever have to change the values.
In truth, there is no such thing as a “set it and forget it” configuration for PostgreSQL because different users have different needs. If there were they would be on the first page of the PostgreSQL manual, and companies would no longer have to pay top dollar to consultants. While some tried and true formulas can be used for the initial setup, parameters should be adjusted to work for a specific environment and customer needs, based on what kind of load they are putting on a server.
Furthermore, parameters ought to be re-evaluated any time we switch the server configuration, hardware, or to a new version of PostgreSQL.
In short, nothing is set in stone.
In the remaining part of the article, let’s look at some important configuration parameters and the myths surrounding them.
Helping PostgreSQL play nicely with its neighbors
PostgreSQL is a resource-greedy system. It will use all available resources and there is no way to tell PostgreSQL, “please, use just one CPU.” Some parameters can be used as loose guidelines for the query planner, but their influence is limited; for example, we can increase the cost of some operations or algorithms, which can have an indirect impact on resources usage. Such behavior may cause problems if you choose to place multiple PostgreSQL clusters on one machine and multiple databases on one cluster. Each of these instances would be unaware that is shares the hardware with others and would attempt to grab (almost) all resources available on the host.
As an example, correctly setting work_mem is a necessary task for any PostgreSQL workload. It’s common for DBAs to use online “parameters calculators”, not realizing that these tools assume one PostgreSQL instance per host with one database on the instance. This becomes a problem if you have more than one PostgreSQL instance per host. For example, if the host has 96GB of RAM with four instances of PostgreSQL running, we could assume that each instance can be limited to 24GB of RAM. That is fine for initial calculations, but it is important to remember that you can’t tell PostgreSQL, “Do not take more work memory, it’s not yours!”
Moreover, when you set up the first PostgreSQL instance on a host, it might be one instance, and all formulas will work. But if later, you decide to set up one additional instances on the same host, you most likely won’t remember that for the first instance, parameters were set as if this instance existed on that host solo. Because of that, it is probably unlikely that you would stop this first instance to reconfigure parameters.
The same applies to the development instances and perhaps even more to testing/staging instances. It’s important to remember that you can’t adequately test application performance on a database that resides in a multi-tenant environment. Placing multiple databases on one cluster (even testing/staging) might result in improper permissions settings (users and roles are created on the cluster level rather than the database level).
Memory, connections, etc.
Memory allocation parameters are the most mythical of all PostgreSQL configuration parameters. Many recommendations and formulas, which can be found on the Internet, were issued many years and many PostgreSQL versions ago, and many of them were never revisited since.
Most conventional recommendations are either wrong or can be used only as a first approximation and tuned later. Don’t miss that last part. No matter what values you set for parameters, it will be important to watch performance and resource utilization and adjust if needed.
shared_buffers
The data stored in a PostgreSQL database has to be read from a hard disk drive to main memory before anybody can use it. The memory area which is allocated for this purpose is called shared memory buffers. This parameter sets the number of shared memory buffers used by the database server. The historical limit of 64 GB is mythical and goes back to older versions of Linux.
This is one of the few memory-related parameters which PostgreSQL uses at its face value, meaning that it sets the hard limit for shared buffers. The only actual limitation to this value is that after subtracting this value from total RAM, there should be enough RAM left for work_mem
(discussed earlier).
Note: Changing this parameter requires an instance restart.
Recommendation: start from 25% of RAM and increase until cache_hit_ratio is close to 90%
effective_cache_size
This parameter is advisory. It does not reserve any memory, and PostgreSQL uses it for query planning purposes only. It indicates how much memory the operating system has available for caching data (so it includes memory cache, disk cache, filesystem cache, etc.) The way the query planner uses it is not straightforward, but the higher it is defined, the higher the chances of using indexes rather than a sequential scan.
Recommendation 75% RAM is usually recommended but not required (and this number is not related to the shared_buffers value)
max_connections
This parameter sets the maximum number of concurrent connections to the PostgreSQL instance, which can be handled simultaneously. This is a hard limit; if the maximum number of connections is reached, the system will become non-responsive. No users except for a superuser will be allowed to connect. This situation will persist until the superuser kills some connections. Changing this parameter requires a system restart. You might wonder why this limit is necessary, and if it can’t be changed on the fly, why not set it as high as possible so that we never run into these problems?
Too many connections are undesirable in any database because each connection takes up additional resources, even when idle. On the application level, this problem is addressed by setting up an application connection pool so that application users do not initiate new database connections but instead use one of the already opened connections from the available pool. If we set this value really high, the system may become slow and non-responsive because there won’t be enough resources for all connections. In addition, there are some internal limitations in the PostgreSQL kernel itself, which makes it undesirable to have more than several hundred connections at any given moment.
Now that we have described many of the problems that occur when there are too many connections, we should note that these horror stories are at least partially mythical as well. If there are enough hardware resources available on the host, even several thousand max_connections may work perfectly fine.
Recommendation: Start from several hundred, observe system behavior, and adjust accordingly.
work_mem
The work_mem setting controls the amount of memory that can be used per query operation node and is generally used for sorting data and hash tables. This is also a hard limit, meaning that this is the maximum size of memory PostgreSQL will allocate to any query node.
Any hash operation that requires more than this amount of memory will resort to swapping to disk and will therefore take longer to complete.
A well-known formula suggests dividing 25% of the total system memory by max_connections
.
Therefore, a decrease in max_connections
will allow for a higher value of work_mem
. Note that in this case, like in many others, it is assumed that the host has only one database, so you have to be mindful when assigning the value for this parameter. Remember that PostgreSQL won’t perform any calculations to check whether your settings make sense. Everybody might be happy until the next session cannot obtain the necessary portion of RAM from the operating system and will receive an out-of-memory error.
Fortunately, work_mem
is one of PostgreSQL parameters which can be set dynamically, for a specific session. Thereby, a possible solution could be to set the work_mem
dynamically from the application. Suppose it is known that a particular query will be sorting a large data set. In that case, it can be advantageous to increase the work_mem of that individual query (being careful not to over-allocate).
For example, if the session requires 200 GB for work_mem
, you can run the following command:
set work_mem = '200GB';
This would set the session’s work_mem
to a new value and allow subsequent queries within that session to utilize more memory for sorting or hashing. To return to the standard value, run the following:
reset work_mem;
maintenance_work_mem
This parameter determines the maximum amount of memory used for maintenance operations like VACUUM
, CREATE INDEX
, ALTER TABLE ADD FOREIGN KEY
, and data-loading operations (eg. COPY
). These may increase the I/O on the database servers while performing such activities, so allocating more memory to them may lead to these operations finishing more quickly. There is a formula to calculate this value as well, but we will discuss this parameter in connection with the PostgreSQL VACCUM
process.
Is that all of the memory parameters?
In this article, I have covered a few of the most important parameters surrounding PostgreSQL memory tuning. If you want to view all of the parameters that are available for PostgreSQL memory tuning, you can view more here in the PostgreSQL documentation.
Was this article helpful? Would you like to keep exploring the myths and reality of PostgreSQL configuration parameters? Let me know in the comments!
The post Are PostgreSQL memory parameters magic? appeared first on Simple Talk.
from Simple Talk https://ift.tt/hqtJeBE
via
No comments:
Post a Comment