Measuring the Memory Overhead of a Postgres Connection
One fairly common complaint about postgres is that is that each connection uses too much memory. Often made when comparing postgres' connection model to one where each connection is assigned a dedicated thread, instead of the current model where each connection has a dedicated process.
To be clear: This is a worthwhile discussion to have. And there are several important improvements we could make to reduce memory usage.
That said, I think one common cause of these concerns is that the easy ways to
measure the memory usage of a postgres backend, like
ps, are quite
It is surprisingly hard to accurately measure the increase in memory usage by an additional connection.
In this post I’m mostly going to talk about running Postgres on Linux, since that is what I have the most experience with.
My bold claim is that, when measuring accurately, a connection only has an overhead of less than 2MiB (see conclusion).
A first look
Just using the common operating system tools make the overhead look much bigger than it actually is. Especially when not utilizing huge_pages (not recommended), the memory usage for each process will look high.
Let’s first look at a freshly established connection, in a freshly started postgres cluster:
andres@awork3:~$ psql postgres=# SELECT pg_backend_pid(); ┌────────────────┐ │ pg_backend_pid │ ├────────────────┤ │ 2003213 │ └────────────────┘ (1 row) andres@awork3:~/src/postgresql$ ps -q 2003213 -eo pid,rss PID RSS 2003213 16944
Massive leaks!?! Luckily not.
What’s worse, the memory usage will appear to grow over time. To show this problem, I’ll use the pgprewarm extension to load all pages in a table into postgres' buffer pool:
postgres=# SHOW shared_buffers ; ┌────────────────┐ │ shared_buffers │ ├────────────────┤ │ 16GB │ └────────────────┘ (1 row) postgres=# SELECT SUM(pg_prewarm(oid, 'buffer')) FROM pg_class WHERE relfilenode <> 0; ┌────────┐ │ sum │ ├────────┤ │ 383341 │ └────────┘ andres@awork3:~$ ps -q 2003213 -eo pid,rss PID RSS 2003213 3169144
Now postgres memory usage appears to be around 3GB. Even though the individual connection did not actually allocate much additional memory. The added memory usage is proportional to the amount of shared buffers touched:
postgres=# SELECT pg_size_pretty(SUM(pg_relation_size(oid))) FROM pg_class WHERE relfilenode <> 0; ┌────────────────┐ │ pg_size_pretty │ ├────────────────┤ │ 2995 MB │ └────────────────┘ (1 row)
And even worse than that, if yet another connection also uses those pages, it will also show as having a huge memory usage:
postgres=# SELECT sum(abalance) FROM pgbench_accounts ; ┌─────┐ │ sum │ ├─────┤ │ 0 │ └─────┘ (1 row) andres@awork3:~/src/postgresql$ ps -q 3244960 -eo pid,rss PID RSS 3244960 2700372
Of course postgres does not actually use 3+2.7 GiB of memory in this
case. Instead, what is happening is that, with
huge_pages=off off, ps will
attribute the amount of shared memory, including the buffer pool, that a
connection has utilized for each connection. Obviously leading to vastly
over-estimating memory usage.
Huge pages accidentally kind of save the day
Many CPU micro-architectures normally use a page size of 4KiB, but also optionally can use larger page sizes, most commonly 2MiB.
Depending on operating system, configuration, and the type of applications used such larger pages can be utilized transparently by the operating system, or explicitly by applications. See e.g. Debian wiki page about huge pages for some details.
Repeating the previous experiments with
huge_pages=on makes them look a lot
better. First, looking at a “new connection”:
andres@awork3:~$ ps -q 3245907 -eo pid,rss PID RSS 3245907 7612
So, a new connection now appears to use only about ~7MiB. This reduction in
memory usage is caused by the page table
needing less memory, as it now only needs to contain
1/512th of the previous
entries, due to the larger page size.
And more importantly, the test where a lot of the memory is accessed:
postgres=# ;SELECT SUM(pg_prewarm(oid, 'buffer')) FROM pg_class WHERE relfilenode <> 0; … postgres=# SELECT sum(abalance) FROM pgbench_accounts ; … andres@awork3:~$ ps -q 3245907,3245974 -eo pid,rss PID RSS 3245907 12260 3245974 8936
In contrast to above, these connections now appear to only use 12MiB and 9MiB respectively, where previously they used 3GiB and 2.7GiB. Quite the apparent change ;)
This is due to the way the use of larger pages is implemented in Linux, not
because we used orders of magnitude less memory: huge pages used just aren’t
shown as part of the
RSS column of ps/top.
Getting less unreal
Since Linux 4.5 the
file shows the memory usage split into finer sub-categories:
VmRSS size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) RssAnon size of resident anonymous memory RssFile size of resident file mappings RssShmem size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings)
Looking at these stats with
andres@awork3:~$ grep -E '^(Rss|HugetlbPages)' /proc/3247901/status RssAnon: 2476 kB RssFile: 5072 kB RssShmem: 8520 kB HugetlbPages: 0 kB postgres=# SELECT SUM(pg_prewarm(oid, 'buffer')) FROM pg_class WHERE relfilenode <> 0; andres@awork3:~$ ps -q 3247901 -eo pid,rss PID RSS 3247901 3167164 andres@awork3:~$ grep -E '^(Rss|HugetlbPages)' /proc/3247901/status RssAnon: 3148 kB RssFile: 9212 kB RssShmem: 3154804 kB HugetlbPages: 0 kB
RssAnon is the amount of “anonymous” memory, i.e. memory
RssFile are memory mapped files, including the postgres binary
itself. And lastly,
RssShmem shows the accessed non-huge_pages shared
This nicely shows that most of the high memory usage ps etc show is due to the shared memory accesses.
andres@awork3:~$ grep -E '^(Rss|HugetlbPages)' /proc/3248101/status RssAnon: 2476 kB RssFile: 4664 kB RssShmem: 0 kB HugetlbPages: 778240 kB postgres=# SELECT SUM(pg_prewarm(oid, 'buffer')) FROM pg_class WHERE relfilenode <> 0; andres@awork3:~$ grep -E '^(Rss|HugetlbPages)' /proc/3248101/status RssAnon: 3136 kB RssFile: 8756 kB RssShmem: 0 kB HugetlbPages: 3846144 kB
Just adding up the memory usage of the non-shmem values still over-estimates memory usage. There’s two main reasons:
First, it doesn’t actually make sense to include
RssFile when measuring a
postgres backend’s memory usage - for postgres that overwhelmingly just are the
postgres binary and the shared libraries it uses (postgres does not
files). As nearly all of that is shared between all processes in the system,
it’s not a per-connection overhead.
Secondly, even just looking at
RssAnon over-estimates memory usage. The
reason for that is that ps measures the entire memory of the process, even
though the majority of a new connection’s overhead is shared between the user
connection and the supervisor process. This is because Linux does not copy all
fork()ing a new process, instead it uses
Copy-on-write to only copy pages
There is no good way to accurately measure the memory usage of an individual
forked process, but since version 4.14 the Linux kernel at least provides an
(commit with description)
in a process'
Pss shows the “the process’
proportional share of this mapping” across all of a process' mappings (Search
linux’s documentation page for
which unfortunately does not have direct links). For memory shared between
processes it will divide the memory usage by the number of processes using a
postgres=# SELECT SUM(pg_prewarm(oid, 'buffer')) FROM pg_class WHERE relfilenode <> 0; ┌────────┐ │ sum │ ├────────┤ │ 383341 │ └────────┘ (1 row) postgres=# SHOW huge_pages ; ┌────────────┐ │ huge_pages │ ├────────────┤ │ off │ └────────────┘ (1 row) andres@awork3:~$ grep ^Pss /proc/2004042/smaps_rollup Pss: 3113967 kB Pss_Anon: 2153 kB Pss_File: 3128 kB Pss_Shmem: 3108684 kB
Pss_Anon contains memory allocated by the process,
Pss_File includes shared
libraries etc linked into the process, and
Pss_Shmem (if not using
huge_pages) the shared memory usage divided across all the processes having
touched the corresponding pages.
What makes the proportional values not perfect is that the divisor depends on the number of connections to the server. Here I use pgbench (scale 1000, -S, -M prepared -c 1024) to start a large number of connections:
postgres=# SELECT count(*) FROM pg_stat_activity ; ┌───────┐ │ count │ ├───────┤ │ 1030 │ └───────┘ (1 row) postgres=# SELECT pid FROM pg_stat_activity WHERE application_name = 'pgbench' ORDER BY random() LIMIT 1; ┌─────────┐ │ pid │ ├─────────┤ │ 3249913 │ └─────────┘ (1 row) andres@awork3:~$ grep ^Pss /proc/3249913/smaps_rollup Pss: 4055 kB Pss_Anon: 1185 kB Pss_File: 6 kB Pss_Shmem: 2863 kB
andres@awork3:~$ grep ^Pss /proc/2007379/smaps_rollup Pss: 1179 kB Pss_Anon: 1173 kB Pss_File: 6 kB Pss_Shmem: 0 kB
Pss values unfortunately do not account for resources only not visible to
the application. E.g. the size of the page table is not included. The page
table size is also visible in the aforementioned
To my knowledge - but I am not certain -
VmPTE (the page table size) is
completely private for each process, but most other
Vm* values, including the
VmStk are shared in a copy-on-write manner.
Using the above, the overhead due to the page table is, with
andres@awork3:~$ grep ^VmPTE /proc/2004042/status VmPTE: 6480 kB
VmPTE: 132 kB
Of course there are some additional small overheads in the kernel for each
process, but they are small enough not be worth being represented in the
Based on the measurements above we can approximate that a connection running a
very simplistic read-only OLTP workload has an overhead of about 7.6MiB
huge_pages=off and of 1.3MiB
huge_pages=on by adding
Even when leaving room for some “invisible” overhead, a larger amount of data in Postgres' buffer pool, etc, I think this backs my earlier claim that the overhead of a connection is less than 2MiB.