PostgreSQL Architecture Document by Subham Dash 1710404181
PostgreSQL Architecture Document by Subham Dash 1710404181
PostgreSQL Architecture Document by Subham Dash 1710404181
FOR
POSTGRESQL
ARCHITECTURE
Query Execution
PostgreSQL accepts Queries from clients, process them and returns Results back to clients.
Background Processe
• WAL Writer • Stat Collector
• BG Writer • Logging Collector
• Checkpointer • Auto-vacuum Launcher
Physical Files
• Archive Logs • Data Files
• WAL Files • Log Files
Logical Structure
• Tablespace • Databases
• Users
The list of background processes required for PostgreSQL operation are as follows.
Process Role
Logger Write the error message to the log file.
checkpointer When a checkpoint occurs, the dirty buffer is written to the file
Writer Periodically writes the dirty buffer to a file.
wal writer Write the WAL buffer to the WAL file
Autovacuum Fork autovacuum worker when autovacuum is enabled.It is the
launcher responsibility of the autovacuum daemon to carry vacuum operations
on bloated tables on demand
archiver When in Archive.log mode, copy the WAL file to the specified
directory.
stats collector DBMS usage statistics such as session execution information (
pg_stat_activity ) and table usage statistical information (
pg_stat_all_tables ) are collected.
As per the figure, it is clearly understood that all – the utility processes + user backends +
Postmaster Daemon are attached to syslogger process for logging the information about their
activities. Every process information is logged under $PGDATA/pg_log with the file .log.
Debugging more on the process information will cause overhead on the server. Minimal tuning is
always recommended. However, increasing the debug level when required. Click Here for further
on logging parameters logging collector, which is a background process that captures log messages
sent to stderr and redirects them into log files.
Checkpointer: When checkpoints occur, all the dirty pages must write to disk. If we increase the
checkpoint_segments then checkpoint will occur less and so I/O will be less as it need to write less
to disk. IF large amount of data is inserted there is more generation of checkpoints. Write-Ahead
Logging (WAL) puts a checkpoint in the transaction log every so often.
▪ The CHECKPOINT command forces an immediate checkpoint when the command is
issued, without waiting for a scheduled checkpoint. A checkpoint is a point in the
transaction log sequence at which all data files have been updated to reflect the information
in the log. All data files will be flushed to disk. If executed during recovery, the
CHECKPOINT command will force a restartpoint rather than writing a new checkpoint.
Only superusers can call CHECKPOINT. The command is not intended for use during
normal operation.
Stats Collector: PostgreSQL's statistics collector is a subsystem that supports collection and
reporting of information about server activity. Presently, the collector can count accesses to tables
and indexes in both disk-block and individual-row terms. It also tracks the total number of rows in
each table, and information about vacuum and analyze actions for each table. It can also count
calls to user-defined functions and the total time spent in each one.
▪ PostgreSQL also supports reporting of the exact command currently being executed by
other server processes. This facility is independent of the collector process.
▪ The statistics collector transmits the collected information to other PostgreSQL processes
through temporary files. These files are stored in the directory named by the
stats_temp_directory parameter, pg_stat_tmp by default. For better performance,
stats_temp_directory can be pointed at a RAM-based file system, decreasing physical I/O
requirements. When the server shuts down cleanly, a permanent copy of the statistics data
is stored in the pg_stat subdirectory, so that statistics can be retained across server restarts.
When recovery is performed at server start (e.g. after immediate shutdown, server crash,
and point-in-time recovery), all statistics counters are reset.
Archiver: Achiver process is optional process, default is OFF. Setting up the database in Archive
mode means to capture the WAL data of each segment file once it is filled and save that data
somewhere before the segment file is recycled for reuse.
▪ On Database Archivelog mode, once the WAL data is filled in the WAL Segment, that
filled segment named file is created under PGDATA/pg_xlog/archive_status by the WAL
Writer naming the file as “.ready”. File naming will be “segment-filename.ready”.
▪ Archiver Process triggers on finding the files which are in “.ready” state created by the
WAL Writer process. Archiver process picks the ‘segment-file_number’ of .ready file and
copies the file from $PGDATA/pg_xlog location to its concerned Archive destination
given in ‘archive_command’ parameter(postgresql.conf).
Storage
▪ data_directory: Specifies the directory to use for data storage. This parameter can only be
set at server start.
▪ config_file: Specifies the main server configuration file (customarily called
postgresql.conf). This parameter can only be set on the postgres command line.
▪ hba_file: Specifies the configuration file for host-based authentication (customarily called
pg_hba.conf). This parameter can only be set at server start.
▪ ident_file: Specifies the configuration file for Section 19.2 username mapping
(customarily called pg_ident.conf). This parameter can only be set at server start.
▪ external_pid_file: Specifies the name of an additional process-ID (PID) file that the server
should create for use by server administration programs. This parameter can only be set at
server start.
▪ PG_LOG: It is not an actual postgres directory, it is the directory where RHEL stores the
actual textual LOG.
▪ PG_XLOG: Here the write ahead logs are stored. It is the log file, where all the logs are
stored of committed and un committed transaction. It contains max 6 logs, and last one
overwrites. If archiver is on, it moves there.
▪ PG_CLOG: It contains the commit log files, used for recovery for instant crash.
▪ PG_VERSION: A file containing the major version number of PostgreSQL.
▪ Base: Subdirectory containing per-database subdirectories
▪ Global: Subdirectory containing cluster-wide tables, such as pg_database.
▪ PG_MULTIXACT: Subdirectory containing multitransaction status data (used for shared
row locks)
▪ PG_SUBTRANS: Subdirectory containing subtransaction status data PG_TBLSPC:
Subdirectory containing symbolic links to tablespaces.
▪ PG_TWOPHASE: Subdirectory containing state files for prepared transactions.
▪ POSTMASTER.OPTS: A file recording the command-line options the postmaster was
last started with.
▪ POSTMASTER.PID: A lock file recording the current postmaster PID and shared
memory segment ID (not present after postmaster shutdown)