Improve PostgreSQL performance
By Udit Agarwal
Setting up the database correctly is essential for tuning PostgreSQL performance. At the time of including a table and query you run, your databases require maintenance and up gradation to ensure ideal PostgreSQL optimization.
PostgreSQL optimization is straightforward, but the database admin must know essential factors for efficient operation.
The best practices for improving ingest performance in vanilla PostgreSQL:
1. Use indexes in moderation
The correct indexes can speed up the queries. Maintaining indexes with each new row demands additional effort. Always check the number of indexes, evaluate if their potential benefits outweigh storage, and insert overhead for queries. Every system is unique; there aren’t any hard and fast rules or “magic numbers” of indexes.
2. Reconsider Foreign Key Constraints
It’s crucial to create foreign keys from one table to another relational table. With a foreign key constraint, each INSERT from the referenced table needs consideration. This can degrade the performance. Denormalizing data to use foreign keys for “elegance” rather than engineering tradeoffs is unnecessary.
3. Avoid unnecessary UNIQUE keys
Developers are often trained in such a way that they can specify primary keys in database tables. Monitoring or time-series applications can log each event or sensor reading as a separate entry by inserting it at the end of a hyper table’s current chunk during write time.
If a UNIQUE constraint is defined, an unnecessary index lookup is performed to determine if the row already exists in the system during insertion. This will adversely impact the speed of INSERT.
4. Use separate disks for WAL and data
Using a separate disk (tablespace) for the database’s write-ahead log (WAL) and data can increase throughput if the disk becomes a bottleneck, which is an advanced optimization that’s not necessary.
5. Use performant disks
The developers deploy their database in environments with slower disks; this is all because of poorly-performing HDD, remote SANs, or other types of configurations. At the time of the insertion of the rows, the data is durably stored in the write-ahead log (WAL) before the completion of the transaction. This can slow down the disks and significantly impact insert performance. One major thing that needs to keep under consideration is to check your disk IOPS.
6. Use parallel writes.
Each INSERT or COPY command to TimescaleDB is executed as a single transaction and runs in a single-threaded fashion. To achieve higher ingest, execute multiple INSERTS or COPY commands in parallel.
7. Insert rows in batches.
To achieve of higher ingests rates, one must insert the data with many rows in each INSERT call or use some bulk insert command, like COPY or our parallel copy tool.
Insertion of data row-by-row instead of trying at least hundreds (or thousands) of rows per INSERT. This frees up the database to spend more time processing data by reducing time spent on connection management, transaction overhead, and SQL parsing.
8. Properly configure shared_buffers
25% of available RAM is the primary recommendation. If you install TimescaleDB with the help of a method that runs Timescaledb-tune, it should automatically configure shared_buffers to something well-suited to your hardware specs.
9. Run Docker images on Linux hosts
Running a TimescaleDB Docker container (Linux) on another Linux operating system gives you ample space. The container helps facilitate the process isolation, and the overhead is extremely minimal.
When the container is executed on a Mac or Windows machine, the OS virtualization, including input and output, causes some performance degradation.
On the contrary, if you have a plan to run the container on Mac or Windows, it is always recommended to have the installation of Docker image.
10. Write data in loose time order
When chunks are sized as per the latest chunk(s), their associated indexes are naturally maintained in memory. New rows with recent timestamps are inserted into memory storage chunks and indexes.
Inserting a row with an older timestamp requires reading the disk pages corresponding to the older chunk from the disk. This will be significantly helpful with the increment of latency and lower insert throughput.