postgresql   30060

« earlier    

Scaling Postgres with Read Replicas and Using WAL to Counter Stale Reads | Hacker News
Instead of having an "observer" process that updates a table with the LSN you can just ask Postgres directly: The `pg_stat_replication` view has the last LSN replayed on each replica available:
Also, instead of updating the `users` table with the LSN of the commit - which creates extra write load - why not store it in the session cookie, then you can route based on that.
Another option is to enable synchronous replication for transactions that need to be visible on all replicas:
Since this can be enabled/disabled for each transaction it's really powerful.
hn  postgresql  postgres  scaling  high-availability 
yesterday by hellsten
PostgreSQL's Powerful New Join Type: LATERAL - Heap Blog
a LATERAL join is like a SQL foreach loop, in which PostgreSQL will iterate over each row in a result set and evaluate a subquery using that row as a parameter.
lateral  analytics  sql  postgres  postgresql 
yesterday by hellsten
Monitoring approach for Streaming Replication with Hot Standby in PostgreSQL 9.3. | EnterpriseDB
Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;
Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.
posgres  postgresql  scaling  architecture  high-availability 
yesterday by hellsten
Scaling the GitLab database | GitLab
pgpool was the first solution we looked into, mostly because it seemed quite attractive based on all the features it offered. Some of the data from our tests can be found in this comment.

Ultimately we decided against using pgpool based on a number of factors. For example, pgpool does not support sticky connections. This is problematic when performing a write and (trying to) display the results right away. Imagine creating an issue and being redirected to the page, only to run into an HTTP 404 error because the server used for any read-only queries did not yet have the data. One way to work around this would be to use synchronous replication, but this brings many other problems to the table; problems we prefer to avoid.

Another problem is that pgpool's load balancing logic is decoupled from your application and operates by parsing SQL queries and sending them to the right server. Because this happens outside of your application you have very little control over which query runs where. This may actually be beneficial to some because you don't need additional application logic, but it also prevents you from adjusting the routing logic if necessary.

Configuring pgpool also proved quite difficult due to the sheer number of configuration options. Perhaps the final nail in the coffin was the feedback we got on pgpool from those having used it in the past. The feedback we received regarding pgpool was usually negative, though not very detailed in most cases. While most of the complaints appeared to be related to earlier versions of pgpool it still made us doubt if using it was the right choice.

The feedback combined with the issues described above ultimately led to us deciding against using pgpool and using pgbouncer instead. We performed a similar set of tests with pgbouncer and were very satisfied with it. It's fairly easy to configure (and doesn't have that much that needs configuring in the first place), relatively easy to ship, focuses only on connection pooling (and does it really well), and had very little (if any) noticeable overhead. Perhaps my only complaint would be that the pgbouncer website can be a little bit hard to navigate.

Using pgbouncer we were able to drop the number of active PostgreSQL connections from a few hundred to only 10-20 by using transaction pooling. We opted for using transaction pooling since Rails database connections are persistent. I
pgpool  database  performance  postgres  postgresql  architecture 
yesterday by hellsten
Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads — Brandur Leach
- Modern databases operating over low latency connections can keep replicas trailing their primary very closely, and probably spend most of their time less than a second out of date. Even systems using read replicas without any techniques for mitigating stale reads will produce correct results most of the time.

- Let’s take a look at a technique to make sure that stale reads never occur. We’ll use Postgres’s own understanding of its replication state and some in-application intelligence around connection management to accomplish it.

1. Postgres commits all changes to a WAL (write-ahead log) for durability reasons.
2. Changes are written to the WAL one entry at a time and each one is assigned a LSN (log sequence number).
3. Changes are batched in 16 MB WAL segments.
4. A Postgres database can dump a representation of its current state to a base backup which can be used to initialize replica.
5. From there, the replica stays in lockstep with its primary by consuming changes in its emitted WAL.
6. A base backup comes with a pointer to the current LSN so that when a replica starts to consume the WAL, it knows where to start.

- There are a few ways for a replica to consume WAL.

1. The first is “log shipping”: completed WAL segments (16 MB chunks of the WAL) are copied from primary to replicas and consumed as a single batch. => secondaries will be at least as behind as the current segment that’s still being written.

2. Another common configuration for consuming WAL is “streaming”, where WAL is emitted by the primary to replicas over an open connection. This has the advantage of secondaries being very current at the cost of some extra resource consumption.

- replicas consuming WAL with log shipping are also known as “warm standbys”
- while those using streaming are called “hot standbys”.

- By routing read operations only to replicas that are caught up enough to run them accurately, we can eliminate stale reads. This necessitates an easy way of measuring how far behind a replica is, and the WAL’s LSN is perfect for this use.

- For any action that will later affect reads, we touch the user’s min_lsn by setting it to the value of the primary’s pg_current_wal_lsn().
postgres  postgresql  scaling  high-availability  architecture  standby 
yesterday by hellsten
Database Patterns: Trees and Paths using WITH RECURSIVE in PostgreSQL
Trees and Paths using WITH RECURSIVE in PostgreSQL - Added March 29, 2017 at 01:26PM
db  postgresql  sql 
yesterday by xenocid
Mobile database client. Browse, search, and update all your PostgreSQL databases on the go with the DataGlass Mobile Database Client for PostgreSQL! DataGlass is your one stop for complete and flexible access to your PostgreSQL servers.
development  psql  postgresql  ios  appstore  database  sql 
yesterday by moalex
Knex.js - A SQL Query Builder for Javascript
Knex.js is a "batteries included" SQL query builder for Postgres, MSSQL, MySQL, MariaDB, SQLite3, and Oracle designed to be flexible, portable, and fun to use. It features both traditional node style callbacks as well as a promise interface for cleaner async flow control, a stream interface, full featured query and schema builders, transaction support (with savepoints), connection pooling and standardized responses between different query clients and dialects.
javascript  data  nodejs  sql  postgresql  mysql  sqlite 
2 days ago by davidgasperoni
Playing with Postgres and Kafka. |
Playing with Postgres and Kafka. - Added March 03, 2017 at 10:53AM
data-engineering  kafka  postgresql  read2of 
2 days ago by xenocid

« earlier    

related tags

analytics  announcement  api  app  appstore  architecture  art  aws  backup  best  blog  book  books  bugs  c++  cap  charts  cli  client  cloud  cloudsql  collaborate  collaboration  cool  crystalreports  csv  data-engineering  data  database  datascience  db  design  details  development  devops  distributed  django  editor  encryption  feathers  filter  fuzzy  gcp  gis  gotcha  gpu  ha  high-availability  hn  howto  idempotency  import  indexes  inspiration  intro  ios  javascript  json  kafka  kubernetes  lag  langc  lateral  learning  library  locking  low-level-programming  mac  management  maps  mariadb  mastering  mongodb  msftconnect  mysql  net  nodejs  nosql  opensource  optimization  oracle  osx  partitioning  performance  pg  pgpool  posgres  postgis  postgres  postgresweekly  programming  psql  python  query  rails  rdbms  rds  read2of  reference  rest  rethinkdb  scaling  search  security  sequelize  server  sql  sqlite  sqlserver  standby  strackoverflow  sysadmin  table  talks  testing  time-series  tips  todo  tool  toolkit  tools  trigram  tui  tutorial  tutorials  webdesign  webdev  webservice  window_functions 

Copy this bookmark: