Which is better, MongoDB or Postgres?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

I am very impressed with the new JSON features in Postgres 9.4. So can Postgres completely replace MongoDB? There are some grouping and aggregate functions that MongoDB has that Postgres does not have. More so, at the Mongo shell I have access to the entire Javascript language, whereas Postgres only gives me SQL plus some cool extensions. I recently found out about “WITH” statements in Postgres. From the point of view of SQL, this is the coolest thing ever. From the point of view of Javascript, this is laughably trivial — so you get to store a query in a variable? Is that a big deal?

If I wrote queries that needed both relational data and JSON in one query then I would say that PostGres is the clear winner, but my brain does not work that way. Perhaps it is just habit, but I think of relational queries as one thing and working with documents as something else. More so, I adopt very different work styles when working with relational data, compared to when I work with documents, so I have trouble imagining using them at the same time.

For now, I can still see a use for MongoDB. But if I was only allowed to use one technology for data persistence, I would pick PostGres.

Some interesting advice about slow queries in MongoDB:

1. Run your query with explain – to confirm that your queries are indeed indexed[1]. There have been a couple of times where we discovered that a specific branch of a complex query was not hitting index – so discovering that was useful.

2. One thought on slow reads with aggregation framework and 1M+ documents is that you are bottlenecked on memory. Even though all your queries may be indexed – you may be hitting disk too often (and doing a lot of random I/O) which is expensive. Check out MMS – and specifically the page faults graph – and you will get an idea about how many times a second you hit disk. For my specific application – response time was very important – so we tried to keep our page faults close to zero. Is increasing the RAM in your instance a viable option? If so this would definitely give you some breathing room.

3. You should try and verify if your response is slow because the query processing time itself is high, or if it’s because you have several concurrent queries competing for system resources (CPU / memory etc). If it’s the latter – the fix is easy – just add more machines to your replicaset (balance this with adding more RAM).

4. Check your lock %. My guess – from your brief description – this will not be an issue for you. If this is an issue – you __may__ need to shard your system – or do 5.

5. I think the most scalable solution would be to rethink your data pipeline. Why do you need to apply complex aggregation queries on 1M+ documents? Can you build a processing pipeline that takes in your data – and builds “prepared views” for the queries you expect out of your aggregation pipeline? In my application – we used stream processing pipeline that aggregated all the messages we got – in all the different ways we needed – and inserted them into different databases (to work around the DB level locking :( ) in mongo. Each collection had exactly ONE type of query that would be executed – this would be an extremely simple query and would be indexed. This pattern is common – even outside of mongo.