Some very rough notes from MongoDB London 2013:

Session 1 - Performance

Keep indexes in memory

Data in memory if you can

Slow queries can be configured to appear in logs

Use SSDs

Growing documents is bad

Do an ‘explain' on queries

Padding factor

DB locks when writing

Sharding to scale writes

Optionally read from slaves but they may not have the written data yet

Write concern level configurable

Can set importance level of writes based on the node that has acknowledged the write

You can define your own _id structure to help querying

Use short field names - use an abstraction layer

Covered indexes

Dropping collections is faster than removing


Run your own benchmark - benchrun


Document per day,  pre allocated then use inc operator

Session 2 - Backups

bsondump converts bson to json

Use journalling

Disk backups faster

TTL indexes and capped collections


An uneven number of nodes is advised

There's a mesh of hearbeats between the nodes

An arbiter node only exists for voting - it stores no data

You can have hiddden nodes, for "backup" purposes only

You can give a node a slaveDelay so the replication is delayed

Servers can be tagged e,g, { datacenter: new york }

There are 5 read preference modes

You can test all this on a single machine

Failure points:

  • Power
  • Network
  • Data Center (5 nodes safest, 2 (primary) +2 (primary)+1 (backup DC)
  • Multi-node failure can occur e.g. 2 out of 3 fail

When there's only one node, the whole cluster becomes read-only

You can disabled indexing if you want to, e.g. on a backup node that isn't ever queried

OpenStreetMap data contains lots of Points Of Interest, e.g. pubs

MongoDB can be used with Hadoop

There's a mongo-storm project