How to build 99.999% uptime payment systems
18 points by ohduran 10 hours ago | 11 comments
  • __mattya 9 hours ago |
    > Stripe gets 99.999% uptime on top of a MongoDB-like database

    Isn’t it actually MongoDB? They describe it as an “extension” [1] but it sounds like they added cluster management without changing the database itself.

    [1]: https://stripe.com/blog/how-stripes-document-databases-suppo...

    • paxys 8 hours ago |
      If you take open source MongoDB and add 10 lines of custom proprietary code, is it still MongoDB? What about a hundred, or a thousand, or a million? What if you change the interface? Ultimately you can give the modified service any name you want, and whether it is "actually MongoDB" or not is a subjective judgement.
    • jameskilton 8 hours ago |
      Stripe runs a ton of Mongo replication clusters and uses home-grown proxy services on top of Mongo that manage and control where data lives, so the services don't have to think about that side of things. I'm not sure what changes have been made to Mongo itself but for the most part it's standard Mongo 4.
  • louwrentius 9 hours ago |
    Maybe it’s me but I have a hard time reading this article, it is so vague. It hints at concepts and ideas, yet nothing is really explained.
    • dm03514 8 hours ago |
      I saw one paragraph on transaction isolation levels hidden in the word salad
    • somehnguy 8 hours ago |
      Not just you, I don't have a clue what this article was trying to teach me. I was expecting an interesting read.
    • ninju 8 hours ago |
      It's a sales pitch (for his book) masquerading as technical article
  • pistoleer 8 hours ago |
    Not technical/concrete (examples are missing!) enough for me to really understand what techniques the author is getting at. How would the author achieve all these exclusion mechanisms in a distributed system instead of letting the database engine handle that?
  • lopkeny12ko 8 hours ago |
    A lot of advice on how to achieve "high uptime" focuses on the technical bits. The technical bits are important, but the bigger risk is the people component. I've been in this game for decades and I've seen time and time again that all it takes is one bad hire who likes to roll out changes carelessly, bypassing any change managememt board, to break your uptime SLA for the entire year. You can fire or discipline that developer after the fact but at that point it is too late. See for example: this year's Crowdstrike outage.
  • tantalor 8 hours ago |
    The explanations given for uptime are topics about performance & correctness in database.

    Is it just me, or does that have nothing to do with uptime?

  • codetiger 8 hours ago |
    I was very much interested to understand anything close to helpful information, but all I hear is “Everything is in that book”.

    Am working on building a high performance payment processing engine based on iso20022 standards and looking for such information. Would like to hear some feedback on my design documents and overall.

    Development is in-progress and plan for first platform release likes somewhere in Q1 2025

    1. Product Documentation - (https://openpayments.tech)

    2. Developer Portal - (https://portal.openpayments.tech)