A rewrite of a stateful application written in python with postgres would be more illustrative of how you're solving the same problems but better. Do BEAM applications not use an actual databse? How is crash tolerance guaranteed? In a typical application I'd write crash tolerance would be handled by the DB. So would transactionality. Without it, one would have to persist each message to disk and be forced to make every action idempotent. The former sounds like a lot of performance overhead, the latter like a lot of programming effort overhead. I assume these problems are solved, but the article doesn't demonstrate the solutions.
You would have a process handling the calls to the postgres.
That process has as local state the database connection and receive messages that are translated to SQL queries, here 2 scenarios are possible:
1) The query is invalid (you are trying to inert a row with a missing foreign key, or wrong data type). In that case, you send the error back to the caller.
2) There is a network problem between your application and the database (might be temporary).
You just let the process crash (local state is lost), the supervisor restarts it, the restarted process tries to connect back to the database (new local state). If it still fails it will crash again and the supervisor might decide to notify other parts of the application of the problem. If the network issue was temporary, the restart succeeds.
Before crashing, you notified the caller that there was a problem and he should retry.
Now, for the caller. You could start a transient process in a dynamic supervisor for every query. That would handle the retry mechanism. The "querier process" would quit only on success and send the result back as a message. When receiving an error, it would crash and then be restarted by the supervisor for the retry.
There are plenty of other solutions, and in Elixir you have "ecto" that handles all of this for you. "ecto" is not an ORM, but rather a data-mapper: https://github.com/elixir-ecto/ecto
> Do BEAM applications not use an actual databse? How is crash tolerance guaranteed? In a typical application I'd write crash tolerance would be handled by the DB. So would transactionality.
OTP includes mnesia, which is a distributed, optionally transactional database (for mostly key-values); it's not the easiest thing to use, but it's there. You can also connect out to an external database, there's no requirement to stay within BEAM.
If you want database changes to be persisted to disk, you have to persist them. If you want to wait to show success until the changes have persisted, you have to wait. I don't see how the runtime you use changes that, so I'm not really sure I understand your question? You don't generally persist the process mailboxes; if a process or node crashes, its mailbox is lost.
In a distributed system you rapidly run into two generals questions, which are always challenging to address. If I send you a message, and I receive a reply, I know you received it. But if I send you a message and don't receive a reply, I don't know what happened; maybe you never got it, maybe you received it and crashed, maybe you replied but I never got it, maybe you replied but I crashed or timed out and moved on. Again, that's the case regardless of runtime. It's hard to find systems with 100% uptime on all individual parts, so you have to set a reasonable timeout on communication, and you have to deal with picking up the pieces when that happens.
> I assume these problems are solved, but the article doesn't demonstrate the solutions.
There isn't really a general solution to the systems are hard problem. You have to pick what's appropriate for your system, and many systems will need different solutions for different parts. As an example from my time at WhatsApp: the table indicating which process held the tcp chat connection for a user was never persisted to disk; otoh (towards the end of my time) text messages would not be acknowledged to the client until they were either acknowledged by the destination client or in memory or on disk on multiple servers; the receiving client was responsible to deduplicate messages in cases where the sender did not receive an ack and resent or when one of the redundant servers was offline when the message was delivered and it delivered it again later. Many things less critical than messages were acknowledged when accepted, without waiting for confirmed persistance. Many user actions would not be automatically retried on a timeout or other failure --- letting the user decide what to do.
I guess maybe the question is why use BEAM if it also doesn't solve the general systems are problem? IMHO, the reason to use BEAM is because it helps you structure your system around easy to reason about parts. You've got to do some work to get messages into the right mailboxes, but the process working on a mailbox usually reads a message, does the work for the message, sends a reply and then gets to the next message in its mailbox. Each individual process can be simple and self-contained. Explicit locking can (hopefully) be avoided by ensuring only a single process is responsible for some piece of state, and that accessing that state is done by sending the responsible process a message. BEAM takes care of locking around the mailbox, but you don't need to worry about it.
When I say crash tolerance, I mean the entire system going down. Given the emphasis on async BEAM processes, which all work in memory, I find it hard to understand why they're more reliable than the "standard" approaches of SQL dbs or crash-tolerant queues like kafka.
Take this example from the article:
I'd assume we want PaymentGateway to commit to a DB. But there's no transactionality with notifications, hence notifications can be lost if the entire runtime goes down. For an article trying to "sell" BEAM to me, I just don't see the value.def handle_call({:process, order}, _from, state) do customer = Customers.fetch!(order.customer_id) charge = PaymentGateway.charge!(customer, order.total) Notifications.send_confirmation!(customer, charge) {:reply, :ok, state} end> I guess maybe the question is why use BEAM if it also doesn't solve the general systems are problem?
I interpreted the tone of the article to mean it does solve all these problems. Resulting in my general confusion as to the actual advantages. I think this whole actor business somewhat reminds me of the Smalltalk people saying it's all about message passing, but I just don't understand what's the difference between passing a message to and object, and doing obj.function(message). At least for BEAM the whole supervisor tree seems neat, but other than that, it sounds like go routines with channels, or just a queue in python.