Antares functions as a middle-tier application. It receives announcements from agents (labeled 𝛂), and it uses the services of agents (labeled 𝜷), and we label it A. But it would be a mistake to assume that operationally it functions exactly like a typical middle-tier web server of 2017.

Let's first look at what scenarios Antares supports, and how this differs from a conventional application.

Timeouts

In the first scenario, illustrated above, an incoming request from 𝛂 causes A to make a downstream request to 𝜷, but 𝜷 does not respond in a timely fashion. Now you might be wondering who decides what is a timely fashion, and it is Antares itself. For while remote agents like 𝜷, and the software they use may have default timeouts, it is truly up to Antares how long it intends to wait, and what it does if it does not hear back. So Antares can set timeouts per request, and that's the first difference to call out.

Retries

Perhaps you know that your downstream agent 𝜷 occasionally fails to respond within the allotted time. But being a good citizen, it also will handle duplicate responses gracefully, in other words - idempotently which means not processing them twice. It is often necessary to re-send, and so with a small addition to your code, Antares downstream calls (which we call renders) can be set to automatically retry. This may make more sense than simply returning an error back upstream, if upstream's only option is to retry anyway. So, in figure A), we see an operation to 𝜷 succeeding on the final, 3rd retry, and then 𝛂 gets its response, none the wiser of the problem. For More: RxJS retries.

Optimistic Responses

Here is the most controversial point. If A were going to retry anyway (Retries), could A not have given 𝛂's reply back immediately—an acknowledgment— before it even reaches out to 𝜷 ? With the implication that: "Hey, I got it 𝛂— I'll take care of it, and retry as many times as allowed. I'll let you know if it doesn't succeed eventually though". See the topic of Compensating Transactions in the topic of Sagas, to understand the best way to communicate a failure, after you've given an optimistic response. It usually involves delivering the corrected up-to-date information.

Antares is about facts. A fact that something has occurred. If it is a fact that you received a communication from 𝛂, correct operational parameters for you may be that you send an acknowledgement right away, then try to persist to 𝜷. This is to say - that while 𝜷 is important, so is prompt response to 𝛂! And if 𝛂 is your customer and 𝜷 is your database, and if you have pretty good success rates at getting things from A to 𝜷, being able to do Optimistic response may be important to you. This can extend all the way up the stack to the front-end, with the concept we call Optimistic UI.

In any event, the choice of prioritizing customer experience vs. persistence is one that probably should be considered with specific cases in mind, and with all the stakeholders' input. Antares is of the opinion that an Optimistic response is perfectly appropriate, and that async execution can be a good default.

And this leads us to...

Sync Vs Async

An Optimistic Response means you may as well do the render to 𝜷 asynchronously, since have returned a response to 𝛂 already. This, we explained, is called a render from A to 𝜷. Example renderers are for the DOM, for a database, to make remote API calls - anything that cannot be retracted once sent out and acknowledged as successfully received (though not necessarily fully processed at acknowledgement time, see Optimistic Responses).

In Antares, renderers can be attached synchronously OR asynchronously. Asynchronously is Antares' preferred mode, but lets first describe synchronous mode.

When a renderer for 𝜷 is attached (i.e. subscribed) synchronously, then if the renderer has an error, it will blow the call stack and 𝛂 will see the error. On the other hand, if the renderer is subscribed asynchronously, 𝛂 will get an OK response right away, and if an error occurs, another communication will have to originate from A back to 𝛂 to inform them of the error; a compensating transaction must be issued. Thankfully Antares assumes a full-duplex bidirectionally communicating environment, so A does indeed have the agency, or ability, to do this. But, in asynchronous mode, any error from 𝜷 can not effect 𝛂, because Antares has already returned an acknowledgment.

Batching

Antares, using all the operators of RxJS, is able to do arbitrarily complex reordering and batching of writes. Like the kitchen that juggles orders from multiple wait staff. Here's a scenario that batching may solve.

Suppose that network communication to 𝜷 is a relatively large part of the time of the round-trip to 𝜷. Then if you could combine these command objects or SQL strings into one, you could save on network transport time. To do so, you can simply transform the write-stream with bufferWithTimeOrCount, and have more efficient write-performance with a single line of code that would not affect a single other behavior in your program.

Now, you have to render in async mode in order to batch, but that may be the most compelling reason to go async. If you aren't having performance challenges in sync mode, then leave it sync for that extra 0.1% assurance, if that helps you to sleep better. Whichever you choose, we can agree that it's better to have control over the buffer size, not using it or setting it to 0, than to not have any control over that parameter.

Appendix A — Why Async?

A: To achieve better efficiency. Have you ever noticed how in a restaurant, the wait staff leave orders at the kitchen and then walk away—instead of hovering, waiting for the dish to be cooked? The reason for this is two-fold. First, it lets the wait staff get back to taking other orders, payments, etc. Second, it lets the kitchen scan the list of open orders and batch up identical or similar orders to work more efficiently. The kitchen would never be able to reorder its workload if its contract with the wait staff were a synchronous one.

It is a core proposition of Antares that async is the default way communication across a distance needs to work! Every distributed human system from the dawn of time has taken this property on—it is simply, what works. However, each business case is different. So sync vs async is an operational choice in Antares, not dictated by convention, or inherited from a choice of tools.

The fact is almost any system, when turned from sync to async, and no other change is made, will scarcely appear to function any differently! If errors are a small part of your responses, and if servers stay up long relative to the frequency of errors, the number of responses that behave any differently may be 1/1,000 or fewer. But the real power of async comes when you can play with other parameters, like batching.

Appendix B — Comparison to the Status Quo

These days the status quo is that it is almost universally true that when a web server receives a request, the following things happen in this inalterable order:

  1. A command object in the remote service's language is constructed (SQL string, Mongo object, etc..)
  2. The command object is sent to 𝜷
  3. An error from 𝜷 results in 𝛂's response being an error (serialized, sanitized)
  4. Only after 𝜷 has returned a response or acknowledgment to A does A reply to 𝛂

For all the reasons listed above, this method of operation does a disservice to the business by making a choice a priori that errors originating from 𝜷 must become errors in 𝛂. It bakes this rigidity into the entire system, pre-empting performance tuning parameters like retry counts, batching, etc, that may be needed to handle bursting loads.

A counter-argument to this may be:

"But the user 𝛂 wants to know if we've really done the write to 𝜷".

But No— no, the user doesn't care; they just want to know whether you have everything you need and whether they can count on you to get it done.

There's a kind of Law of Demeter in here for distributed systems - "I will try to encapsulate you from my collaborators' failures" is a stronger promise than "I will immediately pass any downstream failure back up to you". For this stronger promise, the allowance of asynchrony is, for many, a good trade-off. This is why a sync mode of rendering is contra-indicated, even if clearly supported as an API option. With great power, comes great responsibility, etc, etc...

results matching ""

    No results matching ""