Both are similar in that they hold the HTTP connection open and have the benefit of being simply HTTP (the big plus here). SSE (at least to me) feels like it's more suitable for some use cases where updates/results could be streamed in.
A fitting use case might be where you're monitoring all job IDs on behalf of a given client. Then you could move the job monitoring loop to the server side and continuously yield results to the client.
I've not personally witnessed this, but people on the internets have said that _some_ proxies/LBs have problems with SSE due to the way it does buffering.
I am curious about what you mean here. The 'text/event-stream' allows for abitrary event formats, it just provides structure for EventSource to be able to parse.
You should only need one 'text/event-stream' and should be able send the same JSON via normal or SSE response.
One of the drawbacks, as I learned - SSE have limit on number of up to ~6 open connections (browser + domain name). This can quickly become a limiting factor when you open the same web page in multiple tabs.
The big downside of sse in mobile safari - at least a few years ago - is you got a constant loading spinner on the page. Thats bad UX.
Connections are dropped all the time, and then your code, on both client and server, need to account for retries (will the reconnection use a cached DNS entry? how will load balancing affect long term connections?), potentially missed events (now you need a delta between pings), DDoS protections (is this the same client connecting from 7 IPs in a row or is this a botnet), and so on.
Regular polling great reduces complexity on some of these points.
Yes, WS is complex. Long polling is not much better.
I can’t help but think that if front end connections are destroying your database, then your code is not structured correctly. You can accept both WS and long polls without touching your DB, having a single dispatcher then send the jobs to the waiting connections.
Clients using mobile phones tend to have their IPs rapidly changed in sequence.
I didn't mention databases, so I can't comment on that point.
But websockets also guarantee in-order delivery, which is never guaranteed by long polling. And websockets play way better with intermediate proxies - since nothing in the middle will buffer the whole response before delivering it. So you get better latency and better wire efficiency. (No http header per message).
At this point, long polling seems to carry more benefits, IMHO. WebSockets seem to be excellent for stable conditions, but not quite what we need for mobile.
I don't see how this is meaningfully different for long polling. The client could have received some updates but never ack'd it successfully over a long poll, so either way you need to keep a log and resync on reconnection.
- Observability: WebSockets are more stateful, so you need to implement additional logging and monitoring for persistent connections: solved with graphql if the existing monitoring is already sufficient.
- Authentication: You need to implement a new authentication mechanism for incoming WebSocket connections: solved with graphql.
- Infrastructure: You need to configure your infrastructure to support WebSockets, including load balancers and firewalls: True, firewalls need to be updated.
- Operations: You need to manage WebSocket connections and reconnections, including handling connection timeouts and errors: normally already solved by the graphql library. For errors, it's basically the same though.
- Client Implementation: You need to implement a client-side WebSocket library, including handling reconnections and state management: Just have to use a graphql library that comes with websocket support (I think most of them do) and configure it accordingly.
I vomit in my mouth a bit whenever people reach for socket.io or junk like that. You don’t want or need the complexity and bugs these libraries bring. They’re obsolete.
As a workaround in one project I wrote JavaScript code which manually sent cookies in the first websocket message from the client as soon as a connection opened. But I think this problem is now solved in all browsers.
One thing that seems clumsy in the code example is the loop that queries the data again and again. Would be nicer if the data update could also resolve the promise of the response directly.
Websockets are simply a better technology. With long polling, the devil is in the details and it’s insanely hard to get those details right in every case.
Nowadays I prefer SSE to long polling and websockets.
The idea is: the client doesn't know that the server has new data before it makes a request. With a very simple SSE the client is told that new data is there then it can request new data separately if it wants. This said, SSE has a few quirks, one of them that on HTTP/1 the connection counts to the maximum limit of 6 concurrent connections per browser and domain, so if you have several tabs, you need a SharedWorker to share the connection between the tabs. But probably this quirk also appllies to long polling and websockets. Another quirk, SSE can't transmit binary data and has some limitations in the textual data it represents. But for this use case this doesn't matter.
I would use websockets only if you have a real bidirectional data flow or need to transmit complex data.
Server-sent events.
If you are using SSE and SW and you need to transfer some binary data from client to server or from server to client, the easiest solution is to use the Fetch API. `fetch()` handles binary data perfectly well without transformations or additional protocols.
If the data in SW is large enough to require displaying the progress of the data transfer to the server, you will probably be more suited to `XMLHttpRequest`.
You don't have to use a SharedWorker, you can also do domain sharding. Since the concurrent connection limit is per domain, you can add a bunch of DNS records like SSE1.example.org -> 2001:db8::f00; SSE2.example.org -> 2001:db8::f00; SSE3.example.org -> 2001:db8::f00; and so on. Then it's just a matter of picking a domain at random on each page load. A couple hundred tabs ought to be enough for anyone ;)
Use one http response per message queue snapshot. Send no more than N messages at once. Send empty status if the queue is empty for more than 30-60 seconds. Send cancel status to an awaiting connection if a new connection opens successfully (per channel singleton). If needed, send and accept "last" id/timestamp. These are my usual rules for long-polling.
Prevents: connection overhead, congestion latency, connection stalling, unwanted multiplexing, sync loss, respectively.
You have completely different mechanisms for message passing from client-to-server and server-to-client
Is this a problem? Why should this even be symmetric?
Implementing a stable, in-order, exactly once message delivery system on top of long polling starts to look a lot like implementing TCP on top of UDP. Its a solvable problem. I've done it - 14 years ago I wrote the first opensource implementation of (the server side) of google's Browserchannel protocol, from back before websockets existed:
https://github.com/josephg/node-browserchannel
This supports long polling on browsers, all the way back to IE5.5. It works even when XHR isn't available! I wrote it in literate coffeescript, from back when that was a thing.
But getting all of those little details right is really very difficult. Its a lot of code, and there are a lot of very subtle bugs lurking in this kind of code if you aren't careful. So you also need good, complex testing. You can see in that repo - I ended up with over 1000 lines of server code+comments (lib/server.coffee), and 1500 lines of testing code (test/server.coffee).
And once you've got all that working, my implementation really wanted server affinity. Which made load balancing & failover across over multiple application servers a huge headache.
It sounds like your application allows you to simplify some details of this network protocol code. You do you. I just use websockets & server-sent events. Let TCP/IP handle all the details of in-order message delivery. Its really quite good.
Otoh, end-user projects usually know things and can make simplifying decisions. These two are incomparable. I respect the effort, but I also think that this level of complexity is a wrong answer to the call in general. You have to price-break requirements because they tend to oversell themselves and rarely feature-intersect as much as this library implies. Iow, when a client asks for guarantees, statuses or something we just tell them to fetch from a suitable number of seconds ago and see themselves. Everyone works like this, you need some extra - track it yourself based on your own metrics and our rate limits.
I have only tried it briefly when we use gRPC: https://grpc.io/docs/what-is-grpc/core-concepts/#server-stre...
Here it's easy to specify that a endpoint is a "stream", and then the code-generation tool gives all tools really to just keep serving the client with multiple responses. It looks deceptively simple. We already have setup auth, logging and metrics for gRPC, so I hope it just works off of that maybe with minor adjustments. But I'm guessing you don't need the gRPC layer to use HTTP/2 Multiplexing?
HTTP/2 does specify a server push mechanism (PUSH_PROMISE), but afaik, browsers don't accept them and even if they did, (again afaik) there's no mechanism for a page to listen for them.
But if you control the client and the server, you could use it.
Outside of gRPC, just HTTP POST cannot at this time replace websockets because the in-browser `fetch` API doesn't support streaming request body. For now, websockets is the only thing that can natively provide an ordered stream of messages from browser to server.
I guess at least this trick is still meaningful where HTTP/2 or QUIC aren't in use
It is also fire and forget with fall over to http if web sockets aren't available. I believe if web sockets don't work it can fall over to http long polling instead, but don't quote me on that.
All the downsides of web sockets mentioned in the article are handled for you. Plus you can re-use your existing auth solution. Easily plug logging stuff in, etc. etc. Literally all the problems mentioned by the author are dealt with.
Given the author mentions C# is part of their stack I don't know why they didn't mention signalr or use that instead of rolling their own solution.
Edit: Whoops, I lost context here. Phoenix LiveView as a whole is probably pretty analogous to Blazor.
Not saying this is why but SignalR is notoriously buggy. I've never seen a real production instance that didn't have issues. I say that as someone who probably did one of the first real world, large scale roll outs of SignalR about a decade ago.
I am using Blazor Server, and for some reason a server is not allowing Web Socket connections(troubleshooting this) and the app switches to long polling as fallback.
In terms of real time updates, Phoenix typically relies on LiveView, which uses web sockets and falls back to long-polling if necessary. I think SignalR is the closest equivalent in the .Net world.
I was on the platform/devops/man w/ many hats team for an elixir shop running Phoenix in k8s. WS get complicated even in elxir when you have 2+ app instances behind a round robin load balancer. You now need to share broadcasts between app servers. Here's a situation you have to solve for w/ any app at scale regardless of language
app server #1 needs to send a publish/broadcast message out to a user, but the user who needs that message isn't connected to app server #1 that generated the message, that user is currently connected to app server #2.
How do you get a message from one app server to the other one which has the user's ws connection?
A bad option is sticky connections. User #1 always connects to server #1. Server #1 only does work for users connected to it directly. Why is this bad? Hot spots. Overloaded servers. Underutilized servers. Scaling complications. Forecasting problems. Goes against the whole concept of horizontal scaling and load balancing. It doesn't handle side-effect messages, ie user #1000 takes some action which needs to broadcast a message to user #1 which is connected to who knows where.
The better option: You need to broadcast to a shared broker. Something all app servers share a connection to so they can themselves subscribe to messages they should handle, and then pass it to the user's ws connection. This is a message broker. postgres can be that broker, just look at oban for real world proof. Throw in pg's listen/notify and you're off to the races. But that's heavy from a resources per db conn perspective so lets avoid the acid db for this then. Ok. Redis is a good option, or since this is elixir land, use the built in distributed erlang stuff. But, we're not running raw elixir releases on linux, we're running inside of containers, on top of k8s. The whole distributed erlang concept goes to shit once the erlang procs are isolated from each other and not in their perfect Goldilocks getting started readme world. So ok, in containers in k8s, so each app server needs to know about all the other app servers running, so how do you do that? Hmm, service discovery! Ok, well, k8s has service discovery already, so how do I tell the erlang vm about the other nodes that I got from k8s etcd? Ah, a hex package cool. lib_cluster to the rescue https://github.com/bitwalker/libcluster
So we'll now tie the boot process of our entire app to fetching the other app server pod ips from k8s service discovery, then get a ring of distributed erlang nodes talking to each other, sharing message passing between them, this way no matter which server the lb routes the user to, a broadcast from any one of them will be seen by all of them, and the one who holds the ws connection will then forward it down the ws to the user.
So now there's a non trivial amount of complexity and risk that was added here. More to reason about when debugging. More to consider when working on features. More to understand when scaling, deploying, etc. More things to potentially take the service down or cause it not to boot. More things to have race conditions, etc.
Nothing is ever so easy you don't have to think about it.
Elixir gives more options and lets you do it natively.
Also, there are simpler options for clustering out there like https://github.com/phoenixframework/dns_cluster (Disclaimer: I am a contributor)
Anyway I agree that once you go with more than one server it's a whole new world but not sure if it's easier in any other language.
(id, cluster_id) sounds like it could / should be the PK
If the jobs are cleared once they’ve succeeded, and presumably retried if they’ve failed or stalled, then the table should be quite small; so small, that a. The query planner is unlikely to use the partial index on (status) b. The bloat from the rapidity of DELETEs likely overshadows the live tuple size.
i suspect other firewalls, cdns, or reverse proxy products will all do something similar. for me, this is one of the biggest benefits of websockets over long-polling: it's a standard way to communicate to proxies and firewalls "this connection is supposed to stay open, don't close it on me"
What's the most resource efficient way to push data to clients over HTTP?
I can send data to a server via HTTP request, I just need a way to notify a client about a change and would like to avoid polling for it.
I heard talk about SSE, WebSockets, and now long-polling.
Is there something else?
What requires the least resources on the server?
If you want to reduce server load then you'd have to sacrifice responsiveness, e.g. you perform short polls at certain intervals, say 10s.
What's the least complex to implement then?
For other clients, such as mobile apps, I think long poll would be the simplest.
Don't websockets look like ordinary https connections?
Connection: Upgrade
Upgrade: websocket
A proxy may have a different TLS handshake than a real browser would, depending on how good the MITM is, but the better they are, the more likely it is that websockets work.
For the other direction, to support long-polling clients if your existing architecture is websockets which get data pushed to them by other parts of the system, just have two layers of servers: one which maintains the "state" of the connection, and then the HTTP server which receives the long polling request can connect to the server that has the connection state and wait for data that way.
Personally I would enjoyed solving that problem instead of hacking around it but that’s me.
Having done this, I don't think I'd reduce it to "just a little bit of work" to make it hum in production.
Everything in between your UI components and the database layer needs to be reworked to work in the connection-oriented (Websockets) model of the world vs request-oriented world.
Only that knows url for endpoints, protocols and connections - and proxies between them and your app / components
How so? As a minimal change, the thing on the server end of the websocket could just do the polling of your database on its own while the connection is open (using authorization credentials supplied as the websocket is being opened). If the connection dies, stop polling. This has the nice property that you're in full control of the refresh rate, can implement coordinated backoffs if the database is overloaded, etc.
- change how you hydrate the initial state in the web component. - rework any request-oriented configurations you do at the edge based on the payloads. (For example, if you use cloudflare and use their HTTP rules, you have to rework that)
The section at the end talking about "A Case for Websockets" really only rehashes the arguments made in "Hidden Benefits of Long-Polling" stating that you need to reimplement these various mechanisms (or just use a library for it).
My experience in this space is from 2011, when websockets were just coming onto the scene. Tooling / libraries were much more nascent, websockets had much lower penetration (we still had to support IE6 in those days!), and the API was far less stable prior to IETF standardization. But we still wanted to use them when possible, since they provided much better user experience (lower latency, etc) and lower server load.
> Observability Remains Unchanged
Actually it doesn't, many standard interesting metrics will break because long-polling is not a standard request either.
> Authentication Simplicity
Sure, auth is different than with http, but not more difficult. You can easily pass a token.
> Infrastructure Compatibility
I'm sure you can find firewalls out there where websockets are blocked, however for my use case I have never seen this reported. I think this is outdated, for sure you don't need "special proxy configurations or complex infrastructure setups".
> Operational Simplicity
Restarts will drop any persistent connection, state can be both or neither in WS or in LP, it doesn't matter what you use.
> Client implementation
It mentions "no special WebSocket libraries needed" and also "It works with any HTTP client". Guess what, websockets will work with any websocket client! Who knew!
Finally, in the conclusion:
> For us, staying close to the metal with a simple HTTP long polling implementation was the right choice
Calling simple HTTP long polling "close to the metal" in comparison to websockets is weird. I wouldn't be surprised if websockets scale much better and give much more control depending on the type of data, but that's besides the point. If you want to use long polling because you prefer it, go ahead. Its a great way to stick to request/response style semantics that web devs are familiar with. Its not necessary to regurgitate a bunch of random hearsay arguments that may influence people in the wrong way.
Try to actually leave the reader with some notion of when to use long polling vs when to use websockets, not a post-hoc justification of your decision based on generalized arguments that do not apply.
> Actually it doesn't, many standard interesting metrics will break because long-polling is not a standard request either.
As a person who works in a large company handling millions of websockets, I fundamentally disagree with discounting the observability challenges. WebSockets completely transform your observability stack - they require different logging patterns, new debugging approaches, different connection tracking, and change how you monitor system health at scale. Observability is far more than metrics, and handwaving away these architectural differences doesn't make the implementation easier.
I am doing neither of these things. I am only saying you will have observability problems whether you do LP or WS, because you are stepping away from the request/response model that most tools work with. As such, its weird to argue that "observability remains unchanged".
No polling needed, regardless of the frontend channel.
In terms of not getting fired - Postgres is a lot more innovative than most databases, and the insinuation of IBM.
By innovative I mean uniquely putting in performance related items for the last 10-20 years.
await new Promise(resolve => setTimeout(resolve, 500));
In Node.js context, it's easier to: import { setTimeout } from "node:timers/promises";
await setTimeout(500);
When I said 'in the context of Node.js' I meant if you are in a JS module where you already import other node: modules, ie. when it's clear that code runs in a Node.js runtime and not in a browser. Of course when you are writing code that's supposed to be portable, don't use it. Or don't use setTimeout at all because it's not guaranteed to be available in all runtimes - it's not part of the ECMA-262 language specification after all.
I just don't see the point. It doesn't work in the browser and it shadows global.setTimeout which is confusing. Meanwhile the idiom works everywhere.
import { setTimeout as loiter } from "node:timers/promises";
await loiter(500);
To me it's kinda like adding a shallowClone(old) helper instead of writing const obj = { ...old }.
But no point in arguing about it forever.
Detecting changes in the backend and propagating them to the right client is still an unsolved problem. Until then, long polling is surprisingly simple and a robust solution that works.
I'm considering using SSE for an app. I'm curious, what problems you've run into? At least the docs say you get 100 connections between the server and a client, but it can be negotiated higher if needed it seems?
https://developer.mozilla.org/en-US/docs/Web/API/EventSource
Second Life has an HTTPS long polling channel between client and server. It's used for some data that's too bulky for the UDP connection, not too time sensitive, or needs encryption. This has caused much grief.
On the client side, the poller uses libcurl. Libcurl has timeouts. If the server has nothing to send for a while, libcurl times out. The client then makes the request again. This results in a race condition if the server wants to send something between timeout and next request. Messages get lost.
On top of that, the real server is front-ended by an Apache server. This just passes through relevant requests, blocking the endless flood of junk HTTP requests from scrapers, attacks, and search engines. Apache has a timeout, and may close a connection that's in a long poll and not doing anything.
Additional trouble can come from middle boxes and proxy servers that don't like long polling.
There are a lot of things out there that just don't like holding an HTTP connection open. Years ago, a connection idle for a minute was fine. Today, hold a connection open for ten seconds without sending any data and something is likely to disconnect it.
The end result is an unreliable message channel. It has to have sequence numbers to detect duplicates, and can lose messages. For a long time, nobody had discovered that, and there were intermittent failures that were not understood.
In the original article, the chart section labelled "loop" doesn't mention timeout handling. That's not good. If you do long polling, you probably need to send something every few seconds to keep the connection alive. Not clear what a safe number is.
100 Continue could be usable as a workaround. Would probably require at a bare minimum some extra integration code on the client side.
Every timeout in every hop of the chain is within your control to configure. Setup a subdomain and send long polling requests through that so the timeouts can be set higher and not impact regular http requests or open yourself up to slow client ddos.
Why would you try to do long polling and not configure your request chain to be able to handle them without killing idle connections? The problems you have only exist because you're allowing them to exist. Set your idle timeouts higher. Send keepalives more often. Tell your web servers to not do request buffering, etc.
All of that is extremely easy to test and verify functioanlity. Does the request live longer than your polling interval? Yes? Great you're done! No? Tune some more timeouts and log the request chain everywhere you can until you know where the problems lie. Knock them out one by one going back to the origin until you get what you want.
Long polling is easy to get right from an operations perspective.
Thank you for pointing that out. This thread alone is bound to become a meme.
The underlying problems are those of legacy software. People long gone from Linden Lab wrote this part of Second Life. The networking system predates the widespread use of middle boxes. (It also predates the invention of conflict-free replicated data types, which would have helped some higher level consistency problems.) The internal diagnostic tools are not very helpful. The problem manifests itself as errors in the presentation of a virtual world, far from the network layer. What looked like trouble at the higher levels turned to be, at least partially, trouble at the network layer.
The developer who has to fix this wrote "This made me look into this part of the protocol design and I wish I hadn't."
More than you ever wanted to know about this: [1] That discussion involves developers of four different clients, some of which talk to two different servers.
(All this, by the way, is part of why there are very few big, seamless, high-detail metaverses. Despite all the money spent during the brief metaverse boom, nobody actually shipped a good one. There are some hard technical problems seen nowhere else. Somehow I seem to end up in areas like that.)
[1] https://community.secondlife.com/forums/topic/503010-obscure...
lol
I don't try to run red lights because I don't have control over the lights on the road.
I mean, in both cases, it's a TCP connection over (eg) port 443 that's being kept open, right? Intermediaries can't snoop the data if its SSL, so all they know is "has some data been sent recently?" Why would they kill long-polling sessions after 10sec and not web socket ones?
Why such a short timeout? Attackers can open connections and silently drop them to tie up resources. This is why we can't have nice things.
[1] https://httpd.apache.org/docs/2.4/mod/core.html#keepalivetim...
Anyways, the real takeaway is even if your current solution works now, one day someone will put something stupid between your server and the clients that will invalidate all current assumptions.
For example I have created some service which consumes a very large NDJSON file over an HTTPS connection, which I expect to be open for half an hour at least, so I can process the content as a stream.
I dread the day when I have to fight with someone’s IT to keep this possible.
How does that help? You can't pop from a queue over HTTP because when the client disconnects you don't know whether it saw your response or not.
However, queued messages don't have to be kept for a very long time, usually. Because every connection method suffers from this problem, you wouldn't usually architect a system with no resync or reset strategy in the client when reconnection takes so long that it isn't useful to stream every individual message since the last connection.
The client and/or server have a resync timeout, and the server's queue is limited to that timeout, plus margin for various delays.
Once there's a resync strategy implemented, it is often reasonable for rhe server to be able to force a resync early, so it can flush messages queues according to other criteria than a strict timeout. For example memory pressure or server restarts.
To be clear, there's no real difference - in both cases you have to keep messages in some queue and potentially resend them until they've been acknowledged.
I think it's a premise of reliable long-polling that the server can hold on to messages across the changeover from one client request to the next.
Personal Case Study: I built mobile apps which used Flowise assistants for RAG and found websockets compeletely out of line with the rest of my system and interactions. Suddenly I was fitting a round peg in a square hole. I switched to OpenAI assistants and their polling system felt completely "natural" to integrate.
Does everybody poll their PosgreSQL to get new rows in real-time? This is really weird, there are trigger functions and notifications.
There's also `max_notify_queue_pages`
>Specifies the maximum amount of allocated pages for NOTIFY / LISTEN queue. The default value is 1048576.