Edit: oh it’s only in one AZ
https://cloud.google.com/storage/docs/composite-objects#appe...
[0] https://chrlschn.dev/blog/2024/07/merging-objects-in-google-...
> S3 Express One Zone delivers data access speed up to 10x faster and request costs up to 50% lower than S3 Standard [0]
The critical difference seems to be in availability (1 AZ)
[0] https://aws.amazon.com/s3/storage-classes/express-one-zone/
The house always wins https://www.vantage.sh/blog/amazon-s3-express-one-zone
Egress and storage however are more expensive on express one than any other tier. For comparison, glacier (instant), standard and express are $0.004, $0.023 and $0.16 per GB. Although slight, standard tier also receives additional discounts above 50 TB.
Key points:
- It's just for the "S3 Express One Zone" bucket class, which is more expensive (16c/GB/month compared to 2.3c for S3 standard tier) and less highly available, since it lives in just one availability zone
- "With each successful append operation, you create a part of the object and each object can have up to 10,000 parts. This means you can append data to an object up to 10,000 times."
That 10,000 parts limit means this isn't quite the solution for writing log files directly to S3.
Azure supports 50,000 parts, zone-redundancy, and append blobs are supported in the normal "Hot" tier, which is their low-budget mechanical drive storage.
Note that both 10K and 50K parts means that you can use a single blob to store a day's worth of logs and flush every minute (1,440 parts). Conversely, hourly blobs can support flushing every second (3,600 parts). Neither support daily blobs with per-second flushing for a whole day (86,400 parts).
Typical designs involve a per-server log, per hour. So the blob path looks like:
"{account}/{path}/{year}/{month}/{day}/{hour}_{servername}.txt"
This seems insane, but it's not a file system! You don't need to create directories, and you're not supposed to read these using VIM, Notepad, or whatever.The typical workflow is to run a daily consolidation into an indexed columnstore format like Parquet, or send it off to Splunk, Log Analytics, or whatever...
(Conversely, Azure's low-level performance is woeful in comparison to AWS and they're still slow-walking the rollout of their vaguely equivalent networking and storage called Azure Boost.)
Resource Groups that actually act like folders, not just as special tags.
Resources with human-readable names instead of gibberish identifiers.
Cross-region and cross-subscription (equiv. to AWS account) views of all resources as the default, not as a special feature.
Single pane-of-glass across all products instead of separate URLs and consoles for each thing. E.g.: a VM writing to an S3 bucket dedicated to it are "far apart" from each other in AWS consoles, but the equivalent resources are directly adjacent to each other in an Azure Resource Group when viewed in its Portal.
Azure Application Insights is a genuinely good APM, and the Log Analytics workspace it uses under the hood is the consistent log collection platform across everything in Azure and even Entra ID and parts of Microsoft 365. It's not as featureful as Splunk, but the query language is up there in capability.
Azure App Service has its flaws, but it's by far the most featureful serverless web app hosting platform.
Etc...
Microsoft had the benefit of starting later and learning from Amazon's failures and successes. S3 dates from 2006.
That being said, both Microsoft and Google learned a lot, but also failed at learning different things.
GCP has a lovely global network, which makes multi-region easy. But they spent way too much time on GCE and lost the early advantage they had with Google App Engine.
Azure severely lacks in security (check out how many critical cross-tenant security vulnerabilities they've had in the past few years) and reliability (how many times have there been various outages due to a single DC in Texas failing; availability zones still aren't the default there).
Does anybody know if appending still has that 5TB file limit ?
I have been using azure storage append blob to store logs of long running tasks with periodic flush (see https://learn.microsoft.com/en-us/rest/api/storageservices/u...)
To compare the other way, Azure write blocks target replication blob containers. I consider that a primitive and yet they just outright say you can’t do it. When I engaged our TPM on this we were just told our expectations were wrong and we were thinking about the problem wrong.
> Azure write blocks target replication blob containers
I am sorry but what does it mean?
The goal of my question was about what are the differences between the two solutions: I know HN is a place where I can read technical arguments based on actual experience.
Most of them cheaper, some MUCH cheaper.
S3 is often used as a lowest common denominator, and a lot of the features of azure and gcs aren’t leveraged by libraries and formats that try to be cross platform so only want to expose features that are available everywhere.
If these days all object stores do append then perhaps all the data storage formats and libs can start leveraging it?
S3 has stagnated for a long time, allowing it to become a standard.
Third parties have cloned the storage service and a vast array of software is compatible. There’s drivers, there’s file transfer programs and utilities.
What does it mean that Amazon is now changing it.
Does Amazon even really own the standard any more, does it have the right to break the long standing standards?
I’m reminded of IBM when they broke compatibility of the PS/2 computers just so it could maintain dominance.