Safe relational database queries using the Rust type system

270 points by lukastyrychtr 5 days ago | 172 comments

echelon 4 days ago |
Good luck with this, Lucas! We need all of the innovation in the Rust DB space we can get.
I can't use yours in production yet as it only supports SQLite, but I'll keep my eyes peeled for updates.
We're using Sqlx and Diesel in production currently.
Sqlx is great because it's not an ORM and lets you write "type checked" raw SQL. It frustrates us in the fact that you can't write dynamic queries and have them be type checked, though. No `IN` clauses or conditional predicates, only static queries with simple bindings get type checked.
Love to see more ORMs like yours! Diesel isn't our favorite.
Simpliplant 4 days ago |
After being frustrated with Diesel, I gave a try to SeaORM and I couldn’t be happier (have been using it for more than a year). It is not perfect but certainly the best ORM I have ever used myself (vs Diesel, ActiveRecord, SQLAlchemy)
kstrauser 4 days ago |
How come? I’m new to the space and picked Diesel a couple weeks ago for a new project because it seemed the most recommended. What do you prefer about SeaORM?
Simpliplant 4 days ago |
To me, SeaORM feels like a well-designed library while many things in Diesel feel like an afterthought or a hack. Which mostly manifests itself when doing something complex and debugging it.
Also recommend reading https://www.sea-ql.org/SeaORM/docs/internal-design/diesel/
kstrauser 4 days ago |
I’ll check it out. Thanks!
echelon 4 days ago |
sea-query looks a lot like jOOQ!
weiznich 4 days ago |
Please note that this comparison is outdated since at least 2 years, given that diesel-async exists for more than 2 years now and this page completely forgets to mention it.
satvikpendem 4 days ago |
SeaORM is not compile time safe by their own admission so it's automatically out of the running, personally speaking.
kelnos 4 days ago |
That feels like a weird requirement to me; an ORM is by its very nature not going to be able to validate queries at compile-time.
And even if you're writing queries directly, any time you need to write something dynamic, you're not going to be able to verify those at compile-time either.
satvikpendem 4 days ago |
> an ORM is by its very nature not going to be able to validate queries at compile-time.
I don't see why this is by its "very nature" when libraries like diesel or Prisma already do this.
> any time you need to write something dynamic, you're not going to be able to verify those at compile-time either.
Diesel can validate dynamic queries at compile time too, sqlx too to some extent.
written-beyond 4 days ago |
My first Rust PR was for a Diesel Plugin crate, their macros weren't updated with Rusts newer spec.
I honestly really enjoyed Diesel but it's extremely slow migration to async really made it lose traction.
I've used SQLx extensively for projects after that and it was pretty decent.
weiznich 4 days ago |
Please note that up until today an async rust database library does not give you any measurable performance advantage compared to a sync database library. In addition to that it won't even matter for most applications as they don't reach the required scale to even hit that bottleneck. As a matter of facts crates.io just run well on sync diesel up until maybe a month ago. They have now switched to diesel-async, for certain specific not-performance related reasons. The lead developer there told me specifically that from a performance point of view the service would have been fine with sync diesel for quite a while without problems, and that's with somewhat exponential growth of requests.
Other than that the async rust ecosystem is still in a place that makes it literally impossible to provide strong guarantees around handling transactions and make sure that they ended, which is a main reason why diesel-async is not considered stable from my side yet. This problem exists in all other async rust database libraries as well, as it's an language level problem. They just do not document this correctly.
written-beyond 3 days ago |
WEIZNICH! I am your biggest fan. Thank you so much for the amazing crates you've made and for Diesel.
When I started my rust journey Diesel and Rocket were the first few crates I worked with and it really opened up my mind and I haven't turned back since.
As for async, my primary concern for that was to make sure you minimise blocking code calls in your futures. Even if there weren't many performance gains to be made from Diesel itself being async but having database calls marked as Futures could theoretically help the runtime schedule and manage other threads.
I'd like to thank you for you amazing work once more!
weiznich 3 days ago |
> As for async, my primary concern for that was to make sure you minimise blocking code calls in your futures. Even if there weren't many performance gains to be made from Diesel itself being async but having database calls marked as Futures could theoretically help the runtime schedule and manage other threads.
That doesn't require an async database library at all. It merely requires some abstraction like [`deadpool-diesel`](https://docs.rs/deadpool-diesel/latest/deadpool_diesel/index...) that ensures that the caller always uses `tokio::spawn_blocking` or similar to run database queries. And yes this scales rather well as the number of database connections in your application is rather low (usually a few ten) and therefore much smaller than the number of threads in the tokio blocking thread-pool.
k972 4 days ago |
Depending on what database you're using, you might be able to restructure your dynamic queries to fit a static form...
WHERE ... AND CASE WHEN ? IS NULL THEN TRUE ELSE col ILIKE ? || '%' END AND CASE WHEN ? IS NULL THEN TRUE ELSE col2 IN (?) END
Each "dynamic" parameter is then bound twice. (My personal preference is to use a CTE to bind and name my parameter values in the query instead of binding multiple times whenever possible).
kelnos 4 days ago |
That only works when the dynamic portion is in the WHERE clause, though. I often need to dynamically generate UPDATEs and even sometimes INSERTs.
weiznich 2 days ago |
It doesn't even work for all dynamic where clauses, but only for such that can lead to a fixed number of combinations. This approach doesn't work for dynamic API's that let users combination column + filters as required, which is rather common in complex web applications. (Think of the GitHub issue search box for example)
typicalset 4 days ago |
This looks interesting as an approach to relating schema to data types. One aspect that feels very counter-intuitive/unidiomatic is that, if I understand correctly, in your example there is no Schema enum. Usually this sort of macro does not erase the type defined, and I would find it extremely confusing that I cannot reference, say, Schema::User{..}. It would be clearer to me if this were defined inside of a macro_rules macro, more like lazy_static e.g.
``` schema!{ User { name: String, }, Story { author: User, title: String, content: String, }, } ```
pwdisswordfishz 4 days ago |
I was just about to ask WTF is a Schema-typed value supposed to represent.
davidatbu 4 days ago |
I am really happy to see some more exploration in the typesafe-db-access-in-Rust space.
> The existing libraries don't provide the compile time guarantees that I want and are verbose or awkward like SQL.
Worth noting: diesel definitely fulfills the "providing compile time guarantees" criteria.
Here's where I stand on the inevitable ORM-vs-no-ORM debate that's about to erupt: I like typesafe query builders that don't abstract over sql ( I'd put diesel in this category, and i would not put activerecord or djangos orm or sealORM in this category).
It looks like rust-query will lean towards the full-ORM side of that spectrum. Not my cup of tea, but the tea industry can accommodate many flavors :)
pdimitar 4 days ago |
> Worth noting: diesel definitely fulfills the "providing compile time guarantees" criteria.
Also sqlx.
I liked the article and I bookmarked the library but this statement from the author confused me a bit. At least two libraries give you pretty good compile-time guarantees. sqlx even gives you the option to specify a DB where it can check whether your queries will succeed at runtime.
weiznich 4 days ago |
> Also sqlx.
The guarantees provides by sqlx are less strong than what's provided by diesel due to the fact that sqlx needs to know the complete query statically at compile time. This excludes dynamic constructs like `IN` expressions or dynamic where clauses from the set of checked queries. Diesel also verifies that these queries are correct at compile time.
Kinrany 4 days ago |
Agree. SQL libraries should aim to make any valid query possible to express (though semantic equivalents are fine) while integrating query building into the rest of the language as seamlessly as possible.
Diggsey 4 days ago |
SQL, while not great, does have some advantages that are hard to beat:
- Everyone knows at least some basic SQL, even non-technical users have often encountered it in some form.
- The documentation for eg. PostgreSQL is for SQL, so if you write queries in anything else you have to mentally translate back and forth, and so you need to know SQL anyway.
- Any external tools you use to interact with the database will use SQL.
- Changing the queries doesn't require an expensive compilation step - `sqlx` gets the best of both worlds in this respect, in that it is able to type-check the parameters and rely on the database itself to validate the query, so you don't end up with a ton of type-system shenanigans that increase compilation times.
Maybe for a brand new database, a better query language could win out, but having used sqlx I can't imagine going back to a query-builder style interface for existing SQL databases.
jeltz 4 days ago |
Yeah, I am using sqlx and it is amazing. The only thing it cannot do is help you in the case when you actually need a query builder, e.g. when you need support for queries with dynamic filters and sort orders. But other than that it is much nicer to work with than every other query library I have used.
Diggsey 4 days ago |
FWIW, you very rarely need to make the query itself dynamic. After all, SQL alone is powerful enough to do conditionals.
For example:
SELECT * FROM example WHERE ($1 IS NULL OR field1 = $1) AND ($2 IS NULL OR field2 = $2) ...
When you supply the parameters to this prepared statement, the constant conditions should be optimized away by the query planner if you're using a decent database.
ris 4 days ago |
Those tricks cover a very small subset of dynamic queries, and if you don't give your developers a proper tool to do dynamic queries in, your code will start growing these funny string-formatting-based query-mangling functions that have god-knows-what escaping problems behind them.
Diggsey 4 days ago |
That has not been my experience. SQL is a fully fledged programming language so there's very little that you can't express this way.
What are you doing that requires dynamically building a query?
kelnos 4 days ago |
> That has not been my experience
Well, it has been mine. That trick you suggested works for the simple binary "either include this WHERE clause filter or don't" -- which, agreed, is something that comes up often enough -- but it's only in the simplest of CRUD apps where that's all I've needed.
A quick example: I have a REST endpoint that updates a resource. The JSON fields that map to the DB columns can be missing (leave the column value unchanged), null (set the column value to NULL), or have a value (set the new provided value). The list of assignments in the UPDATE needs to be dynamic; sometimes there might be only one column being updated, sometimes two, three, four five...
jon_richards 4 days ago |
Isn’t the standard just to set the unchanged column to itself?
Diggsey 2 days ago |
In this case you just set the column value to the existing value...
LinXitoW 4 days ago |
It's a fully fledged, horrible language. Anything beyond basic queries is unreadable, ESPECIALLY when it's done in plain strings in another language. There's not even a way to not have to repeat values EVERY SINGLE TIME (think variables or constants in every other language).
Oh, but what about <feature>? Well, is that SQL, or a frankensteined version of SQL, aka a "dialect"?
SQL is the JavaScript of databases, and we'll be better for it once we admit this.
weiznich 2 days ago |
As this claim comes up quite often I have a set of examples I typically ask these people to provide. So far I've got so response, maybe you can point out how you would implement one or more of the following queries using `sqlx::query!`?
* A batch insert query for mysql
* A conditional where clause for postgresql that allows the user to dynamically specify the column + filter operation + values over e.g. a rest end point. Each column can appear zero, one or multiple times with different or even the operations.
* A IN expression with a list of values provided by a rust `Vec<_>` (so dynamically sized) for sqlite.
63stack 4 days ago |
Can you show an example where you have 4 possible fields to filter on, and then one more where the sorting field is dynamic? (Not the value, but the column)
pimeys 4 days ago |
You can add sea-orm on top of sqlx for a query builder. It can also introspect your SQL database and generate entity types.
I built my own back in the days when I worked for Prisma. None of these crates existed. It's not that complex if you limit the builder to queries you need in your application.
But, to be honest, jOOQ or LINQ would be awesome to get for Rust.
unshavedyak 4 days ago |
I am toying with spreadsheet backends these days and it's had me wondering - do spreadsheet query languages offer value that SQL is missing? Beyond "people are used to it" of course.
One thing i am wanting beyond SQL though is a more Object oriented language. Ie i want something SQL-like (or more specifically PRQL, which i find very nice), but without the being so tailored towards a flat data structure. I want to use it on JSON/etc.
wslh 4 days ago |
BTW, do you suggest a LINQ kind of integration in Rust? I just found the Microsoft has a Rust for C#/.NET developers resource [1][2] which is more about integration I think. It is a little weird that there is not Rust.NET yet. .NET is not a garbage collection only platform. I guess Microsoft is working internally on this.
[1] https://microsoft.github.io/rust-for-dotnet-devs/latest/linq... [2] https://microsoft.github.io/rust-for-dotnet-devs/latest/
estebank 4 days ago |
I'm impressed by this individual's effort to bring rust to the clr: https://github.com/FractalFir/rustc_codegen_clr
It is already way more complete given it's age than I would have expected.
I agree that the clr is a more natural target than, let's say, the JVM (at least until Project Valhalla comes out).
kstrauser 4 days ago |
Thanks for releasing this! More ideas are great and welcome, and even I don’t use this directly, maybe the ORM I use will like some of the features and add them.
But I have to admit, this one bit almost made me stop reading:
> For those who don't know, SQL is the standard when it comes to interacting with databases.
I can scarcely imagine anyone who both 1. would want to use this and 2. doesn’t know what SQL is. So if I’m reading this, and you’re telling me what a database is, I assume you assume I’m an idiot. This makes me wonder if it has design choices to protect hypothetical Rust-using, DB-using idiots from themselves at the expense of making it hard to handle edge cases.
I’m sure you didn’t mean it that way, but that’s how it rubbed me. Maybe consider leaving out the extremely basic introductory bits.
davidatbu 4 days ago |
A more charitable take is that OP tried to make his blogpost more accessible to newbies at a very small verbosity cost for non-newbies.
Fwiw: i'm totally fine with that.
> This makes me wonder if it has design choices to ...
I personally felt that the relevant design choices were explicitly enumerated, and so one needs not to deduce them from the rest of the text.
kstrauser 4 days ago |
That’s fair, and I offered that mainly in the spirit of feedback. To me, and perhaps to no one else, but just to me, it was a little off putting. Like, “yeah, yeah, I know what SQL is. Do you think I don’t? Or did you just learn about it last month and you’re eager to share?”
I’m certainly not gonna lose sleep over it. Someone came on HN to show us their neat new project. I wanted to give them my first impression to help them craft their future messaging. If they disagree, fine with me! They don’t owe me anything.
mattrighetti 4 days ago |
Looks nice! I’ve been using SeaQuery for a while now but documentation is not that great if you’re looking to create some more advanced queries.
While I liked the idea of having strongly typed queries I’ve lately found out that sometimes they unnecessarily slow me down during the development process, I’m considering going back to the good old prep statements and binding values manually.
giovannibonetti 4 days ago |
You might be interested in SQLx, since you write the SQL but the library generates the tedious bindings for you.
mattrighetti 4 days ago |
That's my go-to! Maybe it was not super clear from my comment but I'm not using sea-query-orm but just the query builder, once the query is build you still have to feed it to sqlx (e.g [0]) :)
[0]: https://github.com/SeaQL/sea-query/blob/master/examples/sqlx...
ocschwar 4 days ago |
Thanks for this! I am looking forward to putting this together with the Zola CMS for some hobby applications.
ninetyninenine 4 days ago |
>My opinion is that SQL should be for computers to write. This would put it firmly in the same category as LLVM IR.
This is nuts.
SQL is a high level language. It's higher level than python or rust. It's basically a declarative statement that's almost english-like and it's specifically DESIGNED to be more readable and easier to use by humans. It compiles down into many procedures that don't easily map to the SQL statement itself. You can't get any higher level than SQL.
The issue here is that the database exists at the bottleneck of web dev. It's where state mutation happens and it's essentially usually the slowest part of the pipeline in a computer. Yet instead of having fine grained low level control over this part of the pipeline, we have a high level language on top of it. So if we want to optimize this part of the pipeline we have to HACK the query. We have to make the abstraction layer leaky in the API itself with EXPLAIN. We can't do direct optimizations because SQL is so high level. It is a bit of a problem but the industry is so entrenched in SQL that it's actually 10x more efficient to just use it then to develop an API that's more appropriate for this level. SQL is tech debt we are sort of stuck with. The ideal API would be one that is both high level but allows fine grained control... but we don't have one yet.
To use machine learning analogies. SQL is a local optima. There's a much more optimal language somewhere in this space but we are stuck in the local optima and it's likely we will never end up finding the actual optimal api.
In short SQL is the furthest thing from LLVM IR. It's crazy. You cannot treat it as the same thing. If you do there are huge problems.
The problem with rust query and the problem with ORMs in general is that the API for these libraries are in itself high level. They are HIGH level abstractions ON TOP of high level abstractions. You want to optimize a query now? Well you need to hack the first high level abstraction in such a way that it hacks the second high level abstraction such that it produces optimized compiled procedures. That's the problem here.
All this ORM stuff is just programmer OCD. We don't want to do meta programming where we have another language living as a string in our web app. We want everything fully integrated so we create an abstraction in attempt to get rid of an abstraction that was intended to be an abstraction in itself. It's aesthetics and the aesthetics actually makes life harder.
vlovich123 4 days ago |
> You can't get any higher level than SQL.
Can’t you? Wouldn’t higher level be a goal oriented natural language like “what’s the most popular music file in the database” and an AI agent just figured out how to give you the answer?
ninetyninenine 4 days ago |
Well. You know what I mean.
AI is also unreliable. Place chatGPT over a database and it's going to start lying about what is in it eventually.
That being said my statement likely isn't strictly true even when you account for ML. I'm sure there are even higher level languages that are not probability based and thus deterministic with expected results. But in terms of a software developers average experience, SQL is the highest level you can get without getting into the minefield that is LLMs.
threeseed 4 days ago |
> SQL is the highest level you can get without getting into the minefield that is LLMs
You can use NLPs to safely write SQL in more business language.
And of course there are GUIs.
kelnos 4 days ago |
> SQL is the highest level you can get without getting into the minefield that is LLMs
That's trivially false.
I'm currently playing with sea-orm[0], and I can write things like:
let db = Database::connect(...).await?; let yesterday: chrono::DateTime<chrono::Utc> = ...; let foos: Vec<foos::Model> = Foos::find() .filter(foos::Column::Status.eq(FooStatus::Frobbed)) .filter(foos::Column::UpdatedAt.gt(yesterday)) .left_join(Bars) .also_select(Bars) .all(&db) .await?;
How is that not higher-level than SQL?
[0] https://crates.io/crates/sea-orm
cwbriscoe 3 days ago |
Why not just write SQL, it would be much more readable.
cmrdporcupine 4 days ago |
> SQL is the highest level you can get without getting into the minefield that is LLMs.
I mean, not really true. Datalog is relational, like SQL, but gives you a higher level rule-oriented mechanism for querying that can avoid a lot of the mechanics of explicitly joining things, for example.
Yoric 4 days ago |
That's a good point.
It would be very nice to have a LLVM IR-style language for database queries, though. As a language, SQL is... well, a little behind times. ORMs explore lots of interesting design choices, but as you mention, by piling up high-level decisions on top of something that, by many aspects, are even higher decisions.
I wonder what an LLVM IR-like would look like in this space.
ninetyninenine 4 days ago |
I agree there should be a common api. The sheer number of dialects with SQL contributes to a lot of issues with ORMs.
But SQL is definitely not something designed for computers to only write or compile another language into (which is what the OP is promoting).
Yoric 4 days ago |
Yeah.
I wonder if something like GlueSQL or the IndexedDB layer of most browsers could be used as such an API.
josephg 4 days ago |
I hear you and I 70% agree.
I’ve also been using Prisma for a project recently and I’ve been finding it quite good. It layers several useful features on top of sql, like schema management (checked in to git), code gen (incl typescript types), db connection management, serialization / deserialization from JavaScript objects to sql rows and back, and it lets you follow foreign keys. If you want, you can also easily just make raw sql queries. That works fine too.
Sql is a very old programming language, and it’s showing its age. The design is wacky - does GROUP BY go before or after ORDER BY in the query? Every command has custom, seemingly random words which are actually parameters. They are placed at totally arbitrary, hard to remember spots in the command. And what, we make our program output a string that gets immediately parsed by the sql server? What a horrible, stupid way to do RPC. It also gives terrible error messages if you get it wrong. And it’s awful to learn and debug.
All of that is a huge pity, because modern databases can do so, so much. Doing logic inside the database can often be an order of magnitude faster than doing it in client code. SQL is a fantastic way to represent your data.
One path for fixing this would be to move the sql query parser inside the client library. Then internally, have the client library send optimized RPC instructions to the database. This would take cpu load off the database (usually a good move - databases are harder to scale). And it would open the door for the client library to provide other better, lighter and faster ways to programmatically construct database queries. Ideally without the totally unnecessary loop of constructing then parsing sql.
ninetyninenine 4 days ago |
>One path for fixing this would be to move the sql query parser inside the client library. Then internally, have the client library send optimized RPC instructions to the database. This would take cpu load off the database (usually a good move - databases are harder to scale). And it would open the door for the client library to provide other better, lighter and faster ways to programmatically construct database queries. Ideally without the totally unnecessary loop of constructing then parsing sql.
The problem here is these low level optimizations that the database is doing is freaking complicated. There's a lot of intelligence going on in the background and it's different depending on the database. We can solve it with a low level common IR but that IR will be extremely complex. And that IR will likely give different performance profiles for the same code on different databases.
I can see this being solved for one database. But not across all databases and a common api. We sort of do have a common api and we paid for it with the trade off of it being a easier high level language on top of the part of the web that needs to be most optimized.
bpicolo 4 days ago |
> can solve it with a low level common IR
Wouldn't that basically just be a query plan? The language is the easy part. Determining the optimal query plan based on rough statistics about the data and the DBs ability to go about that is the hard part.
There are more general purpose query optimization libs out there, like Calcite
josephg 2 days ago |
> I can see this being solved for one database. But not across all databases and a common api.
Database client libraries are already database-specific. Why boil the ocean? We could start with one database with this very easily until we figured out a good IR for queries.
It could even be database-version specific if you wanted, through the use of wasm. Make the database client library embed a wasm VM. Then on startup, the database sends any connected client application the latest query optimizer / whatever logic in a compiled wasm bundle. This would allow the database to make improvements to the IR with successive versions without needing to update all the client libraries out there.
avianlyric 3 days ago |
> One path for fixing this would be to move the sql query parser inside the client library. Then internally, have the client library send optimized RPC instructions to the database. This would take cpu load off the database (usually a good move - databases are harder to scale). And it would open the door for the client library to provide other better, lighter and faster ways to programmatically construct database queries. Ideally without the totally unnecessary loop of constructing then parsing sql.
Erm, that’s already true. Proper clients will be parsing SQL locally and sending representations of the query, and its parameters separately. They may often be using a wire-format that’s very close to textual SQL, but parsing that SQL is hardly the slow part of executing a query.
DB CPU load doesn’t come from parsing SQL, it comes from the query planning process, and the actual work of reading and filtering data. The planning process alone is extremely complex as the DB will be using all manner of statistical data collected about the type of data stored, and exactly how it’s stored, and how it various columns do or don’t correlate with each other, in order to estimate the best possible way of performing constructing the query plan. A process that factors in additional elements like the relative speeds of reading from disk, vs memory, and CPU cache, and how that varies with working set size.
In addition, the real CPU load comes from the actual execution of that query. Streaming data off disks into memory, building various temporary in-memory data structures to accelerate joins and filters, there’s a lot of bytes that need to be shuffled around, and modern DB codebase are now optimising for the number of CPU cycles needed to operate tight inner loops. None of this work can be moved to the client, not unless you have a mechanism of streaming GB/s of data to your client on every query.
To think of SQL as just an RPC protocol completely misses the point of SQL, or the incredible engineering involved in modern query planners. There a many reasons to replace SQL if something better, but the idea that it’s an inefficient RPC protocol, and that clients should have more fine-grained control of the planning and execution of queries, really isn’t one of them.
josephg 2 days ago |
> To think of SQL as just an RPC protocol completely misses the point of SQL, or the incredible engineering involved in modern query planners. There a many reasons to replace SQL if something better, but the idea that it’s an inefficient RPC protocol, and that clients should have more fine-grained control of the planning and execution of queries, really isn’t one of them.
But SQL is used as an RPC protocol & wire format for getting data in and out of the database. Basically all web apps built on top of a sql database craft SQL strings for SELECT and UPDATE commands. The queries are very repetitive - since they're crafted once and run with each web request. I'm sure there's a lot of unnecessary work related to doing that over and over again constantly - and if that work isn't parsing strings, its generating the query plan. The format is also really awkward to use. SQL doesn't play very nice with most modern languages - since you generally either write the SQL string by hand, and then the types are opaque to the language. Or you get an ORM to do it - and then you can't easily customize the SQL string itself.
Its complicated, but one approach to solving this would be to bundle up all that query planning code into a wasm blob and send it to database clients when they connect. The blob would contain any complex statistical code that the database uses. This would allow applications to create and save a query plan from a query string, and reuse it with subsequent queries. The wasm bundle would also contain all the binary serialization & deserialization code. Databases could then continue to evolve their query planners as they do today - but performance would be much better. And it would open the door to richer, lower level APIs for applications to customize & understand the query plans. And, I'm sure, figure out what types will be expected and returned from any given query.
avianlyric 2 days ago |
> This would allow applications to create and save a query plan from a query string, and reuse it with subsequent queries.
You can already do this. They’re called prepared queries. The application sends the query once, and asks the DB to create a prepared query, and every time the application wants to reuse it, it just needs to pass the query parameters. The DB reuses the already existing prepared query, and often the query plan is reused as well. But that depends on the exact DB implementation.
You also can’t just reuse the same query plan over and over again. The whole point of the query planning process is to optimise the query plan to current state of the data on disk. As that data is appended to and updated, the optimal query plan may change.
It’s also possible for applications to know exactly what types a query is going to return. That information is already returned by the database as part of the query response, it has to be for the client to correctly deserialise the returned data. Most DB uses a binary protocol for most of their clients (but also support a pure text based protocol), and deserialisation the binary response requires information about the response types.
I think you’re drastically underestimating just how complex and magical the inside of a real SQL database is. They represent literally decades of engineering and research into the best ways of storing and querying databases. If anything close to what you’re suggesting was a good idea, someone would have tried it.
josephg 2 days ago |
> I think you’re drastically underestimating just how complex and magical the inside of a real SQL database is. They represent literally decades of engineering and research into the best ways of storing and querying databases. If anything close to what you’re suggesting was a good idea, someone would have tried it.
I was with you up until the last sentence. I almost certainly am underestimating how magic the internals of databases are, and how much work has gone into them. But there is a mountain of good ideas in CS that simply haven't been tried because of convenience and momentum.
For example, SeL4's microkernel architecture is brilliant - but it'll probably never make it into Linux because of simple inertia. (Its too hard to rearchitect linux like that, and nobody really wants to port all their working software to L4.)
One reason that this approach to database design probably hasn't been tried before is that there's dozens of database client driver implementations. Moving any logic into the DB client would require all of that logic to be reimplemented a lot of times, and thats a lot of inconvenient work. But webassembly changes that calculus - since the database itself can hand a wasm blob to the client to run. Thats a very new capacity that was really hard to pull off a few years ago. (Foundationdb does something similar - the client library is a native library that needs bindings on every language / platform to work - but its incredibly inconvenient to use as a result. And a native apple silicon version of the library took years to come out, even though its maintained by Apple.)
avianlyric a day ago |
> One reason that this approach to database design probably hasn't been tried before is that there's dozens of database client driver implementations.
Or it could be that the added complexity simply doesn’t provide any benefits. What benefits would splitting up the query planner between the DB and its clients provide? Computing a query plan isn’t the expensive part of executing a query.
You’re suggesting that it’s a good idea for databases to expose their core internal APIs, which are tightly coupled to their underlying storage subsystems, to some kind of magic query planner that runs in the client. Basically ensuring that you can’t ever modify those APIs without breaking every existing client out there. Effectively you would force every DB to require their clients to be updated in perfect lock-step with the DB itself. All to achieve what? A saving of a few milliseconds on un-prepared queries?
Simply put, there are much more sensible ways to reducing the cost of query parsing and planning. As it happens every DB worth talking about already implements those optimisation in the form of prepared queries. What benefit would moving where the query planner is executed provide?
I would also highlight that if you go an look at DBs built from the ground up for massive scale and parallelisation, they still run their query planners on a single node to figure out how best to break up a query for distributed execution. Nobody is attempting to distribute the actual query planning part of a DB.
9rx 4 days ago |
> We don't want to do meta programming where we have another language living as a string in our web app.
Strings are fine in theory, but the trouble with SQL is that, while too high-level in all the wrong places, it is also too low-level in all the wrong places. Developers want to be able to do things like define the shape of the data beyond tables and compose queries, which SQL fails hard at. In order to accomplish what developers need, you either end up extending SQL into a new language, or you can use the programming constructs you already have, treating SQL as a compiler target, where necessary. The latter is considerably easier to implement.
jimbokun 4 days ago |
I think the sweet spot is a query builder that just ensures you emit syntactically valid SQL to send to the database. So there are no surprises about what the database is executing. But protects against typos and security issues from building SQL through string concatenation.
mplanchard 3 days ago |
sqlx (the rust library) does this, while also validating your sql statement, input types, and output types against your local db. For me it doesn’t get much better.
bob1029 4 days ago |
Every time I've found myself frustrated with a SQL query, there was some horrible misstep in how the underlying schema was designed.
Having the wisdom to craft a high quality schema can be the biggest unlock in the whole enterprise. If the tables, columns & relations are ~1:1 with the actual business (i.e., you went and talked to real, live stakeholders at design time), then it should be deeply intuitive for a non-technical domain expert to interact with.
Often, you don't get a chance to redo the whole thing, but that doesn't mean you can't still design an ideal schema and then shim it with views, CTEs, replicas, ETL, etc.
SQL is a domain specific language. It would be like complaining about how horrible it is to write Lua scripts. The criticism is baseless without understanding the underlying schema/bindings.
habitue 4 days ago |
I see your point for one part, which is that `select([id]).from(Users)` is maybe in the best case as readable as `select id from users`, and in many common cases it'll be less readable than sql. Plus, it's not quite sql, you'll have to look up how to do things a lot of times.
But we aren't just talking about SQL the high level language that you might execute when you are poking the data live. In a web app dynamically constructing sql, we have multiple issues:
1. Deserialization: we need to know what the query returns and plug it into data structures we want to use in our program
2. SQL is dynamically typed, and if you construct queries dynamically you're even more dynamically up a creek. Maybe even "sql injection vulnerability" up a creek.
1 and 2 are solved by strongly typed query builders like this: the type system has a proof that the ways you will construct queries in your program actually result in valid sql queries and that they return the right data shape.
It's actually solving a real problem
cryptonector 3 days ago |
(1) is premised on (2), which is not right. Some SQL RDBMSes are statically typed (e.g., PostgreSQL) while others are dynamically typed (e.g., SQLite3).
If you're building on SQLite3 then you're going to have to do run-time type checks anyways (even if you somehow disallow access to the DB that bypasses your application). So maybe don't build on SQLite3.
This is a shame. We really do need a SQLite3-like DB that is statically typed.
Hytak 3 days ago |
sqlite has `STRICT` tables that are statically typed
zamalek 4 days ago |
I call it "SQL fear", and I agree: people need to get over it. We used EFCore at my previous job, and I don't know how much time figuring out how poorly performing queries are happening. The abstraction leaks like a sieve and isn't worth it.
Now, if we could replace SQL with a purpose-built language, that I would be interested in. I have to concede that SQL doesn't have great ergonomics (e.g. if FROM was first then auto-complete would be better, trailing commas would eliminate much formatting bike shedding, why are INSERT and UPDATE so different).
threeseed 4 days ago |
> It's basically a declarative statement that's almost english-like and it's specifically DESIGNED to be more readable and easier to use by humans
It has categorically failed at this task. That is what we have an entire ecosystem of tools to allow people to query and interact with databases without using SQL.
Developers need to understand that SQL is an abstraction for developers, analysts, data engineers etc not end users.
And only basic SQL statements are English like.
Animats 4 days ago |
I tend to agree.
The main trouble with SQL is that the query parameters are mixed in with the query, which results in string escape problems. That dates from when people were expected to query a database by hand from SQL, which is sometimes useful.
For machine generated use, SQL queries should consist of a well defined fill-in-the-blanks system, with a text SQL query and a machine data structure. Something like this:
Query in SQL text as a constant string:
"SELECT foo FROM BAR WHERE name=NAME1"
Parameters: key/value pairs in a safe native format for the language in use.
{"NAME1" : "Smith"}
That gets you out of the curse of character escaping and SQL injection, the only real problem. It avoids trying to rewrite SQL in yet another ORM.
Microsoft has something which works this way, but it's unnecessarily clunky.[1]
[1] https://learn.microsoft.com/en-us/sql/relational-databases/n...
Animats 4 days ago |
This suggests a simple API using a Rust macro:
fill_in_sql!(query_string, query_params)
Where query_params is a structure containing the parameters:
struct some query { name: &str, id: u32, }
The macro would parse the query string, find the variables to be filled in, match them to fields in the "params" structure, and generate the code to safely escape and fill in the params. Any matching failures are found at compile time.
This is similar to what the usual "derive" macros do - examine a data structure at compile time and generate appropriate code.
kelnos 4 days ago |
I've used systems like this in Scala. It's actually pretty magical to write something like:
val foos = sql"SELECT * from foos WHERE status = 'frobbed'".as[List[Foo]]
... and have the macro system parse the query at compile time and verify that the query will indeed return a `List[Foo]`.
I think this is a nice middle ground, but constructing dynamic queries often can't be checked at compile-time.
(I'm probably getting the syntax somewhat wrong; I haven't done Scala in quite a few years at this point.)
kelnos 4 days ago |
I don't really think SQL injection bugs are the main driver. Certainly everyone should be using a query builder that protects them from this (and yes, I know, there are still people who screw this up), but I think the issues with SQL go far beyond that.
Personally I would much rather deal with a database table as a collection of objects with various fields in them. In a language with a reasonable collections library, I want to be operating on that data using functional combinators like filter, map, group_by, sort/order_by, etc. That feels much more natural to me than writing out stringly-typed SQL by hand. This is sorta an ORM, but not quite: it's somewhere in between writing raw queries, and using a full-fledged (usually inefficient) ORM.
The downside, of course, is that the performance of my queries is at the mercy of whatever the query generator underneath it is doing. But the nice thing is that if I need to optimize a query, I can always bypass the generator and write it by hand. But I wouldn't do that until I've measured things; no reason to prematurely optimize.
avianlyric 3 days ago |
> For machine generated use, SQL queries should consist of a well defined fill-in-the-blanks system, with a text SQL query and a machine data structure. Something like this:
Erm, that’s exactly how modern DB and DB clients work. The client passes the SQL query with placeholders for parameters, and then sends the parameters separately. It’s also a core part of how prepared SQL queries work.
Every SQL client in any worthwhile language implements this pattern of passing a SQL query and its parameters as separate data structures, and they have done for decades. SQL injection only happens when people don’t use their SQL client properly, and attempt to compose SQL queries using naïve string templating. Effectively doing an end run around all the protections DB and clients provide by default.
Seb-C 4 days ago |
> The ideal API would be one that is both high level but allows fine grained control... but we don't have one yet.
What kind of fine grained control are you talking about here? I never felt like I was missing anything when doing even complex SQL (unless I am forced to use a crappy MySQL, which problems are not inherent to SQL itself).
Quite the contrary actually: despite having this amazing high level of abstaction, most developers are ignorant or don't care to properly index the data, add integrity constraints, manage the locks or think about transactions.
In 99% of the non-SQL code that I see, developers don't even go further than nested loops containing IOs, so I don't think that letting them decide which tree/trie is appropriate for a user case (for example) would help in any way.
swaits 3 days ago |
> almost english-like
Real world SQL from the analytics, DE, BIE side of things is _extremely far_ from readable. It's a form of actual hell.
cryptonector 3 days ago |
> >My opinion is that SQL should be for computers to write. This would put it firmly in the same category as LLVM IR.
> This is nuts.
> SQL is a high level language. [...]
It's... not completely nuts. It's the reason that LinkQ exists, and that's not bad. The main reason this approach is good is that too many programmers accidentally create SQL injection vulnerabilities. Another way to prevent SQL injection vulnerabilities would be to disallow all literal values in SQL queries (making them all parameterized, though there could be compile-time and run-time parameters).
But for the SQL injection issue I think this would be... not quite nuts, but IMO unwise, because now I'd have N relational query languages instead of 1 (SQL). Sure, SQL means N variations due to differences among N RDBMSes, but mostly much smaller than the variations one would expect in different SQL-free query libraries.
One thing is clear: this sort of scheme does not make it any easier to reason about what the query engine does under the covers since expressing a query as an AST does not preclude the query engine performing algebra on your queries to optimize (or pessimize) them.
tyleo 4 days ago |
It looks like you just got this blog set up. Congratulations!
Aeolun 4 days ago |
It seems that migrations with individual row level manipulation would be tragically slow to execute to me? Like, I have a table with a billion rows, and any normal update statement takes up to an hour. I don’t want to think what an update per row would take.
Hytak 4 days ago |
Hi, migrations are 1 select statement + `n` insert statement for `n` rows right now.
This might be improved to insert in batches in the future without changing the API.
Aeolun 4 days ago |
I imagine a select on 1B rows will be equally problematic :)
But you are right, it’s not something you necessarily have to worry about now. It just means it doesn’t work for our use case.
Just figured I should point it out.
levkk 4 days ago |
My main concern with application-defined schemas is that this schema is validated by the wrong system. The database is the authority on what the schema is; all other layers in your application make assumptions based on effectively hearsay.
The closest we came so far to bridging this gap in strictly typed language like Rust is SQLx, which creates a struct based on the database types returned by a query. This is validated at compile time against a database, which is good, but of course there is no guarantee that the production database will have the same types. Easiest mistake to make is to design a query against your local Postgres v15 and hit a runtime error in production running Postgres v12, e.g. a function like gen_ramdom_uuid() doesn't exist. Another is to assume a migration in production was actually executed.
In duck-typed languages like Ruby, the application objects are directly created from the database at runtime. They are as accurate as possible, since the schema is directly read at application startup. Then of course you see developers do something like:
if respond_to?(:column_x) # do something with column_x end
To summarize, I think application-defined schemas provide a false sense of security and add another layer of work for the engineer.
IshKebab 4 days ago |
This doesn't seem fundamentally different from any schema/API mismatch issue. For example using the wrong header for a C library, or the wrong Protobuf schema.
I guess it would be good if it verified it at runtime somehow though. E.g. when you first connect to the database it checks Postgresql is the minimum required version, and the tables match what was used at compile time.
dietr1ch 4 days ago |
It could be verified at runtime, but I haven't seen anyone trying to version/hash schemas and include that in the request.
The workaround in practice seems to be to keep the DB behind a server that always™ uses a compatible schema and exposes an API that's either properly versioned or at least safe for slightly older clients. To be fair it's hard to get rid of the middleman and serve straight from the DB, it's always deemed too scary for many reasons, so it's not that bad.
Someone 4 days ago |
It shouldn’t be hard for a database to keep a hash around for each database, update it whenever a DDL (https://en.wikipedia.org/wiki/Data_definition_language) command is run, and, optionally, verify that a query is run against the exact database structure.
Could be as simple as secure hashing the old hash with the text of the DDL command appended.
That would mean two databases can be identical, structure-wise, but have different hashes (for example if tables are created in a different order), but would that matter in practice?
Alternatively, they can keep a hash for every table, index, constraint, etc. and XOR them to get the database hash.
jeltz 3 days ago |
Sounds very hard to me. How do you handle online schema changes with this? Schema changes are trivial to do if you can be down for a long time.
Someone 3 days ago |
You can only do those if schema changes are transactional. The transaction could update the checksum and return it for the caller to use in subsequent calls.
And yes, that would shut off other clients that want to be sure to talk to the correct schema, but if you opt in to such a scheme, that’s what you want.
Hytak 4 days ago |
rust-query manages migrations and reads the schema from the database to check that it matches what was defined in the application. If at any point the database schema doesn't match the expected schema, then rust-query will panic with an error message explaining the difference (currently this error is not very pretty).
Furthermore, at the start of every transaction, rust-query will check that the `schema_version` (sqlite pragma) did not change. (source: I am the author)
mjr00 4 days ago |
> rust-query manages migrations and reads the schema from the database to check that it matches what was defined in the application. If at any point the database schema doesn't match the expected schema, then rust-query will panic with an error message explaining the difference (currently this error is not very pretty).
IMO - this sounds like "tell me you've never operated a real production system before without telling me you've never operated a real production system before."
Shit happens in real life. Even if you have a great deployment pipeline, at some point, you'll need to add a missing index in production fast because a wave of users came in and revealed a shit query. Or your on-call DBA will need to modify a table over the weekend from i32 -> i64 because you ran out of primary key values, and you can't spend the time updating all your code. (in Rust this is dicier, of course, but with something like Python shouldn't cause issues in general.) Or you'll just need to run some operation out of band -- that is, not relying on a migration -- because it what makes sense. Great example is using something like pt-osc[0] to create a temporary table copy and add temporary triggers to an existing table in order to do a zero-downtime copy.
Or maybe you just need to drop and recreate an index because it got corrupted. Shit happens!
Anyway, I really wouldn't recommend a design that relies on your database always agreeing with your codebase 100% of the time. What you should strive for is your codebase being compatible with the database 100% of the time -- that means new columns get added with a default value (or NULL) so inserts work, you don't drop or rename columns or tables without a strict deprecation process (i.e. a rename is really add in db -> add writes to code -> backfill values in db -> remove from code -> remove from db), etc...
But fundamentally panicking because a table has an extra column is crazy. How else would you add a column to a running production system?
[0] https://docs.percona.com/percona-toolkit/pt-online-schema-ch...
threeseed 4 days ago |
> Even if you have a great deployment pipeline, at some point, you'll need to add a missing index in production fast because a wave of users came in and revealed a shit query.
This sounds more like a CI/CD and process issue.
There is no reason why adding a new index in code and deploying it into Production should be more complex or error prone than modifying it on the database itself.
mjr00 4 days ago |
Direct execution of `CREATE INDEX...` on a database table is always going to be faster than going through a normal deployment pipeline. Even if we assume your pipeline is really fast, which is probably not the case at most orgs, you are still comparing a single SQL statement execution, to a single SQL statement execution + git push + code reviews + merge + running through Jenkins/Circle/whatever. How long does that overhead take? How much money have you lost because your website won't load when your post is on the frontpage of HN? Seconds and minutes count. I don't want my code crashing because an unexpected index exists in this scenario.
threeseed 4 days ago |
You should be able to deploy end to end to Production in less than a minute.
Companies should be focused on solving that problem first before doing insanely short-sighted workarounds like skipping pushing to Git and code reviews.
mjr00 4 days ago |
> You should be able to deploy end to end to Production in less than a minute.
When I was at AWS (RDS) our end-to-end production deployment process was 7 days. We were also pulling $25million/day or so in profit. I'm sure that number is much higher now.
There's a large difference between what the theoretical "right" thing is from a textbook perspective, and what successful engineering teams do in reality.
edit: besides, it doesn't even make sense in this context. I have 100 servers talking to the database. I need to create an index, ok, add it to the code. Deploy to server 1. Server 1 adds the index as part of the migration process, and let's say it's instant-ish (not realistic but whatever). Do the other 99 servers now panic because there's an unexpected index on the table?
kelnos 4 days ago |
That's a lovely ideal, but I'm the real world, there are relatively few companies that meet that metric.
threeseed 4 days ago |
I've worked at FAANG and enterprise companies and we managed to do it.
There are no technical reasons why it can't be done. Only process and will.
kelnos 4 days ago |
Yes, and that's exactly the point. The reality doesn't usually match the ideals, and many orgs do not have good process, and do not have the political will to get good process implemented. Part of being a professional is recognizing where reality falls short of the ideals (an all-too-common occurrence), and doing the best you can to successfully get your work done in that environment.
And of course I don't know which FAANGs you worked at, but I know folks at FAANGs who have complained to me about CI and deployment times. Hell, these are huge companies; while they try to harmonize tooling, deployment times (especially when test suites of varying quality are involved) can vary a lot across a company. I wouldn't be surprised if there were people at the companies you worked at that were upset with deployment times, even if the teams you worked on were in good shape.
Honestly, when someone suggests something like you've suggested (that everyone should be able to get their deployment times to under a minute), I really do wonder if they're intentionally arguing in bad faith or are trolling. I know for a fact that things are not that rosy, and are rarely that rosy, even at the companies you claim to have worked at, and it's hard to believe that anyone could genuinely think that this is a broadly-attainable target. That doesn't mean that no one can do it, but that does mean that designing tooling that assumes everyone can do it is... well, just kinda naive and not very useful.
threeseed 4 days ago |
You have two choices: (1) try and solve your deployment issues or (2) make unmanaged, untested, unreviewed changes directly in Production.
Now you may say I'm just trolling but option (1) seems better to me for the long-term health of the project/company. And I don't believe it's correct to say it is an unrealistic goal.
pclmulqdq 4 days ago |
There are lots of reasons to do slow rollouts. You should be rolling out in stages anyway.
swiftcoder 4 days ago |
> I've worked at FAANG and enterprise companies and we managed to do it.
You have a very different experience to the rest of us, in that cases.
The big AWS services all had deployments measured in days - or even weeks (depending on how many regions they are deployed across). Facebook's monorepo took upwards of an hour just to get a PR through the merge queue. Both were notorious for "hand-jamming" critical fixes directly to production.
jeltz 4 days ago |
I don't think I have ever seen a non-toy project where that was the case.
tempodox 4 days ago |
You do code review in less than a minute?
threeseed 4 days ago |
You do code reviews/automated testing in lower environments before you make changes directly to Production.
And in this case if it's an emergency hot fix then it's still better to do this through a managed, tracked, tested pipeline.
kelnos 4 days ago |
It's a bummer that you've been downvoted, because it really does seem like people here have not operated databases at scale.
I will never claim that we were great at managing databases at Twilio, but often a schema change would take hours, days, or even a week or two to complete. We're taking about tables with hundreds of millions of rows, or more.
We'd start the change on a DB replica. When it would finish, we would have to wait for the replica to catch up with the primary. Then we would bring up new replicas, replicating from the replica with the new schema. Finally that replica would get promoted to primary, with all the old replicas (and the old primary, of course) removed from service, and the new replicas brought in.
Only then could we deploy code that was aware of and used the updated schema. The previous code of course had to ignore unknown columns, and if we ever wanted to drop a column, we had to first deploy code that would stop using that column. Any column type changes would need to be backwards-compatible. If that wasn't possible, we'd have to add a new column and backfill it. Adding indexes would usually be fine without preparatory code changes, but if we wanted to drop an index we'd first have to make sure there were no queries still depending on it.
Even for a "small" schema change that "only" took minutes or a few tens of seconds to complete, we'd still have to use this process. What, do you think we'd shut part or all of a real-time communications platform down while we do a schema change? Of course not.
The idea that the application could or should be in control of this process, or could always be in sync with the database when it came to its understanding of the schema, is impossibly unrealistic.
mjr00 4 days ago |
Yep, sounds like we have similar experiences! I first had to start thinking about this stuff at Hootsuite, back in the exciting 1million+ DAU days a decade ago. Before then, to me databases were just a thing that got deployed along with the application, and deploys only happened on a Friday night so who cares about downtime? By the time anyone tries logging into the app on Monday morning, the code and database will all be up to date. Going to a place where deploys were happening constantly and nonzero downtime was unacceptable was eye-opening.
> The idea that the application could or should be in control of this process, or could always be in sync with the database when it came to its understanding of the schema, is impossibly unrealistic.
These days my attitude is to treat databases as a completely separate service from the application code, which they effectively are. They're on a different set of servers, and the interface they provide is the columns/tables/views/etc, accessed through SQL. So yeah, no breaking changes, and the only thing application code should care about is if the queries it tries to execute return the expected sets of data, not if the schema itself matches. And certainly not about things like views, triggers or indexes.
This does end up being more overhead than migrations alongside the application code, which I know a lot of developers prefer because they're easier to use, but the approach just doesn't work after a certain scale.
(to be clear, I still use Liquibase etc to manage migrations, the process for applying those changes is just completely separate from deploying application code.)
kelnos 4 days ago |
> These days my attitude is to treat databases as a completely separate service from the application code, which they effectively are. They're on a different set of servers, and the interface they provide is the columns/tables/views/etc, accessed through SQL.
I've never thought of it this way, but I think this is really smart. If I have a service that exposes a REST API, I can, say, add a new field to a JSON object that's returned from an API endpoint without telling clients about it. Those clients can update later in order to take advantage of the information returned in the new field.
Same thing with a database: I can add a new column, and clients can learn about the new column later in the future, no problem. The database schema is just a part of the database's API, and it can be evolved in a backwards-compatible manner just like any other API.
> to be clear, I still use Liquibase etc to manage migrations, the process for applying those changes is just completely separate from deploying application code.
Right, the schema needs to be managed and there needs to be a source of truth for it, with tooling to do migrations, but coupling that so closely with the application so the schema and application always must be in sync (like some others seem to think is the One True Way) is a mistake, and would be a complete non-starter for my past professional needs.
jessekv 4 days ago |
IMO its a similar situation to the discussion here:
https://capnproto.org/faq.html#how-do-i-make-a-field-require...
dathinab 4 days ago |
I had to realize that at least in start up world most (non db focused) devs thing they might not be experts in SQL but have a very solid understanding
... and then don't know about a lot of very fundamental important parts and are blissfully unaware about that, too.
And to be clear I'm not saying they don't remember the exact details of something.
What I mean they don't even know that there are things they have to look up, nor any experience or willingness to understand what they did wrong by consulting the official documentation instead of just randomly googling and trying out "solutions" until one seem to happen to work.
The most common example would be having so little understanding about transaction that they believe transactions are just magically fixing all race conditions, and then then being very surprised that they don't. Or believing that transactions in SQL are fundamentally broken after realizing that somehow their databases got corrupted.
And again I don't mean junior deves, but people with 10+ years of backend or "fullstack" experience, i.e. people which at least should know that when to consult documentation/lookup protections transactions provide etc.
I have seen more then one time a (final state of) the situation where people started with believing SQL transaction magically fix everything, then get "corrupted" data then blame SQL for being broken and move to NoSql.
The joke here is all the concurrency problem are very fundamental and independent of SQL vs. NoSQL.
And SQL often gives you more powerful/easy to use (at small scale) tools to enforce synchronization, but at a cost. While NoSQL often gives you harder to use primitives where you have to do much more outside of the database to guarantee correctness, but then at least you will more likely blame you code instead of the db for things not working.
The most ironic thing here is I'm not a db expert, I just know where my knowledge stops and where I can lookup the missing parts and can't even give you much tips about huge dbs in production luckily surprisingly many companies have comparatively "small" db needs.
And honest where I see race condition related issues in SQL quite often I'm rarely not seeing them in NoSQL code. Where this issues in SQL make me sad as they are often very avoidable in NoSQL I often feel like giving up in resignation.
Through that experience is for "smallish" databases not Twillo scale. But a surprising large amount of companies have surprisingly "smallish" databases. Like no joke I have seen companies being very vocal about their "huge database" and then you realize it's like 5GiB ;=)
Honestly I (metaphorically speaking) don't even want to know how db experts feel about this, I'm not a db expert and just have a solid enough foundation to know where my knowledge stops and when I have to look things up (which is all the time, because I'm not writing that much SQL).
jessekv 4 days ago |
IMO "parse, don't validate" can apply to data coming out of the database too.
clutchski 4 days ago |
This is the correct answer.
kelnos 4 days ago |
In addition to the deployment-time issues and other stuff I and others have commented downthread, I thought of another problem with this.
I can't see how this would even work for trivial, quick, on-line schema changes. Let's say I have 10 servers running the same service that talks to the database (that is, the service fronting the database is scaled out horizontally). How would I do a migration? Obviously I can't deploy new code to all 10 servers simultaneously that will do the schema migration; only one server can run the migration. So one server runs the migration, and... what, the other 9 servers immediately panic because their idea of the schema is out of date?
Or I deploy code to all 10 servers but somehow designate that only one of them will actually do the schema migration. Well, now the other 9 servers are expecting the new schema, and will panic before that 1 server can finish doing the migration.
It seems to me that rust-query is only suitable for applications where you have to schedule downtime in order to do schema changes. That's just unacceptable for any business I've worked at.
dayjah 4 days ago |
This isn’t unique to rust-query; this problem also exists with ActiveRecord, for example. At Twitch we just had to really think about our migrations and write code to handle differences.
Basically no free lunch!
rendaw 4 days ago |
I think first and foremost, if you're going to use a tool like this, you need to do everything through the tool.
That said, for zero downtime migrations there are a number of techniques, but it typically boils down to splitting the migration into two steps where each step is rolled out to each server before starting the next: https://teamplify.com/blog/zero-downtime-DB-migrations/ https://johnnymetz.com/posts/multistep-database-changes/ etc
I'm not sure if there's anything that automates this, but it'd probably need to involve the infrastructure layer (like terraform) too.
Edit: There's one other approach I've heard of for zero downtime deployments:
Start running the new version in new instances/services parallel to the old version, but pause it before doing any database stuff. Drain client connections to the old version and queue them. Once drained, stop the old version, perform database migrations, and start the new version, then start consuming the queue.
This is (I think) more general but you could get client timeouts or need to kill long requests to the old version, and requires coordination between infrastructure (load balancer?) and software versions.
zozbot234 4 days ago |
You can always use SQL views to expose a version of your schema to apps that's different from what's in the underlying DB tables.
Merad 4 days ago |
Unless you have some tricks up your sleeve that I'm not thinking of, an immediate consequence of this is that zero downtime deployments and blue/green deployments become impossible. Those both rely on your app being able to run in a state where the schema is not an exact match for what the app expects - but it's compatible so the app can still function.
ComputerGuru 4 days ago |
Semantic versioning?
Merad 4 days ago |
If I understand the GP correctly, there's no notion of semver involved. Any difference in the schema results in a runtime error.
ComputerGuru 4 days ago |
Yes, that’s what I gathered from the code. But I was proposing it as an “easy” solution that doesn’t involve throwing away OP’s main idea.
Filligree 4 days ago |
And that's okay. Most applications don't need zero-downtime deployments, and there are already plenty of APIs that support that use case. I'd rather have more like this one.
Filligree 4 days ago |
I'm so glad you made this. I've been searching for a decent Rust database library for half a year already, and this ticks all the boxes.
I haven't tried it yet, so I might have to eat my words later, but- great job! It's going to save tons of effort.
theptip 4 days ago |
In all the systems I’ve built (mostly Django) you need to tolerate vN and vN+1 simultaneously; you are not going to turn off your app to upgrade the DB.
You’ll have some Pods on the old application version while you do your gradual upgrade.
How do you envision rolling upgrades working here?
sobellian 4 days ago |
Surely it is easier to just check that all migrations have run before you start serving requests? Column existence is insufficient to verify that the database conforms to what the application expects (existence of indices, foreign key relationships with the right delete/update rules, etc).
Kinrany 4 days ago |
The application is necessarily the authority on its expectations of the database.
mjr00 4 days ago |
You can see my sibling comment, but in the real world of operating databases at any sort of scale, you need to have databases in transitory states where the application can continue to function even though the underlying database has changed.
The quintessential example is adding a column. If you want to deploy with zero downtime, you have to square with the reality that a database schema change and deployment of application code is not an atomic operation. One must happen before the other. Particularly when you deal with fleets of servers with blue/green deploys where server 1 gets deployed at t=0minutes but server N doesn't get deployed until t=60minutes. Your application code will straight up fail if it tries to insert a column that doesn't exist, so it's necessary to change the database first. This normally means adding a column that's either nullable or has a default value, to allow the application to function as normal, without knowing the column exists.
So in a way, yes, the application is still the authority, but it's an authority on the interface it expects from the database. It can define which columns should exist, but not which columns should not exist.
Hytak 3 days ago |
You (and many other commenters) are right that rust-query currently requires downtime to do migrations. For many applications this is fine, but it would still be nice to support zero-downtime migrations
Your argument that the application should be the authority on the interface it expects from the database makes a lot of sense. I will consider changing the schema check to be more flexible as part of support for zero-downtime migrations.
throwawaymaths 4 days ago |
You might have more than one application hitting the same database
kelnos 4 days ago |
Absolutely not. Certainly it's necessary that the database schema be compatible with whatever the application believes the schema to be, but the application need not be in control of it, and at the orgs I've worked at, there's no way we could build zero-downtime systems if the application had to be in charge of schema.
Consider even a very simple case: let's say I have a database with two (identical) application servers talking to it (that is, I've horizontally scaled my application due to load and availability requirements). If I need to do a schema change, and the application needs to be in charge of schema, how would that even work? If I deploy the change & migration to one of the two servers, once the migration is complete, the second server will freak out because the schema doesn't match its "authoritative" view anymore. If I deploy the change to both servers at the same time, and somehow designate one of them to actually run the migration, then the other one will immediately panic on startup because the first server hasn't completed the migration yet.
Not to mention this setup breaks red/black deployments: even in the case where I only have one application server in front of the database, how do I bring up a new deployment, allow both servers to run while the new one is passing health checks, and then bring down the original server? They would both have different "authoritative" views of what the schema should be.
This also completely breaks the ability to roll back, at least without also rolling the schema back too. That's risky; I don't want to have my service rollback depend on schema rollback also working properly.
This kind of "application is authoritative about schema" only works when you can schedule downtime to do schema changes. That would be a non-starter for any company I've worked at.
Kinrany 4 days ago |
The application can ensure that its assumptions hold without asserting the exact state of the database.
Almost all migrations are idempotent and backwards-compatible, and almost all of the rest of them can be made to be by splitting into multiple changes and waiting for old versions of the application to be shut down.
seanhunter 4 days ago |
How does that work in the case where you have one database and n applications using it? Are they all somehow the authority, or do you have a way to pick one?
ninetyninenine 4 days ago |
agreed. Maybe having a schema check on the build step of the application will solve this. If the schema doesn't match then it doesn't compile. Most orms of course do the opposite. They generate a migration for the database from the code.
ris 4 days ago |
And you end up with no canonical declaration of the schema in your application code, leaving developers to mentally apply potentially tens, hundreds of migrations to build up an idea of what the tables are expected to look like.
eddd-ddde 4 days ago |
No matter how you define your schemas, you still have a series of migrations as data evolves. This is not an issue of schema definition.
ojkelly 4 days ago |
Would it make more sense to consider the response from the DB, like a response from any other system or user input, and take the parse don’t validate approach?
After all, the DB is another system, and its state can be different to what you expected.
At compile time we have a best guess. Unless there was a way to tell the DB what version of the schema we think it has, it could always be wrong.
runeks 4 days ago |
> This is validated at compile time against a database, which is good, but of course there is no guarantee that the production database will have the same types. Easiest mistake to make is to design a query against your local Postgres v15 and hit a runtime error in production running Postgres v12, e.g. a function like gen_ramdom_uuid() doesn't exist. Another is to assume a migration in production was actually executed.
One could have the backend fetch DB schema/version info at startup, compare it to its own view of what the schema should look like, and fail if the two disagree. That way, a new deployment would fail before being activated, instead of being deployed successfully and queries failing down the line.
nurettin 4 days ago |
> schema is validated by the wrong system. The database is the authority on what the schema is
What you describe is db-first. This rust library is code-first. In code-first, code is responsible for generating ddl statements using what is called a "migration" where the library detects changes to code and applies them to the schema.
shermantanktop 3 days ago |
Agree. Mid-tier developers who create queries for a SQL db to execute are doing the equivalent of using Java code to generate HTML. The target is not a programmatic API, it’s a language that was designed for end users, and it is both more expressive and more idiosyncratic than any facade you build in front of it.
cryptonector 3 days ago |
If the database is external to the host language and DB library, then you have external linkage and you'll have this problem no matter what -- not great. If the database is internal to the host language and DB library then you won't have this problem, but also you'll only be able to interact with the database via your application's code -- also not great.
Either way is not great. I think the best bet is to have:
- an external RDBMS - a compiler from the RDBMS schema to host languages/libraries - run-time validation that the schema has not changed backwards-incompatibly
When does run-time validation take place? When you compile a query: a) the RDBMS will fail if the schema has changed in certain backwards-incompatible ways (e.g., tables or columns dropped or renamed, etc.), b) the library has to check that the types of the resulting rows' columns match expectations.
api 4 days ago |
This is close to something I've wanted to see for a long time: a language where relational models are first class citizens and the engine for data storage is abstracted away.
Tables, joins, etc. should all be representable within the language's type system and standard libraries. Connectivity to a database should be handled by plugins/implementations of storage engines. If no storage engine is used, the relational data just lives in RAM like any other variable, collection, etc.
It's kind of the opposite of an ORM. An ORM tries to bolt a relational database into an OOP or other language model. This wraps the language around relational database concepts, fixing the impedance mismatch in the language rather than with a translation layer.
It also means one does not have to reinvent relational concepts (badly) in every application with a complex data model even if that application is not backed by a database.
summerlight 4 days ago |
>My opinion is that SQL should be for computers to write. This would put it firmly in the same category as LLVM IR.
Although I partially agree with the high level idea of this statement (let's accept the reality; no mature database is going to support your new shiny query language anytime soon), SQL is not the most convenient language to be written by code generator. Actually this is a sort of nightmare; a seemingly simple plan optimization would completely change the layout of the query so you're going to have a decent chance to write a full query rewrite stage just for the sake of workaround for a specific query engine. Google's SQL pipe proposal seems to make it a bit better but it has the same problem of new query languages unless it's broadly adopted.
omani 4 days ago |
everytime I see something like this I question myself why I would want or need this over postgREST.
I mean why reinvent the wheel? we live in a time where the DB itself is a REST API with postgREST.
https://postgrest.com
satvikpendem 4 days ago |
Why would I want to interact with my database over the network when I can just...do it locally? It's orders of magnitudes slower and more wasteful and it doesn't even provide all the guarantees that using actual SQL or an ORM does.
kccqzy 3 days ago |
If you are using PostgREST as intended, you don't have an application layer any more. It's not something extra you put in between the application and the database, but it replaces the application.
satvikpendem 3 days ago |
Ah so it's like Hasura, it's a BaaS platform. Unfortunately I don't like putting all my logic in the client only.
kccqzy 3 days ago |
No the logic is in the database. They encourage you to write functions to achieve custom logic in the database. Search for `create or replace function` in their tutorial to see an example.
satvikpendem 3 days ago |
Yeah that is also something I don't want to do, after having used Hasura at a previous place. After a while, the logic gets convoluted when it's all in the database only.
cryptonector 3 days ago |
Convoluted how?
cryptonector 3 days ago |
Idk why you got downvoted. It's an excellent question.
For those who don't know, with PostgREST you write all your application logic in SQL, and you export schemas (which are a subset of your actual schemas) with a RESTful API automatically generated by PostgREST.
Meanwhile PostgREST is built on top of PostgreSQL, which is statically typed, and is written in Haskell, which is also statically typed.
satvikpendem 4 days ago |
When this was posted on reddit, I saw someone [0] ask how it was different from diesel-rs (and diesel-async, the officially supported add-on crate, lest someone says diesel doesn't have async).
I saw some replies by the diesel maintainer about how the creator of this Rust-Query might not have really understood in-depth how diesel worked and wanted to write their own solution, and there's nothing wrong with that, of course, but this thread might be some good context for others asking themselves similar questions.
[0] https://www.reddit.com/r/rust/s/6midd79iDo
pstoll 4 days ago |
When a sample showing a new SQL thing has eg
a) no id types eg for User, just a string b) uses i64 for “stars” C) any other of a million things…
I quickly conclude it’s likely a cute toy by someone who wanted to write a thing, not ever likely to be a battle-hardened effort brought forth by a high performance / scale database tortured soul. And I move on.
rendaw 4 days ago |
I'm (was? not sure after seeing this) working on a similar thing to this, and the most significant difference between this and diesel is that this has stronger safety guarantees by controlling migrations as well, whereas diesel just trusts the database is in the state you claim it is.
weiznich 4 days ago |
You get the same safety guarantees with diesel by using the [`embed_migration!`](https://docs.diesel.rs/2.2.x/diesel_migrations/macro.embed_m...) macro and running your applications on startup. Diesel decouples that as there are situations where you might want to work with a slightly different database as defined, for example if you don't control the database.
rendaw 3 days ago |
I don't get the impression that there's any interaction or data sharing between the embedded migrations and the query generation part, that would validate the queries against the schema as defined by the migration -- is that wrong? The documentation doesn't seem to mention it at least.
weiznich 3 days ago |
That depends on the workflow. The query checking is based on the information in your `schema.rs` file. That file is usually generated via a CLI tool provided as part of diesel in your local development workflow. It's generally assumed to match the migrations in the repository, but you also can use the CLI tool to verify that this is actually the case. To get the full way around: The migrations then describe the database state that's used to define your `schema.rs` file, which in turn are embedded into the binary via the `embed_migration!` macro. If you now have the requirement to make sure that the database matches the provided schema file you can ensure that at build time by:
* Running your migrations in a build.rs file against a test database
* Running `diesel print-schema --locked-schema` afterwards to make sure that the provided `schema.rs` file matches the state of the database
* Use the `embed_migrations!` macro to embed and run the migrations on application startup
What rust-query does is just the "same" as these steps outlined above. Arguably they do all in one step, but that has the disadvantage that it forces rust-query to always have the full control over the database. Other than that there is no verification at any point happening that the schema actually matches what's declared there, they just apply migrations as well and reasonable assume that the database will match the declared state. This does not guard against cases where you for example manually modify the schema later or something like this.
For diesel we want to make this more robust at some point by providing an actual check function as part of the `schema.rs` files that allows to verify that the declared schema matches the actual database state. That one then could be called at different points, depending on the users requirements. If you are interested in such a feature I suggest reaching out to us in the diesel support channels.
rendaw 3 days ago |
Oh thanks! So is this the expected workflow?
1. Write initial db migrations
2. Start a test db, apply the migrations, generate schema.rs via the CLI tool
3. Write code based on schema.rs
4. Write new migrations
5. Test the changes on test db instance, re-generate schema.rs via the CLI tool
6. Update code based on the new schema.rs
When I first looked at Diesel I thought the safety was built into the code itself, but if it comes from out-of-code procedures using the CLI and a real database that makes a lot more sense. I don't think I got a good picture of how all these tools and libraries came together from the Diesel docs themselves. So, in this sense it seems very similar to SQLx.
My take is that this is a "database-first" approach vs rust-query's "code-first" approach to safety. I think the benefits of a code-first design are that you don't need any additional CLI tooling, manual procedures outside of code, to test migrations, or a running database at any point in development.
I guess from some perspective, writing raw SQL with tests that run against a real database is perfectly safe too. I think it's a spectrum of how database-first or code-first the database layer is.
> that has the disadvantage that it forces rust-query to always have the full control over the database
I think this goes for any database layer, doesn't it? If you write migrations for Diesel and then someone goes and makes manual changes to the production database you're similarly out of luck.
The rust-query author also mentions checksumming, but I think at the end of the day people always have the ability to go in and hollow out assumptions - but IMO the code-first approach which discourages any manual database interaction draws a clear line.
weiznich 3 days ago |
> So is this the expected workflow?
It's impossible to write what is the expected workflow, because that heavily depends on your requirements. Overall there is certain functionality that exists in diesel and that can be combined in different ways to build different kind of workflows.
For example: If you have a database that is controlled by someone else you won't want to use any migration functionality at all, you would want to use only `diesel print-schema` there to generate the `schema.rs` file for you. Similarly different workflows consisting of any of the existing parts are possible.
> So, in this sense it seems very similar to SQLx.
There is an important difference here: Diesel provides a operate CLI to generate rust code for you, instead of connecting to a database from the "compiler" (or reading files). That's really important as you don't have any non-deterministic proc-macros, which are really not that great with the rust compiler (and officially something that's at least in some grey zone in terms of support).
> My take is that this is a "database-first" approach vs rust-query's "code-first" approach to safety. I think the benefits of a code-first design are that you don't need any additional CLI tooling, manual procedures outside of code, to test migrations, or a running database at any point in development.
This brings me back to the non-standard workflow point raised before. You also can have a code first approach with diesel. The cli tool supports generating the migrations for you from the given `schema.rs` file and a up and running database. So you basically would write the `schema.rs` file in that case and the tool generates SQL to move the database to that `schema.rs` state.
> > that has the disadvantage that it forces rust-query to always have the full control over the database > > I think this goes for any database layer, doesn't it? If you write migrations for Diesel and then someone goes and makes manual changes to the production database you're similarly out of luck.
My point here is more: rust-query really forces you to have control over the database. Diesel is totally fine with not being able to control migrations or whatever. For running code you only need to provide a schema.rs file, which might be generated by running migrations or which might be hand written or which might be generated from an existing stable database.
> The rust-query author also mentions checksumming, but I think at the end of the day people always have the ability to go in and hollow out assumptions - but IMO the code-first approach which discourages any manual database interaction draws a clear line.
That's likely only about migrations, not about the actual database state. It can help to make sure that migrations are really the same as used to setup the database, but it won't help with cases where you change the database manually. I don't think it's even meaningful to check on each database interaction that the schema hasn't change so far, as that would be quite expensive.
Hytak 3 days ago |
Author here: I just wanted to point out that rust-query does in fact read the schema from the database and compares it to the expected schema. There is no checksumming or comparison of migrations. It is checked efficiently that the schema hasn't changed at the start of every transaction by just checking the `schema_version` sqlite pragma.
https://news.ycombinator.com/item?id=42283030
weiznich 3 days ago |
I just want to point out that reading the value of this pragma is not the same as verifying that the schema hasn't change, as this does not guard you against manual modifications of either the value of the pragma (after all I can just do a `pragma set schema_version=42`) or the schema itself (I can also manually change the schema, without changing the value of the pragma).
So while this is a nice way to verify which of your migrations have been applied this doesn't give any guarantees around the schema at all. Essentially this is the same as what diesel does with reading the `__diesel_migration_version` table.
Hytak 2 days ago |
manually updating the schema_version can lead to database corruption https://www.sqlite.org/pragma.html#pragma_schema_version
If you are willing to risk corrupting the database, then yes you can trick rust-query
weiznich 2 days ago |
As pointed out in the parent comment you do not need to update this pragma to create a mismatch between the expected schema and the actual schema. The point here is that this is not a sufficient safe way to ensure the schema is what you actually expect, but merely a check about which migrations are applied and which not. That's not different to what diesel does, beside the fact that you run the check on each transaction (which comes with the cost of an additional "query") instead of once at startup.
To actually check if the schema matches what you expect you need to query information about all relevant table and compare them with the expected state.
EDIT: I would also like to point to the [linked documentation above](https://www.sqlite.org/howtocorrupt.html#cfgerr) that explicitly states the following:
> Changing the PRAGMA schema_version while other database connections are open.
So you don't risk any corruption (from SQLite's point of view) if there is no other database connection open, which essentially means you just need to shutdown your application first.
Hytak 2 days ago |
What I think is that you are confusing the schema_version and user_version pragmas.
user_version is what you would use to keep track of which migrations have run and it can be safely updated (or not updated) without corrupting the database.
schema_version is the thing that sqlite updates automatically on every single change to the schema. sqlite uses schema_version to make sure that prepared statements don't accidentally run on a different schema than what they are prepared for. schema_version is also the thing that rust-query uses to make sure that the schema hasn't changed since the application started.
You can see now that you can not trick rust-query without also tricking sqlite (and risking corruption if there was a prepared statement).
If you shutdown the application before you change the schema_version, then rust-query will just see that the schema is different when the application is started. Because rust-query always reads the full schema and compares it to the expected schema on application start!
weiznich 2 days ago |
Again the assumption that no change in `schema_version` means that the observed schema stayed the same is not correct. One counter example is creating a temporary table. This doesn't increase the `schema_version` flag, but it allows you to shadow any existing table with whatever different structure you like. That's connection specific behavior, so you won't be able to observe that from other database connections. As soon as you allow users to execute arbitrary SQL (which you definitively want as you cannot reasonably provide a DSL for all supported SQL constructs) it's possible to break you assumption in that way.
mijoharas 4 days ago |
> The actual row numbers are never exposed from the library API. Application logic should not need to know about them.
This point confuses me. If we're thinking about a web server, you'll pass your data to the frontend with a row ID, so that they can refer to and modify that data in another request? How would it work otherwise?
Am I missing something? I can think of some use cases, but this limitation cuts down on the uses I'd have for this massively.
twiss 4 days ago |
I'm guessing they're talking about https://www.complexsql.com/rowid-rownum/, i.e. the row numbers defined by the database, rather than the value of the ID column you typically create in each table.
I.e., it should still be possible to create a column with a public identifier by which you can refer to the row, I assume.
Hytak 3 days ago |
Sending row IDs to your frontend has two potential problems:
- Row IDs can be reused when the row is deleted. So when the frontend sends a message to the backend to modify that data, it might accidentally modify a different row that was created after the original was deleted.
- You may be leaking the total number of rows (this allows detecting when new rows are created etc. which can be problematic).
If you have nothing else to uniquely identify a row, then you can always create a random unique identifier and put a unique constraint on it.
skeptrune 4 days ago |
Diesel basically being the same as Postgres SQL is an awesome feature. I'd much rather than then an abstracted query language which is difficult to understand.
runeks 4 days ago |
Nit: a DB query can never be made type safe (avoiding runtime failure) because its correctness depends on the state of an external service, which may change at any point in time.
LudwigNagasena 3 days ago |
It can be type safe in relation to the established contract between services.
rendaw 4 days ago |
This is awesome, but also kind of hurts since I was working on something similar here: https://github.com/andrewbaxter/good-ormning
They're very similar: Both define schema version "snapshots" not "diffs", with diffing happening automatically, and check queries based on the latest defined version, don't require a running database, etc.
The main difference seems to be about queries. My approach was to try to replicate sql query structure directly with the query builders, rather than abstract away from sql, whereas this appears to aim for something that's less similar to the original sql. I don't consider myself an sql expert and I've seen sql abstractions that aren't able to replicate all functionality, but seeing the reddit debate with the diesel author this seems fairly thorough.
FWIW in `good-ormning` I was thinking of getting rid of the query builders/my own query ast and having a macro that parses with `datafusion-sqlparser-rs` and checks that ast instead, but it's a significant change.
FWIW I think SQL is awful too, so I'm really happy to see the support here and I hope rust-query takes off.
dboreham 3 days ago |
For folks reading who don't have experience in this space: this kind of thing is usually a bad idea. On the surface it looks like a neat thing, and the people who write these components clearly think they're a good thing. But not so much.