Logica – Declarative logic programming language for data

171 points by voat 7 days ago | 66 comments

dang 7 days ago |
Related:
Google is pushing the new language Logica to solve the major flaws in SQL - https://news.ycombinator.com/item?id=29715957 - Dec 2021 (1 comment)
Logica, a novel open-source logic programming language - https://news.ycombinator.com/item?id=26805121 - April 2021 (98 comments)
usgroup 7 days ago |
I may be misremembering but I think that at the time, Logica was the work of one developer who happened to be at Google. I'm not sure that there was an institutional push to use this language, nor that it has significant adoption at Google itself.
thenaturalist 7 days ago |
This seems supported by the fact that the repo is not under a Google org and it has a single maintainer.
diggan 7 days ago |
> that the repo is not under a Google org
I don't think that matters? github.com/google has a bunch of projects with large warnings that "This is not a Google project", not sure why or how that is. From the outside it looks like if you work at Google, they take ownership of anything you write.
azornathogron 7 days ago |
> From the outside it looks like if you work at Google, they take ownership of anything you write.
That is precisely how it works.
Disclaimer: I am not a lawyer, and I'm sure the validity and enforceability of the relevant contract clauses varies by jurisdiction.
Y_Y 7 days ago |
If, like me, your first reaction is that this looks suspiciously like Datalog then you may be interested to learn that they indeed consider Logical to be "in the the Datalog family".
jp57 7 days ago |
I think Datalog should be thought of as "in the logic programming family", so other data languages based on logic programming are likely to be similar.
And, of course the relational model of data is based on first-order logic, so one could say that SQL is a declarative logic programming language for data.
riku_iki 7 days ago |
Only one active committer on github..
thenaturalist 7 days ago |
I don't want to come off as too overconfident, but would be very hard pressed to see the value of this.
At face value, I shudder at the syntax.
Example from their tutorial:
EmployeeName(name:) :- Employee(name:);
Engineer(name:) :- Employee(name:, role: "Engineer");
EngineersAndProductManagers(name:) :- Employee(name:, role:), role == "Engineer" || role == "Product Manager";
vs. the equivalent SQL:
SELECT Employee.name AS name
FROM t_0_Employee AS Employee
WHERE (Employee.role = "Engineer" OR Employee.role = "Product Manager");
SQL is much more concise, extremely easy to follow.
No weird OOP-style class instantiation for something as simple as just getting the name.
As already noted in the 2021 discussion, what's actually the killer though is adoption and, three years later, ecosystem.
SQL for analytics has come an extremely long way with the ecosystem that was ignited by dbt.
There is so much better tooling today when it comes to testing, modelling, running in memory with tools like DuckDB or Ibis, Apache Iceberg.
There is value to abstracting on top of SQL, but it does very much seem to me like this is not it.
Tomte 7 days ago |
The syntax is Prolog-like, so people in the field are familiar with it.
thenaturalist 7 days ago |
Which field would that be?
I.e. I understand now that it's seemingly about more than simple querying, so me coming very much from an analytics/ data crunching background am wondering what a use case would look like where this is arguably superior to SQL.
tannhaeuser 7 days ago |
> Which field would that be?
Database theory papers and books have used Prolog/Datalog-like syntax throughout the years, such as those by Serge Abiteboul, just to give a single example of a researcher and prolific author over the decades.
evgskv 5 days ago |
In my opinion any analytical query is easier to read in logic language than in SQL. But it's most obvious for recrusive querries. E.g. distance in graph defined by predicate (aka table) G written in Logica looke like:
D(a, b) Min= 1 :- G(a, b); # Connected by an edge => distance 1. D(a, c) Min= D(a, b) + D(b, c); # Triangle inequality.
It will be much harder to read with SQL CTE. It can also computed over weighted graphs, which is impossible or extremely hard with SQL.
In practice you rarely need recursive querries, so gap between Logica and SQL isn't as large as it is here, but Logica is easier to read (in my opinion) for similar reasons.
aseipp 7 days ago |
Logica is in the Datalog/Prolog/Logic family of programming languages. It's very familiar to anyone who knows how to read it. None of this has anything to do with OOP at all and you will heavily mislead yourself if you try to map any of that thinking onto it. (Beyond that, and not specific to Logica or SQL in any way -- comparing two 3-line programs to draw conclusions is effectively meaningless. You have to actually write programs bigger than that to see the whole picture.)
Datalog is not really a query language, actually. But it is relational, like SQL, so it lets you express relations between "facts" (the rows) inside tables. But it is more general, because it also lets you express relations between tables themselves (e.g. this "table" is built from the relationship between two smaller tables), and it does so without requiring extra special case semantics like VIEWs.
Because of this, it's easy to write small fragments of Datalog programs, and then stick it together with other fragments, without a lot of planning ahead of time, meaning as a language it is very compositional. This is one of the primary reasons why many people are interested in it as a SQL alternative; aside from your typical weird SQL quirks that are avoided with better language design (which are annoying, but not really the big picture.)
thenaturalist 7 days ago |
> but it is more general, because it also lets you express relations between tables themselves (e.g. this "table" is built from the relationship between two smaller tables), and it does so without requiring extra special case semantics like VIEWs.
If I understand you correctly, you can easily get the same with ephemeral models in dbt or CTEs generally?
> Because of this, it's easy to write small fragments of Datalog programs, and then stick it together with other fragments, without a lot of planning ahead of time, meaning as a language it is very compositional.
This can be a benefit in some cases, I guess, but how can you guarantee correctness with flexibility involved?
With SQL, I get either table or column level lineage with all modern tools, can audit each upstream output before going into a downstream input. In dbt I have macros which I can reuse everywhere.
It's very compositional while at the same time perfectly documented and testable at runtime.
Could you share a more specific example or scenario where you have seen Datalog/ Logica outperform a modern SQL setup?
Generally curious.
I am not at all familiar with the Logica/Datalog/Prolog world.
from-nibly 7 days ago |
Prolog et al is a real brain buster. As in it will break your spirits and build you back up better. I remember in college I was able to build a binary tree with 3 lines of code. And once you write the insert, the delete, search, and others just magically appear.
It also frames your thinking about defining what you want rather than how to get it.
If you really want to see the power of these kinds of languages look up Einstein's puzzle solved with prolog. The solution just magically comes out by entering the constraints of the puzzle.
rytis 7 days ago |
I suppose something like this: https://stackoverflow.com/a/8270393 ?
surgical_fire 7 days ago |
I had to use Prolog in college, and while I never saw it in the wild - I at least never stumbled upon a scenario where prolog was the answer - I really enjoyed how I had to change how I looked at a problem in order to solve it in prolog.
burakemir 7 days ago |
Here is a proof that you can translate non-recursive datalog into relational algebra and vice versa: https://github.com/google/mangle/blob/main/docs/spec_explain...
Since Logica is translated to SQL it should benefit from all the query optimistic goodness that went into the SQL engine that runs the resulting queries.
I personally see the disadvantages of SQL in that it is not really modular, you cannot have libraries, tests and such.
Disclosure: I wrote Mangle (the link goes to the Mangle repo), another datalog, different way of extending, no SQL translation but an engine library.
aseipp 7 days ago |
Mangle looks very interesting, thanks for the share. In particular I love your GRPC demo, because it shows a prototype of something I've been thinking about for a long time: what if we did GraphQL, but with Datalog! Maybe we could call it LogiQL :)
In particular many people talk a lot about concerns like optimizations across GraphQL plans and how they are expected to behave on underlying tables, but this is something that I think has seen a lot of research in the Datalog realm. And to top it off, even ignoring that, Datalog just feels much more natural to write and read after a bit of practice, I think. (Obviously you need to be in the pure fragment of datalog without recursion, but even then it might be feasible to add those features with termination criteria even if it's just "decrement an internal counter and if it hits zero throw a big error")
What do you think the plans for the Rust implementation will be? That's probably the most likely place I'd use it, as I don't really use Go that much.
burakemir 7 days ago |
The Mangle repo has the beginnings of a Rust implementation but it will take some time before it is usable. The go implementation is also still being improved, but I think real DB work with persistent data will happen only in Rust. Bindings to other host languages would also use the Rust implementation. There are no big challenges here it is just work and takes time.
The combination of top-down and bottom up logic programming is interesting, especially when one can move work between pre computation and query time.
I like that optimizing queries in datalog can be discussed like optimization of programming language but of course the biggest gains in DB come from join order and making use of indices. There is a tension here between declarative and having some control or hints for execution. I haven't yet figured out how one should go about it, and also how to help programmers combine top-down and bottom-up computation. Work in progress! :-)
aseipp 7 days ago |
> If I understand you correctly, you can easily get the same with ephemeral models in dbt or CTEs generally?
You can bolt on any number of 3rd party features or extensions to get some extra thing, that goes for any tool in the world. The point of something like Datalog is that it can express a similar class of relational programs that SQL can, but with a smaller set of core ideas. "Do more with less."
> I guess, but how can you guarantee correctness with flexibility involved?
How do you guarantee the correctness of anything? How do you know any SQL query you write is correct? Well, as the author, you typically have a good idea. The point of being compositional is that it's easier to stick together arbitrary things defined in Datalog, and have the resulting thing work smoothly.
Going back to the previous example, you can define any two "tables" and then just derive a third "table" from these, using language features that you already use -- to define relationships between rows. Datalog can define relations between rules (tables) and between facts (rows), all with a single syntactic/semantic concept. While SQL can only by default express relations between rows. Therefore, raw SQL is kind of "the bottom half" of Datalog, and to get the upper half you need features like CTEs, VIEWs, etc, and apply them appropriately. You need more concepts to cover both the bottom and top half; Datalog covers them with one concept. Datalog also makes it easy to express things like e.g. queries on graph structures, but again, you don't need extra features like CTEs for this to happen.
There are of course lots of tricky bits (e.g. optimization) but the general idea works very well.
> Could you share a more specific example or scenario where you have seen Datalog/ Logica outperform a modern SQL setup?
Again, Datalog is not about SQL. It's a logic programming language. You need to actually spend time doing logic programming with something like Prolog or Datalog to appreciate the class of things it can do well. It just so happens Datalog is also good for expressing relational programs, which is what you do in SQL.
Most of the times I'm doing logic programming I'm actually writing programs, not database queries. Trying to do things like analyze programs to learn facts about them (Souffle Datalog, "can this function ever call this other function in any circumstance?") or something like a declarative program as a decision procedure. For example, I have a prototype Prolog program sitting around that scans a big code repository, figures out all 3rd party dependencies and their licenses, then tries to work out whether they are compatible.
It's a bit like Lisp, in the sense that it's a core formulation of a set of ideas that you aren't going to magically adopt without doing it yourself a bunch. I could show you a bunch of logic programs, but without experience all the core ideas are going to be lost and the comparison would be meaningless.
For the record, I don't use Logica with SQL, but not because I wouldn't want to. It seems like a good approach. I would use Datalog over SQL happily for my own projects if I could. The reasons I don't use Logica for instance are more technical than anything -- it is a Python library, and I don't use Python.
kthejoker2 7 days ago |
CTEs aren't really an "extra" feature they just are a composable reusable subquery. This just adds the benefit of storing CTEs as function calls aka table valued functions (TVFs) ... also not really an "extra" feature.
The main advantage to any non SQL language is its ability to more efficiently express recursion (graph / hierarchical queries) and dynamic expressions like transposition and pivots.
You can do those in SQL it's just clunky.
jyounker 7 days ago |
The covid analysis seems like a pretty good example: https://colab.research.google.com/github/EvgSkv/logica/blob/...
A good exercise might be converting it to the corresponding SQL and comparing the two for clarity.
cess11 7 days ago |
Right, so that's what they claim, that you'll get small reusable pieces.
But: "Logica compiles to SQL".
With the caveat that it only kind of does, since it seems constrained to three database engines, probably the one they optimise the output to perform well on, one where it usually doesn't matter and one that's kind of mid performance wise anyway.
In light of that quote it's also weird that they mention that they are able to run the SQL they compiled to "in interactive time" on a rather large dataset, which they supposedly already could with SQL.
Arguably I'm not very good with Datalog and have mostly used Prolog, but to me it doesn't look much like a Datalog. Predicates seems to be variadic with named parameters, making variables implicit at the call site so to understand a complex predicate you need to hop away and look at how the composite predicates are defined to understand what they return. Maybe I misunderstand how it works, but at first glance that doesn't look particularly attractive to me.
Can you put arithmetic in the head of clauses in Datalog proper? As far as I can remember, that's not part of the language. To me it isn't obvious what this is supposed to do in this query language.
aseipp 7 days ago |
For the record, I don't use Logica myself so I'm not familiar with every design decision or feature -- I'm not a Python programmer. I'm speaking about Datalog in general.
> making variables implicit at the call site
What example are you looking at? The NewsData example for instance seems pretty understandable to me. It seems like for any given predicate you can either take the implicit name of the column or you can map it onto a different name e.g. `date: date_num` for the underlying column on gdelt-bq.gdeltv2.gkg.
Really it just seems like a way to make the grammar less complicated; the `name: foo` syntax is their way of expressing 'AS' clauses and `name:` is just a shorthand for `name: name`
> In light of that quote it's also weird that they mention that they are able to run the SQL they compiled to "in interactive time" on a rather large dataset, which they supposedly already could with SQL.
The query in question is run on BigQuery (which IIRC was the original and only target database for Logica), and in that setup you might do a query over 4TB of data but get a response in milliseconds due to partitioning, column compression, parallel aggregation, etc. This is actually really common for many queries. So, in that kind of setup the translation layer needs to be fast so it doesn't spoil the benefit for the end user. I think the statement makes complete sense, tbh. (This also probably explains why they wrote it in Python, so you could use it in Jupyter notebooks hooked up to BigQuery.)
cess11 7 days ago |
They define a NewsData/5, but use a NewsData/2.
Are you aware of any SQL transpilers that spend so much time transpiling that you get irritated? I'm not.
aseipp 6 days ago |
Ah, I see what you mean. I'm not sure predicates like NewsData can actually be overloaded by arity, I'd have to check the docs. It mostly just seems like a shorter way to write the predicate with unbound variables.
> Are you aware of any SQL transpilers that spend so much time transpiling that you get irritated? I'm not.
Again, when you are running a tool on something that returns results in ~millisecond time, it is important the tool does not spoil that. Even 100-200ms is noticeable when you're typing things out. They could have worded it differently, it's probably just typical "A programmer wrote these docs" stuff, so it's just bad copy. A dedicated technical writer would probably do something different.
joe_the_user 7 days ago |
It's very familiar to anyone who knows how to read it.
"Anyone who know the system can easily learn it" he said with a sniff.
Yes, the similarity to Prolog lets you draw on a vast pool of Prolog programmers out there.
I mean, I studied a variety of esoteric languages in college and they were interesting (I can't remember if we got to prolog tbh but I know 1st logic pretty well and that's related). When I was thrown into a job with SQL, it's English language syntax made things really easy. I feel confident that knowing SQL wouldn't oppositely make learning Prolog easy (I remember Scala later and not being able to deal with it's opaque verbosity easily).
Basically, SQL syntax makes easy things easy. This gets underestimated a lot, indeed people seem to have contempt for it. I think that's a serious mistake.
jyounker 7 days ago |
> Basically, SQL syntax makes easy things easy. This gets underestimated a lot, indeed people seem to have contempt for it. I think that's a serious mistake.
The flip side of that is SQL makes hard things nearly impossible.
SQL doesn't have facilities for abstraction, and it doesn't compose, and this has consequences that I deal with daily.
The lack of abstract facilities makes it hard to construct complicated queries, it makes it hard to debug them, and it makes it hard refactor them.
Instead of writing more complicated SQL queries, developers lean on the host languages to coordinate SQL calls, using the host language's abstraction facilities to cover for SQL's inadequacies.
joe_the_user 7 days ago |
The flip side of that is SQL makes hard things nearly impossible.
What about SQL syntax makes the hard things possible? I get that the actual language SQL is broken in all sorts of ways. But I don't see any reason to replace it with some opaque from get-go.
I mean, what stops you from defining, say adjectives and using those for rough modularity.
Say
EXPENSIVE(T) means T.price > 0; Select name FROM books WHERE EXPENSIVE(books);
Seems understandable.
geocar 7 days ago |
Isn't that just WITH?
WITH expensive AS (SELECT * FROM books WHERE price > 100) SELECT name FROM expensive
jyounker 3 days ago |
Yes, you can extend the language with more syntax. This kind of proves my point.
If there weren't deficiencies then you wouldn't need to define more syntax, and so many work-arounds wouldn't have already been created. The deficiencies in SQL are why each big SQL database ends up creating some procedural-SQL language for use in stored procedures and triggers.
CTEs are close to what you outline above, but even then (as far as I know) you can't name that CTE and use it across multiple statements.
aseipp 7 days ago |
I mean, yes, that's sort of how linguistics works in general? You can't just look at a language with completely different orthography or semantic concepts and expect to be able to reliably map it onto your pre-existing language with no effort. That's sort of the whole reason translation is a generally difficult problem.
I don't really get this kind of complaint in general I'm afraid. Many people can read and write, say, Hangul just fine -- and at the same time we don't expect random English speakers with no familiarity will be able to understand Korean conversations, or any syllabic writing systems in general. Programming language families/classes like logic programming are really no different.
> it's English language syntax made things really easy
That's just called "being familiar with English" more than any inherent property of SQL or English.
jyounker 7 days ago |
> No weird OOP-style class instantiation for something as simple as just getting the name.
I understand the desire to no waste your time, but I think you're missing the big idea. Those statements define logical relations. There's nothing related to classes or OOP.
Using those building blocks you can do everything that you can with SQL. No need for having clauses. No need for group by clauses. No need for subquery clauses. No need for special join syntax. Just what you see above.
And you can keep going with it. SQL quickly runs into the limitations of the language. Using the syntax above (which is basically Prolog) you can construct arbitrarily large software systems which are still understandable.
If you're really interested in improving as a developer, then I suggest that spend a day or two playing with a logic programming system of some sort. It's a completely different way of thinking about programming, and it will give you mental tools that you will never pick up any other way.
thenaturalist 7 days ago |
Really appreciate your response and perspective!
Goes on the holidays list.
snthpy 7 days ago |
Have a look at PRQL [1] for analytical queries. That's exactly what it's designed for. Disclaimer: I'm a contributor.
That said. I like Logica and Datalog. For me the main use case is "recursive" queries as they are simpler to express that way. PRQL has made some progress there with the loop operator but it could still be better. If you have any ideas for improvement, please reach out!
1: https://prql-lang.org/
cess11 7 days ago |
If this is how you want to compile to SQL, why not invent your own DCG with Prolog proper?
It should be easy enough if you're somewhat fluent in both languages, and has the perk of not being some Python thing at a megacorp famous for killing its projects.
taeric 7 days ago |
I find the appeals to composition tough to agree with. For one, most queries begin as ad hoc questions. And can usually be tossed after. If they are needed for speed, it is the index structure that is more vital than the query structure. That and knowing what materialized views have been made with implications on propagation delays.
Curious to hear battle stories from other teams using this.
FridgeSeal 7 days ago |
Depends who your users are and what the context is.
Having been in quite a few data teams, and supported businesses using dashboards, a very large chunk of the time, the requests do align with the composable feature: people want “the data from that dashboard but with x/y/z constraints too” or “<some well defined customer segment> who did a|b in the last time, and then send that to me each week, and then break it down by something-else”. Scenarios that all benefit massively from being able to compose queries more easily, especially as things like “well defined customer segment” get evolved. Even ad-hoc queries would benefit because you’d be able to throw them together faster.
There’s a number of tools that proclaim to solve this, but solving this at the language level strikes me as a far better solution.
taeric 7 days ago |
I supported a team at a large company looking at engagement metrics for emails. Materialized views (edit: manually done) and daily aggregate jobs over indexed ranges was really the only viable solution. You could tell the new members because they would invariably think to go to base data and build up aggregates they wanted, and not look directly for the aggregates.
That is so say, you have to define the jobs that do the aggregations, as well. Knowing that you can't just add historical records and have them immediately on current reports.
I welcome the idea that a support team could use better tools. I suspect polyglot to win. Ad hoc is hard to do better than SQL. DDL is different, but largely difficult to beat SQL, still. And job description is a frontier of mistakes.
Agraillo 7 days ago |
I think it is a good direction imho. Once being familiar with SQL I learned Prolog a little and similarities struck me. I wasn't the first one sure, and there are others who summarized it better than me [1] (2010-2012):
Each can do the other, to a limited extent, but it becomes increasingly difficult with even small increases in complexity. For instance, you can do inferencing in SQL, but it is almost entirely manual in nature and not at all like the automatic forward-inferencing of Prolog. And yes, you can store data(facts) in Prolog, but it is not at all designed for the "storage, retrieval, projection and reduction of Trillions of rows with thousands of simultaneous users" that SQL is.
I even wanted to implement something like Logica at the moment, primarily trying to build a bridge through a virtual table in SQLite that would allow storing rules as mostly Prolog statements and having adapters to SQL storage when inference needs facts.
[1]: https://stackoverflow.com/a/2119003
cess11 7 days ago |
Perhaps you already know this, but as a data store Prolog code is actually surprisingly convenient sometimes, similar to how you might create a throwaway SQLite3 or DuckDB for a one-off analysis or recurring batched jobs.
It's trivial to convert stuff like web server access logs into Prolog facts by either hacking the logging module or running the log files through a bit of sed, and then you can formalise some patterns as rules and do rather nifty querying. A hundred megabytes of RAM can hold a lot of log data as Prolog facts.
E.g. '2024-11-16 12:45:27 127.0.0.1 "GET /something" "Whatever User-Agent" "user_id_123"' could be trivially transformed into 'logrow("2024-11-16", "12:45:27", "127.0.0.1", "GET", "/something", "Whatever User-Agent", "user_id_123").', especially if you're acquainted with DCG:s. Then you could, for example, write a rule that defines a relation between rows where a user-agent and IP does GET /log_out and shortly after has activity with another user ID, and query out people that could be suspected to use several accounts.
foobarqux 7 days ago |
There don't seem to be any examples of how to connect to an existing (say sqlite) database even though it says you should try logica if "you already have data in BigQuery, PostgreSQL or SQLite,". How do you connect to an existing sqlite database?
kukkeliskuu 7 days ago |
I was turned off by this at first, but then tried it out. These are mistakes in the documentation. The tools just work with PostgreSQL and SQLite without any extra work.
foobarqux 7 days ago |
How do you connect to an existing database so that you can query it? There are examples of how you can specify an "engine" which will create a new database and use it as a backend for executing queries but I want to query existing data in an sqlite database.
evgskv 5 days ago |
To connect to a database file use:
@AttachDatabase("db_prefix", "your_file.db"); # Then you can query from it: Q(..r) :- db_prefix.YourTable(..r);
foobarqux 5 days ago |
Thank you. You can't do Q(..r) in sqlite right? That's what I read in the tutorial.
evgskv 5 days ago |
Ah, yes, you're right! Please do:
# ... Q(your_column) :- example_db.YourTable(your_column:);
You can query multiple columns of course. Feel free to start threads in Discussions of the repo with whatever questions you have!
kukkeliskuu 5 days ago |
They are very responsive in the repo discussions, just ask questions there.
evgskv 6 days ago |
Yeah, we need better tutorials.
To use SQLite use @Engine("sqlite") imperative. And you can then connect to you database file with @AttachDatabase imperative.
For example if you have example.db file with Fruit table which has col0 column, then you can count fruits with program:
@Engine("sqlite"); @AttachDatabase("example", "example.db");
CountFruit(fruit) += 1 :- Fruit(fruit);
Then run CountFruit predicate.
avodonosov 7 days ago |
> Composite(a * b) distinct :- ...
Wait, does Logica factorize the number passed to this predicate when unifying the number with a * b?
So when we call Composite (100) it automatically tries all a's and b's who give 100 when m7ltiplied
I'd be curious to see the SQL it transpiles to.
puzzledobserver 7 days ago |
As someone who is intimately familiar with Datalog, but have not read much about Logica:
The way I read these rules is not from left-to-right but from right-to-left. In this case, it would say: Pick two numbers a > 1 and b > 1, their product a*b is a composite number. The solver starts with the facts that are immediately evident, and repeatedly apply these rules until no more conclusions are left to be drawn.
"But there are infinitely many composite numbers," you'll object. To which I will point out the limit of numbers <= 30 in the line above. So the fixpoint is achieved in bounded time.
Datalog is usually defined using what is called set semantics. In other words, tuples are either derivable or not. A cursory inspection of the page seems to indicate that Logica works over bags / multisets. The distinct keyword in the rule seems to have something to do with this, but I am not entirely sure.
This reading of Datalog rules is commonly called bottom-up evaluation. Assuming a finite universe, bottom-up and top-down evaluation are equivalent, although one approach might be computationally more expensive, as you point out.
In contrast to this, Prolog enforces a top-down evaluation approach, though the actual mechanics of evaluation are somewhat more complicated.
avodonosov 6 days ago |
Bottom-up? Ok, I see. From the tutorial it seems Logica reifies every predicate into an actual table (except for what they call "infinite predicates")
I found a way to look at the SQL it generates without installing anything:
Execute the first two cells in the online tutorial collab (the Install and Import). Then replace the 3rd cell content with the following and execute it:
%%logica Composite @Engine("sqlite"); # don't try to authorise and use BigQuery # Define numbers 1 to 30. Number(x + 1) :- x in Range(30); # Defining composite numbers. Composite(a * b) distinct :- Number(a), Number(b), a > 1, b > 1; # Defining primes as "not composite". Prime(n) distinct :- Number(n), n > 1, ~Composite(n);
Look at the SQL tab in the results.
avodonosov 6 days ago |
Where does your intimate knowledge of Datalog comes from? Do you use Datalog regularly (maybe with Datomic)? Or you just studied how it is implemenced out of curiosity?
anorak27 7 days ago |
There's also Malloy[0] from Google that compiles into SQL
> Malloy is an experimental language for describing data relationships and transformations.
[0]: https://github.com/malloydata/malloy
pstoll 7 days ago |
Came here to mention Malloy. Which is from the team that built Looker, which Google acquired. The Looker CTO founder then started (joined?) Malloy. And … he recently 6mo ago moved from Google to Meta.
https://www.linkedin.com/posts/medriscoll_big-news-in-the-da...
Also for those playing along at home - a few other related tools for “doing more with queries”.
- AtScale - a semantic layer not dissimilar to LookML but with a good engine to optimize pre building the aggregates and routing queries among sql engines for perf.
- SDF - a team that left Meta to make a commercial offering for a sql parser and related tools. Say to help make dbt better.
(No affiliation other than having used / been involved with / know some of these people at work)
transfire 7 days ago |
Nice idea, but the syntax seems hacky.
cynicalsecurity 7 days ago |
This is going to be a hell in production. Someone is going to write queries in this new language and then wonder why the produced MySQL queries in production take 45 minutes to execute.
evgskv 6 days ago |
There is a standard method of optimization - breaking predicate into smaller ones and saving intermediates into database.
Typically program runs efficiently, but when optimization is needed - you can do it by breaking up the predicate.
usgroup 7 days ago |
Its nice to see Logica has come on a bit. A year or two ago I tried to use this in production and it was very buggy.
The basic selling point is a compositional query language, so that over-time one may have a library of re-usable components. If anyone really has built such a library I'd love to know more about how it worked out in practice. It isn't obvious to me how those decorators are supposed to compose and abstract on first look.
Its also not immediately obvious to me how complicated your library of SQL has to be for this approach to make sense. Say I had a collection of 100 moderately complex and correlated SQL queries, and I was to refactor them into Logica, in what circumstances would it yield a substantial benefit versus (1) doing nothing, (2) creating views or stored procedures, (3) using DBT / M4 or some other preprocessor for generic abstraction.
thenaturalist 7 days ago |
Never heard of M4 before and, lo and behold, of course HN has a discussion of it: https://news.ycombinator.com/item?id=34159699
The author discusses Logica vs. plain SQL vs POSIX.
I’d always start with dbt/ Sqlmesh.
The library you’re talking about exists: dbt packages.
Check out hub.getdbt.com and you’ll find dozens of public packages for standardizing sources, data formatting or all kinds of data ops.
You can use almost any query engine/ DB out there.
Then go for dbt power user in VS Code or use Paradime and you have first class IDE support.
I have no affiliation with any of the products, but from a practitioner perspective the gap between these technologies (and their ecosystems) is so large that the ranking of value for programming is as clear as they come.
thom 7 days ago |
M4 is absolutely ancient, one of those things you've probably only seen flashing by on your screen if you've found yourself running `make; make install`. I suppose it is a perfectly cromulent tool for SQL templating but you're right that you must be able to get more mileage out of something targeted like dbt/SQLMesh.
pstoll 6 days ago |
Having debugged my share of autoconf setups..I assumed it had to be a new M4 and not the ancient quirky GNU M4 thing because no one in their right mind would wish M4 (and related GNU autoconfig / autconf) on any other sentient beings.
It’d be like saying - “hey, I’m starting a new project and trying to pick between ed, sed, or awk. Whatcha think”? Def not.
pstoll 6 days ago |
Also +1 for cromulent use of “cromulent”.
usgroup 7 days ago |
Has anyone used Datalog with Datomic in anger? If so, what are your thoughts about Logica, and how does the proposition differ in your experience?