I'll have to stuff that into the back of my mind until I run into something where it might fit.
I might see if I can get this set up. Apparently VICE supports "printing" to a file on the host OS, so I could run my test on the emulator.
Funny, I just wrote the exact same thing only to scroll down and see you had the same opinion:
The frustrating thing is, because it's in your shell history, you have to look it up again after you change jobs...
I've known about git bisect for maybe a decade, and used it maybe... twice so far? And I think at least one of those times it took me longer to understand the command line options than it took to just do it manually.
YMMV, maybe it's more useful in a larger collaborative codebase, but to me it sounds life-changing in theory but I haven't found it that in practice.
git bisect start master v1.1 # Bisect from v1.1 to current master
git bisect run cmake --build .build # Find the first commit where `cmake --build` returns non-zero.
No need to bother trying to understand the error message, I can go make coffee instead and then just look at the commit that broke it. Also, this is really useful for and in conjunction with Nixpkgs.It's not as useful for personal projects because chances are it won't tell you anything you don't already know.
> Build's failing? Easy enough:
> It's not as useful for personal projects
How often do you run into this situation, though? For your scenario to arise, you need:
(1) A repo you're unfamiliar with (which is already unusual for most people... most people spend most of their time working on repositories they've become familiar with)
(2) The unfamiliar repo to use standard build commands you already know, which is basically never the case in my experience (even cmake-based projects almost always need variables defined with custom names I don't know beforehand, and the configuration process is so onerous I find the GUI easier)
(3) An obscure enough breakage where the output is complete garbage (otherwise the compile error would just tell you the file and line, and git blame would tell you the commit directly)
(4) Knowledge that this unfamiliar repo actually would have actually built on your machine at some point (which is often not the case because cmake isn't hermetic and breakages are often due to having the wrong toolchain installed)
(5) For the problem to be a mere build error, as opposed to some running of the program that requires investing time to automate
(6) For you to not have any sort of prior CI or other form of validation on your dependencies, because you apparently didn't catch even catch a build error when it was introduced (never mind automated testing)
I could go on, but these set of circumstances are more or less indistinguishable from ~never to me.
So, for now, I'm stuck with the version one commit before that dependency :). (I haven't tried undoing the change later, I suspect it might be dependend upon later on..)
I don't understand a ton of things. I've bisected Linux, Chromium, Wine, Nixpkgs, and many more. These are projects whose codebases I was never going to understand, the codebases are too big for one person to ever fully grasp. Then there's smaller codebases where I could if I wanted to, but if I don't have to, why bother?
> (2) The unfamiliar repo to use standard build commands you already know, which is basically never the case in my experience (even cmake-based projects almost always need variables defined with custom names I don't know beforehand, and the configuration process is so onerous I find the GUI easier)
In real life I make great use of Nixpkgs understandings of how to build things. Nix development shells take care of the dependencies and the build commands. However, I do have confidence that I can figure out the actual desired build commands of most random repositories in under five minutes. This applies to private repos at work and random open source projects. Even for autotools projects. Usually, not that much hoopla is needed, most CMake projects don't need custom cache variables to make a basic functional build.
> (3) An obscure enough breakage where the output is complete garbage (otherwise the compile error would just tell you the file and line, and git blame would tell you the commit directly)
Nah, not necessarily, really just any breakage where you're not sure what happened. Compiler errors are not a great use case for git bisect, but test failures are a really good use case. Likewise, git bisect is great for crashes. Even if you can't automate the full process easily, for e.g. a GUI program, it's still a lot less work to periodically repeat some steps to crash or cleanly exit a GUI program to help the bisect along. Still, some compiler errors are great for Git bisect. Basically any weird linker error is a good candidate.
> (4) Knowledge that this unfamiliar repo actually would have actually built on your machine at some point (which is often not the case because cmake isn't hermetic and breakages are often due to having the wrong toolchain installed)
Or just test it.
> cmake isn't hermetic
Nix is, so no problem!
> (5) For the problem to be a mere build error, as opposed to some running of the program that requires investing time to automate
Not true. It's harder to automate GUI failures, but that does not mean bisecting isn't incredibly valuable for GUI apps. Remember, in a bisect where you never have to skip a commit (e.g. intermediate builds rarely fail for unrelated reasons, a very common case surprisingly) bisect only ever takes log2[n] steps, so even if you are in a repo that has a very huge commit volume, the bisect will be at most a handful of steps. Repeating the same GUI action like 10 times is hardly a big chore, especially for a machine to magically crank out the exact origin of where the thing broke. For CLI apps, it's even easier since you can use bisect run still.
> (6) For you to not have any sort of prior CI or other form of validation on your dependencies, because you apparently didn't catch even catch a build error when it was introduced (never mind automated testing)
Not true. Everyone uses CI, but CI will never catch 100% of regressions. There's a combinatorial explosion of potential build environments and there's no way a CI environment can test every possibility. Can't test with every library version, every release of every compiler, every combination of using vendored library vs system library or so on.
> I could go on, but these set of circumstances are more or less indistinguishable from ~never to me.
For me it happens more often touching open source things than at work, but really the probability it will be useful increases as the size and inscrutability of a project increases. For trying to debug Chromium or Electron issues, bisect reigns supreme. (My current workplace doesn't ship software built on top of Chromium, but replace Chromium with any other big and inscrutable dependency and you get the idea.)
I suspect it may be the case that I bother to root cause and debug things that other people would simply completely write off as impossible or not worth the debugging, thanks to the power of Git bisect.
>> Build's failing? git bisect run cmake --build .build
> Compiler errors are not a great use case for git bisect, but test failures are a really good use case
These aren't build failures.
> For trying to debug Chromium or Electron issues, bisect reigns supreme.
These are neither a simple CMake! Even their build process changes over time.
(Side note, I'm surprised that for a massive third party dependency that you don't understand, the specific commit is what you really want. I would've thought you'd just go to the previous version, which is presumably more stable and supported too.)
>> No need to bother trying to understand the error message, I can go make coffee instead and then just look at the commit that broke it.
> Not true. It's harder to automate GUI failures, but that does not mean bisecting isn't incredibly valuable for GUI apps.
I'm not sure how we got into discussing GUIs, but if you're debugging a dependency that's a GUI, then having to bisect it (as useful as that is) is very much the opposite of "no need to understand it, just go grab a coffee"!
All of which is to say: I'm not claiming git bisect is useless by any means. It's nice that it's there and you can use it. I'm just saying that it's often (often != always) a small difference in cost vs. bisecting manually.
While they are not compilation failures, I do sort-of consider unit test failures to be build failures if the unit tests are typically ran as part of the build. But either way, I do still use bisect to find the origin of weird compilation errors, FWIW. As an example, I used it on Envoy proxy to figure out a really strange linker error on Darwin. It's just not great for e.g. a syntax error, because compiler diagnostics will give a more obvious answer most of the time. I do agree with that.
> These are neither a simple CMake! Even their build process changes over time.
Chromium and Electron are both fairly easy builds. Not because they are simple to build, but rather because the build process is extremely automated and documented. (The real hard part is balancing concurrency with OOM: Electron defaults to -j200 and on my 7950X3D it will happily eat all of the 128 GiB of RAM. Keeping modern CPU cores fed is hard work...)
> (Side note, I'm surprised that for a massive third party dependency that you don't understand, the specific commit is what you really want. I would've thought you'd just go to the previous version, which is presumably more stable and supported too.)
Well, I want the bug to be fixed. Even if I am not going to submit the actual patch that fixes it, providing bisect details in my bug report will vastly increase the odds that it gets fixed. However, with the magic of bisect I often can make working patches for complex bugs and then patch them. If it's software I use on my system, I can then add a temporary patch in my Nix configuration and continue to run the latest version. I don't always do this, but it is very useful. I don't like to rely on hoping someone else will figure out my bugs if I can help it.
I am also a package maintainer in some cases. If I run into a bug on something I maintain a package for, I will often send a PR fixing it and then pull that patch into my package temporarily.
> I'm not sure how we got into discussing GUIs, but if you're debugging a dependency that's a GUI, then having to bisect it (as useful as that is) is very much the opposite of "no need to understand it, just go grab a coffee"!
I'm not sure what you mean, I'm just saying that git bisect works even for complicated runtime issues. There's no need to debug the GUI: all you have to do is tell git bisect if the bug is still there in whatever commit it dumps you in. You do it a few times and you have the commit that broke things. It's magic every time.
As an example of a GUI failure I've "debugged" with this, I found a regression in Wine using this technique.
> All of which is to say: I'm not claiming git bisect is useless by any means. It's nice that it's there and you can use it. I'm just saying that it's often (often != always) a small difference in cost vs. bisecting manually.
Bisecting manually? I don't understand. Do you mean still using git bisect without using the run command? I do that when debugging more complex failures so I can skip commits broken for unrelated reasons. Or, do you mean literally manually picking commits to test and bisecting like that? If so, I'm confused. Starting a git bisect is a single command where you pass it the bad and good commit. The rest of the commands don't even have to be memorized because it tells you what to run (not that it's hard since its just git bisect good, git bisect bad, git bisect skip...) I cannot fathom it being less work to do this manually, especially since git bisect handles complex situations where there are diverging merges to explore.
Even at Google, where the CL numbers were pretty much linear, we still had a bunch of tooling around bisecting google3 to find regressions. I'm a little surprised it's any question that it's worth it, you can bisect 100,000 commits in ~17 steps without sitting there trying to figure out which commits you need to pick. That's incredible value and it has helped me root cause bugs in all kinds of stuff, like pure magic.
Anyway, I'm not trying to be condescending, but as you may have gathered I actually bisect fairly frequently and I also am very sure it is saving me time every time I do it. Especially since I reckon more than half the time I don't really have to do anything, so even when the build can take a long time it's still not a problem.
Please also note I have no doubt that your workflow gets immense value from git-bisect! Almost every tool is incredibly valuable to certain subset of users and workflows. What I've been debating is just how useful I see it being to the most typical users. I know I haven't actually come across anyone using git-bisect for several years now, and I myself haven't either. Bisection as a technique itself is incredibly useful for everybody -- it's just not something I see done via git-bisect with any frequency, is all.
> Bisecting manually? do you mean literally manually picking commits to test and bisecting like that? If so, I'm confused
That's what I mean, yeah.
> The rest of the commands don't even have to be memorized because it tells you what to run (not that it's hard since its just git bisect good, git bisect bad, git bisect skip...) I cannot fathom it being less work to do this manually
While I didn't actually intend to say it's faster to avoid git-bisect -- just that it's a similar amount of effort -- now that you mention it, I actually do think it is often faster for me. There are multiple reasons why:
- Despite it taking a few more iterations in the worst case in theory, manually picking my own commits lets me split based on e.g. version numbers, rather than random commits. I find it much, much more useful to know that a problem came up between v4.3.1 and v4.4, rather than between commit 1133728 and commit 1133262, say. And I know -- and this is key -- that those commits will be well-tested, stable, and released to the public. So I don't have to worry about encountering random bugs in the middle, even unrelated ones. By contrast, "git bisect skip" might seem easy to type in 3 seconds, but if you encounter a single failed build, then you lose all that time you "saved", and then an enormous amount on top of that, merely by virtue of having to build & testing those failed iterations. I really, really wouldn't want to do a full build of Chromium -- even an "incremental" one -- just to encounter a bug in it and have to skip to another commit!
- I actually almost always have some idea of where the bug is, merely based on the tags/versions, or commit messages, or just the timestamps. Sometimes, literally searching for a keyword pops up a relevant commit for me to test, and then doing an incremental build with the commit immediately after is much faster (almost free) compared to doing what ends up being a ~full rebuild for a distant commit. So it actually does end up being far fewer iterations and less time in some cases.
- Git bisect is stateful. It means your repo is no longer in a normal state, it's in the middle of a bisect! That's annoying to say the least. I've gotten burned way too many times by running the wrong command or forgetting that I'm in the middle of an interactive merge or rebase. I don't want to have to think about an additional state I could be in when I leave my work and come back to it. It's a huge cognitive burden, and I would really rather avoid unless it's really clearly going to pay for itself.
Again, I'm not saying these trade-offs apply the same way to you. You encounter scenarios that I clearly don't :-) I'm just explaining where I'm coming from, and (what I believe are) the reasons I don't really see other people using git-bisect frequently.
> especially since git bisect handles complex situations where there are diverging merges to explore.
Diverging merges is a super interesting case, I've never needed to deal with that with bisection before (whether manual or automatic). If your history is nonlinear then I definitely see a much larger value in git-bisect. Though at the same time the result isn't even necessarily a single commit, so I'm not even sure what exactly I would expect it to do in that case!
Maybe more importantly, when the thing I am bisecting is Nixpkgs itself, there are no version numbers to go off of. Same with large monorepos like google3: There simply are no version numbers.
When there are no version numbers, there is nothing better to do than what git bisect does.
When there are version numbers, if I want to know which versions the problem was introduced in, that's really not a problem either: Git has `git tag --contains`, it's relatively easy to determine the first version to contain the problem commit as an ancestor. Though, I'm not really sure why that information is particularly useful.
> Git bisect is stateful. It means your repo is no longer in a normal state, it's in the middle of a bisect! That's annoying to say the least. I've gotten burned way too many times by running the wrong command or forgetting that I'm in the middle of an interactive merge or rebase. I don't want to have to think about an additional state I could be in when I leave my work and come back to it. It's a huge cognitive burden, and I would really rather avoid unless it's really clearly going to pay for itself.
If you find this to be a particular problem for you, I would recommend adopting Git worktrees. That way, you can create a worktree for your bisect, making it a lot harder to accidentally mix peanut butter and toothpaste.
All of Git is stateful and obviously there's nothing special about bisect, so it's really just the same burden. That said, `git status` will tell you if you're in middle of a bisect. Generally it's a good idea to check your status before doing some work just to double check that you're not working on the wrong branch: I personally also have git status in my shell prompt, which will also tell me if I am in middle of a bisect. And if you do happen to fuck up and do some work during bisect it is not a huge deal: You can do a `git bisect reset HEAD` to exit the bisect without messing with your working tree, and if you want to jump back into it, that's not too big of a deal either: you can take a look at the bisect status with `git bisect visualize` and then all you need to do to be able to jump back in later is set the good/bad terms to match the bounds you had last time.
I'm not defending Git, it's a pretty poorly designed program. It's confusing that `git checkout` does 3 different things. The working directory vs staging area vs HEAD vs current branch is annoying. The workflows are often unnecessarily confusing and there are obvious features that I wish Git had natively that it doesn't, like an interactive TUI for rebasing (The chistedit extension in Mercurial is awesome, for example). But, it seems like you are running into issues where it comes down to having trouble dealing with Git rather than anything specific to `git bisect`. If you are afraid to use features because you will have to spend a lot of time digging yourself out of messes, it might be worth some time to look at ways to improve the workflow (or learn new tools for getting out of messes. An all-time favorite tool of mine here is definitely the reflog, though I find myself needing it very seldom these days.)
Git bisect itself is very nearly one of the easiest commands in Git that I can think of: there's very few commands to remember and Git does most of the work.
> Diverging merges is a super interesting case, I've never needed to deal with that with bisection before (whether manual or automatic). If your history is nonlinear then I definitely see a much larger value in git-bisect. Though at the same time the result isn't even necessarily a single commit, so I'm not even sure what exactly I would expect it to do in that case!
git bisect does a good job dealing with this case: it tries to figure out what parent of the merge to explore by trying each of them, then it continues the bisect running down that lineage. Of course, git bisect can't tell you if there are potentially multiple commits that independently introduce the problem, but it can very reliably tell you one of the commits that introduced the problem, which is all I care about, since what I want to do is discover more about the bug and how to fix it.
The versions are important because they're publicly-released, stable builds (generally), rather than random and potentially poorly-tested unstable commits. They're much friendlier too, if you're trying to tell a user to revert to a previous version. If you're trying to create a patch, that's obviously useless. But if you're trying to figure out which version of the software you can use, that's obviously useful.
> I think we're trying to solve different problems: I don't care what version the problem was introduced in, I want to find what broke it. It's not about the commit itself, it's about the diff.
Yup, exactly.
> If you find this to be a particular problem for you, I would recommend adopting Git worktrees.
Oh I'm familiar with them. I don't find them particularly attractive for the typical "manual" bisects I do. Using a different path means I have to set up a totally different build just to do the bisect -- that's at least one extra iteration I have to go through (+ space I have to waste). Again, not saying it's never useful, but I really am not keen to extra iterations if I feel like I can avoid it.
> All of Git is stateful and obviously there's nothing special about bisect, so it's really just the same burden.
Actually funny enough, not quite: there are two fundamental differences for git-bisect:
1. For most common operations, I usually use a GUI for git. This includes rebase, cherry-picks, and merges. The dialogs are modal and stack on top of each other by nature, so it's pretty darn to forget you were in the middle of one of these operations and accidentally start another one -- they stay on the screen till your done! It's the command-line where you're back in a normal shell and can easily forget you were in a weird state, thus forcing you to check your status periodically.
2. Git bisect is by far the longest-running stateful operation. Rebasing and merging just involve modifying source files at every iteration. Bisect requires building + running tests too. That takes orders of magnitude longer. By the time you come back to it for the next iteration, a long time will have passed, and it's much easier to lose context then.
> I'm not defending Git, it's a pretty poorly designed program. It's confusing that `git checkout` does 3 different things.
I've heard this a lot but funny enough git checkout is the least of my problems with it. The syntax is second-nature to me now. Having to remember what state my repo was in, or constantly check it? That's another beast.
> But, it seems like you are running into issues where it comes down to having trouble dealing with Git rather than anything specific to `git bisect`.
Nope, it's really git bisect (see above). The only thing it's doing is saving me the hassle of opening the commit log and picking my own commit. That's very little gain -- on the order of seconds/minutes -- at an often larger cost (more iterations etc. as explained previously).
P.S., when you're near the end of a bisect (i.e. all the candidate commits are a few away from each other), it's very helpful to be able to see the messages & files each commit touched -- which I can do easily when I'm looking at a log. That way you can make educated guesses as to the root cause and save a couple iterations. Using git-bisect is just blindfolding yourself to useful information. Again: guaranteed small win, at the cost of a not-so-infrequent large loss.
Also, nothing in particular stops me from viewing the log. In fact, the `git bisect visualize` command is for the exact thing you are describing. You can easily narrow your bisect window at any time if you have a hunch. I have done this and I'm pretty sure the manual even describes doing this.
I also haven't even mentioned some of the more useful features about git bisect. For example, often times, even if you don't know exactly what the problem is, you might be able to narrow down with good certainty what set of files would've had to change to cause it. `git bisect start` accepts pathspecs after everything else. If you tell it that, it will only select commits where files in that pathspec have changed. You could of course manually find commits of interest and then divide them in half and so forth, but that really doesn't feel like it scales very well.
And if all you care about is what release version something broke in, there's really no particular reason why an automated tool couldn't do that, too. I suspect the reason why `git bisect` doesn't have a "tags only" mode is because it's not what bisect is really for, though it would probably be a fairly trivial addition.
I also personally never loved Git GUIs. Being able to visualize the state of the working directory/repository feels very nice, but in practice it doesn't really solve a problem for me. In practice, I always come away feeling like Git GUI tools create as many problems as they solve. The only exception is that I vastly prefer JetBrain's 3-way-merge tool over almost any other merge tool. (Runner's up would probably be kdiff3, I guess. Not a huge fan of vimdiff.)
> git bisect visualize
This is the first time I'm hearing about this, it sounds useful! I've never seen anyone use it before, so it must've slipped past me if I've ever even encountered it in the manual. Thanks for sharing it!
This cycle time combined with occasional unexpected interactions between components meant that in every release cycle, there were dozens of complicated failing tests where it was not obvious which code change was responsible.
`bisect` here was extremely helpful: instead of having to pore over commit history and think very hard, I could bisect with a small wrapper script that would build and run the failing test in question. Builds still took hours, but I could usually autonatically pinpoint the responsible code change for one of the tests overnight.
(This was not using Git, but Perforce, for which I had to write `p4-bisect`. Alas, it's not open-source...)
Sorry if this is a dumb question, perhaps I'm not understanding what you mean by every release cycle, but does this mean nobody in the team/company ran tests until it was time for release? That sounds like a really big everlasting wound to put a tiny git-bisect bandage on!
The OP wrote “many platforms supported, a build could take hours, and the test suite took so long to run that it would take several days or weeks of real time to get through it all”
⇒ I think they did run tests, but typically not all of them on all platforms.
This was native desktop+server software, not a hosted SaaS thing.
Major releases were put out every few months.
Developers did run tests regularly with every change they would make, but it was infeasible to run all the tests in every configuration for each change. So they would try to choose tests to run that seemed most relevant to the code they had changed.
The entire test suite would run more or less constantly on shared machines, and every few days some new tricky failure would be detected. The tricky failures were almost always the result of some unanticipated interaction of features, frequently on the more obscure configurations (like IBM z/OS).
The problem was not that developers were not testing, but that the tests were infeasible to run every time. So instead, testing became an optimization problem.
That's perfectly fine, and I'd say lucky you.
Bisect is a type of tool that you hope you won't need, but when you need it it's a life saver. You're right about the larger collaborative codebase. Imagine trying to find a strange bug that got introduced sometime in the last 12 months in a huge and active repo.
https://github.com/csmith-project/creduce/blob/31e855e290970...
You can see the pass implementation in files on the left.
The user provides a script which determines if a program is “interesting.” A program with build-time errors should be considered “not interesting” in most cases. (And if you’re hunting an incorrectly reported error, you’ll want to make sure you only consider that error to be interesting.)
Then it yoinks out parts of your program and checks if the result is still interesting. If it’s not, that attempt is discarded and it tries something else. If the reduced program is still interesting, it will try yoinking out more stuff. Repeat until you like the result.
There doesn’t need to be any understanding of the program in order for this to work. You just need something where removing some bit of the program has a chance of producing a new program that’s still interesting. That works in most programming languages.
This process can take a long time, and it’s slower when there’s a lower probability of a removal producing an interesting program. Thus heuristics are added: don’t remove random character ranges, but work at a token level. When removing a brace, find the matching brace and remove the insides. That sort of thing. Most modern languages are similar enough to C that many of these rules are helpful for them too. And even if they don’t help, this just slows the process down, but it still works.
rm -rf /foo/bar
and in theory it could reduce it to rm -rf /
The passes are very heuristic, so that doesn't seem like something you can rule out.(and I actually want to run it on shell scripts! )
TLDR; C-Reduce just gives you text to run/compile, if you're worried about things like that sandbox as much as possible.
Given a sufficiently bad compiler bug anything is possible, but I think given that it's trying to minimize the size of an input that gives a particular output I don't think it's likely to explore distant branches.
I can't help but wonder about the consteval/constexpr machinery in the various C++ compilers... It'd be interesting to see how much adversarial input has been tested for those.
(I guess Zig's comptime might also be relevant, but I'm not sure what that's allowed to do. Maybe execute arbitrary code?)
... anyway just a stray thought.
Only if you'te going for compiler bug. If you're working on minimal reproducible example. You need to make sure your code isn't reduced to:
int main() { return 0; }
in that case.
"Enough time" does a lot of work here and warrants further analysis. With enough time it might produce the works of shakespare (if you ignore its designed to minimize), but we should probably view anything that would take more than a year as too much times.
I’m pretty sure (based on the last time I saw this) that this is just good old fashioned computer science.[1]
I really hope that HN hasn’t forgotten about good old fashioned computer science stuff already.
[1] All the algorithm and whatnot stuff, not the spooky machine learning stuff. Even things like Prolog-for-AI, although that has the slight downside of not working (for the purposes of making AI).
I do get that it will reject a lot of stuff as not working (and has to even in the target language) and I also get that algol-like languages are all very similar, but I am still surprised that it works well enough to function on ~arbitrary languages.
These are very LLM-era properties for a program to have. The question is not "does it work for language x" but "how well does it work for language x", and the answer is not "is it one of the languages it was designed for" but instead "idunno try it out and see".
I am sure in 2006 or so I was doing "extract method" in Resharper. And Lispers probably doing it for centuries ;)
And that is enough when:
1. Most code in a code file is irrelevant to whatever specifc bug is being looked at.
2. It is removal of code at play not writing code.
Some of those passes are:
* Tokenize the input according to C, and randomly chuck away chunks of length 1-(I think) 13. Most languages vaguely similar to C have very similar tokenization rules to C, so these kinds of passes are going to have similar effects, and this will tend to be effective in doing things like omitting needless qualifiers or attributes.
* Balanced parentheses, and chuck away units of balanced parentheses. This includes (), {}, and []--and for almost any language, this can be a useful level of stuff to throw away or reduce.
* Stripping comments and whitespace. Again, many languages use the /* and // styles of comments that C does, so this is pretty effective.
There's actually relatively few passes that are incredibly C/C++-specific, and in my experience, one of the biggest issues with creduce is that it is absolutely terrible at trying to detemplatize the code as a reduction step, which should be a relatively easily automatible step.
If you imagine you've got some neighbourhood function N(x) that takes a program and returns a large sample of valid programs that are "similar" to x (its neighbourhood), then your basic test-case reduction algorithm is just:
1. Start with some initial test case. 2. Look at the neighbourhood of your current test case. If any elements of it are both interesting and smaller than your current test case, move to one of those. If not, stop. 3. Return the smallest interesting test case you've seen at the end.
And most of what varies from test-case reducer to test-case reducer is that neighbourhood function (how you explore the neighbourhood also varies, but you can think of that as an implementation detail that mostly only affects performance).
This approach will generally work pretty well because the sort of bugs you tend to find are "dense in program space". This will e.g. do nothing if you start from the property that the file has a particular sha1 hash, but if you start from a property like "this uses this particular pair of language features", most nearby programs will share it.
The nice thing about doing test-case reduction is that it doesn't actually matter if your neighbourhood contains only valid programs, because you can rely on the interestingness test to filter out any invalid program. So all you care about really is:
1. It contains a fairly large set of valid programs among the test cases you consider. 2. It's not too large so as to be prohibitive to explore.
And this frees you up to just try a bunch of heuristics that are likely to work pretty well, and it turns out that there are a lot of easily accessible ones. In particular, deleting contiguous chunks of a program pretty often produces a valid program, which will always be shorter.
For a trivial example, imagine you've got something like:
if True:
blah()
in a Python program. It might look like you need to know about Python syntax to reduce this to just `blah()`, but in fact you can reduce it by deleting the 13-byte string `"if True:\n "`. So if your reducer is happy to brute force try all short strings, it will be able to make this transformation.This isn't a real example exactly. C-reduce I think won't find this particular one (but I might be misremembering its exact heuristics here). Shrinkray will, but it will do so by understanding Python - it stops at 10-byte sequences, which won't work here. But it demonstrates the sort of thing that can work surprisingly well without understanding syntax.
Another thing you can do is you can understand just enough syntax in a general purpose way. For example, shrinkray (and I think C-Reduce) will look for integer literals in the program and try replacing them with integer literals representing a smaller number. In order to do this, it doesn't have to know anything about the program's lexical syntax, it just has to know what a number looks like.
Heuristics like this will often fail to work, but that's OK, you just need them to succeed enough, and often some pretty crude stuff will get you down to something that is small enough that a human can get it the rest of the way.
But still having trouble understanding how it knows WHAT to remove when trying each iteration. There must be some tokenization going on, but then I don't know how that would work across programming languages
I used a test script that spent hours using CSmith to generate random test programs when developing an esoteric llvm target. When they crashed it automatically Creduced and left them for inspection. Invaluable!
It seems like a small number of the passes are not specific to C:
https://github.com/csmith-project/creduce/blob/master/creduc...
`"C" => 1,` means it is only for C.
I would guess `pass_lines` is the most important for non-C code; I'm guessing (it's written in unreadable Perl) that it removes lines.
So while it can work for languages other than C, most of its features are C-specific so it's not going to work nearly as well. Still I'd never heard of C-Reduce; pretty cool tool!
Someone make one based on Tree Sitter, quick!
My real worry is what if this ends up running dangerous code. Like what if you have a disabled line that writes instead of reading, that randomly gets reactivated?
CPython includes a flag to only run parsing/compiling to bytecode. While you can use it like they did here and run the code - it really depends on how much you trust every possible subset of your code
Yep, here's an explanation from a related tool, which was spawned by the parser-parser-combinator[0] approach to syntactical analysis and transformation: https://comby.dev/blog/2021/03/26/comby-reducer - which is based on what you've said.
Most of the non-domain specific reductions in C-Reduce is simply brute force IIRC.
I accidentally got obsessed with test-case reduction as a result of writing Hypothesis, and wrote shrinkray because I thought it was ridiculous that I hadn't put all of the things I'd learned about test-case reduction into some general purpose tool that other people could benefit from.
Shrinkray is still a bit under-advertised and rough around the edges in places, but I think that basically there's no good reason to use C-reduce over shrinkray for things that aren't C or C++. Shrinkray has very good generic heuristics that should work in most languages (it even works pretty well on binary files apparently, although I've not tested this myself), a couple of language specific passes for other languages, and is much more fun to watch.
It might even be the case that you should use shrinkray for C and C++ too (because I use C-Reduce's transformations for those languages), but I'm less confident that that's true. The best I'll confidently claim is "probably more competitive than you'd expect".
Okay fine, let's see for ourselves:
# Setup
git clone [email protected]:RustPython/RustPython.git && cd RustPython && cargo build --release
git clone [email protected]:tekknolagi/scrapscript.git && cp scrapscript/scrapscript.py .
# Download interesting.sh, replace the path to RustPython
chmod +x interesting.sh
# Command in article:
nix run nixpkgs#creduce -- --not-c interesting.sh scrapscript.py
Niiice: (93.2 %, 13717 bytes)
(93.2 %, 13683 bytes)
(93.3 %, 13614 bytes)
(93.3 %, 13592 bytes)
(93.3 %, 13571 bytes)
(93.3 %, 13517 bytes)
(93.3 %, 13449 bytes)
(93.4 %, 13412 bytes)
(93.4 %, 13365 bytes)
(93.4 %, 13333 bytes)
(93.4 %, 13313 bytes)
It seems to have stopped at "(96.4 %, 7347 bytes)" with the following output: https://gist.github.com/judofyr/47cba8a20cb2cd5798943ef975d0...My understanding is that this would just cause the dangerous things to be repeatable.
It took some time to get RustPython to run, but it seems to work fine: https://gist.github.com/judofyr/82d2255c92ebb0c6b3a6882ac9fd...
Also, after running it for 5 minutes it was now able of reducing it to a much smaller case:
class a:
def b() : c
def h(d) :
while d:
break
else:
return
d.b()
def j(e) :
f = a()
if f.h () :
g
i = ""
j(i)
There’s a Venn diagram of people who don’t care enough to do that (works for me!) and people who would never think to use c-reduce. It’s not a perfect circle, but it’ll be fairly close.
This is true, but I was enjoying the irony that there is an old sys-sdmin adage that you should only use absolute paths in your program(usually a shell script, in this environment) this is to make sure it is running exactly what you expect it to run.
So always put "/usr/bin/awk" instead of just "awk"
I had a co-worker once who took this as gospel. His scripts were always... interesting... to port to a new environment.
I think it might do a little but not a lot better if I left it to run for longer but it looked to have mostly stalled then so I got bored and killed it.
(4 nested BaseExceptionGroup: Exceptions from Trio nursery (1 sub-exception), and then:
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/nix/store/5cdw4sa0n0lwym47rd99a1q6b4dj7nr9-python3.12-shrinkray-0.0.0/lib/python3.12/site-packages/shrinkray/work.py", line 221, in parallel_map
| yield receive_out_values
| File "/nix/store/5cdw4sa0n0lwym47rd99a1q6b4dj7nr9-python3.12-shrinkray-0.0.0/lib/python3.12/site-packages/shrinkray/work.py", line 71, in map
| yield v
| GeneratorExit
+------------------------------------
Not sure if this is a known issue? Note that I ended up not respecting all of the version requirements in the pyproject.toml since I tried to run it all in Nix and there the Poetry support is currently quite lacking.My implementation of delta debugging, "delta" is 19+ years old: https://github.com/dsw/delta
I released it as Open Source because Microsoft Research sent someone to my office to ask me to, back when Microsoft was calling Open Source a "cancer".
The LLVM introduction by Latner refers to the "standard delta debugging tool", so it is rather well-known: https://aosabook.org/en/v1/llvm.html 'unlike the standard "delta" command line tool.'
> [...] [C-Reduce] produces outputs that are, on average, more than 25 times smaller than those produced by our other reducers or by the existing reducer that is most commonly used by compiler developers. We conclude that effective program reduction requires more than straightforward delta debugging.
(Of course, this means that C-Reduce is 12 years old now.)
At the same time, C-Reduce seems to be more general than the LLVM tool you linked ("BugPoint", dating to 2002), since that one works with LLVM IR specifically.
I think most developers are generally unfamiliar with automatic test case minimization tools and techniques, so the post may be helpful even if the ideas have been known in their respective circles for quite some time.
I've used c-reduce/cvise to reduce assembly programs to the minimal set that evokes an assembler defect quite a few times. It's a real handy program.
You start with your large repo case
1. Use language specific mutations that both add/delete
2. Crossover
3. Test fitness to see if they retain the compiler bug and sort by code size
4. Keep top N
5. Repeat
This allows the code temporarily get larger to get out of local minimums.
$ echo re{pro,ducer}
repro reducer