OCRing Music from YouTube with Common Lisp
62 points by superdisk 5 days ago | 18 comments
  • kanwisher 4 days ago |
    honestly this would be better with an AI model
    • teruakohatu 4 days ago |
      > honestly this would be better with an AI model

      In the article the author tried Tesseract which uses ML and has some neural network models, and also tried ChatGPT.

      I have come to the same conclusion as the author when doing OCR that needed 100% accuracy.

      When you know the font, spacing and the layout is fixed, old school statistical analysis of the pixels works a treat.

      • register 4 days ago |
        Completely second that. This is my experience as well.
      • Vampiero 4 days ago |
        You can generalize that to anything: when you know the problem domain so well, why the hell are you using ChatGPT to solve any problem within it? Use the most specialized tool for the job or you're just wasting CPU and memory (and electricity, and money, and time). Same goes for a neural net trained on every possible character set. If you know the font and character size in advance it's way overkill.

        It's a bit more effort to set up since you actually have to set it up. But at least it's done right.

    • secondplacetho 4 days ago |
      ML is the second best answer to everything, and very rarely the first best answer.

      Of course it'd be better than something that is intentionally limiting itself. But that says nothing.

    • curt15 4 days ago |
      By "AI model" do you mean neural nets? "AI" or "ML" are just buzzwords that conveys no real meaning about the underlying mathematics. The underlying models could be something as basic as linear or logistic regression, which depending on the application could actually be more appropriate that full-blown neural nets.
  • rcarmo 4 days ago |
    Holy cow.
  • varjag 4 days ago |
    • superdisk 4 days ago |
      I just restarted the webserver. It's running on OpenBSD HTTPd + MediaWiki + SQLite, and keeping it up has been a perpetual thorn in my side. Oh well. I need to figure out some alternative setup probably.
      • j45 4 days ago |
        Modify your DNS to put cloudflare or bunny in front of it and you'll be good. Don't stop self-hosting :)
        • zoezoezoezoe 4 days ago |
          self-hosting means freedom, never stop self-hosting
      • MonkeyClub 4 days ago |
        Is your VPS on OpenBSD.Amsterdam by any chance? (The 46.23.. address seems familiar.)
        • superdisk 4 days ago |
          Yep, that's it. The host is (for the most part) fine, but there's either some problem with httpd or the PHP worker pool where it just dies after some number of requests.
          • MonkeyClub 4 days ago |
            Hi, neighbor! (I'm on server 7.)

            The service is indeed great, Mischa does an excellent job.

            Yeah PHP on httpd can be flaky, I'd wish for a lighter solution for wikis.

  • notpublic 4 days ago |
    Instead of doing a diff, curious if Normalized compression distance (NCD)[1] will yield a better result. It is very simple algorithm:

    to compare two images, i1 and i2

      l1  = length(gzip(i1))
      l2  = length(gzip(i2))
      l12 = length(gzip(concatenate(i1, i2))
    
      ncd = (l12 - min(l1, l2))/max(l1, l2)
    
    Here is a nice article where I found out about this long ago.

    https://yieldthought.com/post/95722882055/machine-learning-t...

    From the article:

    "Basically it states that the degree of similarity between two objects can be approximated by the degree to which you can better compress them by concatenating them into one object rather than compressing them individually."

    [1] https://en.wikipedia.org/wiki/Normalized_compression_distanc...

    • johnisgood 4 days ago |
      Oh interesting, I remember comparing images before, I think I was doing a diff as well, so I suppose this would have worked? Nice to know! They were very small images though.

      It probably would have added the overhead from compression which in my case would have been detrimental.

      • notpublic 4 days ago |
        Do try it. We use it for text search in one of our apps and works remarkably well. Basically to find which chunks contain the given text. Since the text can span multiple chunks, a simple string search will not work.
  • xenonite 4 days ago |