Would love to hear feedback and how useful this is relative to the existing search.
Compared to Algolia.hn, this gives 0 filter controls (time window, stories vs. comments, `author:metadat', sort order, and so on), and no ability to search for exact matches. It failed to turn up anything interesting or even relevant for the 4 or 5 queries I ran.
You've still made it further than I in the HN search engine adventures, which is commendable.
It would be remarkable and interesting to have a super deep search capability that indexes all first-order links on this site.
Show HN: DeepHN - https://news.ycombinator.com/item?id=26791582
Of note, I was able to find that item (which I recalled existed but not by what name) with hn.algolia, but I was not able to find it with the OP search engine, or with the DeepHN search itself. So in my book, Algolia is winning.
But projects like this are super fun and educational to build so props to OP.
Otherwise I snapshot the page to evernote when I think it might be interesting later. Hopefully they don't completely end the free tier.
Of course, local/offline only would be great.
Otherwise, no thanks.
Is that a valid link? I get an error when opening it.
I found a bug. Under the "When will GPT-5 be released?" search results, there are double duplicate results. On one of the duplicates, the "username (date)" says "undefined (undefined)"
Algolia has already done the search thing, can the Vectara search be 10x better?
What I do find missing from HN is the ability for me to see things that may be of interest to me, but that I may have missed. I like how I get everything in the main feed which is pure popularity, but I don't have the time to go through all posts, and definitely likely miss things I would probably have been interested in.
Though this can be done with collaborative filtering, or other non-AI methods, might this be a decent use case for your AI?
[1] my hunch is that some human expert curation is involved.
Human curation also exists, but I think that is aimed at removing spam and uplifting YC company posts.
(I've been thinking about this not just in terms of HN, but treating all my RSS feeds as one undifferentiated stream and just having a chatbot sort incoming items into whatever bucket it deems most appropriate).
What's stopping me is that it might work, and I doubt making the internet even stickier is good for me long term.
But my gut feeling is that there's not enough interest in RSS right now to drive widespread adoption of a new version of the spec. My approach would be to focus on improved UX over existing feeds, rather than speculatively expanding the spec to make feeds richer.
The main advantage of my approach, I think, is that it adapts to the individual end user's needs. If all my subscribed feeds are tech-focused and I use a generic published taxonomy, I'm going to end up with 60% of my items in "Technology" and 30% in "Computing". If I use a chatbot to dynamically bucket stuff, I'll get "Micro PCs", "Graph theory", "Golang", etc etc.
I posted an RSS reader that can do this recently [2] and I'm actively hacking on another [3]. But there's many RSS tools that can do this.
> Arm says it wants all Snapdragon X Elite laptops destroyed
Not so useful.
So it’s not like it’s irrelevant, even though it is certainly not actually the most relevant one either.
It seems to give better results if you are more specific. For example, try the following search:
how to use iptables effectively
And have a look at the first five or so results.
Also, note that OP said it’s searching about six months worth of data. So if anything specific about iptables that you were looking for is older than that then their search tool doesn’t know about it.
It’s better to use the API.
One of the most frequent searches I do is to look for a specific comment that I know a user made recently. For example, I might want to look for my own comment here: https://news.ycombinator.com/item?id=40801389 (sorry, this is a slightly political one but I just picked it randomly for test purposes).
Searching Vectara for "n4r9 NHS" produces no results: https://hackernews.demo.vectara.com/?query=n4r9+NHS&filter=
HN's own search however produces the goods in the top result: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
[ EDIT except for this very post :p ]
Maybe 6 days ago is outside the dataset that this is based on?
Some other thoughts/suggestions:
- Ability to click through to the comment itself? At the moment it looks like the link goes just to the main comments page and then I have to find the relevant comment on the page.
- Filter comments vs posts?
- Order by datetime?
- Filter within a date range?
My personal opinion is that I'll keep using the HN search for the foreseeable time.
Want more posts about Lisp, Smalltalk and reverse engineering, for example, rather than the usual front page drivel? Search for them.
On one hand I wish Algolia didn't give very old posts a lot of weight (it often prefers to show posts > 8+ years ago), on the other hand old content tends to be before the Eternal September of tech-adjacent people coming to this forum to discuss tech-adjacent light content, so it's actually a feature. The real value of HN is its archives IMO.
javascript:(function() {function randomDate(start, end) {var date = new Date(+start + Math.random() \* (end - start));var day = ("0" + date.getDate()).slice(-2);var month = ("0" + (date.getMonth() + 1)).slice(-2);var year = date.getFullYear();return year + '-' + month + '-' + day;}var startDate = new Date(2007, 9, 9);var endDate = new Date();var randomDateStr = randomDate(startDate, endDate);var newUrl = 'https://news.ycombinator.com/front?day=' + randomDateStr;window.location.href = newUrl;})();
For example, here's HN from a year ago: https://news.ycombinator.com/front?day=2023-07-02.
https://news.ycombinator.com/highlights is another good resource (and if anyone notices a great HN comment, past or present, they're welcome to nominate it for the highlights list! just email [email protected]).
A 'random' link might be a good idea. For /highlights too.
Although, something I value a lot from algolia is the very fast live search as you type[0].
Vectara seems to be smarter, but much slower.
My needs are satisfied with algolia 99% of the time as a technical user.
[0]: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
I am currently playing with the Algolia hackernews search API myself and experimenting with spaCy Named Entity Recognition and llama3 to come up with some interesting data.
Work in progress version here: https://news.facts.dev/topic
It doesn't seem like it has any filtering or sorting like the Algolia one has, like comments/stories by a specific user, during certain dates, sorting by upvotes/recency, searching by just title/content/comments.
Say I wanted to search for comments by the OP, ofermend, it doesn't seem like I can...
Entering just their name returns results that aren't made by them nor mention their username, I tried other queries too without any luck.
PS: no, lootitooti is not my project. I decided to finally watch Game of Thrones with my wife and I remembered that site when I was watching the opening. I remembered seeing it here on HN, searched and found it.