For a general audience - https://www.ai-supremacy.com/?utm_source=substack&utm_medium...
Fromm inside the AI Labs - https://aligned.substack.com/
https://milesbrundage.substack.com/
for swe - https://artificialintelligencemadesimple.substack.com/
It's better to do it more batchy, like once every 6-12 months or so.
Waiting 3-6 months to take a deep dive is a good pattern to prevent investing your time in dead-end routes.
1. Buy O'reilly (and other tech) books as they come out. This will have a lag, but essentially somebody did this research & summarization work, and wrote it up for you in chapters. Note that you don't have to read everything in a book. Also, $50 is a great investment if it saves you 10s of hours of time.
2. Talks on Youtube at conferences by industry leaders, like Yann LeCun, or maintainers of popular libraries, etc. Also, YT videos on the topic that are upvoted/linked.
3. If you're interested in hardcore research, look for review articles on arxiv.
4. Look at tutorials/examples in the documentation/repo of popular ML/AI libraries, like Pytorch.
5. Try to cover your blindspots. One way or another, you'll know how new AI is applied to SWE and related fields. But how is AI applied to perpendicular fields, like designing buildings, composing music, or balancing a budget? Trying to cover these areas will be tougher, because it will be more noisy, as most commenters will be non-experts compared to you. To get a feel for this, do something that feels unnatural, like watch TED talks that seem bullshity, read HBR articles intended for MBAs, and check out what Palantir is doing.
and is curated by me/my team. hope that helps people keep up on the video/talk-length form factor (as in, instead of books, though we also have 2-3 hour workshops)
I started from scratch, spent 2-4 hrs per day for 6 months & won a silver in a kaggle NLP competition. Now I use some of it now but not all of it. More than that, I'm quite comfortable with models, understand the costs/benefits/implications etc. I started with Andrew Ng's intro courses, did a bit of fastai, did Karpathy's Zero to Hero fully, all of Kaggle's courses & a few other such things. Kagglers share excellent notebooks and I found them v helpful. Overall I highly recommend this route of learning.
im not even convinced kaggling helps you interview at an openai/anthropic (its not a negative, sure, but idk if itd be what theyd look for for a research scientist role)
Now when I read a paper on something unrelated to AI (idk, say progesterone supplements), and they mention a random forest, I know what they're talking about. I understand regression, PCA, clustering, etc. When I trained a few transformer models (not pretrained) on my native language texts, I was shocked by how rapidly they learn connotations. I find transformer-based LLMs to be very useful, yes, but not unsettlingly AGI-like, as I did before learning about them. I understand the usual way of building recommender systems, embeddings and things. Image models like Unets, GANs etc were very cool too, and when your own code produces that magical result, you see the power of pretraining + specialization. So yeah, idk what they do in interviews nowadays but I found my education very fruitful. It was how I felt when I first picked up programming.
Re the age of LLMs, it is precisely because LLMs will be ubiquitous I wanted to know how they work. I felt uncomfortable treating them as black boxes that you don't understand technically. Think about the people who don't know simple things about a web browser, like opening dev tools and printing the auth token or something. It's not great to be in that place.
fastai is also amazing, but it's made of 1.5 hour videos, and is more freeflowing. By the time I even figured out where we stopped last time, my time would sometimes be up. It was very discouraging because of this. But later, once I got a little more time & some basic understanding from Andrew Ng, I was able to attempt fastai.
Github blog: https://github.blog/ai-and-ml/ Cursor blog: https://www.cursor.com/blog
Swyx also has a lot of stuff keeping up to date at https://www.latent.space/, including the Latent Space podcast, although tbh I haven't listened to more than one or two episodes.
Then spin up a RAG-enhanced chatbot using pgvector on your favourite subject, and keep improving it when you learn about cool techniques
- https://www.youtube.com/@aiexplained-official - https://www.youtube.com/@DaveShap - https://www.youtube.com/@TwoMinutePapers/videos
Then newsletter AI supremacy
That said... I will say that in one of my other replies I did mention that some YT channels in this space can be a bit tabloid'ish, and I may have had Shapiro partly in in mind when saying that. But I still subscribe to his channel and some similar ones, just to get a variety of takes and perspectives.
Then find a small dataset and see if you can start getting close to some of the reported benchmark numbers with similar architectures.
We are not exactly talking about big secrets. We are talking about "llm learn resources" keywords - which apparently needs handholding in 2024. And "acknowledging the value of the community".
I use tags a lot - these ones might be more useful for you:
https://simonwillison.net/tags/prompt-engineering/ - collects notes on prompting techniques
https://simonwillison.net/tags/llms/ - everything relating to LLMs
https://simonwillison.net/tags/openai/ and https://simonwillison.net/tags/anthropic/ and https://simonwillison.net/tags/gemini/ and https://simonwillison.net/tags/llama/ and https://simonwillison.net/tags/mistral/ - I have tags for each of the major model families and vendors
Every six months or so I write something (often derived from a conference talk) that's more of a "catch up with the latest developments" post - a few of those:
- Stuff we figured out about AI in 2023 - https://simonwillison.net/2023/Dec/31/ai-in-2023/ - I will probably do one of those for 2024 next month
- Imitation Intelligence, my keynote for PyCon US 2024 - https://simonwillison.net/2024/Jul/14/pycon/ from July this year
For me personally, I prefer to work backwards and then forwards. What I mean by that is that I want to understand the basics and fundamentals first. So, I'm, slowly, trying to bone up on my statistics, probability, and information theory and have targeted machine learning books that also take a fundamental approach. There's no end to books in this realm for neural networks, machine learning, etc., so it's hard to recommend beyond what I've just picked, and I'm just getting started anyway.
If you can get your employer to pay for it, MIT xPRO has courses on machine learning (https://xpro.mit.edu/programs/program-v1:xPRO+MLx/ and https://xpro.mit.edu/courses/course-v1:xPRO+GenAI/). These will likely give a pretty up to date overview of the technologies.
Here's my one on computation probability. The code and math here underlie "AI". It's the same fundamentals, and even code libraries (Jax, pytorch etc( https://bayesiancomputationbook.com/welcome.html
I also posted my more specific guidebook to the fundamentals of GenAI above. Hope both help
We wrote a zine on system evals without jargon: https://forestfriends.tech
Eugene Yan has written extensively on it https://eugeneyan.com/writing/evals/
Hamel has as well. https://hamel.dev/blog/posts/evals/
Ollama Course – Build AI Apps Locally https://youtu.be/GWB9ApTPTv4?feature=shared
As an aside, does anyone have any ideas about this: there should be an app like an 'auto-RAG' that scrapes RSS feeds and URLs, in addition to ingesting docs, text and content in the normal RAG way. Then you could build AI chat-enabled knowledge resources around specific subjects. Autogenerated summaries and dashboards would provide useful overviews.
Perhaps this already exists?
I am not aware if that exists yet, but the challenge I see with it is rather simple: you get overwhelmed with information really quickly. In other words, you would still need human somewhere in that process to review those scrapes and the quality of that varies widely. For example, even on HN it is not a given a link will be pure gold ( you still want to check if it fits your use case ).
That said, as ideas goes, it sounds like a fun weekend project.
You have summarized the marketing strategy of the majority of recent startups.
It depends what you are looking for honestly “the latest things happening” is pretty vague. I’d say the place to look is probably just the blogs of OpenAI/Anthropic/Genini, since they are the only teams with inside information and novel findings to report. Everyone else is just using the tools we are given.
Beyond that: there are some decent sub-reddits for keeping up with AI happenings, a lot of good Youtube channels (although a lot of the ones that talk about the "current, trendy" AI stuff tend to be a bit tabloid'ish), and even a couple of Facebook groups. You can also find good signal by choosing the right people to follow on Twitter/LinkedIn/Mastodon/Bluesky/etc.
https://www.reddit.com/r/artificial/
https://reddit.com/r/machineLearning/
https://www.reddit.com/r/ollama/
https://www.youtube.com/@matthew_berman
https://www.youtube.com/@TheAiGrid
https://www.youtube.com/@WesRoth
https://www.youtube.com/@DaveShap
https://www.youtube.com/c/MachineLearningStreetTalk
https://www.youtube.com/@twimlai
https://www.youtube.com/@YannicKilcher
And you can always go straight to "the source" and follow pre-prints showing up in arXiv.
For tools to make it easier to track new releases, arXiv supports subscriptions to daily digest emails, and also has RSS feeds.
https://info.arxiv.org/help/subscribe.html
https://info.arxiv.org/help/rss.html
There are also some bots in the Fediverse that push out links to new arXiv papers.
* Matt Berman on X / YT
* AI-summarized AI news digest: https://buttondown.com/ainews by swyx
* https://codingwithintelligence.com/about by Rick Lamers
Then I manually follow up to learn more about specific topic/news I'm interested in.
i admire the youtubers a lot and often wonder if i should be venturing into that domain. youtube takes a lot of work but also has the greatest reach by far.
If you want to be an AI engineer study this:
https://github.com/karpathy/llm.c
And build around llama.cpp
Ollama is like cpanel for models. It’s not going to familiarize you with lower level implementation which is just as important as knowing the math.
That was my approach. Being aware of the internals not just the equivalent of “git pull model” got me a job, without a CS degree and a long career in software. Ymmv
https://arxiv.org/pdf/2404.17625 (pdf)
https://news.ycombinator.com/item?id=40408880 (llama3 implementation)
https://news.ycombinator.com/item?id=40417568 (my comment on llama3 with breadcrumbs)
Admittedly, I'm way behind on how this translates to software on the newest video cards. Part of that is that I don't like the emphasis on GPUs. We're only seeing the SIMD side of deep learning with large matrices and tensors. But there are at least a dozen machine learning approaches that are being neglected, mainly genetic algorithms. Which means that we're perhaps focused too much on implementations and not on core algorithms. It would be like trying to study physics without change of coordinates, Lorentz transformations or calculus. Lots of trees but no forest.
To get back to rapid application development in machine learning, I'd like to see a 1000+ core, 1+ GHz CPU with 16+ GBs of core-local ram for under $1000 so that we don't have to manually transpile our algorithms to GPU code. That should have arrived around 2010 but the mobile bubble derailed desktop computing. Today it should be more like 10,000+ cores for that price at current transistor counts, increasing by a factor of about 100 each decade by what's left of Moore's law.
We also need better languages. Something like a hybrid of Erlang and Go with always-on auto-parallelization to run our human-readable but embarrassingly parallel code.
Short of that, there might be an opportunity to write a transpiler that converts C-style imperative or functional code to existing GPU code like CUDA (MIMD -> SIMD). Julia is the only language I know of even trying to do this.
Those are the areas where real work is needed to democratize AI, that SWEs like us may never be able to work on while we're too busy making rent. And the big players like OpenAI and Nvidia have no incentive to pursue them and disrupt themselves.
Maybe someone can find a challenging profit where I only see disillusionment, and finally deliver UBI or at least stuff like 3D printed robots that can deliver the resources we need outside of a rigged economy.
Is there a way to SAVE THIS THREAD on HN ? 'Cos I'd love that.
Thx
https://news.ycombinator.com/item?id=36195527 and
Hacker's Guide to LLMs by Jeremy from Fast.ai - https://www.youtube.com/watch?v=jkrNMKz9pWU
State of GPT by Karpathy - https://www.youtube.com/watch?v=bZQun8Y4L2A
LLMs by 3b1b - https://www.youtube.com/watch?v=LPZh9BOjkQs
Visualizing transformers by 3b1b - https://www.youtube.com/watch?v=KJtZARuO3JY
How ChatGPT trained - https://www.youtube.com/watch?v=VPRSBzXzavo
AI in a nutshell - https://www.youtube.com/watch?v=2IK3DFHRFfw
How Carlini uses LLMs - https://nicholas.carlini.com/writing/2024/how-i-use-ai.html
For staying updated:
X/Twitter & Bluesky. Go and follow people that work at OpenAI, Anthropic, Google DeepMind, and xAI.
Podcasts: No Priors, Generally Intelligent, Dwarkesh Patel, Sequoia's "Training Data"
Started off here: https://www.youtube.com/watch?v=hZWgEPOVnuM&list=PL6e-Bu0cqf...
Ended up here: https://www.youtube.com/watch?v=_5XYLA2HLmo&list=PL6e-Bu0cqf...
And after that, I've had some recent projects that I love to mess around with such as a better license plate detection API than what currently exists for U.K. plates, and once I completed those two courses I had a good enough baseline to work from where I'd encounter a repository and google around if I needed to learn something new.
Short, simple, not painful etc. and I don't have the advanced mathematical background (nor the background within the American mathematical notation) that I'd need to digest the MIT course set, so this learning path has been the best for me. I'm no expert whatsoever, though.
They also have a weekly podcast.
It has a mix of concepts and hands on code, and lots of links to the best places to learn more. I'm keeping it up to date as well, about to merge a guide on building applications, which is what it sounds like you want.
Here's my Google scholar if you want credentials https://scholar.google.com/citations?user=Oq99ddEAAAAJ&hl=en...