I have had the chance to talk with many companies working on AI products. The biggest mistake I see is a lack of standard process that allows them to rapidly iterate towards their performance goal. Using my learnings, I’m working on an Open Source Framework that structures your application development for rapid iteration so you can easily test different combination of your LLM application components and quickly iterate towards your accuracy goals.
You can checkout the project at https://github.com/palico-ai/palico-ai
You can locally setup a complete LLM Chat App with us with a single command. Stars are always appreciated!
Would love any feedback or your thoughts around LLM Development.
If you are curious about the theory and best practices behind iterating on LLM applications to improve it's performance, this is a good blog-post from Data Science at Microsoft: https://medium.com/data-science-at-microsoft/evaluating-llm-...
I am also working on takes the theory behind the blog-post above, and converting that to a more practical guide using our framework. It should be out within the next two weeks. You can get notified when we release a blog by signing up for our newsletter: https://palico.us22.list-manage.com/subscribe?u=84ba2d0a4c03...
So the process we see when companies are trying to adopt a evaluation framework is that when they want to try a new configuration, they completely change their code-base, create the code to run an evaluation, and review that result independently and try to compare with other changes they have made sometimes in the past. This usually leads to a very slow process for making new changes and becomes very unorganized.
With us, we help you build your LLM application where it's easy to swap components. From there, when you want to see how your application works with a certain configuration, we have a UI where you can pass in the configuration settings for your application, and run an evaluation. We also save all your previous evaluations so you can easily compare them with each other. As a result, it's very easy and fast to test different configurations of your application and evaluate them with us.
Do not you have a phenomenon akin to overfitting? How do you ensure that enhancing accuracy on foreseen input does not weaken results under unforseen future ones?
Overfitting in most ml is a problem because you task an automated process with no understanding with the job of mercilessly optimising for a goal, and then you have to figure out how to spot that.
Here you're actively picking architectures and you should be actively creating new tests to see how your system performs.
You're also dealing with much more general purpose systems, so the chance you're overfitting lowers.
Beyond that you're into the production ML environment where you need to be monitoring how things are going for your actual users.
The article states companies are pivoting towards more specialized verticals, ex. LlamaIndex is focusing on managed document parsing / OCR, which means they are going to get smaller and smaller and eventually die. I don't think just because companies are narrowing their scope means they can't have viable business. If LlamaIndex was charging $100K base price per enterprise and had 1000 customers, they are doing 100M in revenue at least, which is a very viable business.
If you are curious about this topic, maybe this is a good podcast for you :)
You know exactly what goes into the prompt, how it’s parsed, what params are used or when they are changed. You can abstract away as much or as little of it as you like. Your API is going to change only when you make it so. And everything you learn about patterns in the process will be applicable to Python in general - not just one framework that may be replaced two months from now.
With our framework, if you want flexibility for
> You know exactly what goes into the prompt, how it’s parsed, what params are used or when they are changed
We provide this for you. We just give you a process that lets you try and evaluate different configurations of your LLM application layer at scale.
Frameworks are more process driven for achieving a complex task. This is like ReactJS with their component mode -- they set a process for building web application such that you can build more complex applications. At the same time, you have lots of flexibilities in the implementation details of your application. Framework should provide as much flexibilities as possible.
Similarly, we are trying to build our framework for streamlining the process for LLM development such that you can iterate on your LLM application faster. To help setup this process, we enforce very high-level interfaces for how you build(input & output schema), evaluate, deploy your application. We provide all the flexibilities to the developer for low-level implementation details and ensure it's extensible so you can also use any external tools you want within the framework.
Also in general, We are currently in a time of comparatively low iteration. Most companies don't have the tolerance for it anymore and choose cheap one-shot execution at stupid risk, because of FOMO.
Iteration cycles are a function of your inputs; creative potential, vision, energy, runway.