The one that really stands out is GroundUI-1K, where it beats the competition by 46%.
Nova Pro looks like it could be a SOTA-comparable model at a lower price point.
No match for Google's NotebookLM podcasts.
It’s irrelevant to the article, which is about Nova.
1. A company the size of Amazon has enough resources and unique internal data no one else has access to that it makes sense for them to build their own models. Even if it's only for internal use
2. Amazon cannot beat Anthropic at this game. They are far a head of them in terms of performance and adoption. Building these models in-house doesn't mean it's a bad idea to also invest in Anthropic
AWS having customers using its own model probably improves AWS's margins, but having multiple models available (e.g. Anthropic's) improves their ability to capture market share. To date, AWS's efforts (e.g. Q, CodeWhisperer) have not met with universal praise. So for at least for the present, it makes sense to bring customers to AWS to "do AI" whether they're using AWS's models or someone else's.
I would add different errors as well. Here are two examples where GPT-4o and Claude 3.5 Sonnet cannot tell that "GitHub" is spelled like "GitHub".
GPT-4o: https://app.gitsense.com/?doc=6c9bada92&model=GPT-4o&samples...
Claude 3.5 Sonnet: https://app.gitsense.com/?doc=905f4a9af74c25f&model=Claude+3...
I don't think there will be one model that will rule them all, unless there is a breakthrough. If things continue on the same path, I think Amazon, Microsoft and Google will be the last ones standing, since they can provide models from all the major LLM players.
To create one model that is great at everything is probably a pipedream. Much like creating a multi-tool that can do everything- but can it? I wouldn't trust a multi-tool to take a wheel nut off a wheel, but I would find it useful if I suddenly needed a cross-head screw taken out of something.
But then I also have a specific crosshead screwdriver that is good at just taking out cross-head screws.
Use the right tool for the right reason. In this case, there maybe a legal reason why someone might need to use it. It might be that this version of a model can create something better that another model can't. It might be that for cost reasons you are within AWS, that it makes sense to use a model at the cheaper cost than say something else.
So yeah, I am sure it will be great for some people, and terrible for others... just the way things go!
Nobody needs Reddit hallucinations about programming.
So I guess that’s who it’s for.
I’ve only spent an hour with it though obviously.
Per 1k tokens Input | Output
Amazon Nova Micro: $0.000035 | $0.00014
Amazon Nova Lite: $0.00006 | $0.00024
Amazon Nova Pro: $0.0008 | $0.0032
Claude 3.5 Sonnet: $0.003 | $0.015
Claude 3.5 Haiku: $0.0008 | $0.0004
Claude 3 Opus: $0.015 | $0.075
Source: AWS Bedrock Pricing https://aws.amazon.com/bedrock/pricing/Price is pretty good. I'm assuming 3.72 chars/tok on average though.. couldn't find that # anywhere.
AWS is the golden goose. If Amazon doesn't tie up Anthropic, AWS customers who need a SOTA LLM will spend on Azure or GCP.
Think of Anthropic as the "premium" brand -- say, the Duracell of LLMs.
Nova is Amazon's march toward a house brand, Amazon Basics if you will, that minimizes the need for Duracell and slashes cost for customers.
Not to mention the potential benefits of improving Alexa, which has inexcusably languished despite popularizing AI services.
:Edited for readability
For example, look how many different types of database they offer (many achieve the same objective but different instantiation)
https://aws.amazon.com/products/?aws-products-all.sort-by=it...
I guess it depends on how sensitive your data is
At best, you can conclude that outdated product design doesn't always ruin a business (clearly). But you can't conclude the inverse (that investing in modern product design doesn't ever help a business).
Case in point: tell me, from the point of view of the user, how many steps it takes to deploy a NextJS/React ecosystem website with Vercel and with AWS, start to finish.
However, I don't think it's fair to say that this trade-off always wins out. Rather, they've carved out their own ecological niche and, for now, they're exploiting it well.
The AWS APIs are so expansive, a product like this could offer a complete replacement for the default web console and maybe even charge for it. Does anyone know if such a solution exists? Perhaps some more generic "shell-to-ui" application? If not, I'm interested in building one if anybody would like to contribute.
I've heard multiple accounts of him seeing a WIP and asking for changes that compromise the product's MVP.
https://gist.github.com/kislayverma/d48b84db1ac5d737715e8319...
I read that post every couple of years or so.
For most of AWS offerings, it literally doesn't matter and logging in to AWS Console is a break glass thing.
Case in point: this very article. It uses boto3 to interface with AWS.
Thats a big thing for complience. All LLM-providers reserve the right to save (up to 30days) and inspect/check prompts for their own complience.
However, this means that company data is potentionally sotred out-of-cloud. This is already problematic, even more so when the storage location is outside the EU.
Legally we're only allowed to use text-embeddings-3-large at work because Azure don't host text-embeddings-3-small within a European region.
I wonder how fast it "glances" an entire 30 minute video and takes until the first returned token. Anyone wager a guess?
Does this mean they trained multiple copies of the models?
Part of it could also be that they'd prefer to move all operations to the in-house trn chips, but don't have full confidence in the hardware yet.
Def ambiguous though. In general reporting of infra characteristics for LLM training is left pretty vague in most reports I've seen.
This is blowing my mind. gemini-1.5-flash accidentally knows how to transcribe amazingly well but it is -very- hard to figure out how to use it well and now Amazon comes out with a gemini flash like model and it explicitly ignores audio. It is so clear that multi-modal audio would be easy for these models but it is like they are purposefully holding back releasing it/supporting it. This has to be a strategic decision to not attach audio. Probably because the margins on ASR are too high to strip with a cheap LLM. I can only hope Meta will drop a mult-modal audio model to force this soon.
Amazon is rapidly developing its own jargon such that you need to understand how Amazon talks about things (and its existing product lineup) before you can understand half of what they're saying about a new thing. The way they describe their products seems almost designed to obfuscate what they really do.
Every time they introduce something new, you have to click through several pages of announcements and docs just to ascertain what something actually is (an API, a new type of compute platform, a managed SaaS product?)
It doesn't even use the words "LLM", "multimodal" or "transformer" which are clearly the most relevant terms here... "foundation model" isn't wrong but it's also the most abstract way to describe it.
a) How does it perform on my set of evals
b) What is the cost/latency of serving it to my consumers.
It shouldn't matter to me how many parameters, corpus it is trained on, whether it's LLM or Transformer or something else
What kinds of eval? Personally, I have no idea what kind of data you can throw at a "foundation model" and what kind of response you will get.
The only thing it says is that there's machine learning involved... Once you get enough context to understand it's not a spin-off of a TV series.
It's rare for the leading model providers to answer these questions.
As someone who applies these models daily, I agree with the dead comment from meta_x_ai. Your questions are interesting/relevant to a person developing these models, but less important to the average person utilizing these models through Bedrock.
Amazontalk: You can build on <product name> to analyze complex documents... Human language: There is no product, just some DIY tools.
Amazontalk: Provides the intelligence and flexibility Human language: We will charge your credit card in multiple obscure ways, and we'll be smart about it
If you think of clouds as being cross continent mainframes, a lot more things make a more sense.
What’s the subnet of the security group of my user group for Aws lambda application in a specific environment that calls kms to get a secret for….
Right now when I see obviously AI generated images for book covers I take that as a signal of low quality. If AI generated videos continue to look this bad I think that'll also be a clear signal of low quality products.
Personally I am more familiar with directly using API keys or auth tokens than AWS's IAM users (which are more similar to what I'd call "service accounts").
I'm using none to very little of the functionality they have added recently: not interested in RAG, not interested in Guardrails. Just Claude access, basically.
This is a guide for the casual observer who wants to try things out, given that getting started with other AI platforms is so much more straightforward. It's all open source, with transparent hosting, catering to any remaining concerns someone interested in exactly that may have.
This blog post encourages you to do this known dangerous thing, instructs you to bypass these warnings, and then paste these credentials into an untrusted app that is made up of 1000+ lines of code. Yes, the 1000+ lines of code are available for a security audit, but let’s be real: the “casual observer who wants to try things out” is not going to actually review all (if any) of the code, and likely not even realize they should review it.
I give kudos to you for wanting to be helpful, but the instructions in this blog (“do this dangerous thing, but trust me it’s okay, and then do this other dangerous thing, but trust me it’s okay”) is exactly what nefarious actors would ask of unsuspecting victims, too, and following such blog posts is a practice that should not be generally encouraged.
When marketing talks about price delta and not quality of the output, it is DOA. For LLMs, quality is a more important metric and Nova would always try to play catch with the leaderboard forever.