Stagehand is a TypeScript project that extends Playwright with three simple AI methods — act, extract, and observe. We’d love for you to try it out using the command below:
npx create-browser-app --example quickstart
Here’s a sample workflow: const stagehand = new Stagehand();
await stagehand.init();
// Stagehand overrides the Playwright Page and Context classes
const { page, context } = stagehand
await page.goto("instadash.com") // Regular Playwright
// Take action on the page
await page.act({ action: "click on taqueria cazadores" })
// Extract relevant data from the page
const { price } = await page.extract({
instruction: "extract the price of the super burrito",
schema: z.object({
price: z.number()
})
})
We built Stagehand because we loved building browser automations using Playwright and Selenium, but we grew frustrated at how cumbersome it is to just get started and write simple browser automations. These frameworks, while incredibly powerful, are built for QA testing and are thus notoriously prone to fail if there are minor changes in the UI or underlying DOM structure.The goal of Stagehand is twofold:
1. Make browser automations easier to write 2. Make browser automations more resilient to DOM changes.
We were super energized by what we’ve been seeing with vision-based computer use agents. We think with a browser, you can provide even richer data by leveraging the information in the DOM + a11y tree in addition to what’s rendered on the page. However, we didn’t want to go so far as to build an agent, since we wanted fine-grained control over each step that an agent can take.
Therefore, the happy medium we built was to extend the existing powerful functionalities of Playwright with simple and extensible AI APIs that return the decision-making power back to the developer at each step.
Check out our docs: https://docs.stagehand.dev
We’d love for you to join and give us feedback on Slack as well: https://stagehand.dev/slack
Do you guys ever think you'll do a similar abstraction for MCP and computer use more broadly?
We're working on a better computer use integration using Stagehand, def a lot of interesting potential there
https://playwright.dev/docs/codegen-intro
Is a chat bot easier to reiterate a test?
Some minimal model that could be run locally and specifically tuned for this purpose might be pretty fruitful here compared to delegating out to expensive APIs.
I often thought E2E testing should be done with AI. What I want is that the functionality works (e.g.: login, then start an assignment) without the need to change the test each time the UI changes.
Personally I'd love to use this as an intermediate workflow for producing deterministic playwright code, but it looks like this is intended for running directly.
I don't think I could plausibly argue for using LLMs at runtime in our test suite at work...
Rather, we want Stagehand to assist people who want to build web agents. For example, I was using headless browsers earlier in 2024 to do real-time RAG on e-commerce websites that could aggregate results for vibes-based search queries. These sites might have random DOM changes over time that make it hard to write sustainable DOM selectors, or annoying pop-ups that are hard to deterministically code against.
This is the perfect use for Stagehand! If you're doing QA on your own site, then base Playwright (as you mention) is likely the better solution
People in the browser automation space consistently ignore this, for whatever reason. Though, it's right on their site in black and white.
Most of my test failures come down to timing issues—CPU load subtly affects execution, leading to random timeouts. This makes it difficult to run tests both quickly and consistently. While proactive load-testing of the test environment and introducing artificial random delays during test authoring can help, these steps often end up taking more time than writing the tests themselves.
It would be amazing if tools were smart enough to detect these false positives automatically. After all, if a human can spot them, shouldn’t AI be able to as well?
We built basically this: Let an LLM agent take a look at your web page and generate the playwright code to test it. Running the test is just running the deterministic playwright code.
Of course, the actual hard work is _maintaining_ end-to-end tests so our agent can do that for you as well.
Feel free to check us out, we have a no-hassle free tier.
Also is there some level of deterministic behavior here or might every test run result in a different underlying command if your wording isn’t precise enough?
What I would love to see either as something leveraging this, or built in to this, is if you prompt stagehand to extract data from a page, it also returns the xpath elements you'd use to re-scrape the page without having to use an LLM to do that second scraping.
So basically, you can scrape new pages never before seen with the non-deterministic LLM tool, and then when you need to rescrape the page again to update content for example, you can use the cheaper old-school scraping method.
Not sure how brittle this would be both going from LLM version to xcode version reliably, or how to fallback to the LLM version if your xcode script fails, but overall conceptually, being able to scrape using the smart tools but then building up basically a library of dumb scraping scripts over time would be killer.
Repeatability of extract() is definitely super interesting and something we're looking into
But of course, the way it works now could also help reduce the brittleness. With an xpath or selector, it quickly breaks when the design changes or things are moved around. With this, it might overcome this.
So tradeoffs, I guess.
disclaimer: i am the author
Skyvern: https://github.com/Skyvern-AI/skyvern
Shortest: https://github.com/anti-work/shortest
I’d love to hear what makes Stagehand different and pros/ cons.
Of course, I have no complaints to see more competition and open source work in this space. Keep up the great work!
You might want to check out Lightpanda (https://github.com/lightpanda-io/browser). It's an open-source, lightweight headless browser built from scratch for AI and web automation. It's focused on skipping graphical rendering to make it faster and lighter than Chrome headless.
That said, the architecture's coming together and the performance gains we're seeing make us excited about what's possible as we keep building. Feedback is very welcome, especially on what APIs you'd like to see us prioritize for specific workflows and use cases.
I recently tried to implement a workflow automation using similar frameworks that were playwright or puppeteer based. My goal was to log into a bunch of vendor backends and extract values for reporting (no APIs available). What stopped me entirely were websites that implemented an invisible captcha. They can detect a playwright instance by how it interacts with the DOM. Pretty frustrating, but I can totally see this becoming a standard as crawling and scraping is getting out of control.