In this case the Ray-Ban Meta is getting accessibility features. The functionality is promising according to reviews, but requires the user to say "Hey meta what am I looking at" every time a scene is to be described. The battery life seem underwhelming as well.
It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it. This is where I got interested to see if I could create a solution using an ESP32 WiFi camera, and learn some arduino development in the process.
I managed to create a solution where the camera connects to the phone "personal hotspot", and publishes an image every 7 seconds to an online server, which then uses the gpt-4o-mini model to describe the image and update a web page, that is read back to the user using voice synthesis. The latency for this is less than 2 seconds, and is generally faster.
I am happy with the result and learnt a lot, but I think I will pause this project for now. At least until some shiny new tech emerges (cheaper open source camera glasses).
I don't see a point to this over just using a cell phone app to do this, which are slowly starting to appear.
I have not done any app development, and for this project I wanted to keep some things simple to focus on what can be expected from a low quality camera in combination with AI for descriptions.
My hope is that there will be "cheap" camera glasses that you can use different services for image descriptions. There is a company called "Be My Eyes" that is developing an AI tool for image descriptions, which probably is miles better than anything I can come up with. https://www.bemyeyes.com/blog/introducing-be-my-ai
Be My Eyes seem to support Ray-Ban Meta glasses, so hopefully "Be My AI" will too.
I understand the "not consulting the target audience" all too well, for instance braille signs that are at eye-level and is hard to find. Some workplaces is very keen to make accessibility adjustments, but mostly if they are seen so that they can show others that adjustments have been done, regardless if they actually help or not.
https://altayakkus.substack.com/p/you-wouldnt-download-an-ai
But if you subscribe, you may see me doing the same with a surveillance camera soon(isch) :)
I think you need to triple-check whether users actually find that nice.
Assuming that keeping the text limited to what interests the user will stay an unsolved problem for the foreseeable future, I guesspect that they prefer a middle ground where they aren’t continuously bombarded with text, but it’s easy to get that flow going. For example, having that text feed on only while a button is being held down.
I guesspect that because I think users would soon be fed up with an assistent that says there’s a painting on the wall or a church tower in the distance every time they turn their head.
Both can be useful information, but not when you hear them for the thousandth times while in your own house/garden.
I wanted to create something opposite of needing to say "Hey Google, describe what is in front of me" or similar. Also a point was to see how cheap/simple you can go and still get valuable information.