Remember when I talked about how how I wished there was some image-to-text AI instead just the text-to-image AI? Turns out there is!
This is a screenshot of an image-to-text AI called "clip_prefix_caption," specifically using the model "Coco." And while it's not 100% accurate, it still did a reasonably impressive job with this image. Of course, it kind of makes sense since this photograph is a free-to-use image I pulled off the web, which is almost certainly the kind of stuff this AI was trained on. If we get a type of image very different from what this AI was probably trained on, the results are not nearly as accurate.
But that's okay, Coco isn't designed for Optical Character Recognition (OCR). If you put this same image into An OCR-focused AI program like Image to Text Converter, you'd get:
Night Vale podcast (zioNightValeRadio A mafia guy who has really misunderstood "make it look like an accident" shouting WHOOPSIE every time he fires the gun. 1:06 PM • 2023-02-10 • 72.4K Views 3,487 Likes 776 Retweets 27 Quotes
Still not perfect, but defintely better than "a man is playing a game on the Nintendo Wii."
But what about art? How do these types of things fair with describing art? Since I'm not 100% clear on what (if any) information about the input images are put into a data set for AI to learn from, I did not want to put just any art in here. And if you play with any of these programs, I strongly encourage you not to put anything in there you don't have explicit permission to use for this.
I got specific permission from @animunerdery to use their drawing of Vinsmoke Sanji for some AI tests:
I decide to try a few different models too.
clip_prefix_caption (using coco model): A man wearing a tie and a shirt.
Blip: Caption: a black and white drawing of a man wearing a tie
CLIP Interrogator (using ViT-L model): a man in a shirt and tie smoking a cigarette, sanji, fanart ”, short silver hair, boring, lanky, zero - hour, coal, alp
I should note that the last one, while much more detailed, took a lot longer to generate than the other two.
This is by no means exhaustive. If you take a look at the post this image came from, you will find some even more detailed image-to-text AI outputs.
And this isn't even counting image-to-text AI in less open-source projects. Microsoft Word, for example, generates alt text for almost every image you put in a Word document, assuming you're using the current version. The Accessibility Checker will prompt you to check these though, because their accuracy is iffy at best, especially with images that are very far from what was probably in the data sets Microsoft trained its AI on. You can also contribute to that training data set if you want, because Microsoft gives you the option to "donate" any manually-created alt text you add to an image in your document to their database to improve accuracy. It's a case-by-case opt in though, don't worry.
Some screen readers have built-in image-to-text AI as well. For example, sometimes after reading the alt text on an image (be it properly written alt text or the default word "photo" on every image on a Tumblr post without user-added alt text), the VoiceOver (iOS) screen reader will read an additional description it makes using its text-to-image AI. I can't always get it to do this consistently, but after playing around a bit with a version of the Vinsmoke Sanji image that did not have anything but the default "photo" alt text, I got it to give me this:
Adult. Clothing. Illustrations. One.
Not the most helpful. But this technology is still pretty young, and I think it has a lot of potential if used correctly.
I had a conversation with this accessibility blogger and we dug into various ai that can provide fairly accurate and detailed functional description for those who need it.
While it’s true that text to image exploded in the public limelight, the same technology came from image to text ai designed as accessibility tools.
As the blogger points out, one should definitely check with the original artist before running their work through ai. I personally don’t mind if any of you want to use what’s on this blog to either test out or simply be read by ai. The only caveat is to not make any money off my stuff.
Also, the @accessibleaesthetics blog is a valuable resource for anyone interested in learning more. Do check them out if you’re interested in providing more effective accessibility.



















