Bing Chat AI can now interpret the content of images

Microsoft has announced a very interesting new feature for Bing Chat, called Visual Search.

Bing Chat, ChatGPT and other similar general AI tools on the market currently mainly focus on understanding text content in natural human language and providing answers. However, understanding visual content is also extremely useful, and that is one of the focus Microsoft is currently implementing for its Bing Chat AI model.

In a recent blog post, Microsoft announced a very interesting new feature for Bing Chat, called Visual Search. With this feature, people can upload an image or select an image available on the web, and Bing will attempt to analyze and interpret the content contained in that image, and apply the data obtained in the response. Microsoft's demo video shows a person uploading a hand-drawn mockup of a web form and asking Bing to generate the HTML and CSS code to make it work.

About Visual Search, Microsoft said:

"Whether you're traveling to a new city and asking about the architecture of a particular building, or are at home trying to come up with lunch ideas based on what's in your fridge, simply upload an image to Bing Chat, and use it to tap into the rich knowledge of the AI algorithm to get the right answer."

In fact, since 2017 tools like Google Lens have been able to identify people, animals, plants, landmarks, and other objects in images, or before that, Google Goggles in 2010. To get a competitive edge over the competition, Microsoft is using GPT-4's image detection features. This is inherently the same language model used by the premium version of ChatGPT, known for its high accuracy.

Bing Chat AI can now interpret the content of images Picture 1

Early real-world testing shows that asking Bing to describe an image often results in a much more detailed response than what users get from Google Lens. For example, when a user uploads a photo of a dog, Bing Chat's response is: "This is a photo of a black and tan dog sitting on a brown fur rug. The dog has a red collar with a silver tag. The dog is looking up at the camera with his ears raised. The background is a white couch with blue-and-white pillows. The photo was taken from a high angle." In addition, the tool also correctly interprets the message that the image has been uploaded in italics. It can be seen that the detail of the content that the AI can interpret is very impressive.