A revolutionary leap: computer vision within everyone's reach

The integration of computer vision into language models, a pioneering initiative by OpenAI and Google, is marking a new chapter in technology.

Contents

On the threshold of a new era in artificial intelligence, the integration of computer vision into language models, led by innovations from OpenAI and Google, marks a milestone in the technological field. This fusion of textual and visual capabilities opens up a new horizon of practical applications, and its ease of use and ease of access will revolutionize the way we interact with technology.

In this article, we will dive into how these advanced technologies are transforming both our everyday and professional environments, shaping a future where digital interaction and efficiency are intertwined in unprecedented ways.

What is Computer Vison and how do these models work?

Computer Vision is a field of artificial intelligence that creates computer models and systems to provide a computer with the ability to extract information from a digital image and analyze its meaning.

Until now, language models were only capable of receiving one input modality: text. For many use cases, this was restrictive in areas where models such as GPT-4 could be applied, since using only text inputs left a lot of processing potential and functionality by the wayside.

Following the announcements by OpenAI and Google, computer vision will finally be available and integrated into existing language models. These new computer vision capabilities are in addition to the existing text generation models, while maintaining all the capabilities they already have.

These could be considered some of the first steps towards a "multimodal" interaction, understanding multimodality as the model's ability to "ingest knowledge from multiple sources and modalities and use it to solve tasks involving any modality".

How accessible are these new capabilities to the public?

Beyond the technological advancement of these tools, something that is also surprising is how these models have been made available to the public. While computer vision models have been in use in various segments of industry and in various scientific research settings, this is the first time they have been available to each of us.

In the case of the OpenAI model, these features are available to developers already using ChatGPT, but also to anyone who subscribes to ChatGPT Plus.

At the moment, Google's Gemini Pro model is integrated with Bard. But, betting on more, Google has also been dedicated to Gemini Nano (a simpler version of the model) can be integrated with the Android system, initially only available only for Pixel 8 Pro phones.

Applications for IoT and Smart Cities

To get a general idea of the potential of these tools and their various uses, we mention some examples that are being implemented in industrial environments and in the areas of the Internet of Things and smart cities.

Support for the blind

By endowing the artificial intelligence model with vision, blind people can use it for support or assistance. A clear use case comes from BeMyEyes, an application that has been on the market for 12 years, which seeks to encourage voluntary help for blind people.

Recently, this company launched in beta a spinoff of its original application in collaboration with OpenAI, to apply these new technologies. The result is BeMyAI, based on the idea that people point the camera of their cell phone wherever they want and the app will give them voice indications of what they are capturing. For example, it can be used to help them cross the street or know what the menu says in a restaurant.

Public safety

Within this area there are several implementations applied to tracking, counting and monitoring of people in transportation and public places. In particular, some implementations are oriented to detect dangerous situations (such as robberies or assaults) and vandalism, as well as to discover people in unauthorized places.

Its objective is to contribute to the security measures already in place and to generate alerts, as a kind of guard available 24 hours a day, every day.

Industrial safety

By analyzing work situations and checking employees' compliance with safety regulations, these applications aim to improve safety in the workplace. They are of particular interest, for example, in construction sites, excavations or laboratories where volatile substances are handled.

Traffic monitoring and road safety

This is one of the areas where most applications of computer vision technology have been seen. The most common use cases are related to the analysis of driving and traffic behavior of vehicles, although there are also applications aimed at monitoring the condition of roads and other road infrastructure, in order to keep it in good condition. The information from these analyses is relevant for implementing different measures to make traffic more efficient and safer.

For example, Honda' s subsidiary in Argentina recently conducted an experiment applying these technologies in an intelligent traffic light equipped with a camera. The idea was to verify that motorcyclists were wearing helmets, and the traffic light would only change to green if this rule was complied with. The result was a video to raise awareness, in which several motorcyclists were perplexed by what the traffic light demanded through a screen and ended up complying with the rule.

Examples of domestic use

Now that computer vision is available for all this technology, we find many more applications for everyday use. Several of them make tasks easier for us or give us assistance and recommendations. Some examples are:

Nutritional value analysis

Sometimes, it is difficult to understand or even read the small letters on a product label to understand its nutritional values and ingredients. With these vision tools, we can take a picture of the label and ask the virtual assistant to analyze it, as well as ask questions about what we want to know, for example, if the food is good for a celiac diet or if the sodium values are recommended for someone with hypertension.

Smart appliances and home automation

In this area we can include several functionalities, from asking the model what to cook with a photo of what is in the fridge to asking it to make an automatic purchase of the things we are missing. We can also delegate sorting tasks to the model, such as distinguishing between garbage and recyclables, or even assistance in choosing the best washing program based on a photo of the laundry.

Also, within home automation, cameras can be placed that recognize the residents of the home, give them access to the house and adjust the lights, music and heating to that person's liking.

Labor and productivity

On occasions when we need to plan a presentation or put together design schemes, these tools can also help us convert our hand sketches into digital projects.

An example of these capabilities was recently demonstrated at OpenAI's developer livestream. There, the model was fed a hand-drawn sketch of a web site and asked to program it from scratch, a task it accomplished in seconds, when it would take a person much more work.

Looking to the future

Today, making computer vision available to everyone is not only a giant technological leap forward, but a true transformation of the way we will interact with our environment in the near future.

From supporting the blind to revolutionizing public and road safety and home automation, these technologies are proving to be efficient tools with great potential.

It should be kept in mind that these developments are only now coming within our reach and, therefore, mixed results are to be expected. However, integrating computer vision into our daily and professional lives now will provide us with smoother, safer and more efficient digital interaction experiences by the time these technologies have matured sufficiently.


By Martin Piriz, Research & Development Assistant Quantik Labs

Martin is an advanced student of Communication Systems Engineering, with a profile focused on signal processing and machine learning.
Since 2022 he is part of QuantikLabs assisting in the research and development of projects and products.


About Quantik Lab

Quantik Lab is the area of the Quantik Group dedicated to research and development (R&D). Its objective is to foster and mature the creation of new products and technologies, which can then be scaled. Ideas for exploring new topics come from both customers and partners.

Today, he conducts research on metaverse, internet of things, electric mobility, customer experience and smart cities.

Share:

Facebook
Twitter
LinkedIn
WhatsApp

Related Entries