In previous blog posts we have explored the origin of intelligent systems and their evolution throughout history. In this new entry we will try to understand what Artificial Intelligence brings to the world today, understanding the main blocks in which this discipline works.
As in any scientific discipline, the subdivisions that can be proposed for classification purposes are infinite and subjective for the most part. However, in the field of AI there is a certain consensus on the major categories, those from which all the others are drawn.
Currently, the vast majority of AI solutions work in one of these main categories: computer vision, natural language processing, and finally speech and text generation.
Computer vision aims to enable computers to understand the visual information they receive. A human being's vision not only captures images, like a digital camera, but processes them, understands them, and makes decisions based on what is observed.
To make a machine understand the information contained in an image or video, algorithms based on neural networks are used. Specifically, the most widely used model today is that of Convolutional Neural Networks (CNN). A convolutional neural network is a computer programme that performs numerous mathematical operations on each pixel of the image it receives, in order to fulfil a specific objective. The great advantage is that these operations, as discussed in the introduction to AI, are decided by the neural network itself.
The objectives that a Convolutional Neural Network (CNN) can pursue are very numerous, but among the most important ones are:
1. Object classification. The intelligent system receives an image and tries to understand the object presented in it. For example, identifying the species of a plant by means of a photograph, or differentiating between different types of vehicles in an automatic toll system. In this type of targeting, the spatial location of the objects is not relevant, but simply associating a class with them. One of the major limitations of classification techniques is that they can only process a single concept per image.
2. Object detection. In this case, the AI system tries to spatially locate objects as well as classify them. This means being able to frame each object of interest in the images or videos provided to the neural network. This type of technology is used in autonomous cars to detect traffic signs and then classify them to understand the type of sign being observed.
3. Object tracking. Finally, once we know where each object is and what type it is, it can be interesting to know where they move. With this technology we can know, among other things, how customers move around a supermarket and therefore identify the busiest routes and hot spots.
Natural Language Processing (NLP)
The aim of natural language processing is to enable computers to understand human language. To this end, various artificial intelligence techniques have been developed, such as Recurrent Neural Networks, with long and short term memory (LSTM).
Text analysis allows a computer to process a text written by a human being and extract the most relevant information from it. It allows AI to syntactically and semantically recognise entities, relationships between them, or key concepts. It can be used to analyse the reputation of a brand on social networks, the popularity of a politician or to summarise press articles.
Thanks to this technology, AI goes a step further in natural language processing. It not only recognises concepts or topics, but also detects the intention of the sender. In this way, we can detect customers annoyed with their treatment, supporters elated by a political speech, or disappointing online product reviews.
The task of machine translation has advanced enormously since the advent of deep learning, allowing computers to understand the context and intent of a sentence in one language before translating it into any other. In this way, it is not a structured translation, but rather an attempt to maintain the original message and connotation. An example of translation using deep neural networks, in addition to the well-known Google Translate, is the online translator DeepL, which calls itself the best translator in the world.
Finally, text classification allows us to associate tags to entire contents to improve the classification of books, texts and articles. In this case, the AI processes a whole text and tries to group it by categories, just as object classification technology does but processing words instead of images.
Speech and text generation
Although it may seem a similar field to natural language processing, in this case the task is not to understand human-generated information but to generate information that may appear to have been generated by humans or to convert one type of information into another (written to spoken or vice versa).
Speech synthesis aims to be able to read aloud and with natural intonation any written text provided as input. In this way, applications can be realised as conversational assistants or support for visually impaired or elderly people who may need auditory support to understand written texts.
In this case, the AI tries to do exactly the opposite. Starting from a sound file in which a human speaks in a predetermined language, neural networks try to transcribe his or her speech into text. This technology also helps people with typing difficulties on computer keyboards or mobile phones, but it can also help to automatically generate subtitles in videos to help the hearing impaired.