Are you interested in REQUESTS? Save with our coupons on WHATSAPP o TELEGRAM!

What is Mamba, the architecture that aims to completely overcome GPT. New era of AI?

Today I want to go a little more technical. We talk about artificial intelligence every day but it is right to know what it is based on and how it works. In this regard I want to introduce you to Mamba, a new architecture that promises to change i linguistic models as we know them today. The features of Mamba, compared to those of GPT, are extremely superior as well as what it allows you to do.

Mamba is a new horizon for artificial intelligence

The Transformer architecture, introduced in 2016 through the paper “Attention is All You Need” by Google, represented a breakthrough for language models, allowing them to maintain context in interactions. In short: architecture Transformer is an AI model used for creating models like GPT (Generative Pretrained Transformer).

HOW TRANSFORMER ARCHITECTURE WORKS

The heart of the Transformer architecture is the mechanism of “caution“, which allows the model to focus on specific parts of one text while generating or processing another. This mechanism makes Transformers particularly effective in understanding the context and complex relationships within a text. In practice, models based on the Transformer architecture, such as GPT, they learn to generate and understand language through two stages main ones: training (training) and inference (text generation).
During the training, the model is trained on large text datasets to understand linguistic structures, relationships between words, context, etc. In phase of inference, the model uses what it has learned to generate new text, answer questions, translate languages, and other language processing tasks.

However, the emergence of Mamba could mark the beginning of a new era. This architecture promises to be more efficient, capable of overcoming some key challenges faced by current models such as GPT. Specifically, three key aspects make Mamba a promising architecture:

  • reduced inference costs: A key aspect of Mamba is the significant reduction in inference costs. As I said before, inference is the process by which an AI model, after being trained, applies what it has learned to new data, generating text or images. In complex models such as GPT-3 or GPT-4, this process can be expensive in terms of computational resources. Mamba promises to reduce these costs up to five times compared to Transformer-based models, which could have a significant impact, especially for applications that require rapid response generation or work with huge datasets;
  • linear attention computation cost: The second advantage of Mamba concerns the efficiency in calculating attention. In Transformer models, the cost grows virtually (precisely at the level of power, it is not a figure of speech) as the length of the text increases. This means that the longer the text, the more resources are required to process it, limiting the practicality of the models in some applications. Mamba proposes a solution where the cost grows linearly compared to the size of the attention window, making the processing of long texts more manageable and less onerous in computational terms;
  • extremely greater input: Mamba could handle a maximum input window up to 1 million tokensn, much more than is possible with the Transformer architecture. This means that Mamba could, theoretically, analyze and understand extremely long texts, such as entire books, maintaining coherence and details in context. For example, he might analyze an entire novel while maintaining a clear understanding of the characters, plot, and themes from beginning to end.

Despite Mamba's promises, the paper raises doubts about its scalability, particularly when compared to massive models like GPT-4, which has 175 billion parameters. Scalability, in very simple terms, refers to a system's ability to handle an increase in work or grow in size without losing effectiveness. Imagine a small restaurant that does well with few customers. If the restaurant becomes popular and starts to have many more customers, it should be able to handle this increase without compromising the quality of service or food. If it succeeds, then it is “scalable”.

Mamba, in its current state, has been tested only with 3 billion parameters. Thus, it remains uncertain whether its performance and efficiency can be maintained or improved when scaled to larger sizes.

Gianluca Cobucci
Gianluca Cobucci

Passionate about code, languages ​​and languages, man-machine interfaces. All that is technological evolution is of interest to me. I try to divulge my passion with the utmost clarity, relying on reliable sources and not "on the first pass".

Subscribe
Notify
guest

0 Post comments
Inline feedback
View all comments
XiaomiToday.it
Logo