Skip to main content

Large Language Models (LLM)

Large Language Models, or LLMs as they are known in the technical community, are artificial intelligence systems capable of understanding and generating text in a surprisingly human-like way. Imagine an incredibly intelligent assistant who has read practically everything ever written on the internet—books, articles, forums, technical documentation—and can converse on any topic with the naturalness of someone who truly masters the subject.

The Evolution of Architectures

Comprehensive LLM Timeline Update: 2024 - June 2025

February 2024

DateEventCategory
Feb 08, 2024Bard ➜ Gemini Transition – Google unifies Bard and Duet AI under the Gemini brand and launches an Android app + iOS integration.Foundation Models
Feb 15, 2024Gemini 1.5 Pro – Up to 1 million token context window (2M in testing), MoE architecture, performance comparable to Gemini 1.0 Ultra.Encoder-Decoder / MoE
Feb 15, 2024Sora Preview – OpenAI's text-to-video model (restricted access). Generates videos up to 1 min with complex scenes.Multimodal Models

March 2024

DateEventCategory
Mar 04, 2024Claude 3 FamilyHaiku, Sonnet, Opus models (multimodal, 200K context). Opus surpasses GPT-4 on benchmarks.Decoder-Only

April 2024

DateEventCategory
Apr 18, 2024Llama 3 (8B & 70B) – 15T training tokens, >10M human-annotated examples.Open-Source Models
Apr 2024Mixtral 8x22B – Open MoE architecture (176B total / 39B active).Mixture-of-Experts

May 2024

DateEventCategory
May 13, 2024GPT-4o "Omni" – Natively multimodal; 2× faster and 50% cheaper than GPT-4 Turbo.Multimodal Models

June 2024

DateEventCategory
Jun 13, 2024Official Adoption of the EU AI Act – First comprehensive regulation on AI.Alignment / Regulation
Jun 20, 2024Claude 3.5 Sonnet – 2× faster than Opus; introduces "Artifacts" feature.Decoder-Only

July 2024

DateEventCategory
Jul 18, 2024GPT-4o Mini – Replaces GPT-3.5 Turbo; 128K context; 82% MMLU.Decoder-Only
Jul 23, 2024Llama 3.1 – 8B / 70B / 405B versions; 128K context; Commercial license.Open-Source Models

September 2024

DateEventCategory
Sep 12, 2024o1-preview & o1-mini – First models with internal chain-of-thought.Theoretical Advances
Sep 25, 2024Llama 3.2 Vision – First multimodal Llama; includes Llama Guard Vision for safety.Multimodal Models

October 2024

DateEventCategory
Oct 22, 2024Claude 3.5 Sonnet (New) – Model capable of operating graphical interfaces.Hybrid Approaches

December 2024

DateEventCategory
Dec 05, 2024o1 Full Release – 34% fewer errors than preview; limited access.Theoretical Advances
Dec 09, 2024Sora Public – Sora Turbo; 1080p / 20s videos.Multimodal Models
Dec 2024ChatGPT Pro – $200/month subscription with unlimited o1 access.Foundation Models
Dec 2024Gemini 2.0 Flash – Focused on agents; native image/audio generation.Multimodal Models
Dec 2024Llama 3.3 – 70B with 405B performance; support for 8 languages.Open-Source Models
Dec 20, 2024o3 & o3-mini Preview – Successors to o1, still in testing.Theoretical Advances

2025

January 2025

DateEventCategory
Jan 21, 2025Stargate Project – $500B joint venture for AI infrastructure.Foundation Models

February 2025

DateEventCategory
Feb 24, 2025Claude 3.7 Sonnet – "Extended thinking" mode up to 128K tokens.Hybrid Approaches
Feb 27, 2025GPT-4.5 "Orion" – Research preview; API deprecated in April.Decoder-Only

March 2025

DateEventCategory
Mar 25, 2025Gemini 2.5 Pro – #1 on LMArena; 1M token context.Theoretical Advances

April 2025

DateEventCategory
Apr 05, 2025Llama 4 Series – First Llama with MoE; Scout, Maverick, Behemoth versions.Mixture-of-Experts
Apr 14, 2025GPT-4.1 Series – 1M context; Mini/Nano variants.Decoder-Only
Apr 2025Gemini 2.5 Flash – Hybrid model with 0-24K token reasoning control.Hybrid Approaches

May 2025

DateEventCategory
May 22, 2025Claude 4 Family – Sonnet 4 & Opus 4 models; ASL-3 safety; parallel tool use.Decoder-Only
May 2025Gemini 2.5 Pro Deep Think – Experimental mode for advanced math/code.Theoretical Advances
May 21, 2025OpenAI acquires Jony Ive's firm (io) – $6.5B; focus on hardware + AI.Foundation Models

  • Reasoning Revolution ⚙️ – Models with internal chain-of-thought (o-series, Claude 3.7, Gemini 2.5) make the thinking process a controllable hyperparameter.
  • Omnipresent Multimodality – Text, audio, image, and video are becoming prerequisites for new releases.
  • Open-Source Acceleration – Llama 3/4, Mixtral, and DeepSeek V3 show parity with (or superiority over) proprietary models.
  • Concrete Regulation – The EU AI Act establishes deadlines (2025-2026) that influence global roadmaps.
  • Infrastructure Scaling – Projects like Stargate highlight the magnitude of capital required.
  • Safety by Design – Alignment techniques (Constitutional AI, Llama Guard Vision) are integrated into the training cycle.

The Democratization of Access

One of the most interesting stories in this field is that of Llama, developed by Meta [1]. Unlike other proprietary models, Llama was made available more openly, allowing researchers and developers around the world to experiment and create their own versions. It's as if, after years of only seeing luxury cars at the dealership, an affordable and high-quality option suddenly appeared.

Llama 2 [2] and its specialized versions, like Code Llama, showed that it was possible to create efficient models even with more limited resources. This opened the doors to a true explosion of innovation, with small companies and independent developers creating incredible solutions.

Gemma [3], from Google, followed this same philosophy of efficiency and accessibility. With only 2 or 7 billion parameters, it can rival much larger models on various tasks. It's a case of "size isn't everything"—sometimes, efficiency lies in the elegance of the solution, not in brute force.

The Quest for Natural Conversation

The Claude models [4], developed by Anthropic, brought a different approach to the table. Focused on more natural and human-aligned conversations, they represent a pursuit of AI that is not only intelligent but also ethical and trustworthy. It's like having a coworker who not only knows a lot but also has good sense.

The Architecture Behind the Magic

But how does all this magic happen? At the heart of virtually all these models is something called the Transformer architecture [5]. Without getting too technical, imagine a system capable of paying attention to different parts of a text simultaneously, weighing the importance of each word in relation to the others. It's as if, when reading a sentence, you could instantly understand not only the individual meaning of each word but also all the relationships and dependencies between them.

What's truly impressive is how these models can "emerge" abilities that were not explicitly programmed. As we increase the number of parameters—think of them as the model's "memory" and "experience"—completely unexpected capabilities arise. It's as if, by adding more neurons to an artificial brain, it suddenly begins to demonstrate creativity, logical reasoning, and even a sense of humor.

The Power of Parameters

To give you an idea of the exponential evolution of these systems: we started with models of millions of parameters, moved on to billions, and today we have models with hundreds of billions. Each parameter is like a small "adjustment knob" that the model uses to process information. The more parameters, the more nuances and subtleties the model can capture in human language.

More Than Words: Emergent Capabilities

What fascinates me most is how these models have begun to demonstrate skills no one expected. They can solve complex mathematical problems, write functional code, create creative analogies, and even demonstrate a primitive form of "reasoning." It's as if we are seeing the first signs of a genuinely different intelligence from our own, but complementary to it.

Let's pause for a moment to absorb this: we are talking about systems that learned to use language by observing patterns in text, without ever having experienced the physical world as we do. And yet, they can converse about almost any subject with a fluency that often surprises us.


Now that we understand what these Large Language Models are and how they have evolved, a natural question arises: how exactly can we use all this power in our daily lives? This is where the practical applications of these systems come in—and believe me, the possibilities are much broader and more revolutionary than you might imagine.

References Cited in This Section

[1] Hugo Touvron et al. "LLaMA: Open and Efficient Foundation Language Models". arXiv preprint arXiv:2302.13971, 2023. https://arxiv.org/abs/2302.13971

[2] Hugo Touvron et al. "LLaMA 2: Open Foundation and Fine-Tuned Chat Models". arXiv preprint arXiv:2307.09288, 2023. https://arxiv.org/abs/2307.09288

[3] Gemma Team et al. "Gemma: Open Models Based on Gemini Research and Technology". arXiv preprint arXiv:2403.08295, 2024. https://arxiv.org/abs/2403.08295

[4] Anthropic. "Claude 3 Model Card". Anthropic, 2024. https://www.anthropic.com/news/claude-3-family

[5] Ashish Vaswani et al. "Attention Is All You Need". Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017. https://arxiv.org/abs/1706.03762