Large Language Models (LLM)

Large Language Models, or LLMs as they are known in the technical community, are artificial intelligence systems capable of understanding and generating text in a surprisingly human-like way. Imagine an incredibly intelligent assistant who has read practically everything ever written on the internet—books, articles, forums, technical documentation—and can converse on any topic with the naturalness of someone who truly masters the subject.

The Evolution of Architectures

Comprehensive LLM Timeline Update: 2024 - June 2025

February 2024

Date	Event	Category
Feb 08, 2024	Bard ➜ Gemini Transition – Google unifies Bard and Duet AI under the Gemini brand and launches an Android app + iOS integration.	Foundation Models
Feb 15, 2024	Gemini 1.5 Pro – Up to 1 million token context window (2M in testing), MoE architecture, performance comparable to Gemini 1.0 Ultra.	Encoder-Decoder / MoE
Feb 15, 2024	Sora Preview – OpenAI's text-to-video model (restricted access). Generates videos up to 1 min with complex scenes.	Multimodal Models

March 2024

Date	Event	Category
Mar 04, 2024	Claude 3 Family – Haiku, Sonnet, Opus models (multimodal, 200K context). Opus surpasses GPT-4 on benchmarks.	Decoder-Only

April 2024

Date	Event	Category
Apr 18, 2024	Llama 3 (8B & 70B) – 15T training tokens, >10M human-annotated examples.	Open-Source Models
Apr 2024	Mixtral 8x22B – Open MoE architecture (176B total / 39B active).	Mixture-of-Experts

May 2024

Date	Event	Category
May 13, 2024	GPT-4o "Omni" – Natively multimodal; 2× faster and 50% cheaper than GPT-4 Turbo.	Multimodal Models

June 2024

Date	Event	Category
Jun 13, 2024	Official Adoption of the EU AI Act – First comprehensive regulation on AI.	Alignment / Regulation
Jun 20, 2024	Claude 3.5 Sonnet – 2× faster than Opus; introduces "Artifacts" feature.	Decoder-Only

July 2024

Date	Event	Category
Jul 18, 2024	GPT-4o Mini – Replaces GPT-3.5 Turbo; 128K context; 82% MMLU.	Decoder-Only
Jul 23, 2024	Llama 3.1 – 8B / 70B / 405B versions; 128K context; Commercial license.	Open-Source Models

September 2024

Date	Event	Category
Sep 12, 2024	o1-preview & o1-mini – First models with internal chain-of-thought.	Theoretical Advances
Sep 25, 2024	Llama 3.2 Vision – First multimodal Llama; includes Llama Guard Vision for safety.	Multimodal Models

October 2024

Date	Event	Category
Oct 22, 2024	Claude 3.5 Sonnet (New) – Model capable of operating graphical interfaces.	Hybrid Approaches

December 2024

Date	Event	Category
Dec 05, 2024	o1 Full Release – 34% fewer errors than preview; limited access.	Theoretical Advances
Dec 09, 2024	Sora Public – Sora Turbo; 1080p / 20s videos.	Multimodal Models
Dec 2024	ChatGPT Pro – $200/month subscription with unlimited o1 access.	Foundation Models
Dec 2024	Gemini 2.0 Flash – Focused on agents; native image/audio generation.	Multimodal Models
Dec 2024	Llama 3.3 – 70B with 405B performance; support for 8 languages.	Open-Source Models
Dec 20, 2024	o3 & o3-mini Preview – Successors to o1, still in testing.	Theoretical Advances

2025

January 2025

Date	Event	Category
Jan 21, 2025	Stargate Project – $500B joint venture for AI infrastructure.	Foundation Models

February 2025

Date	Event	Category
Feb 24, 2025	Claude 3.7 Sonnet – "Extended thinking" mode up to 128K tokens.	Hybrid Approaches
Feb 27, 2025	GPT-4.5 "Orion" – Research preview; API deprecated in April.	Decoder-Only

March 2025

Date	Event	Category
Mar 25, 2025	Gemini 2.5 Pro – #1 on LMArena; 1M token context.	Theoretical Advances

April 2025

Date	Event	Category
Apr 05, 2025	Llama 4 Series – First Llama with MoE; Scout, Maverick, Behemoth versions.	Mixture-of-Experts
Apr 14, 2025	GPT-4.1 Series – 1M context; Mini/Nano variants.	Decoder-Only
Apr 2025	Gemini 2.5 Flash – Hybrid model with 0-24K token reasoning control.	Hybrid Approaches

May 2025

Date	Event	Category
May 22, 2025	Claude 4 Family – Sonnet 4 & Opus 4 models; ASL-3 safety; parallel tool use.	Decoder-Only
May 2025	Gemini 2.5 Pro Deep Think – Experimental mode for advanced math/code.	Theoretical Advances
May 21, 2025	OpenAI acquires Jony Ive's firm (io) – $6.5B; focus on hardware + AI.	Foundation Models

Trends & Observations

Reasoning Revolution ⚙️ – Models with internal chain-of-thought (o-series, Claude 3.7, Gemini 2.5) make the thinking process a controllable hyperparameter.
Omnipresent Multimodality – Text, audio, image, and video are becoming prerequisites for new releases.
Open-Source Acceleration – Llama 3/4, Mixtral, and DeepSeek V3 show parity with (or superiority over) proprietary models.
Concrete Regulation – The EU AI Act establishes deadlines (2025-2026) that influence global roadmaps.
Infrastructure Scaling – Projects like Stargate highlight the magnitude of capital required.
Safety by Design – Alignment techniques (Constitutional AI, Llama Guard Vision) are integrated into the training cycle.

The Democratization of Access

One of the most interesting stories in this field is that of Llama, developed by Meta [1]. Unlike other proprietary models, Llama was made available more openly, allowing researchers and developers around the world to experiment and create their own versions. It's as if, after years of only seeing luxury cars at the dealership, an affordable and high-quality option suddenly appeared.

Llama 2 [2] and its specialized versions, like Code Llama, showed that it was possible to create efficient models even with more limited resources. This opened the doors to a true explosion of innovation, with small companies and independent developers creating incredible solutions.

Gemma [3], from Google, followed this same philosophy of efficiency and accessibility. With only 2 or 7 billion parameters, it can rival much larger models on various tasks. It's a case of "size isn't everything"—sometimes, efficiency lies in the elegance of the solution, not in brute force.

The Quest for Natural Conversation

The Claude models [4], developed by Anthropic, brought a different approach to the table. Focused on more natural and human-aligned conversations, they represent a pursuit of AI that is not only intelligent but also ethical and trustworthy. It's like having a coworker who not only knows a lot but also has good sense.

The Architecture Behind the Magic

But how does all this magic happen? At the heart of virtually all these models is something called the Transformer architecture [5]. Without getting too technical, imagine a system capable of paying attention to different parts of a text simultaneously, weighing the importance of each word in relation to the others. It's as if, when reading a sentence, you could instantly understand not only the individual meaning of each word but also all the relationships and dependencies between them.

What's truly impressive is how these models can "emerge" abilities that were not explicitly programmed. As we increase the number of parameters—think of them as the model's "memory" and "experience"—completely unexpected capabilities arise. It's as if, by adding more neurons to an artificial brain, it suddenly begins to demonstrate creativity, logical reasoning, and even a sense of humor.

The Power of Parameters

To give you an idea of the exponential evolution of these systems: we started with models of millions of parameters, moved on to billions, and today we have models with hundreds of billions. Each parameter is like a small "adjustment knob" that the model uses to process information. The more parameters, the more nuances and subtleties the model can capture in human language.

More Than Words: Emergent Capabilities

What fascinates me most is how these models have begun to demonstrate skills no one expected. They can solve complex mathematical problems, write functional code, create creative analogies, and even demonstrate a primitive form of "reasoning." It's as if we are seeing the first signs of a genuinely different intelligence from our own, but complementary to it.

Let's pause for a moment to absorb this: we are talking about systems that learned to use language by observing patterns in text, without ever having experienced the physical world as we do. And yet, they can converse about almost any subject with a fluency that often surprises us.

Now that we understand what these Large Language Models are and how they have evolved, a natural question arises: how exactly can we use all this power in our daily lives? This is where the practical applications of these systems come in—and believe me, the possibilities are much broader and more revolutionary than you might imagine.

References Cited in This Section

[1] Hugo Touvron et al. "LLaMA: Open and Efficient Foundation Language Models". arXiv preprint arXiv:2302.13971, 2023. https://arxiv.org/abs/2302.13971

[2] Hugo Touvron et al. "LLaMA 2: Open Foundation and Fine-Tuned Chat Models". arXiv preprint arXiv:2307.09288, 2023. https://arxiv.org/abs/2307.09288

[3] Gemma Team et al. "Gemma: Open Models Based on Gemini Research and Technology". arXiv preprint arXiv:2403.08295, 2024. https://arxiv.org/abs/2403.08295

[4] Anthropic. "Claude 3 Model Card". Anthropic, 2024. https://www.anthropic.com/news/claude-3-family

[5] Ashish Vaswani et al. "Attention Is All You Need". Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017. https://arxiv.org/abs/1706.03762

The Evolution of Architectures​

Comprehensive LLM Timeline Update: 2024 - June 2025

February 2024​

March 2024​

April 2024​

May 2024​

June 2024​

July 2024​

September 2024​

October 2024​

December 2024​

2025​

January 2025​

February 2025​

March 2025​

April 2025​

May 2025​

Trends & Observations​

The Democratization of Access​

The Quest for Natural Conversation​

The Architecture Behind the Magic​

The Power of Parameters​

More Than Words: Emergent Capabilities​

References Cited in This Section​