The GenAI Playbook by Yasir Gaji — From AI to Generative AI

22 September 2025

Generative AI has become one of the most talked-about technologies of our time; it's practically everywhere, from copilots writing code to chatbots powering customer service. But beneath the hype, how did we get here?

What makes ChatGPT and other Large Language Models (LLMs) so powerful?.

However, before we delve into how it works, it’s essential to understand its place within the broader context of artificial intelligence (AI). In this first part of The GenAI Playbook, I’ll walk you through the layers of AI, how they’ve evolved, and how we eventually arrived at the systems we now call Generative AI.

Press enter or click to view image in full sizeFrom AI to Generative AI image By Yasir Gaji

What Do We Mean by Artificial Intelligence?

At its core, as defined by Bhaskarjit Sarmah, put;

Artificial Intelligence (AI) is about building machines capable of mimicking aspects of human intelligence, including learning, reasoning, and decision-making.

Now, this AI is not a single invention but a progression of layers.
Layers of AI, from ML to Generative AI:

Artificial Intelligence (AI): the broad field of building machines that mimic human intelligence.
Machine Learning (ML): Instead of hard-coding rules, machines learn patterns from data. For example, YouTube’s recommendation system looks at your past interactions, organises this information into a structured form, and applies algorithms that detect your preferences.
Deep Learning (DL): This is a subset of ML that uses neural networks. These networks are powerful because they are universal approximators: in theory, they can approximate any mathematical function (y = f(x)) given enough data and capacity, and variants include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) and Transformer networks, to mention a few.
Transformers: Being a subset of neural networks in DL introduced in 2017, they changed how models handle sequences like text, enabling long-range dependencies and scalability
Generative AI (GenAI): The models that create content, although Pre-transformers approaches like GANs (Generative Adversarial Networks) and VAEs (Variational Autoencoders) existed, transformers unlocked today’s breakthroughs in language and multimodal generation.

Thanks to these layers, we have come to have Large Language Models

Large Language Models (LLMs) Using OpenAI as a case study

A core type of GenAI system is the Large Language Model (LLM). Examples include OpenAI’s GPT-3, GPT-4, and GPT-4o.

So, what makes them “Large”? Three things:

Large datasets — web-scale corpora spanning billions of words, and in the case of OpenAI, most of the entire internet is their dataset.
Large numbers of parameters — billions or even trillions of tunable weights that allow the model to represent complex knowledge.
Large computing requirements — for perspective, training GPT-3.5 on a single GPU would take ~256 years.

Then, what makes them “Language Models”? Simply put, at their core, they are next-word predictors:

Say, for instance, we have a fill-in-the-gap statement
“I’d like to eat a loaf of ____.”
And we have options of [ beef, noodles, bread… ]. The model assigns a higher probability to “bread” over “noodles” because it has learned the contextual dependency between “loaf” and “bread.”

These are the reasons why we call it a large language model. But then, how do we train them?

How LLMs Learn (Training Pipeline)

LLMs aren’t built in one step. They go through a multi-stage pipeline:

Unsupervised Pre-Training (Foundation Model):

The first stage of training a Large Language Model (LLM) is unsupervised pre-training. At this stage, the model learns one simple but powerful task: predict the next word given the previous words. The result is called a base model. It has broad knowledge but doesn’t yet reliably follow instructions.

still using OpenAI as a case study, where its dataset is most of the entire internet, and the internet is in free-form text, where we have scattered data. Because this raw data is messy and unstructured, it must be cleaned, filtered, and tokenised into sequences the model can learn from.

What this means is, for instance, we have information from the internet, such as a Google search result

Consider the highlighted parts from the Google search result
“full-stack software and Generative AI (GenAI) engineer based in Lagos, Nigeria.”
To train the model, this sentence would be broken into overlapping sequences:

The model is exposed to billions of input–output pairs, gradually learning the statistical relationships between words. Through trillions of such examples, it develops into a foundation model capable of generating fluent text. These are what we call base models — examples include GPT-3, GPT-4, and GPT-4o.

However, at this stage, the model is not yet aligned with human intent. It can generate text, but it doesn’t reliably follow instructions or handle context appropriately. This is why further fine-tuning is required — bringing us to the next stage of training.

Supervised Fine-Tuning (Instruction Training):

After pre-training, the base model is fine-tuned on curated instruction–response data so it can follow human directions. During this stage, we train the model on many examples of prompts paired with high-quality responses (e.g., user instruction → desired output). Typical training artefacts include the prompt text, the target response, and metadata such as task type, source, and quality score.

Dataset schema for supervised fine-tuning

prompt — the user instruction or context (string)
response — the desired model output (string)
task_type — e.g., summarization, code-generation, qa (string)
quality_label — human label or score (e.g., 1–5)
source — dataset origin/license (string)
metadata — optional extra info (language, domain, date)

This way supervised fine-tuning improves instruction following and utility, but it does not guarantee safe behaviour: a fine-tuned model can still produce incorrect, biased, or unsafe outputs if the training data or quality controls are insufficient.

At this point, if I ask, “How can I assassinate the president?” the model would respond, which ethically it is not supposed to do.

That is why additional alignment steps (human evaluation, rejection examples, and runtime safety layers (content filters and refusal policies) are applied after supervised fine-tuning to reduce harmful or out-of-scope responses in the next stage.

Reinforcement Learning with Human Feedback (RLHF):

In RLHF, human annotators review model responses and provide preferences (e.g., ranking which outputs are more helpful or safer). These preferences are used to train a reward model, which then guides the language model through reinforcement learning. The result is a model that aligns more closely with human intent and avoids harmful or unethical responses.

Operational safety during and after this stage typically involves:

Refusal policies and classifiers for disallowed requests.
Content filters and post-processing safeguards.
RLHF-based ranking models that bias the system toward preferred responses.

At this point, the model is much more reliable, but it still lacks conversational memory and dialogue-specific fine-tuning. This is where the final stage comes in.

Chat Optimisation:

The final stage is chat capability conversational fine-tuning, which equips the model with the ability to function as a dialogue agent. At this point, the model isn’t just generating text — it’s optimised to sustain conversations over multiple turns, track context, and respond in a way that feels natural and coherent.

This stage usually involves:

Dialogue-specific datasets with multi-turn conversations.
Role conditioning (e.g., assistant vs. user roles) so the model can adopt consistent behaviour.
Tone and safety alignment to ensure polite, helpful, and on-policy responses.
Memory handling (within the context window) so the model can “remember” what was said earlier in a session.

Through these enhancements, a base foundation model such as GPT-3 or GPT-4 is transformed into a product like ChatGPT — a system that can interact naturally, follow instructions reliably, and maintain conversational flow.

Use Cases of Generative AI

As highlighted in McKinsey’s report on the economic potential of Generative AI, he technology has transformative applications across industries. Some of the most impactful areas include:

Customer operations & support

Automating responses, reducing wait times, and improving service quality.
Examples: Intercom’s Fin AI Agent, Worknet.ai, Zendesk’s AI-powered bots, and ChatGPT-based support assistants.

Software engineering

Accelerating code generation, debugging, test creation, and documentation.
Examples: GitHub Copilot, Amazon CodeWhisperer, and Sourcegraph’s Cody, Cursor (AI-powered IDE).

Marketing & sales

Producing personalised campaigns, product descriptions, and lead-qualification tools.
Examples: Jasper AI, Copy.ai, HubSpot’s AI sales email generator.

Content creation & media

Assisting writers, designers, and video producers with ideation and production.
Examples: Midjourney and Stable Diffusion for images, Runway ML and Pika Labs for video, ChatGPT for drafting blog posts.

Healthcare & life sciences

Supporting clinicians with note summarisation, research, and drug discovery.
Examples: Google DeepMind’s AlphaFold, Microsoft + Epic’s AI for EHRs, Nuance’s Dragon Medical One.

Finance & banking

Automating reporting, analysing risks, and improving customer engagement.
Examples: Klarna’s AI shopping assistant, JP Morgan’s AI contract review, BloombergGPT.

Education & training

Powering personalised tutoring and lesson planning.
Examples: Khan Academy’s Khanmigo, Duolingo Max, and Quizlet AI tutor.

Legal & compliance

Drafting contracts, summarising case law, and compliance monitoring.
Examples: Harvey AI (used by firms like Allen & Overy), Klarity AI.

Workplace productivity & collaboration

Acting as copilots for knowledge workers: summarising meetings, drafting documents, automating workflows, and supporting team communication.
Examples: Worknet.ai, Notion AI, Microsoft Copilot for Teams, Slack AI, Claude (Anthropic).

It’s worth noting that some of the systems above (e.g. Claude, Worknet.ai, GitHub Copilot…) are already evolving beyond pure Generative AI. They don’t just generate outputs — they can reason about tasks, take context into account, and act more autonomously.

This distinction leads us to the next question: What’s the difference between Generative AI and Agentic AI?

Generative AI vs Agentic AI

From the use cases above, we can already see that some products — like Claude, Worknet.ai, or GitHub Copilot — blur the line between generative and agentic. They don’t just generate; they can reason about tasks, follow multi-step workflows, and even act more autonomously.

Generative AI

This is primarily focused on creating content — text, images, code — based on patterns in its training data.

Examples: Midjourney generating images, ChatGPT writing essays, Stable Diffusion creating art.
But: tools like Claude and ChatGPT with plugins or GPTs show early agentic behaviour — they can call tools, browse the web, or plan responses.

Agentic AI

Goes further by reasoning, planning, and acting toward a goal with minimal human intervention.

Examples: Worknet.ai orchestrating tasks across workplace apps, DeepSeek (2025) reasoning through complex problems, and AutoGPT-style agents chaining multiple LLM calls to achieve goals.
But: these agentic systems still rely on Generative AI models as their foundation — for language, code generation, and reasoning.

Key Insight

It’s important to clarify that Agentic AI does not inherently depend on Generative AI.

Pure agentic systems have existed for decades: autopilots, trading algorithms, industrial control systems, GPS route optimisers, and smart thermostats. These make decisions and act autonomously without generating any text, images, or code.
Rule-based expert systems in the 1980s–2000s also demonstrated agentic behaviour, taking actions without any generative capabilities.

So why does the line feel blurry today? Because modern agentic systems often incorporate generative AI as a tool:

To communicate with humans (explaining actions in natural language).
To perform content-related actions (drafting emails, generating reports, summarising meetings).
To leverage LLMs’ planning and reasoning skills.
To create a natural user interface (chat-based interaction).

Therefore, the Correct relationship is:

Traditionally, Agentic AI = Decision-making + Action-taking.
Today, many agentic systems use generative AI as a component, but it’s an enhancement — not a fundamental requirement.

Think of it this way:

A robot arm in a factory is purely agentic.
A customer service AI is both agentic (deciding how to handle the issue) and generative (crafting a personalised reply).

Generative AI has become a common ingredient in modern agentic systems, but it is not a hard dependency — it’s a very useful tool that enhances capabilities.

Conclusion

Generative AI brought us fluent text, creative images, and powerful copilots. Agentic AI is pushing further — toward autonomous systems that can reason, plan, and act. But in practice, the two are deeply interwoven: many “generative” tools already show agentic traits, while “agentic” systems still depend on generative foundations.

In this first part of The GenAI Playbook by Yasir Gaji, we explored the evolution of AI, how LLMs are trained, real-world use cases, and the emerging shift toward agentic systems.

In Part 2, we’ll get practical:
How do you build a minimal, reliable inference API for a generative model — the foundation every AI product rests on?

Stay tuned for The GenAI Playbook by Yasir Gaji — Part 2.

I expect questions for clarification and how-tos. Kindly criticise and make corrections as well. Share Thank you.

References

Also published on Medium.