Skip to content

Master Generative AI — Part 1: Foundation of AI & Machine Learning

This is Part 1 of the Master Generative AI: A Step-by-Step Challenge series — a practical, no-fluff guide to going from complete beginner to confident AI practitioner in 2026.

Series Map:


The AI revolution isn't just for researchers anymore. In 2026, the tools, libraries, and models that used to require a PhD and a supercomputer are now accessible to any developer willing to invest a few weeks of focused learning. This series is your step-by-step map.

We start at the very beginning — not because you're not smart, but because the best practitioners always have the strongest foundations.


Chapter 1: Introduction to AI & Generative AI

What Is Artificial Intelligence?

Artificial Intelligence is the field of building systems that can perform tasks that normally require human intelligence — recognizing images, understanding language, making decisions.

AI (the broad field)
├── Machine Learning (learns from data)
│   │
│   ├── Deep Learning (neural networks)
│   │   │
│   │   ├── NLP (language tasks)
│   │   ├── Computer Vision (image tasks)
│   │   └── Generative AI ← where we're going
│   │
│   └── Classical ML (trees, SVMs, linear models)
└── Symbolic AI (rules, logic, expert systems)

What Is Generative AI?

Most AI systems are discriminative — they classify or predict. Given an image, say "cat" or "dog."

Generative AI creates new content — text, images, audio, video, code — that didn't exist before:

Type Examples What It Creates
Text GPT-4, Claude, Gemini Articles, code, answers
Image Stable Diffusion, DALL-E Photos, artwork, designs
Audio ElevenLabs, Whisper Voice, music, sound
Video Sora, Runway Clips, animations
Code Copilot, Code Llama Programs, scripts
Multimodal GPT-4o, Gemini Any combination above

The 2026 Generative AI Landscape

Foundation Models (trained on massive data, cost millions)
    GPT-4o    Claude 3.5/4    Gemini 2.0    LLaMA 3.1
         ↓ fine-tuning / prompt engineering ↓
Application Layer (your products and solutions)
    Chatbots  Code assistants  Content tools  AI agents

Key insight: You don't train foundation models. You use them. Your job as a practitioner is to know which model to pick, how to prompt it, when to fine-tune, and how to deploy reliably.


Chapter 2: Basics of Machine Learning

What Is Machine Learning?

Traditional programming: you write rules → computer follows them. Machine learning: you give examples → computer finds the rules.

# Traditional programming
def classify_email(email):
    if "FREE" in email and "CLICK HERE" in email:
        return "spam"
    return "not spam"

# Machine learning
# (the model learns these rules from 10,000 labeled examples)
model = train(emails, labels)
prediction = model.predict(new_email)

Supervised Learning

You give the model labeled data — inputs paired with correct outputs. The model learns to map inputs to outputs.

Data:
  (email_1, "spam")
  (email_2, "not spam")
  (email_3, "spam")
  ...

Model learns: features → label

Use cases: classification, regression, object detection
Examples: spam filter, house price prediction, medical diagnosis
# Example: Supervised learning with scikit-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Features: [size_sqm, bedrooms, distance_to_center]
X = [[85, 2, 3.2], [120, 3, 1.5], [45, 1, 8.0], [200, 4, 0.5]]
y = [4500000, 8000000, 2000000, 15000000]  # prices in THB

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Unsupervised Learning

No labels. The model finds hidden patterns and structure in data on its own.

Data:
  (customer_1_purchase_history)
  (customer_2_purchase_history)
  (customer_3_purchase_history)
  ...

Model discovers: natural groupings (clusters)

Use cases: clustering, dimensionality reduction, anomaly detection
Examples: customer segmentation, recommendation systems, fraud detection
# Example: K-means clustering
from sklearn.cluster import KMeans
import numpy as np

# Customer data: [monthly_spend, purchase_frequency]
customers = np.array([
    [500, 2], [600, 3], [5000, 15], [4800, 12],
    [200, 1], [100, 1], [5200, 18], [300, 2]
])

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(customers)
print(kmeans.labels_)  # [0, 0, 1, 1, 2, 2, 1, 0] → 3 customer segments found

Reinforcement Learning

An agent learns by taking actions in an environment and receiving rewards or penalties.

Agent → Action → Environment → Reward → Agent (update policy)

Examples: game playing (AlphaGo), robotics, RLHF for LLMs (ChatGPT)

The ML Workflow (Universal)

1. Define the problem     → What are we predicting?
2. Collect data           → More labeled data = better model
3. Explore & clean data   → Fix missing values, outliers
4. Feature engineering    → Transform raw data into useful signals
5. Choose & train model   → Pick algorithm, optimize
6. Evaluate               → Does it generalize to new data?
7. Deploy & monitor       → Real-world performance often differs

Chapter 3: Neural Networks 101

The Neuron: Building Block

A single neuron takes multiple inputs, multiplies each by a weight, sums them up, adds a bias, then passes the result through an activation function:

inputs:   x₁ = 0.5,  x₂ = 0.8,  x₃ = 0.3
weights:  w₁ = 0.4,  w₂ = -0.2, w₃ = 0.9
bias:     b  = 0.1

weighted sum = (0.5×0.4) + (0.8×-0.2) + (0.3×0.9) + 0.1
             = 0.2 - 0.16 + 0.27 + 0.1 = 0.41

output = activation(0.41) = sigmoid(0.41) ≈ 0.60
import numpy as np

def neuron(inputs, weights, bias, activation="relu"):
    z = np.dot(inputs, weights) + bias  # linear combination
    if activation == "relu":
        return max(0, z)                # ReLU: max(0, z)
    elif activation == "sigmoid":
        return 1 / (1 + np.exp(-z))    # sigmoid: 0 to 1
    elif activation == "tanh":
        return np.tanh(z)              # tanh: -1 to 1
    return z

From One Neuron to a Network

Layer neurons together → neural network:

Input Layer    Hidden Layer    Output Layer
[x₁]─────────→[n₁]─────────→[out]
[x₂]─────────→[n₂]
[x₃]─────────→[n₃]
               [n₄]

Each connection has a weight (learned during training)
Hidden layer finds intermediate features
Output layer gives the final prediction

Common Activation Functions

Function Formula Use Case
ReLU max(0, x) Hidden layers (default)
Sigmoid 1/(1+e⁻ˣ) Binary classification output
Softmax eˣᵢ / Σeˣ Multi-class output
GELU x·Φ(x) Transformers (LLMs)
Tanh (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) RNNs, some hidden layers

Your First Neural Network

import torch
import torch.nn as nn

# A simple 3-layer neural network
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, hidden_size)
        self.layer3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.relu(self.layer1(x))  # input → hidden
        x = self.relu(self.layer2(x))  # hidden → hidden
        x = self.layer3(x)             # hidden → output
        return x

# Instantiate
model = SimpleNet(input_size=10, hidden_size=64, output_size=3)
print(model)
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}")  # 4,419

Chapter 4: Deep Learning & Backpropagation

What Makes It "Deep"?

Deep learning = neural networks with many layers (deep = many hidden layers). More layers → can learn more complex patterns.

Shallow (1-2 layers): can classify simple shapes, basic text
Deep (10-100+ layers): can understand faces, language nuance, generate art

How Neural Networks Learn: Backpropagation

Training has a simple loop:

1. FORWARD PASS:   Feed data through the network → get a prediction
2. COMPUTE LOSS:   How wrong was the prediction?
3. BACKWARD PASS:  Calculate how each weight contributed to the error
4. UPDATE WEIGHTS: Nudge each weight in the direction that reduces error
5. REPEAT:         Do this for millions of examples

The Loss Function measures how wrong you are:

# Mean Squared Error (regression)
def mse_loss(predictions, targets):
    return ((predictions - targets) ** 2).mean()

# Cross-Entropy (classification)
def cross_entropy_loss(logits, targets):
    return nn.CrossEntropyLoss()(logits, targets)

Gradient Descent is how weights update:

new_weight = old_weight - learning_rate × gradient

learning_rate: how big a step to take (e.g., 0.001)
gradient: the slope — which direction reduces the error

Full Training Loop in PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Dummy data: 100 samples, 10 features, 3 classes
X = torch.randn(100, 10)
y = torch.randint(0, 3, (100,))

model = SimpleNet(10, 64, 3)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(50):
    # 1. Forward pass
    predictions = model(X)

    # 2. Compute loss
    loss = criterion(predictions, y)

    # 3. Backward pass (compute gradients)
    optimizer.zero_grad()   # clear old gradients
    loss.backward()         # compute new gradients

    # 4. Update weights
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")

Key Deep Learning Concepts

Concept What It Means Why It Matters
Epoch One full pass through all training data More epochs = more learning (up to a point)
Batch size How many samples per gradient update Smaller = noisier but more frequent updates
Learning rate Step size for weight updates Too big → diverge; too small → slow convergence
Overfitting Model memorizes training data, fails on new Detect with validation loss; fix with dropout, more data
Dropout Randomly zero out neurons during training Forces robust features, reduces overfitting
Batch Norm Normalize layer inputs Stabilizes training, allows higher learning rates

Chapter 5: Introduction to Large Language Models (LLMs)

What Is an LLM?

A Large Language Model is a neural network (specifically a Transformer) trained on massive amounts of text to predict the next token. "Large" refers to billions of parameters.

Training: Learn from 15 trillion tokens of text (books, web, code...)
Task:     Given "The Eiffel Tower is in", predict "Paris"

After training: the model has compressed language patterns into
                billions of parameters → can generate coherent text

How Text Becomes Tokens

LLMs don't read characters or words — they read tokens (chunks of text):

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Generative AI is transforming the world!"
tokens = tokenizer.encode(text)
print(tokens)
# [8645, 876, 3068, 318, 28287, 262, 995, 0]
# 8 tokens for 40 characters (~5 chars/token on average)

The Generation Process

At inference time, LLMs generate one token at a time:

Input: "The cat sat on"
Step 1: Predict → "the" (probability: 0.72) → append
Step 2: Predict → "mat" (probability: 0.65) → append
Step 3: Predict → "." (probability: 0.81) → append
Step 4: Predict → [END] → stop

Output: "The cat sat on the mat."

Major LLM Families in 2026

Family Creator Best For
GPT-4o OpenAI General purpose, multimodal
Claude 3.5/4 Anthropic Reasoning, safety, long context
Gemini 2.0 Google Multimodal, long context
LLaMA 3.1 Meta Open source, self-hosting
Qwen 2.5 Alibaba Multilingual, efficient
DeepSeek V3/R1 DeepSeek Coding, math, reasoning
Mistral Mistral AI Efficient, European

Chapter 6: Key Metrics for Model Evaluation

Loss

Loss measures how wrong the model's predictions are. Lower = better.

import torch
import torch.nn.functional as F

# Cross-entropy loss for a 3-class problem
logits = torch.tensor([[2.0, 1.0, 0.5]])  # model outputs (unnormalized)
target = torch.tensor([0])                 # correct class is 0

loss = F.cross_entropy(logits, target)
print(f"Loss: {loss.item():.4f}")  # lower is better

Accuracy

The percentage of correct predictions. Intuitive but misleading for imbalanced data.

def accuracy(predictions, targets):
    correct = (predictions.argmax(dim=1) == targets).sum().item()
    return correct / len(targets)

# Watch out: 99% accuracy sounds great...
# But if 99% of your data is "not fraud", a model that predicts
# "not fraud" for everything gets 99% accuracy with zero utility

Perplexity (for Language Models)

Perplexity measures how "surprised" the model is by real text. Lower = model predicts text better.

Perplexity = e^(average cross-entropy loss per token)

A perplexity of 10 means the model is as uncertain as if
it had to choose uniformly among 10 equally likely tokens.
A perplexity of 5 means it's doing better (fewer effective choices).

GPT-2:         perplexity ~35 on WikiText
GPT-3:         perplexity ~20
Modern LLMs:   perplexity ~8-15 (domain-dependent)
import torch
import torch.nn.functional as F
import math

def calculate_perplexity(model, tokenizer, text: str) -> float:
    inputs = tokenizer(text, return_tensors="pt")
    input_ids = inputs.input_ids

    with torch.no_grad():
        outputs = model(**inputs, labels=input_ids)
        loss = outputs.loss  # cross-entropy per token

    return math.exp(loss.item())

Other Important Metrics

Metric Used For Formula
Precision When false positives are costly TP / (TP + FP)
Recall When false negatives are costly TP / (TP + FN)
F1 Score Balance precision and recall 2 × (P × R) / (P + R)
BLEU Text translation quality N-gram overlap with reference
ROUGE Summarization quality Overlap with reference summary
BERTScore Semantic text similarity Cosine similarity of BERT embeddings

Chapter 7: Setting Up Your Environment

The AI Developer's Toolkit

Python 3.11+        → Language (use pyenv or conda to manage versions)
PyTorch 2.x         → Deep learning framework (best for research + LLMs)
TensorFlow 2.x      → Alternative framework (strong in production/mobile)
Hugging Face        → Hub for models, datasets, tokenizers
Jupyter             → Interactive notebooks for exploration
CUDA / ROCm         → GPU acceleration (NVIDIA / AMD)

Option 1: Local Setup

# Install Python 3.11 (using pyenv)
brew install pyenv   # macOS
pyenv install 3.11.9
pyenv global 3.11.9

# Create project environment
python -m venv genai-env
source genai-env/bin/activate  # Mac/Linux
# genai-env\Scripts\activate   # Windows

# Install core packages
pip install torch torchvision torchaudio   # PyTorch (CPU)
# For CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install transformers          # Hugging Face Transformers
pip install datasets              # HF Datasets
pip install accelerate            # Multi-GPU / mixed precision
pip install diffusers             # Diffusion models
pip install sentence-transformers # Embeddings
pip install langchain             # LLM application framework
pip install openai anthropic      # API clients
pip install jupyter               # Notebooks
pip install numpy pandas matplotlib seaborn  # Data science essentials

# Verify GPU
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"

Option 2: Google Colab (Free GPU — Best for Beginners)

Google Colab gives you a free T4 GPU (16 GB VRAM). Perfect for all exercises in this series.

# In a Colab notebook, most packages are pre-installed
# Check what GPU you have:
!nvidia-smi

# Install what's missing
!pip install -q transformers datasets accelerate

# Mount Google Drive to save model checkpoints
from google.colab import drive
drive.mount('/content/drive')

Option 3: cloud GPU services

Service Free Tier GPU Best For
Google Colab Yes (T4) T4 16GB Learning, small experiments
Kaggle Notebooks Yes (P100) P100 16GB Competitions, datasets
Lightning AI Yes T4 Quick prototyping
RunPod No ($0.20/hr) Any Custom setups
Lambda Labs No ($0.50/hr) A10, A100 Production training

Hugging Face Quickstart

The Hugging Face pipeline is the fastest way to start using AI models:

from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="gpt2")
result = generator("Generative AI is", max_new_tokens=50)
print(result[0]["generated_text"])

# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("This course is absolutely fantastic!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# Text summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """Generative AI models have revolutionized how we interact with computers.
          These models can produce text, images, audio, and video that are
          increasingly indistinguishable from human-created content..."""
summary = summarizer(text, max_length=50, min_length=20)
print(summary[0]["summary_text"])

# Translation
translator = pipeline("translation_en_to_fr")
result = translator("Hello, I am learning Generative AI!")
print(result[0]["translation_text"])  # "Bonjour, j'apprends l'IA générative !"

Your Environment Health Check

Run this script to confirm everything is working:

# health_check.py
import sys
print(f"Python: {sys.version}")

import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

import transformers
print(f"Transformers: {transformers.__version__}")

from transformers import pipeline
pipe = pipeline("sentiment-analysis")
result = pipe("I love learning AI!")
print(f"Test inference: {result}")  # should print POSITIVE

print("\n✓ Environment ready for Generative AI!")

Summary

You've just built the foundation everything else in this series stands on. Here's what to carry forward:

Concept One-Line Takeaway
AI vs. ML vs. DL AI is the goal; ML is the approach; DL is the engine for LLMs
Generative AI Creates new content — text, image, audio — using probability
Supervised vs. Unsupervised Labeled data vs. pattern discovery without labels
Neural network Layers of weighted connections; weights learned from data
Backpropagation Gradient of loss flows backward, adjusting weights at each step
LLMs Transformers trained to predict next token on massive text corpora
Loss / Accuracy / Perplexity Loss is the training signal; accuracy for classification; perplexity for language models
Your environment PyTorch + Hugging Face + Colab = the minimal viable toolkit

Next in this series → Part 2: Working with LLMs — where we go hands-on with tokenization, embeddings, the Transformer architecture, fine-tuning, RAG, and building your first chatbot.

Practice Challenge

Before moving to Part 2, complete this challenge:

  1. Open a Google Colab notebook
  2. Run the environment health check script above
  3. Use pipeline("text-generation") with three different models from HuggingFace Hub
  4. Compare the output quality — notice how model size affects results
  5. Measure inference speed with time.time()

This hands-on exercise cements everything in this part before adding more layers.


Questions or discussion? Connect on LinkedIn, X or reach out via email.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.