Master Generative AI — Part 1: Foundation of AI & Machine Learning¶

This is Part 1 of the Master Generative AI: A Step-by-Step Challenge series — a practical, no-fluff guide to going from complete beginner to confident AI practitioner in 2026.

Series Map:

Part 1 → Foundation of AI & ML ← you are here
Part 2 → Working with LLMs
Part 3 → Advanced Generative AI
Part 4 → Practical Applications
Part 5 → Career & Capstone Projects

The AI revolution isn't just for researchers anymore. In 2026, the tools, libraries, and models that used to require a PhD and a supercomputer are now accessible to any developer willing to invest a few weeks of focused learning. This series is your step-by-step map.

We start at the very beginning — not because you're not smart, but because the best practitioners always have the strongest foundations.

Chapter 1: Introduction to AI & Generative AI¶

What Is Artificial Intelligence?¶

Artificial Intelligence is the field of building systems that can perform tasks that normally require human intelligence — recognizing images, understanding language, making decisions.

AI (the broad field)
│
├── Machine Learning (learns from data)
│   │
│   ├── Deep Learning (neural networks)
│   │   │
│   │   ├── NLP (language tasks)
│   │   ├── Computer Vision (image tasks)
│   │   └── Generative AI ← where we're going
│   │
│   └── Classical ML (trees, SVMs, linear models)
│
└── Symbolic AI (rules, logic, expert systems)

What Is Generative AI?¶

Most AI systems are discriminative — they classify or predict. Given an image, say "cat" or "dog."

Generative AI creates new content — text, images, audio, video, code — that didn't exist before:

Type	Examples	What It Creates
Text	GPT-4, Claude, Gemini	Articles, code, answers
Image	Stable Diffusion, DALL-E	Photos, artwork, designs
Audio	ElevenLabs, Whisper	Voice, music, sound
Video	Sora, Runway	Clips, animations
Code	Copilot, Code Llama	Programs, scripts
Multimodal	GPT-4o, Gemini	Any combination above

The 2026 Generative AI Landscape¶

Foundation Models (trained on massive data, cost millions)
    GPT-4o    Claude 3.5/4    Gemini 2.0    LLaMA 3.1
         ↓ fine-tuning / prompt engineering ↓
Application Layer (your products and solutions)
    Chatbots  Code assistants  Content tools  AI agents

Key insight: You don't train foundation models. You use them. Your job as a practitioner is to know which model to pick, how to prompt it, when to fine-tune, and how to deploy reliably.

Chapter 2: Basics of Machine Learning¶

What Is Machine Learning?¶

Traditional programming: you write rules → computer follows them. Machine learning: you give examples → computer finds the rules.

# Traditional programming
def classify_email(email):
    if "FREE" in email and "CLICK HERE" in email:
        return "spam"
    return "not spam"

# Machine learning
# (the model learns these rules from 10,000 labeled examples)
model = train(emails, labels)
prediction = model.predict(new_email)

Supervised Learning¶

You give the model labeled data — inputs paired with correct outputs. The model learns to map inputs to outputs.

Data:
  (email_1, "spam")
  (email_2, "not spam")
  (email_3, "spam")
  ...

Model learns: features → label

Use cases: classification, regression, object detection
Examples: spam filter, house price prediction, medical diagnosis

# Example: Supervised learning with scikit-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Features: [size_sqm, bedrooms, distance_to_center]
X = [[85, 2, 3.2], [120, 3, 1.5], [45, 1, 8.0], [200, 4, 0.5]]
y = [4500000, 8000000, 2000000, 15000000]  # prices in THB

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Unsupervised Learning¶

No labels. The model finds hidden patterns and structure in data on its own.

Data:
  (customer_1_purchase_history)
  (customer_2_purchase_history)
  (customer_3_purchase_history)
  ...

Model discovers: natural groupings (clusters)

Use cases: clustering, dimensionality reduction, anomaly detection
Examples: customer segmentation, recommendation systems, fraud detection

# Example: K-means clustering
from sklearn.cluster import KMeans
import numpy as np

# Customer data: [monthly_spend, purchase_frequency]
customers = np.array([
    [500, 2], [600, 3], [5000, 15], [4800, 12],
    [200, 1], [100, 1], [5200, 18], [300, 2]
])

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(customers)
print(kmeans.labels_)  # [0, 0, 1, 1, 2, 2, 1, 0] → 3 customer segments found

Reinforcement Learning¶

An agent learns by taking actions in an environment and receiving rewards or penalties.

Agent → Action → Environment → Reward → Agent (update policy)

Examples: game playing (AlphaGo), robotics, RLHF for LLMs (ChatGPT)

The ML Workflow (Universal)¶

1. Define the problem     → What are we predicting?
2. Collect data           → More labeled data = better model
3. Explore & clean data   → Fix missing values, outliers
4. Feature engineering    → Transform raw data into useful signals
5. Choose & train model   → Pick algorithm, optimize
6. Evaluate               → Does it generalize to new data?
7. Deploy & monitor       → Real-world performance often differs

Chapter 3: Neural Networks 101¶

The Neuron: Building Block¶

A single neuron takes multiple inputs, multiplies each by a weight, sums them up, adds a bias, then passes the result through an activation function:

inputs:   x₁ = 0.5,  x₂ = 0.8,  x₃ = 0.3
weights:  w₁ = 0.4,  w₂ = -0.2, w₃ = 0.9
bias:     b  = 0.1

weighted sum = (0.5×0.4) + (0.8×-0.2) + (0.3×0.9) + 0.1
             = 0.2 - 0.16 + 0.27 + 0.1 = 0.41

output = activation(0.41) = sigmoid(0.41) ≈ 0.60

import numpy as np

def neuron(inputs, weights, bias, activation="relu"):
    z = np.dot(inputs, weights) + bias  # linear combination
    if activation == "relu":
        return max(0, z)                # ReLU: max(0, z)
    elif activation == "sigmoid":
        return 1 / (1 + np.exp(-z))    # sigmoid: 0 to 1
    elif activation == "tanh":
        return np.tanh(z)              # tanh: -1 to 1
    return z

From One Neuron to a Network¶

Layer neurons together → neural network:

Input Layer    Hidden Layer    Output Layer
[x₁]─────────→[n₁]─────────→[out]
[x₂]─────────→[n₂]
[x₃]─────────→[n₃]
               [n₄]

Each connection has a weight (learned during training)
Hidden layer finds intermediate features
Output layer gives the final prediction

Common Activation Functions¶

Function	Formula	Use Case
ReLU	max(0, x)	Hidden layers (default)
Sigmoid	1/(1+e⁻ˣ)	Binary classification output
Softmax	eˣᵢ / Σeˣ	Multi-class output
GELU	x·Φ(x)	Transformers (LLMs)
Tanh	(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)	RNNs, some hidden layers

Your First Neural Network¶

import torch
import torch.nn as nn

# A simple 3-layer neural network
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, hidden_size)
        self.layer3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.relu(self.layer1(x))  # input → hidden
        x = self.relu(self.layer2(x))  # hidden → hidden
        x = self.layer3(x)             # hidden → output
        return x

# Instantiate
model = SimpleNet(input_size=10, hidden_size=64, output_size=3)
print(model)
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}")  # 4,419

Chapter 4: Deep Learning & Backpropagation¶

What Makes It "Deep"?¶

Deep learning = neural networks with many layers (deep = many hidden layers). More layers → can learn more complex patterns.

Shallow (1-2 layers): can classify simple shapes, basic text
Deep (10-100+ layers): can understand faces, language nuance, generate art

How Neural Networks Learn: Backpropagation¶

Training has a simple loop:

1. FORWARD PASS:   Feed data through the network → get a prediction
2. COMPUTE LOSS:   How wrong was the prediction?
3. BACKWARD PASS:  Calculate how each weight contributed to the error
4. UPDATE WEIGHTS: Nudge each weight in the direction that reduces error
5. REPEAT:         Do this for millions of examples

The Loss Function measures how wrong you are:

# Mean Squared Error (regression)
def mse_loss(predictions, targets):
    return ((predictions - targets) ** 2).mean()

# Cross-Entropy (classification)
def cross_entropy_loss(logits, targets):
    return nn.CrossEntropyLoss()(logits, targets)

Gradient Descent is how weights update:

new_weight = old_weight - learning_rate × gradient

learning_rate: how big a step to take (e.g., 0.001)
gradient: the slope — which direction reduces the error

Full Training Loop in PyTorch¶

import torch
import torch.nn as nn
import torch.optim as optim

# Dummy data: 100 samples, 10 features, 3 classes
X = torch.randn(100, 10)
y = torch.randint(0, 3, (100,))

model = SimpleNet(10, 64, 3)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(50):
    # 1. Forward pass
    predictions = model(X)

    # 2. Compute loss
    loss = criterion(predictions, y)

    # 3. Backward pass (compute gradients)
    optimizer.zero_grad()   # clear old gradients
    loss.backward()         # compute new gradients

    # 4. Update weights
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")

Key Deep Learning Concepts¶

Concept	What It Means	Why It Matters
Epoch	One full pass through all training data	More epochs = more learning (up to a point)
Batch size	How many samples per gradient update	Smaller = noisier but more frequent updates
Learning rate	Step size for weight updates	Too big → diverge; too small → slow convergence
Overfitting	Model memorizes training data, fails on new	Detect with validation loss; fix with dropout, more data
Dropout	Randomly zero out neurons during training	Forces robust features, reduces overfitting
Batch Norm	Normalize layer inputs	Stabilizes training, allows higher learning rates

Chapter 5: Introduction to Large Language Models (LLMs)¶

What Is an LLM?¶

A Large Language Model is a neural network (specifically a Transformer) trained on massive amounts of text to predict the next token. "Large" refers to billions of parameters.

Training: Learn from 15 trillion tokens of text (books, web, code...)
Task:     Given "The Eiffel Tower is in", predict "Paris"

After training: the model has compressed language patterns into
                billions of parameters → can generate coherent text

How Text Becomes Tokens¶

LLMs don't read characters or words — they read tokens (chunks of text):

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Generative AI is transforming the world!"
tokens = tokenizer.encode(text)
print(tokens)
# [8645, 876, 3068, 318, 28287, 262, 995, 0]
# 8 tokens for 40 characters (~5 chars/token on average)

The Generation Process¶

At inference time, LLMs generate one token at a time:

Input: "The cat sat on"
Step 1: Predict → "the" (probability: 0.72) → append
Step 2: Predict → "mat" (probability: 0.65) → append
Step 3: Predict → "." (probability: 0.81) → append
Step 4: Predict → [END] → stop

Output: "The cat sat on the mat."

Major LLM Families in 2026¶

Family	Creator	Best For
GPT-4o	OpenAI	General purpose, multimodal
Claude 3.5/4	Anthropic	Reasoning, safety, long context
Gemini 2.0	Google	Multimodal, long context
LLaMA 3.1	Meta	Open source, self-hosting
Qwen 2.5	Alibaba	Multilingual, efficient
DeepSeek V3/R1	DeepSeek	Coding, math, reasoning
Mistral	Mistral AI	Efficient, European

Chapter 6: Key Metrics for Model Evaluation¶

Loss¶

Loss measures how wrong the model's predictions are. Lower = better.

import torch
import torch.nn.functional as F

# Cross-entropy loss for a 3-class problem
logits = torch.tensor([[2.0, 1.0, 0.5]])  # model outputs (unnormalized)
target = torch.tensor([0])                 # correct class is 0

loss = F.cross_entropy(logits, target)
print(f"Loss: {loss.item():.4f}")  # lower is better

Accuracy¶

The percentage of correct predictions. Intuitive but misleading for imbalanced data.

def accuracy(predictions, targets):
    correct = (predictions.argmax(dim=1) == targets).sum().item()
    return correct / len(targets)

# Watch out: 99% accuracy sounds great...
# But if 99% of your data is "not fraud", a model that predicts
# "not fraud" for everything gets 99% accuracy with zero utility

Perplexity (for Language Models)¶

Perplexity measures how "surprised" the model is by real text. Lower = model predicts text better.

Perplexity = e^(average cross-entropy loss per token)

A perplexity of 10 means the model is as uncertain as if
it had to choose uniformly among 10 equally likely tokens.
A perplexity of 5 means it's doing better (fewer effective choices).

GPT-2:         perplexity ~35 on WikiText
GPT-3:         perplexity ~20
Modern LLMs:   perplexity ~8-15 (domain-dependent)

import torch
import torch.nn.functional as F
import math

def calculate_perplexity(model, tokenizer, text: str) -> float:
    inputs = tokenizer(text, return_tensors="pt")
    input_ids = inputs.input_ids

    with torch.no_grad():
        outputs = model(**inputs, labels=input_ids)
        loss = outputs.loss  # cross-entropy per token

    return math.exp(loss.item())

Other Important Metrics¶

Metric	Used For	Formula
Precision	When false positives are costly	TP / (TP + FP)
Recall	When false negatives are costly	TP / (TP + FN)
F1 Score	Balance precision and recall	2 × (P × R) / (P + R)
BLEU	Text translation quality	N-gram overlap with reference
ROUGE	Summarization quality	Overlap with reference summary
BERTScore	Semantic text similarity	Cosine similarity of BERT embeddings

Chapter 7: Setting Up Your Environment¶

The AI Developer's Toolkit¶

Python 3.11+        → Language (use pyenv or conda to manage versions)
PyTorch 2.x         → Deep learning framework (best for research + LLMs)
TensorFlow 2.x      → Alternative framework (strong in production/mobile)
Hugging Face        → Hub for models, datasets, tokenizers
Jupyter             → Interactive notebooks for exploration
CUDA / ROCm         → GPU acceleration (NVIDIA / AMD)

Option 1: Local Setup¶

# Install Python 3.11 (using pyenv)
brew install pyenv   # macOS
pyenv install 3.11.9
pyenv global 3.11.9

# Create project environment
python -m venv genai-env
source genai-env/bin/activate  # Mac/Linux
# genai-env\Scripts\activate   # Windows

# Install core packages
pip install torch torchvision torchaudio   # PyTorch (CPU)
# For CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install transformers          # Hugging Face Transformers
pip install datasets              # HF Datasets
pip install accelerate            # Multi-GPU / mixed precision
pip install diffusers             # Diffusion models
pip install sentence-transformers # Embeddings
pip install langchain             # LLM application framework
pip install openai anthropic      # API clients
pip install jupyter               # Notebooks
pip install numpy pandas matplotlib seaborn  # Data science essentials

# Verify GPU
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"

Option 2: Google Colab (Free GPU — Best for Beginners)¶

Google Colab gives you a free T4 GPU (16 GB VRAM). Perfect for all exercises in this series.

# In a Colab notebook, most packages are pre-installed
# Check what GPU you have:
!nvidia-smi

# Install what's missing
!pip install -q transformers datasets accelerate

# Mount Google Drive to save model checkpoints
from google.colab import drive
drive.mount('/content/drive')

Option 3: cloud GPU services¶

Service	Free Tier	GPU	Best For
Google Colab	Yes (T4)	T4 16GB	Learning, small experiments
Kaggle Notebooks	Yes (P100)	P100 16GB	Competitions, datasets
Lightning AI	Yes	T4	Quick prototyping
RunPod	No ($0.20/hr)	Any	Custom setups
Lambda Labs	No ($0.50/hr)	A10, A100	Production training

Hugging Face Quickstart¶

The Hugging Face pipeline is the fastest way to start using AI models:

from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="gpt2")
result = generator("Generative AI is", max_new_tokens=50)
print(result[0]["generated_text"])

# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("This course is absolutely fantastic!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# Text summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """Generative AI models have revolutionized how we interact with computers.
          These models can produce text, images, audio, and video that are
          increasingly indistinguishable from human-created content..."""
summary = summarizer(text, max_length=50, min_length=20)
print(summary[0]["summary_text"])

# Translation
translator = pipeline("translation_en_to_fr")
result = translator("Hello, I am learning Generative AI!")
print(result[0]["translation_text"])  # "Bonjour, j'apprends l'IA générative !"

Your Environment Health Check¶

Run this script to confirm everything is working:

# health_check.py
import sys
print(f"Python: {sys.version}")

import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

import transformers
print(f"Transformers: {transformers.__version__}")

from transformers import pipeline
pipe = pipeline("sentiment-analysis")
result = pipe("I love learning AI!")
print(f"Test inference: {result}")  # should print POSITIVE

print("\n✓ Environment ready for Generative AI!")

Summary¶

You've just built the foundation everything else in this series stands on. Here's what to carry forward:

Concept	One-Line Takeaway
AI vs. ML vs. DL	AI is the goal; ML is the approach; DL is the engine for LLMs
Generative AI	Creates new content — text, image, audio — using probability
Supervised vs. Unsupervised	Labeled data vs. pattern discovery without labels
Neural network	Layers of weighted connections; weights learned from data
Backpropagation	Gradient of loss flows backward, adjusting weights at each step
LLMs	Transformers trained to predict next token on massive text corpora
Loss / Accuracy / Perplexity	Loss is the training signal; accuracy for classification; perplexity for language models
Your environment	PyTorch + Hugging Face + Colab = the minimal viable toolkit

Next in this series → Part 2: Working with LLMs — where we go hands-on with tokenization, embeddings, the Transformer architecture, fine-tuning, RAG, and building your first chatbot.

Practice Challenge

Before moving to Part 2, complete this challenge:

Open a Google Colab notebook
Run the environment health check script above
Use pipeline("text-generation") with three different models from HuggingFace Hub
Compare the output quality — notice how model size affects results
Measure inference speed with time.time()

This hands-on exercise cements everything in this part before adding more layers.

Questions or discussion? Connect on LinkedIn, X or reach out via email.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.