Master Generative AI — Part 1: Foundation of AI & Machine Learning¶
This is Part 1 of the Master Generative AI: A Step-by-Step Challenge series — a practical, no-fluff guide to going from complete beginner to confident AI practitioner in 2026.
Series Map:
- Part 1 → Foundation of AI & ML ← you are here
- Part 2 → Working with LLMs
- Part 3 → Advanced Generative AI
- Part 4 → Practical Applications
- Part 5 → Career & Capstone Projects
The AI revolution isn't just for researchers anymore. In 2026, the tools, libraries, and models that used to require a PhD and a supercomputer are now accessible to any developer willing to invest a few weeks of focused learning. This series is your step-by-step map.
We start at the very beginning — not because you're not smart, but because the best practitioners always have the strongest foundations.
Chapter 1: Introduction to AI & Generative AI¶
What Is Artificial Intelligence?¶
Artificial Intelligence is the field of building systems that can perform tasks that normally require human intelligence — recognizing images, understanding language, making decisions.
AI (the broad field)
│
├── Machine Learning (learns from data)
│ │
│ ├── Deep Learning (neural networks)
│ │ │
│ │ ├── NLP (language tasks)
│ │ ├── Computer Vision (image tasks)
│ │ └── Generative AI ← where we're going
│ │
│ └── Classical ML (trees, SVMs, linear models)
│
└── Symbolic AI (rules, logic, expert systems)
What Is Generative AI?¶
Most AI systems are discriminative — they classify or predict. Given an image, say "cat" or "dog."
Generative AI creates new content — text, images, audio, video, code — that didn't exist before:
| Type | Examples | What It Creates |
|---|---|---|
| Text | GPT-4, Claude, Gemini | Articles, code, answers |
| Image | Stable Diffusion, DALL-E | Photos, artwork, designs |
| Audio | ElevenLabs, Whisper | Voice, music, sound |
| Video | Sora, Runway | Clips, animations |
| Code | Copilot, Code Llama | Programs, scripts |
| Multimodal | GPT-4o, Gemini | Any combination above |
The 2026 Generative AI Landscape¶
Foundation Models (trained on massive data, cost millions)
GPT-4o Claude 3.5/4 Gemini 2.0 LLaMA 3.1
↓ fine-tuning / prompt engineering ↓
Application Layer (your products and solutions)
Chatbots Code assistants Content tools AI agents
Key insight: You don't train foundation models. You use them. Your job as a practitioner is to know which model to pick, how to prompt it, when to fine-tune, and how to deploy reliably.
Chapter 2: Basics of Machine Learning¶
What Is Machine Learning?¶
Traditional programming: you write rules → computer follows them. Machine learning: you give examples → computer finds the rules.
# Traditional programming
def classify_email(email):
if "FREE" in email and "CLICK HERE" in email:
return "spam"
return "not spam"
# Machine learning
# (the model learns these rules from 10,000 labeled examples)
model = train(emails, labels)
prediction = model.predict(new_email)
Supervised Learning¶
You give the model labeled data — inputs paired with correct outputs. The model learns to map inputs to outputs.
Data:
(email_1, "spam")
(email_2, "not spam")
(email_3, "spam")
...
Model learns: features → label
Use cases: classification, regression, object detection
Examples: spam filter, house price prediction, medical diagnosis
# Example: Supervised learning with scikit-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Features: [size_sqm, bedrooms, distance_to_center]
X = [[85, 2, 3.2], [120, 3, 1.5], [45, 1, 8.0], [200, 4, 0.5]]
y = [4500000, 8000000, 2000000, 15000000] # prices in THB
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Unsupervised Learning¶
No labels. The model finds hidden patterns and structure in data on its own.
Data:
(customer_1_purchase_history)
(customer_2_purchase_history)
(customer_3_purchase_history)
...
Model discovers: natural groupings (clusters)
Use cases: clustering, dimensionality reduction, anomaly detection
Examples: customer segmentation, recommendation systems, fraud detection
# Example: K-means clustering
from sklearn.cluster import KMeans
import numpy as np
# Customer data: [monthly_spend, purchase_frequency]
customers = np.array([
[500, 2], [600, 3], [5000, 15], [4800, 12],
[200, 1], [100, 1], [5200, 18], [300, 2]
])
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(customers)
print(kmeans.labels_) # [0, 0, 1, 1, 2, 2, 1, 0] → 3 customer segments found
Reinforcement Learning¶
An agent learns by taking actions in an environment and receiving rewards or penalties.
Agent → Action → Environment → Reward → Agent (update policy)
Examples: game playing (AlphaGo), robotics, RLHF for LLMs (ChatGPT)
The ML Workflow (Universal)¶
1. Define the problem → What are we predicting?
2. Collect data → More labeled data = better model
3. Explore & clean data → Fix missing values, outliers
4. Feature engineering → Transform raw data into useful signals
5. Choose & train model → Pick algorithm, optimize
6. Evaluate → Does it generalize to new data?
7. Deploy & monitor → Real-world performance often differs
Chapter 3: Neural Networks 101¶
The Neuron: Building Block¶
A single neuron takes multiple inputs, multiplies each by a weight, sums them up, adds a bias, then passes the result through an activation function:
inputs: x₁ = 0.5, x₂ = 0.8, x₃ = 0.3
weights: w₁ = 0.4, w₂ = -0.2, w₃ = 0.9
bias: b = 0.1
weighted sum = (0.5×0.4) + (0.8×-0.2) + (0.3×0.9) + 0.1
= 0.2 - 0.16 + 0.27 + 0.1 = 0.41
output = activation(0.41) = sigmoid(0.41) ≈ 0.60
import numpy as np
def neuron(inputs, weights, bias, activation="relu"):
z = np.dot(inputs, weights) + bias # linear combination
if activation == "relu":
return max(0, z) # ReLU: max(0, z)
elif activation == "sigmoid":
return 1 / (1 + np.exp(-z)) # sigmoid: 0 to 1
elif activation == "tanh":
return np.tanh(z) # tanh: -1 to 1
return z
From One Neuron to a Network¶
Layer neurons together → neural network:
Input Layer Hidden Layer Output Layer
[x₁]─────────→[n₁]─────────→[out]
[x₂]─────────→[n₂]
[x₃]─────────→[n₃]
[n₄]
Each connection has a weight (learned during training)
Hidden layer finds intermediate features
Output layer gives the final prediction
Common Activation Functions¶
| Function | Formula | Use Case |
|---|---|---|
| ReLU | max(0, x) | Hidden layers (default) |
| Sigmoid | 1/(1+e⁻ˣ) | Binary classification output |
| Softmax | eˣᵢ / Σeˣ | Multi-class output |
| GELU | x·Φ(x) | Transformers (LLMs) |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | RNNs, some hidden layers |
Your First Neural Network¶
import torch
import torch.nn as nn
# A simple 3-layer neural network
class SimpleNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.layer1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.layer2 = nn.Linear(hidden_size, hidden_size)
self.layer3 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = self.relu(self.layer1(x)) # input → hidden
x = self.relu(self.layer2(x)) # hidden → hidden
x = self.layer3(x) # hidden → output
return x
# Instantiate
model = SimpleNet(input_size=10, hidden_size=64, output_size=3)
print(model)
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}") # 4,419
Chapter 4: Deep Learning & Backpropagation¶
What Makes It "Deep"?¶
Deep learning = neural networks with many layers (deep = many hidden layers). More layers → can learn more complex patterns.
Shallow (1-2 layers): can classify simple shapes, basic text
Deep (10-100+ layers): can understand faces, language nuance, generate art
How Neural Networks Learn: Backpropagation¶
Training has a simple loop:
1. FORWARD PASS: Feed data through the network → get a prediction
2. COMPUTE LOSS: How wrong was the prediction?
3. BACKWARD PASS: Calculate how each weight contributed to the error
4. UPDATE WEIGHTS: Nudge each weight in the direction that reduces error
5. REPEAT: Do this for millions of examples
The Loss Function measures how wrong you are:
# Mean Squared Error (regression)
def mse_loss(predictions, targets):
return ((predictions - targets) ** 2).mean()
# Cross-Entropy (classification)
def cross_entropy_loss(logits, targets):
return nn.CrossEntropyLoss()(logits, targets)
Gradient Descent is how weights update:
new_weight = old_weight - learning_rate × gradient
learning_rate: how big a step to take (e.g., 0.001)
gradient: the slope — which direction reduces the error
Full Training Loop in PyTorch¶
import torch
import torch.nn as nn
import torch.optim as optim
# Dummy data: 100 samples, 10 features, 3 classes
X = torch.randn(100, 10)
y = torch.randint(0, 3, (100,))
model = SimpleNet(10, 64, 3)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
# Training loop
for epoch in range(50):
# 1. Forward pass
predictions = model(X)
# 2. Compute loss
loss = criterion(predictions, y)
# 3. Backward pass (compute gradients)
optimizer.zero_grad() # clear old gradients
loss.backward() # compute new gradients
# 4. Update weights
optimizer.step()
if epoch % 10 == 0:
print(f"Epoch {epoch}: Loss = {loss.item():.4f}")
Key Deep Learning Concepts¶
| Concept | What It Means | Why It Matters |
|---|---|---|
| Epoch | One full pass through all training data | More epochs = more learning (up to a point) |
| Batch size | How many samples per gradient update | Smaller = noisier but more frequent updates |
| Learning rate | Step size for weight updates | Too big → diverge; too small → slow convergence |
| Overfitting | Model memorizes training data, fails on new | Detect with validation loss; fix with dropout, more data |
| Dropout | Randomly zero out neurons during training | Forces robust features, reduces overfitting |
| Batch Norm | Normalize layer inputs | Stabilizes training, allows higher learning rates |
Chapter 5: Introduction to Large Language Models (LLMs)¶
What Is an LLM?¶
A Large Language Model is a neural network (specifically a Transformer) trained on massive amounts of text to predict the next token. "Large" refers to billions of parameters.
Training: Learn from 15 trillion tokens of text (books, web, code...)
Task: Given "The Eiffel Tower is in", predict "Paris"
After training: the model has compressed language patterns into
billions of parameters → can generate coherent text
How Text Becomes Tokens¶
LLMs don't read characters or words — they read tokens (chunks of text):
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Generative AI is transforming the world!"
tokens = tokenizer.encode(text)
print(tokens)
# [8645, 876, 3068, 318, 28287, 262, 995, 0]
# 8 tokens for 40 characters (~5 chars/token on average)
The Generation Process¶
At inference time, LLMs generate one token at a time:
Input: "The cat sat on"
Step 1: Predict → "the" (probability: 0.72) → append
Step 2: Predict → "mat" (probability: 0.65) → append
Step 3: Predict → "." (probability: 0.81) → append
Step 4: Predict → [END] → stop
Output: "The cat sat on the mat."
Major LLM Families in 2026¶
| Family | Creator | Best For |
|---|---|---|
| GPT-4o | OpenAI | General purpose, multimodal |
| Claude 3.5/4 | Anthropic | Reasoning, safety, long context |
| Gemini 2.0 | Multimodal, long context | |
| LLaMA 3.1 | Meta | Open source, self-hosting |
| Qwen 2.5 | Alibaba | Multilingual, efficient |
| DeepSeek V3/R1 | DeepSeek | Coding, math, reasoning |
| Mistral | Mistral AI | Efficient, European |
Chapter 6: Key Metrics for Model Evaluation¶
Loss¶
Loss measures how wrong the model's predictions are. Lower = better.
import torch
import torch.nn.functional as F
# Cross-entropy loss for a 3-class problem
logits = torch.tensor([[2.0, 1.0, 0.5]]) # model outputs (unnormalized)
target = torch.tensor([0]) # correct class is 0
loss = F.cross_entropy(logits, target)
print(f"Loss: {loss.item():.4f}") # lower is better
Accuracy¶
The percentage of correct predictions. Intuitive but misleading for imbalanced data.
def accuracy(predictions, targets):
correct = (predictions.argmax(dim=1) == targets).sum().item()
return correct / len(targets)
# Watch out: 99% accuracy sounds great...
# But if 99% of your data is "not fraud", a model that predicts
# "not fraud" for everything gets 99% accuracy with zero utility
Perplexity (for Language Models)¶
Perplexity measures how "surprised" the model is by real text. Lower = model predicts text better.
Perplexity = e^(average cross-entropy loss per token)
A perplexity of 10 means the model is as uncertain as if
it had to choose uniformly among 10 equally likely tokens.
A perplexity of 5 means it's doing better (fewer effective choices).
GPT-2: perplexity ~35 on WikiText
GPT-3: perplexity ~20
Modern LLMs: perplexity ~8-15 (domain-dependent)
import torch
import torch.nn.functional as F
import math
def calculate_perplexity(model, tokenizer, text: str) -> float:
inputs = tokenizer(text, return_tensors="pt")
input_ids = inputs.input_ids
with torch.no_grad():
outputs = model(**inputs, labels=input_ids)
loss = outputs.loss # cross-entropy per token
return math.exp(loss.item())
Other Important Metrics¶
| Metric | Used For | Formula |
|---|---|---|
| Precision | When false positives are costly | TP / (TP + FP) |
| Recall | When false negatives are costly | TP / (TP + FN) |
| F1 Score | Balance precision and recall | 2 × (P × R) / (P + R) |
| BLEU | Text translation quality | N-gram overlap with reference |
| ROUGE | Summarization quality | Overlap with reference summary |
| BERTScore | Semantic text similarity | Cosine similarity of BERT embeddings |
Chapter 7: Setting Up Your Environment¶
The AI Developer's Toolkit¶
Python 3.11+ → Language (use pyenv or conda to manage versions)
PyTorch 2.x → Deep learning framework (best for research + LLMs)
TensorFlow 2.x → Alternative framework (strong in production/mobile)
Hugging Face → Hub for models, datasets, tokenizers
Jupyter → Interactive notebooks for exploration
CUDA / ROCm → GPU acceleration (NVIDIA / AMD)
Option 1: Local Setup¶
# Install Python 3.11 (using pyenv)
brew install pyenv # macOS
pyenv install 3.11.9
pyenv global 3.11.9
# Create project environment
python -m venv genai-env
source genai-env/bin/activate # Mac/Linux
# genai-env\Scripts\activate # Windows
# Install core packages
pip install torch torchvision torchaudio # PyTorch (CPU)
# For CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers # Hugging Face Transformers
pip install datasets # HF Datasets
pip install accelerate # Multi-GPU / mixed precision
pip install diffusers # Diffusion models
pip install sentence-transformers # Embeddings
pip install langchain # LLM application framework
pip install openai anthropic # API clients
pip install jupyter # Notebooks
pip install numpy pandas matplotlib seaborn # Data science essentials
# Verify GPU
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
Option 2: Google Colab (Free GPU — Best for Beginners)¶
Google Colab gives you a free T4 GPU (16 GB VRAM). Perfect for all exercises in this series.
# In a Colab notebook, most packages are pre-installed
# Check what GPU you have:
!nvidia-smi
# Install what's missing
!pip install -q transformers datasets accelerate
# Mount Google Drive to save model checkpoints
from google.colab import drive
drive.mount('/content/drive')
Option 3: cloud GPU services¶
| Service | Free Tier | GPU | Best For |
|---|---|---|---|
| Google Colab | Yes (T4) | T4 16GB | Learning, small experiments |
| Kaggle Notebooks | Yes (P100) | P100 16GB | Competitions, datasets |
| Lightning AI | Yes | T4 | Quick prototyping |
| RunPod | No ($0.20/hr) | Any | Custom setups |
| Lambda Labs | No ($0.50/hr) | A10, A100 | Production training |
Hugging Face Quickstart¶
The Hugging Face pipeline is the fastest way to start using AI models:
from transformers import pipeline
# Text generation
generator = pipeline("text-generation", model="gpt2")
result = generator("Generative AI is", max_new_tokens=50)
print(result[0]["generated_text"])
# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("This course is absolutely fantastic!")
print(result) # [{'label': 'POSITIVE', 'score': 0.9998}]
# Text summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """Generative AI models have revolutionized how we interact with computers.
These models can produce text, images, audio, and video that are
increasingly indistinguishable from human-created content..."""
summary = summarizer(text, max_length=50, min_length=20)
print(summary[0]["summary_text"])
# Translation
translator = pipeline("translation_en_to_fr")
result = translator("Hello, I am learning Generative AI!")
print(result[0]["translation_text"]) # "Bonjour, j'apprends l'IA générative !"
Your Environment Health Check¶
Run this script to confirm everything is working:
# health_check.py
import sys
print(f"Python: {sys.version}")
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
import transformers
print(f"Transformers: {transformers.__version__}")
from transformers import pipeline
pipe = pipeline("sentiment-analysis")
result = pipe("I love learning AI!")
print(f"Test inference: {result}") # should print POSITIVE
print("\n✓ Environment ready for Generative AI!")
Summary¶
You've just built the foundation everything else in this series stands on. Here's what to carry forward:
| Concept | One-Line Takeaway |
|---|---|
| AI vs. ML vs. DL | AI is the goal; ML is the approach; DL is the engine for LLMs |
| Generative AI | Creates new content — text, image, audio — using probability |
| Supervised vs. Unsupervised | Labeled data vs. pattern discovery without labels |
| Neural network | Layers of weighted connections; weights learned from data |
| Backpropagation | Gradient of loss flows backward, adjusting weights at each step |
| LLMs | Transformers trained to predict next token on massive text corpora |
| Loss / Accuracy / Perplexity | Loss is the training signal; accuracy for classification; perplexity for language models |
| Your environment | PyTorch + Hugging Face + Colab = the minimal viable toolkit |
Next in this series → Part 2: Working with LLMs — where we go hands-on with tokenization, embeddings, the Transformer architecture, fine-tuning, RAG, and building your first chatbot.
Practice Challenge
Before moving to Part 2, complete this challenge:
- Open a Google Colab notebook
- Run the environment health check script above
- Use
pipeline("text-generation")with three different models from HuggingFace Hub - Compare the output quality — notice how model size affects results
- Measure inference speed with
time.time()
This hands-on exercise cements everything in this part before adding more layers.
Questions or discussion? Connect on LinkedIn, X or reach out via email.
Discussion
Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.