Text data surrounds us โ from emails and reviews to tweets and articles. The Text ML Explorer App was built to help learners understand how machine learning models can classify text step by step. Itโs an interactive Streamlit app that takes you from raw text to predictions, making the workflow transparent and approachable.
Motivation
Computer vision often feels like a black box to learners. Images are transformed into arrays, features are extracted, and models make predictions โ but the steps in between are rarely visible. The Image ML Explorer App was designed to make these steps transparent, interactive, and educational, so students can see how raw pixels become predictions.
Features
The app is organized into clear pages, each representing a stage in the image ML pipeline:
-
Dataset Explorer ๐ผ๏ธ
Preview sample image datasets (e.g., CIFARโ10, MNIST) or upload your own.
Inspect image dimensions, channels, and labels. - Preprocessing ๐ง
- Convert images to grayscale or resize them.
- Normalize pixel values.
- Visualize transformations sideโbyโside.
- Model Builder ๐ค
- Train simple CNNs or transfer learning models (e.g., ResNet, VGG).
- Compare accuracy across architectures.
- Visualize training curves (loss/accuracy).
- Results ๐
- Test custom images against trained models.
- Display predictions with confidence scores.
- Show GradโCAM or saliency maps to highlight what the model โsees.โ
Code Walkthrough
1. ๐ Image Dataset Explorer
This page introduces how image datasets are structured, loads the default Digits dataset from scikitโlearn, and lets students preview sample images:
import streamlit as st
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
import numpy as np
st.title("๐ Image Dataset Explorer")
# --- Explanations ---
st.write("### How Image Datasets Work")
st.markdown("""
Images are stored as arrays of pixel values.
For example, a grayscale image is a 2D matrix where each number represents brightness (0 = black, 255 = white).
๐ Machine learning models learn patterns in these numbers to classify images.
""")
# --- Load default dataset (Digits) ---
digits = load_digits()
X = digits.images # shape (n_samples, 8, 8)
y = digits.target # labels (0โ9)
st.write("Dataset shape:", X.shape)
st.write("Number of classes:", len(np.unique(y)))
# --- Show sample images ---
st.write("### Sample Images")
num_samples = st.slider("Select number of samples to view:", 4, 20, 8)
fig, axes = plt.subplots(1, num_samples, figsize=(num_samples * 2, 2))
for i, ax in enumerate(axes):
ax.imshow(X[i], cmap="gray")
ax.set_title(f"Label: {y[i]}")
ax.axis("off")
st.pyplot(fig)
# --- Save for later pages ---
st.session_state["X_images"] = X
st.session_state["y_labels"] = y
๐ Teaching note:
The default dataset here is Digits, a classic computer vision dataset of handwritten numbers (0โ9). Itโs widely used in tutorials because:
- Each image is small (8ร8 pixels), making it computationally lightweight.
- Labels are intuitive (digits 0โ9), so learners can quickly connect predictions to real outcomes.
- It provides a simple entry point into image classification before moving to larger datasets like MNIST or CIFARโ10.
2. ๐ง Preprocessing
This page demonstrates how raw images are transformed into usable features for machine learning. Students can experiment with resizing, grayscale conversion, and normalization.
import streamlit as st
import numpy as np
from PIL import Image
st.title("๐ง Image Preprocessing")
# --- Explanations ---
st.write("### Why Preprocessing Matters")
st.markdown("""
Raw images are large arrays of pixel values.
Preprocessing reduces complexity and highlights important features.
๐ Without preprocessing, models may struggle to learn efficiently.
""")
# --- Load image from session state ---
if "X_images" in st.session_state:
img = st.session_state["X_images"][0] # take first sample for demo
st.image(img, caption="Original Image")
# Resize
resized = Image.fromarray(img).resize((16, 16))
st.image(resized, caption="Resized (16x16)")
# Grayscale
gray = Image.fromarray(img).convert("L")
st.image(gray, caption="Grayscale")
# Normalize
arr = np.array(img) / 255.0
st.write("Normalized pixel values (first 5x5 block):")
st.write(arr[:5, :5])
๐ Teaching note:
Preprocessing is the bridge between raw data and machine learning models. By visualizing each step:
- Resizing reduces dimensionality, making training faster.
- Grayscale simplifies color images into intensity values.
- Normalization scales pixel values to [0,1], improving model stability.
This page helps learners see how preprocessing transforms images from raw pixels into structured inputs for models.
3. ๐ค Model Builder
This page introduces how to train a simple classifier on image data. It uses scikitโlearnโs LogisticRegression to classify the digits dataset.
import streamlit as st
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
st.title("๐ค Model Builder")
# --- Explanations ---
st.write("### Training a Simple Model")
st.markdown("""
Weโll use **Logistic Regression** to classify digits (0โ9).
๐ Logistic Regression is a simple yet effective baseline for image classification.
""")
# --- Load data from session state ---
if "X_images" in st.session_state and "y_labels" in st.session_state:
X = st.session_state["X_images"]
y = st.session_state["y_labels"]
# Flatten images (8x8 โ 64 features)
X_flat = X.reshape((X.shape[0], -1))
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_flat, y, test_size=0.2, random_state=42)
# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
st.success(f"Model trained! Accuracy: {acc:.2f}")
# Save model for later pages
st.session_state["model"] = model
๐ Teaching note:
This page shows learners how to move from preprocessed data to a trained model. Key insights:
- Images must be flattened into feature vectors before training.
- Logistic Regression, though simple, provides a strong baseline for classification tasks.
- Accuracy gives a quick measure of performance, but later pages can explore deeper evaluation (confusion matrices, misclassified samples).
By starting with Logistic Regression, students learn the fundamentals of model training before experimenting with more complex algorithms like CNNs.
4. ๐ Results
This page lets learners test the trained model on new samples and visualize predictions.
import streamlit as st
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import numpy as np
st.title("๐ Results")
st.write("### Testing the Model")
st.markdown("""
Now we can evaluate the trained model on unseen data.
๐ This step shows how well the model generalizes beyond the training set.
Load model and data:
""")
if "model" in st.session_state and "X_images" in st.session_state and "y_labels" in st.session_state:
model = st.session_state["model"]
X = st.session_state["X_images"]
y = st.session_state["y_labels"]
# Flatten images
X_flat = X.reshape((X.shape[0], -1))
# Predict
y_pred = model.predict(X_flat)
# Show confusion matrix
cm = confusion_matrix(y, y_pred)
fig, ax = plt.subplots(figsize=(6, 6))
ConfusionMatrixDisplay(cm).plot(ax=ax)
st.pyplot(fig)
# Try a single sample
idx = st.slider("Select a sample index:", 0, len(X) - 1, 0)
st.image(X[idx], caption=f"True Label: {y[idx]}")
st.write(f"Predicted Label: {y_pred[idx]}")
๐ Teaching note:
The Results page is where theory meets practice. Students can:
- See how well the model performs across all classes using a confusion matrix.
- Select individual samples to compare true vs predicted labels.
- Understand that even simple models like Logistic Regression can make mistakes โ and those mistakes are valuable learning opportunities.
This final step closes the loop: from dataset โ preprocessing โ model building โ evaluation.