Ontario Tech Science vialab · Visualization for Information Analysis

Athena CTF A Modular Framework for Instructional Capture-the-Flag Challenges

Zach Frank · Supervisors: Randy Fortier, Mariana Shimabukuro

Scroll

Athena CTF in Brief

An open-source modular framework for the rapid creation, containerization, and deployment of web-based CTF challenges — with assessment-ready per-user parametrization and constrained LLM-assisted hints.

10lines
A fully functional level in as few as ten lines of Python
115bytes
Average per-user record size in a 15-challenge demo deployment
0PII
No personally identifying information stored by default
01

Lightweight, Modular Authoring

Each level is a self-contained Page object. Authors supply an instructions endpoint and a verify function — the framework wires routing, templates, cookies and verification.

02

Assessment-Oriented

Deterministic per-user or per-team parametrization reduces trivial answer sharing; LMS-exported rosters (CSV/JSON) provision users by a single athenaId field — no other PII is stored.

03

Constrained LLM Hints

One-shot, non-conversational hints grounded in the user's interaction history and a creator-authored solution path. Supports commercial APIs or locally hosted models.

04

Containerized Delivery

Docker images or source-repo spawning scripts. Runs on Unix or Windows, on a local classroom network or behind a reverse proxy — no VM required for learners.

The Problem With Current CTFs

Web-Based, Modular Architecture

Athena is an entirely web-native stack built on FastAPI and Jinja2 templates, with an optional MongoDB layer for tracking and a separately-deployed admin container.

End-to-End, Optional Where It Counts

Challenges can be distributed as Docker images through registries like Docker Hub, or as source repositories with provided spawning scripts. Both Unix and Windows hosts are supported. Learners interact through their native browser — no VM, no extensions, no preconfigured environment.

The optional database layer enables learner tracking, LMS-based provisioning, and LLM-assisted hints. Disabling it yields a fully standalone deployment suitable for demos or offline workshops.

Athena Framework diagram

Architecture Overview

Levels are Page objects registered at import time. A shared application core handles routing, sessions, templates and the verification pipeline; the database connector and LLM connector are both optional and swap-in/swap-out at deploy time.

In a 15-challenge demo, the average user record was ~115 bytes with an index of roughly 1 MB across ~11,000 users — small enough for free-tier cloud Mongo or a single classroom VM.

High-level UML of Athena CTF

Constrained, Context-Driven Hints

One-shot plain-text hints generated on explicit user request — not an open-ended chat agent. The output space is bounded to reduce prompt injection, over-scaffolding, and solution leakage.

Grounded in Two Controlled Sources

  • User interaction history — prior requests and failed submissions, pulled from the centralized DB when enabled.
  • Creator-authored solution path — a write-up or executable script shipped with the level, loaded into memory at runtime.

The model is guided toward the intended solving strategy and prevented from inventing alternative paths.

Non-Conversational & Stateless

Each request is independent — no dialogue memory, no chained turns. This keeps inference costs predictable for classroom-scale deployments and sharply limits the attack surface for prompt injection.

Commercial APIs (Claude · ChatGPT) Local Models (Ollama, any) Static Hints (no LLM)

Admins choose the model, the context volume, or disable LLM hints entirely for assessments.

Modular Level Creation

Each level is a self-contained Page object. Authors provide an instructions endpoint (what the learner sees) and a verify endpoint (boolean correctness, optional error code) — the framework does the rest.

  • Jinja2 shared templates — a small set of layouts every level draws from, so styling stays consistent and white-labelling is trivial.
  • Server-side helper componentsgenerate_button, FormGroup, FormData, tables, accordions; authors compose the UI in Python.
  • Safe-by-default rendering — injected content is sanitized automatically; authors can opt out only where a vulnerability genuinely requires raw HTML.
  • Auto-registration — Page objects register with the app at import time, giving every level uniform ordering, nav, and verification semantics.
  • Optional DB hookup — levels run standalone; enabling the centralized Mongo store unlocks tracking, hints, and LMS export with no code changes.
minimal_level.py · a full level in ~10 lines
from app.utils.page import Page
from app.utils.extensions import templates
from config import get_config

config = get_config()
tiny = Page("Tiny Level")

async def verify(request):
    data = await request.json()
    return {"success": data.get("flag") == "HELLO_WORLD"}

async def instructions(request):
    return templates.TemplateResponse(request=request,
        name=config.TEMPLATE,
        context={"text": "Submit the flag HELLO_WORLD"})

tiny.set_functions(verify=verify, instructions=instructions)

Deterministic Parametrization

Each level carries a level-specific secret generated at creation time. At verify time, that secret is combined with a user identifier — either an anonymous cookie or an LMS-provisioned athenaId — to compute a unique expected solution per participant.

Reproducible for the Learner

The same user gets the same flag on every attempt — revisiting the level, restarting the browser, or retrying after a break all produce identical solutions. Nothing needs to be memorized across sessions.

Invalid When Shared

A flag pasted into a Discord server won't verify for anyone else — it was deterministically derived from their cookie plus the level secret, and the next participant gets a different expected value.

Uniform Code Path

The parametrization function runs on every verification, even when disabled. In static mode it simply returns the level-wide solution. No divergent branches, no duplicated logic, no drift between assessment and demo deployments.

Admin-Configurable

Disable parametrization per deployment (for in-class demos and collaborative exercises) or selectively enable it on specific levels for graded assignments — all without modifying challenge code.

Verification Flow

User cookie + level code → deterministic expected flag → comparison with submitted flag.

Parametrization verification flow diagram

Ease of Assessment

The administration interface is a separately deployed container. Assessment workflows are configured independently of challenge logic — the same challenge image serves demos, collaborative labs, and graded assignments.

Containerized & Reproducible

  • Challenges ship as Docker images via Docker Hub, or as source repos with spawning scripts
  • Consistent behaviour across Unix and Windows hosts — no VM required for learners
  • Local network, classroom LAN, or public via reverse proxy — same image
  • Kubernetes-based per-user isolation is a planned extension

LMS-Based Provisioning

  • Default: anonymous cookie-based identifier, no PII
  • Formal mode: upload CSV or JSON from your LMS — only the athenaId field is stored
  • Same identifier can be shared across a team for collective progress
  • Export interaction data and rejoin with the LMS export at grading time
https://athena-ctf.local/admin
Athena CTF admin interface — user provisioning and result export

Expert Evaluation

45min
Pair-programming session
0docs
Participants read no documentation beforehand
2experts
One TA and one professor

In-Lab Study

Builder

TA built 3 levels of varying difficulty. Documentation was provided — no pair programming this time.

Positive Comments

  • Level building was simple and fast
  • Documentation, including docstrings, was good

Students

Students completed pre- and post-surveys in the final week of labs.

Positive Comments

  • Overall, students preferred Athena to other platforms
  • More engaging and easier to follow
  • Hints provided a good way to get help

Suggestions

  • Hints could be more specific
  • Hint load times were too long

Designed for Responsible Use

Teaching exploitation means exposing learners to real techniques. Athena's defaults are deliberately conservative.

Responsible Disclosure

All publicly released sample challenges are scoped to already-documented vulnerability classes. No zero-days, no pending disclosures — the platform never contributes to the spread of unreported flaws.

Academic Integrity

Per-user and per-team parametrized flags prevent trivial answer sharing. Instructors may selectively enable or disable this per level to fit collaborative labs versus graded work.

Constrained LLM Assistance

Hints are non-conversational, context-constrained, and grounded in creator-authored solution paths — reducing the risk of hallucinated guidance, full-answer leakage, and prompt-injection abuse.

Privacy by Default

Stored records contain only an anonymized identifier, challenge interaction metadata, and limited request history needed for hints or assessment. No demographic or personally identifying data is required.

Future Work

More LLM options

Extend the hint class to support additional providers — starting with Gemini.

More helper functions

Grow the authoring library so common patterns like auth and paging are a one-liner.

Creating more levels

Expand the shipped catalog so instructors can run a full semester out-of-the-box.

Better hint prompts

Iterate on prompt engineering to make hints sharper and more useful for stuck students.

Kubernetes deployments

Add Kubernetes manifests for safer, more scalable classroom and event deployments.