Cognitive Profiling Through Naval Combat

A Data-First Approach to Player Assessment in Njordfel Futures

Njordfel Futures · 2026

Battleship is a solved information game with a tractable decision space, making it an effective vehicle for measuring cognitive function. This paper describes a system that actively selects ship placement strategies to isolate five cognitive dimensions, records player actions with millisecond precision, and generates AI-powered assessments from the raw data. The player owns all data collected. The system stores actions and timestamps in kilobytes; all inference is computed on read and can be replaced as models improve. The data accrues value over time without structural change.

1. Why Battleship

Battleship has properties that make it unusually well-suited for cognitive measurement. Each attack is a discrete, observable decision. The decision space is bounded (100 cells, binary outcomes). The game has no hidden mechanics — every outcome is deterministic given the board state. And critically, the game requires multiple cognitive systems simultaneously: spatial reasoning, probabilistic thinking, memory, pattern recognition, and impulse control.

Most games conflate these systems. A chess mistake could be a calculation error, a positional misjudgment, or a time pressure failure. In Battleship, we can isolate variables by controlling what the player faces. The opponent's ship placement is the independent variable. The player's attack sequence is the dependent variable. Everything else is held constant.

2. The Five Probes

Each bot game the player plays, the system selects a placement strategy designed to isolate one cognitive dimension. The bot's targeting AI stays at the difficulty level the player chose — only the ship placement is overridden. The player is unaware of the probe; the game feels identical.

Probe	Placement	Dimension	What It Measures
Spread	Ships in separate quadrants	Working Memory	Can the player track multiple regions and return to unresolved areas after a sink?
Gapped	No ships adjacent (including diagonals)	Inhibitory Control	After sinking a ship, does the player waste shots on adjacent cells?
Aligned	All ships same orientation	Pattern Recognition	Does the player detect the orientation pattern and adapt their search?
Stealth	Anti-probabilistic positioning	Probabilistic Reasoning	How efficiently does the player eliminate the search space?
Cluster	Ships packed in center zone	Strategic Planning	Does the player prioritize high-density regions early?

The first game a player plays uses random placement to establish a baseline efficiency score. Subsequent games rotate through probes, prioritizing the dimension with the fewest data points. This ensures balanced coverage across all five dimensions with minimal games played.

2.1 Probe Selection Algorithm

The selector counts how many times each probe type has been used for the current player. It picks the probe with the fewest samples. Ties are broken randomly. This greedy approach ensures convergence to uniform coverage without requiring a fixed sequence, which would be detectable by pattern-sensitive players.

3. Measurement

3.1 Core Metrics

Each probe has a specific metric computed from the player's attack sequence:

Working Memory (spread): Post-sink linger — average number of consecutive shots near a recently sunk ship before moving to a new region. Fewer lingering shots indicate stronger board-wide awareness. Scored 0–100 where 0 linger = 100.
Inhibitory Control (gapped): Post-sink adjacency waste — how many of the next 6 shots after a sink land adjacent to the sunk ship's cells. With gapped placement, these are guaranteed empty. Scored 0–100 where 0 waste = 100.
Pattern Recognition (aligned): After sinking 2 ships (both same orientation), what fraction of subsequent shots follow the detected axis? Measured over a 15-shot window. Scored 0–100 where full alignment = 100.
Probabilistic Reasoning (stealth): Raw efficiency — total ship cells (17) divided by total shots fired. Against anti-probabilistic placement, efficiency directly reflects quality of elimination logic. Scored 0–100.
Strategic Planning (cluster): What percentage of the player's first 20 shots land in the center zone (rows 2–7, cols 2–7)? Higher concentration indicates deliberate high-probability targeting. Scored 0–100.

3.2 Timing Precision

Every game event is recorded with Date.now() millisecond timestamps. However, raw attack-to-attack timing in bot games is contaminated by the bot's artificial thinking delay (500–3000ms depending on difficulty). To isolate the player's actual deliberation time, we compute:

deliberation = player_attack.ts_ms - last_bot_or_system_event.ts_ms

This strips out the bot's turn entirely. What remains is the time between the player gaining control and the player acting — their actual thinking time, plus negligible network latency.

Deliberation time is further segmented by context: post-hit versus post-miss. The ratio between these reveals cognitive processing style. A player who slows down after misses is recalculating (analytical). A player who speeds up after misses may be frustration-firing (reactive). A player with consistent timing regardless of outcome is following a predetermined strategy (systematic).

4. Architecture

4.1 Separation of Data and Inference

The system has two layers that are deliberately decoupled:

Data layer (permanent): Raw actions with timestamps, stored in SQLite. Game events table records every attack, placement, ability use, and game state transition with millisecond precision and sequential ordering. Probe results store one row per game with the computed metric. Cognitive profiles store one row per player per dimension with running averages. Total storage per player after 50 games: approximately 1.5 KB.
Inference layer (replaceable): Assessment narratives are computed on read, not stored alongside raw data. The template-based system produces deterministic assessments from scores and timing ratios. The AI-powered system sends compact game summaries to a language model and caches the result. Either can be swapped without touching the data layer.

This separation means the raw data accrues value over time. As inference models improve, historical games can be reassessed without re-collection. The data outlives any specific model.

4.2 Storage Efficiency

Table	Rows Per Game	Bytes Per Row	Purpose
game_events	~80	~100	Raw event stream (actions + timestamps)
game_sessions	1	~300	Game metadata
probe_results	1	~80	Probe type + computed metric
cognitive_profiles	0–1	~60	Running average per dimension
game_assessments	0–1	~300	Cached AI narrative

A complete game with AI assessment: approximately 9 KB. A player's full profile after 50 games: approximately 450 KB, dominated by the raw event stream. The event stream is the investment — everything else is derived.

5. AI Assessment

After each completed bot game, the system sends a compact summary to a language model (Claude) containing: outcome, shot count, accuracy, probe type and score, deliberation timing (average, post-hit, post-miss, peak), and longest miss streak. The model returns a 3–4 sentence assessment addressing the player directly.

The assessment is cached with the model identifier. When a better model becomes available, cached assessments can be regenerated from the same raw data. The model column in the assessment table records provenance, so old and new assessments are distinguishable.

Without an API key configured, the system falls back to template-based narratives — deterministic, structured assessments derived from score thresholds and timing ratios. These are less nuanced but cost nothing and respond instantly.

5.1 What the AI Sees

The prompt contains only aggregate statistics, never raw event streams. This keeps token costs negligible (under 1000 tokens per assessment) while providing enough signal for specific, actionable feedback. The model is instructed to reference timing patterns, address adversity (miss streaks), and avoid flattery. The player's data belongs to them; the model is a tool providing insight.

6. Data Ownership

The cognitive profile is the player's property. The system collects actions and timestamps — the minimum necessary to reconstruct gameplay and derive cognitive metrics. No data is aggregated across players. No data is sold. No data is used for purposes beyond player insight.

The player can access their full cognitive profile, per-game analysis, and AI assessments through their profile page. Every metric is transparent: what was measured, how it was scored, and what the conditions were. The probe type is visible in the game analysis so the player understands why a particular cognitive dimension was tested.

7. Thematic Framing

Each cognitive dimension is presented with Norse mythological naming to integrate with the game's Viking aesthetic:

Dimension	Norse Name	Origin
Working Memory	Huginn's Recall	Huginn, Odin's raven of thought and memory
Inhibitory Control	Warrior's Discipline	The restraint of a seasoned warrior
Pattern Recognition	Seer's Sight	The Völva's ability to perceive hidden patterns
Probabilistic Reasoning	Norns' Wisdom	The Norns who weave fate and probability
Strategic Planning	Jarl's Command	The Jarl's capacity for strategic direction

This framing serves a purpose beyond aesthetics. It gives players a vocabulary for discussing cognitive function that feels empowering rather than clinical. "Your Warrior's Discipline is developing" lands differently than "Your inhibitory control scored 45."

8. Limitations

Sample size. A stable cognitive profile requires approximately 3 samples per dimension (15–18 games total). With only 1–2 games, individual scores are noisy and should be interpreted as directional rather than precise.
Probe independence. Probes are not fully orthogonal. A spread placement tests working memory primarily, but also requires strategic planning to search efficiently. Cross-contamination between dimensions is inherent and acknowledged.
Timing contamination. Deliberation timing includes network round-trip latency (typically 10–50ms) which is negligible relative to human decision time (typically 1–10 seconds). Fire spread events add a fixed 800ms delay on affected turns, which inflates deliberation for those specific shots.
Bot awareness. Experienced players may detect that placement patterns change between games. This does not invalidate the measurements — the player's response to the placement is the signal, whether or not they consciously identify the probe.
Single-game variance. Luck affects any individual game. A player who happens to hit on their first shot looks efficient regardless of strategy. Running averages across multiple games converge toward the player's true cognitive profile.

9. Future Directions

Trend detection. Scoring slopes across games to identify improvement or decline per dimension. "You improved from 35 to 72 in inhibitory control over your last 5 games" is more actionable than a static score.
Adaptive probing. Instead of uniform probe rotation, weight toward dimensions with high variance or recent decline, focusing measurement where it's most informative.
Cross-game cognitive fingerprinting. If the same cognitive profiling system is deployed across multiple games in a portfolio, cross-game correlation could reveal which cognitive functions transfer and which are domain-specific.
Human-vs-human cognitive contrast. In multiplayer games, both players' attack sequences are recorded. Comparing cognitive profiles between opponents adds a competitive analytics dimension.
Model-over-model reassessment. As language models improve, historical games can be reassessed with newer models. The model column in assessment storage enables A/B comparison of assessment quality across model generations.

10. Conclusion

A game that most people consider solved turns out to be a remarkably clean cognitive measurement instrument when the placement strategy is treated as an experimental variable. The key insight is that the player doesn't need to know they're being assessed — the assessment emerges naturally from how they play. By recording actions and timestamps in kilobytes and computing all inference on read, the system creates a permanent, portable cognitive record that improves as assessment models improve. The data belongs to the player. The system is the tool.

NJORDFEL FUTURES · njordfellfutures.com