Match capture tool to content type — text for words, voice for movement, photos for spatial
Match capture modality to information structure: use text for sequential verbal content, voice when hands are occupied, and photographs for spatial or visual information.
Why This Is a Rule
Information has structure, and capture modalities have strengths. When the modality matches the structure, capture is fast and faithful. When it doesn't, you either lose information during translation or spend unnecessary effort converting between formats.
Text excels at sequential verbal content — arguments, lists, instructions, explanations. It's searchable, editable, and composable. Voice excels when hands are occupied (walking, driving, cooking) and when the insight is flowing faster than you can type — the 150-word-per-minute speaking rate captures associative context that 40-word-per-minute typing truncates. Photographs excel at spatial and visual information — whiteboard diagrams, physical arrangements, handwritten sketches, error messages on screens — where textual description would be lossy and slow.
Most people default to a single modality (usually text) regardless of context, which means they either skip captures that don't fit the modality (spatial insights during walks never captured) or translate inefficiently (spending 3 minutes describing a diagram they could photograph in 2 seconds).
When This Fires
- Deciding how to capture an insight, observation, or piece of information
- Noticing that certain types of information consistently get lost
- Setting up capture workflows for different contexts (desk, commute, meetings)
- Any situation where the default capture method feels slow or lossy
Common Failure Mode
Defaulting to text for everything, including spatial and visual information. Describing a system architecture diagram in words takes 5 minutes and loses structural relationships. Photographing it takes 2 seconds and preserves everything. Similarly, typing notes while walking is slow and dangerous; voice capture is faster and preserves the associative flow.
The Protocol
Before capturing, ask: "What is the structure of this information?" (1) Sequential/verbal (an argument, a list, instructions) → text. (2) Associative/flowing (a stream of connected ideas, hands occupied) → voice. (3) Spatial/visual (a diagram, a physical layout, a screen state) → photograph. (4) Mixed → use the dominant modality and supplement. A voice memo with a photo of the whiteboard captures both the verbal reasoning and the spatial structure.
Source Lessons
Photograph as capture
A photo of a whiteboard, sketch, or physical artifact is a legitimate capture method — and for spatial, visual, or environmental information, it is the superior one.
Multiple capture channels prevent loss
Having more than one way to capture thoughts reduces the chance of losing important ones. A single capture tool creates a single point of failure in your thinking infrastructure.