About This Archive
This is a searchable record of the Speak Your Truth! Crew’s work — performances, speakers, and conversations spanning mental health advocacy events in Annapolis, Pittsburgh, and online.
Rather than a flat list of YouTube videos, every minute of recorded content is broken into named narrative segments — each tagged with the people involved, the themes discussed, and, for clinically relevant moments, the CHIME-D framework categories that peer-support researchers use to measure recovery and impact. Every tag is traceable back to the exact spoken moment that justifies it.
What’s in it
- 250 narrative segments drawn from SYT’s catalog of events and video submissions
- 112 speakers, performers, and organizations — each with a dedicated page showing every segment they appear in
- 9 themes gathering every segment that touches them: anxiety, grief, addiction recovery, stigma reduction, community healing, and more
- 8 SYT events spanning the archive’s time range
- CHIME-D categorization — Connectedness, Hope, Identity, Meaning, Empowerment, Difficulties — applied to relevant segments with supporting evidence quotes, making the archive primary-source material for funders, researchers, and clinical peers
What SYT can do with it
- Ground grant narratives in evidence. The CHIME-D tagging means any claim about impact can be supported with specific, attributed quotes — not anecdote, but verbatim testimony from the community.
- Find the right clip, fast. Every segment’s themes and timestamps are queryable. Pulling material for social posts, newsletters, or partner features goes from hours of footage-scrubbing to minutes of filtering.
- Offer a citable resource to researchers. Peer-support academics studying community-based mental health advocacy now have a structured, clinically-aligned corpus to reference.
- See connections across time. The graph view surfaces patterns that aren’t visible one segment at a time — which themes co-occur, which speakers span the widest range of experience, how the collective’s voice has grown.
A note on ownership
SYT owns the source content and the system that processes it. The transcripts, metadata, and pipeline code are all version-controlled and preserved independently of any platform — if YouTube changes policy or the hosting provider goes away, the archive’s text and structure keep working because they’re just files on disk.
The one remaining dependency is the videos themselves, which still live on YouTube. Closing that gap is the first item in the future-work section below.
Potential future work
This archive is a starting point, not a finished product. A few directions under active consideration:
Media archival and video redundancy
The transcripts and metadata are preserved independently of any platform, but the videos themselves still live on YouTube. If a video is removed from the platform, re-encoded, age-restricted, or made private, the archive’s text remains but the source footage is gone. Closing this gap takes a three-point redundancy strategy:
- Carbon Works mirror. Download every video in full resolution to Carbon Works-operated storage — a working redundant copy the pipeline can fall back to if YouTube becomes unreachable. Audio-only backups exist today (MP3 on encrypted cold storage); extending to full video is the next step.
- SYT-owned master. Store a second copy in storage SYT itself controls — whether a physical drive in the Crew’s hands, an SYT-owned cloud bucket, or both. This ensures the collective doesn’t depend on Carbon Works either; the archive outlives any single organization.
- Restoration path. With the videos mirrored, the embedded-video feature (below) can be upgraded to serve from SYT infrastructure when YouTube is down — or to swap back to YouTube once it recovers. Viewers see continuity even when the platform has an outage.
Imagery
The site is text-only today. Two complementary paths:
- Curated imagery — performers, hosts, and event photos added manually or via an intake workflow, serving as hero images on segment and event pages.
- Automated imagery — extract relevant frames from the source videos themselves: YouTube thumbnail pulls for each video, plus programmatic keyframe extraction (or vision-model selection) at each narrative segment’s start time for representative stills.
Embedded video
Each segment page could embed the YouTube player directly, cued to the segment’s exact start time, so visitors watch the clip inline without leaving the archive. Event pages could show the full lineup as a single chaptered player.
Language and voice
The copy on this site is a starting draft, assembled with the help of an AI assistant. It should be reviewed with the Crew and updated to reflect SYT’s own voice — crisis-resource recommendations, theme names, entry-point suggestions, and general register. The current About framing is pitched toward partners and researchers; visitors there for their own experience may want a different register.
Editorial curation and per-type enrichment
The archive today is fully automated end-to-end. Hand-tuned curation on top would meaningfully lift quality:
- Editorial rules over the automated output. Short housekeeping announcements, intermission-music segments, and MC-only handoffs could be suppressed (or demoted). Representative anchor performances could be promoted to hero treatment. A small editorial ruleset layered on top of the LLM output lets SYT’s voice override the automation where it matters — reducing noise and sharpening the knowledge graph’s discoverability at the same time.
- Richer hub pages per content type. Event pages now carry video lists, speakers, themes, and dates. The same treatment extends naturally:
- People pages — short bios, representative quote, event history, roles within the Crew, associated themes
- Theme pages — a definition or framing paragraph, top voices on the theme, CHIME-D framework alignment notes, suggested entry segments
- Segment pages — embedded video (covered above), tag rationale, related-segment suggestions, crisis-resource contextual callouts where the content warrants them
Each of these is individually small; taken together, they shift the site from “searchable record” to “curated archive.”
More archive work
- Accessibility: the transcripts can be exported as
.srtcaptions and uploaded to the original YouTube videos — adding accurate closed captions where YouTube’s auto-captions are imperfect. One-time conversion, ongoing win for hard-of-hearing viewers. - Speaker diarization: some segments have multiple speakers that are currently conflated. Per-speaker attribution at the transcript level would improve both clip-grabbing and clinical use.
- Cross-event identity: the reconciliation pass already consolidates many cases of the same person appearing in multiple videos, but edge cases remain (performers who only appear once, or whose names are spelled inconsistently across events).
For the technically curious
The pipeline is a five-stage batch process:
- Download audio-only from YouTube using
yt-dlp, rate-limited to respect platform norms. - Transcribe using OpenAI’s Whisper
large-v3model (viafaster-whisper 1.1.0,int8quantized) for word-level timestamps. - Segment and enrich with Anthropic’s Claude Sonnet 4.6 in two passes — first identifying narrative boundaries, then producing per-segment metadata including themes, entities, CHIME-D categorization, and evidence quotes.
- Reconcile within each video (cross-segment consistency) and then across the whole corpus (identity resolution: the same person appearing in multiple events is a single node in the graph, not many).
- Publish to a static site built with Quartz 4 and deployed on Cloudflare Pages.
The site isn’t vanilla Quartz — three small, well-scoped patches and two custom components layer on top of the upstream project to shape the graph view and tag-page listings to this archive’s needs. The patches are documented at docs/quartz-patches.md in the repo. It’s a conscious trade-off: every patch is re-ported by hand on a Quartz upgrade, but the edits are small and stable, and Quartz itself is pinned to a specific upstream commit so nothing shifts underneath us between builds.
Stack summary
| Layer | Component | Version |
|---|---|---|
| Pipeline runtime | Python | 3.11.15 |
| Service orchestration | Docker Compose | — |
| State coordination | SQLite | — |
| Audio extraction | yt-dlp | 2026.03.17 |
| Speech recognition | faster-whisper (large-v3, int8) | 1.1.0 |
| Whisper runtime | CTranslate2 | 4.4.0 |
| Segment & enrichment LLM | Claude Sonnet 4.6 | — |
| Static site | Quartz 4 | — |
| Hosting | Cloudflare Pages | — |
| Cold archive | LUKS-encrypted SSD + external ext4 drive | — |
Every run emits a run-config.json recording exact tool versions and SHA-256 hashes of the taxonomy, entity registry, and prompts, so any output file can be traced back to the precise configuration that produced it. The codebase itself is tracked in git and designed to be handed off.
Pipeline and archive design by Carbon Works LLC
Making sense of technology — community-rooted in the Route 3 corridor.