Session Replay¶
Discover a user's Mixpanel Session Replay recordings, fetch the raw rrweb event stream from the signed CDN, and project the sessions into analysis-ready pandas DataFrames plus an LLM-friendly action timeline — all without leaving Python or the shell.
The high-leverage type is ReplayBundle
A ReplayBundle is a collection of replays with cross-session projections. A single Replay is conceptually a bundle of size one, and the API treats them the same way — every DataFrame projection available on a bundle is available on a replay.
When to Use It¶
Session replay answers "what did this user actually do?" — the click-by-click story behind an analytics number. Reach for it when you need to:
- Pull a specific user's recent sessions and read the timeline (
replays_for_user). - Correlate a tracked Mixpanel event with the on-screen actions around it (
include_mixpanel_events). - Rank the most-clicked elements, find rage-click bursts, or surface sessions with console errors across many replays.
- Export the raw rrweb stream to feed Mixpanel's JS player or your own tooling (
to_rrweb_player_json).
The surface is built on the same signed-CDN endpoints Mixpanel's own MCP server uses. It does not persist anything to disk — signed URLs are time-bounded bearer credentials handled in process.
Getting Started¶
The one-call path — discover a user's replays, fetch them, and join the Mixpanel events that fired during each session:
import mixpanel_headless as mp
ws = mp.Workspace()
bundle = ws.replays_for_user(
"user-42",
from_date="2025-01-01",
to_date="2025-01-31",
)
# One row per session: duration, action/click/error counts, entry/exit URL
print(bundle.sessions_df)
# The LLM-friendly action timeline for the first replay
print(bundle.replays[0].summary_markdown)
replays_for_user defaults limit=20 (each replay materializes its full byte stream, so fetching is byte-heavy) and include_mixpanel_events=True. Raise limit deliberately, or drop to list_replays + stream_replay for large sweeps.
Discovery¶
list_replays issues a single Insights query against $mp_session_record and returns lightweight ReplaySummary handles (no bytes fetched). Discover by user and date window, or hydrate an explicit list of IDs:
# By user (from_date / to_date required)
summaries = ws.list_replays(
distinct_id="user-42",
from_date="2025-01-01",
to_date="2025-01-31",
limit=100,
)
for s in summaries:
print(s.replay_id, s.start_time, s.retention_days)
# Or hydrate explicit replay IDs (no distinct_id needed)
summaries = ws.list_replays(replay_ids=["0190ebde-d50d-71b1-804c-ec1b4a533ef9"])
An empty window returns an empty list — it never raises. Each summary carries the per-replay retention_days (read from $mp_replay_retention_period, defaulting to 30 with a warning when the property is absent).
Fetching a Single Replay¶
fetch_replay signs the replay, walks the CDN files in parallel, runs the vendored rrweb analyzer, and returns a fully-materialized Replay:
replay = ws.fetch_replay(
"0190ebde-d50d-71b1-804c-ec1b4a533ef9",
include_mixpanel_events=True, # optional Mixpanel-event join
)
print(replay.duration_seconds) # 2769.0
print(len(replay.rrweb_events)) # raw rrweb events
print(replay.summary_markdown) # action timeline
print(replay.page_path()) # navigation URL sequence
# Raw rrweb JSON, timestamp-sorted, ready for the rrweb JS player
player_json = replay.to_rrweb_player_json()
Pass retention_days= to skip the retention-discovery round trip when you already know it, and distinct_id= to stamp the owning user onto the returned Replay (replays_for_user does this for you).
Streaming large recordings¶
For long sessions where you don't want the whole byte stream in memory at once, stream_replay yields rrweb events one batch at a time and re-signs transparently if the URL expires mid-walk:
DataFrame Projections¶
A ReplayBundle (and a single Replay) exposes long-format projections keyed by replay_id. bundle.df defaults to sessions_df.
| Projection | Grain | Key columns |
|---|---|---|
sessions_df |
one row per replay | replay_id, distinct_id, start_time, end_time, duration_s, retention_days, n_events, n_actions, n_clicks, n_inputs, n_pages, n_errors, n_mp_events, entry_url, exit_url |
actions_df |
one row per normalized action | replay_id, t, action, target_node_id, target_desc, description, url, metadata |
events_df |
one row per raw rrweb event | replay_id, t, type, source, mouse_type, target_node_id, url, raw |
mixpanel_df |
one row per Mixpanel event in the replay window | replay_id, t, event_name, properties |
elements_df |
one row per (target_desc, normalized_url) |
target_desc, url, n_clicks, n_unique_replays |
# Feed directly into DuckDB, or any pandas workflow
import duckdb
duckdb.from_df(bundle.actions_df).aggregate("action, count(*)", "action").show()
The description column on actions_df is the analyzer's full human-readable phrase ('Clicked button "Sign in"', 'Scrolled', 'Console error: …'); target_desc is the bare element label.
The Action Timeline¶
summary_markdown renders a compact, LLM-friendly timeline — one line per action as {timestamp_seconds}: {description}, with consecutive duplicate actions collapsed into a (×N) suffix so a re-rendering data grid doesn't flood the output:
1779693081: Navigated to https://app.example.com/boards
1779693457: Focused mp-button "hor-ellipsis"
1779693459: Clicked div in li "Refresh Data"
1779693483: Scrolled (×3)
bundle.summary_markdown concatenates the per-replay timelines with a totals header. The mp replays analyze CLI command renders the same output.
Aggregations¶
Bundle-level aggregations return DataFrames (following the FlowQueryResult idiom):
print(bundle.top_clicks(10)) # target_desc, count — genuine clicks only
print(bundle.rage_clicks(threshold=3, window_ms=1000)) # replay_id, t_start, target_desc, count
print(bundle.long_pauses(threshold_s=10)) # replay_id, t_start, duration_s
err = bundle.error_sessions() # a NEW bundle of only the replays with console errors
top_clicks (and elements_df) count genuine clicks only — a real user click fires both a focused and a clicked interaction, and counting both would double every click, so the focus-only interactions are excluded.
Filters and Comparison¶
Filters return a new bundle (immutable semantics) — the original is untouched, so chains stay cheap:
# Sessions that visited /checkout AND lasted longer than 60s
checkout = (
bundle
.where(contains_url="/checkout")
.filter(lambda r: r.duration_seconds > 60)
)
# Deterministic sample for manual review
for r in checkout.sample(n=3, seed=42).replays:
print(r.summary_markdown[:200], "…")
# Sessions whose action labels contain a contiguous sub-sequence
matched = bundle.find_pattern(["click:button@/", "navigate:…@/checkout"])
# Diff action frequencies between two cohorts of sessions
converters = ws.replays_for_user("user-99", from_date="2025-01-01", to_date="2025-01-31")
print(bundle.compare(converters)) # action | self_count | other_count | delta
where(...) accepts distinct_id, contains_url, has_event, min_duration_s, and max_duration_s. find_pattern accepts a label_fn= override (see default_label_fn / selector_label_fn).
Correlating Mixpanel Events¶
mixpanel_df is populated when you fetch with include_mixpanel_events=True (the default for replays_for_user), or lazily via join_mixpanel_events(). It holds the tracked Mixpanel events that fired during each replay's time window — the analytics layer alongside the action layer:
bundle = ws.replays_for_user(
"user-42", from_date="2025-01-01", to_date="2025-01-31",
event_properties=["$browser", "plan"], # up to 5 extra properties
)
print(bundle.mixpanel_df) # replay_id | t | event_name | properties
For a single replay or an explicit ID list, use events_for_replay(replay_id) / events_for_replays(replay_ids).
Signed URLs and Security¶
Replay files live behind a time-bounded signed CDN URL (≈5-minute TTL). The query string is a bearer credential:
SignedReplaymasks the credential inreprandstr, and the library never logs it at any level.sign_replay/sign_replaysreturn the handles;fetch_replaysigns and fetches in one step.- A 403 indicating the project's
SESSION_RECORDING_SENSITIVE_DATAflag raisesSessionReplayAccessErrorwith the missing permission indetails. An expired URL raisesSignedURLExpiredError; a replay absent from the CDN raisesReplayNotFoundError.
signed = ws.sign_replay("0190ebde-d50d-71b1-804c-ec1b4a533ef9")
print(signed) # SignedReplay(replay_id='…', url='…', query_string='<redacted N chars>', …)
DOM text can carry PII
The analyzer's target_desc and description fields surface text that
rrweb captured from the page — aria-label, title, alt, and visible
element text. If a recorded page rendered personal data (e.g. a
"Welcome, Jane Doe" heading), that text lands in actions_df, the
markdown timelines, and anything you build from them (logs, files, LLM
context). The library faithfully reflects what rrweb recorded — it does
not scrub content — so treat analyzer output with the same care as the
underlying recording, and rely on Mixpanel's recording-side masking to
keep sensitive fields out of the capture in the first place.
CLI¶
The mp replays command group mirrors the Python surface:
# Discover a user's replays (or hydrate explicit --replay-id values)
mp replays list --user user-42 --from 2025-01-01 --to 2025-01-31
# Mixpanel events during a replay's window
mp replays events 0190ebde-d50d-71b1-804c-ec1b4a533ef9
# Sign for CDN access — redacted by default; --reveal-signed-urls opts in
mp replays sign 0190ebde-d50d-71b1-804c-ec1b4a533ef9
# Write the raw rrweb JSON (rrweb-player compatible)
mp replays fetch 0190ebde-d50d-71b1-804c-ec1b4a533ef9 -o replay.json
# Render the markdown action timeline
mp replays analyze 0190ebde-d50d-71b1-804c-ec1b4a533ef9
# Discover + fetch + analyze in one command; write per-replay timelines to a dir
mp replays for-user user-42 --from 2025-01-01 --to 2025-01-31 \
--include analyze --out-dir ./timelines
mp replays sign --reveal-signed-urls is the single opt-in path to emitting the full credential; it prints a stderr warning on every invocation.