Skip to content

Data Discovery

Explore your Mixpanel project's schema before writing queries. Discovery results are cached for the session.

Explore on DeepWiki

πŸ€– Discovery Methods Guide β†’

Ask questions about schema exploration, caching behavior, or how to discover your data landscape.

Listing Events

Get all event names in your project:

import mixpanel_headless as mp

ws = mp.Workspace()

events = ws.events()
print(events)  # ['Login', 'Purchase', 'Signup', ...]
mp inspect events

# Filter with jq - get first 5 events
mp inspect events --format json --jq '.[:5]'

# Find events containing "User"
mp inspect events --format json --jq '.[] | select(contains("User"))'

Events are returned sorted alphabetically.

Listing Properties

Get properties for a specific event:

properties = ws.properties("Purchase")
print(properties)  # ['amount', 'country', 'product_id', ...]
mp inspect properties --event Purchase

Properties include both event-specific and common properties.

Property Values

Sample values for a property:

# Sample values for a property
values = ws.property_values("country", event="Purchase")
print(values)  # ['US', 'UK', 'DE', 'FR', ...]

# Limit results
values = ws.property_values("country", event="Purchase", limit=5)
mp inspect values --property country --event Purchase --limit 10

Subproperties

Some Mixpanel event properties are lists of objects β€” for example, a cart property whose value is [{"Brand": "nike", "Category": "hats", "Price": 51}, ...]. The property_values() endpoint returns these as JSON-encoded strings, which makes them awkward to inspect by eye. subproperties() parses a sample of those blobs and infers a scalar type per inner key.

for sp in ws.subproperties("cart", event="Cart Viewed"):
    print(sp.name, sp.type, sp.sample_values)
# Brand string ('nike', 'puma', 'h&m')
# Category string ('hats', 'jeans')
# Item ID number (35317, 35318)
# Price number (51, 87, 102)
mp inspect subproperties --property cart --event "Cart Viewed"

# Sample more rows (default: 50)
mp inspect subproperties -p cart -e "Cart Viewed" --sample-size 200

# Tabular output
mp inspect subproperties -p cart -e "Cart Viewed" --format table

Results are alphabetically sorted by name. Subproperties whose values are themselves dicts/lists are silently skipped (only scalar sub-values are reportable). When a sub-key is observed with mixed scalar shapes, with both scalar and dict shapes, or with only null values, the call emits a UserWarning.

The discovered names and types feed directly into Filter.list_contains and GroupBy.list_item for filtering and breaking down by subproperty values.

SubPropertyInfo

sp.name           # "Brand"
sp.type           # "string" | "number" | "boolean" | "datetime"
sp.sample_values  # ('nike', 'puma', 'h&m')  β€” up to 5 distinct values
sp.to_dict()      # {'name': 'Brand', 'type': 'string', 'sample_values': ['nike', ...]}

Saved Funnels

List funnels defined in Mixpanel:

funnels = ws.funnels()
for f in funnels:
    print(f"{f.funnel_id}: {f.name}")
mp inspect funnels

FunnelInfo

f.funnel_id  # 12345
f.name       # "Checkout Funnel"

Saved Cohorts

List cohorts defined in Mixpanel:

cohorts = ws.cohorts()
for c in cohorts:
    print(f"{c.id}: {c.name} ({c.count} users)")
mp inspect cohorts

SavedCohort

c.id           # 12345
c.name         # "Power Users"
c.count        # 5000
c.description  # "Users with 10+ logins"
c.created      # datetime
c.is_visible   # True

Lexicon Schemas

Retrieve data dictionary schemas for events and profile properties. Schemas include descriptions, property types, and metadata defined in Mixpanel's Lexicon.

Schema Coverage

The Lexicon API returns only events/properties with explicit schemas (defined via API, CSV import, or UI). It does not return all events visible in Lexicon's UI.

Schema Registry CRUD

For write operations on the schema registry (create, update, delete schemas and enforcement configuration), see the Data Governance guide β€” Schema Registry.

# List all schemas
schemas = ws.lexicon_schemas()
for s in schemas:
    print(f"{s.entity_type}: {s.name}")

# Filter by entity type
event_schemas = ws.lexicon_schemas(entity_type="event")
profile_schemas = ws.lexicon_schemas(entity_type="profile")

# Get a specific schema
schema = ws.lexicon_schema("event", "Purchase")
print(schema.schema_json.description)
for prop, info in schema.schema_json.properties.items():
    print(f"  {prop}: {info.type}")
mp inspect lexicon-schemas
mp inspect lexicon-schemas --type event
mp inspect lexicon-schemas --type profile
mp inspect lexicon-schema --type event --name Purchase

LexiconSchema

s.entity_type           # "event", "profile", or other API-returned types
s.name                  # "Purchase"
s.schema_json           # LexiconDefinition object

LexiconDefinition

s.schema_json.description                # "User completes a purchase"
s.schema_json.properties                 # dict[str, LexiconProperty]
s.schema_json.metadata                   # LexiconMetadata or None

LexiconProperty

prop = s.schema_json.properties["amount"]
prop.type                                # "number"
prop.description                         # "Purchase amount in USD"
prop.metadata                            # LexiconMetadata or None

LexiconMetadata

meta = s.schema_json.metadata
meta.display_name       # "Purchase Event"
meta.tags               # ["core", "revenue"]
meta.hidden             # False
meta.dropped            # False
meta.contacts           # ["owner@company.com"]
meta.team_contacts      # ["Analytics Team"]

Rate Limit

The Lexicon API has a strict rate limit of 5 requests per minute. Schema results are cached for the session to minimize API calls.

Write Operations

The Lexicon schemas shown here are read-only discovery methods. For full CRUD operations on Lexicon definitions (update, delete events/properties, manage tags, bulk updates), see the Data Governance guide.

Schema Graph

schema_graph() gathers the whole project's Lexicon in one pass β€” event definitions, event properties, and user properties β€” plus the event↔property relationship graph (which properties appear on which events). It returns a typed SchemaGraphResult with DataFrame views and a networkx export, so you can map the schema without a per-entity lookup.

schema = ws.schema_graph()

schema.events_df          # event definitions (name, display_name, count, ...)
schema.properties_df      # event + user properties, with a resource_type column
schema.relationships_df   # one row per (event, property) edge β€” the headline view

schema.properties_for_event("Purchase")   # ['amount', 'currency', ...]
schema.events_for_property("utm_source")   # events carrying the property
schema.orphan_properties()                 # event properties that appear on no events

graph = schema.to_graph()  # networkx.DiGraph, events -> properties
mp inspect schema-graph                    # full structure (JSON)
mp inspect schema-graph --format table     # the relationship edge list
mp inspect schema-graph --no-user-properties --jq '.event_to_properties'

# Per-property density (densityLocal), repeated onto each edge
mp inspect schema-graph --density --format table

The relationships come from one bulk data-definitions/properties?includeEvents=true request β€” a single call for the whole project rather than a schema lookup per entity. Group properties are not gathered (Headless has no data-groups listing to enumerate them).

SchemaGraphResult

schema.events                # list[dict] β€” raw event definitions
schema.properties            # list[dict] β€” event properties (each carries an `events` list)
schema.user_properties       # list[dict] β€” user properties
schema.event_to_properties   # {event_name: [property_name, ...]}
schema.property_to_events    # {property_name: [event_name, ...]}
schema.relationships_df      # DataFrame: event | property | density_local
schema.to_graph()            # networkx.DiGraph (events -> properties; bipartite when names are disjoint)
schema.to_dict()             # JSON-serializable

Caching

Like other discovery methods, schema_graph() is cached for the lifetime of the Workspace. Pass force_refresh=True to re-fetch, or call clear_discovery_cache().

Top Events

Get today's most active events:

# General top events
top = ws.top_events(type="general")
for event in top:
    print(f"{event.event}: {event.count} ({event.percent_change:+.1f}%)")

# Average top events
top = ws.top_events(type="average", limit=5)
mp inspect top-events --type general --limit 10

TopEvent

event.event           # "Login"
event.count           # 15000
event.percent_change  # 12.5 (compared to yesterday)

Not Cached

Unlike other discovery methods, top_events() always makes an API call since it returns real-time data.

Caching

Discovery results are cached for the lifetime of the Workspace:

ws = mp.Workspace()

# First call hits the API
events1 = ws.events()

# Second call returns cached result (instant)
events2 = ws.events()

# Clear cache to force refresh
ws.clear_discovery_cache()

# Now hits API again
events3 = ws.events()

Discovery Workflow

A typical discovery workflow before analysis:

import mixpanel_headless as mp

ws = mp.Workspace()

# 1. What events exist?
print("Events:")
for event in ws.events()[:10]:
    print(f"  - {event}")

# 2. What properties does Purchase have?
print("\nPurchase properties:")
for prop in ws.properties("Purchase"):
    print(f"  - {prop}")

# 3. What values does 'country' have?
print("\nCountry values:")
for value in ws.property_values("country", event="Purchase", limit=10):
    print(f"  - {value}")

# 4. What funnels are defined?
print("\nFunnels:")
for f in ws.funnels():
    print(f"  - {f.name} (ID: {f.funnel_id})")

# 5. Run a live query with discovered data
result = ws.segmentation(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="country"
)
print(result.df)

Next Steps