Runbook: JSON Schema
The JSON column type is production-ready from ClickHouse 25.3+. Earlier versions are not recommended for production use.
Your data arrives as JSON. ClickHouse gives you several ways to store it, from fully typed columns to a raw String. The right choice depends on how predictable your schema is and whether you need field-level queries.
Scope: This page covers schema design decisions for storing JSON data. It does not cover JSON input/output formats, JSON functions, or query syntax. For background on the JSON column type itself, see Use JSON where appropriate.
Assumes: Familiarity with ClickHouse table creation, MergeTree basics, and column type syntax.
Quick decision
- If every field has a known, stable type and the schema rarely changes → Typed columns
- If most fields are stable but some section is dynamic or unpredictable → Hybrid (typed + JSON)
- If the entire structure is dynamic, with keys that appear and disappear across records → Native JSON column
- If you only store and retrieve the JSON blob without field-level queries → Opaque String storage
Don't confuse the JSON format with the JSON column type. You can insert JSON-formatted data (via JSONEachRow, etc.) into typed columns without using the JSON column type at all. The decision here is about column types, not input formats.
Approach details
Typed columns
When to use: The JSON structure is fully known at design time. Fields and types don't change between records. Even complex nested structures (arrays of objects, nested maps) can be expressed with Array, Tuple, and Nested types.
Trade-offs: Schema changes require ALTER TABLE. Unexpected fields are silently dropped on insert unless the schema is updated.
Setup, verification, and gotchas
Setup
Verification
Watch out for
- If you insert JSON data with
JSONEachRowand the JSON contains fields not in the schema, ClickHouse drops them silently by default. Setinput_format_skip_unknown_fieldsto0if you want errors instead.
Hybrid (typed columns + JSON)
When to use: A core set of fields is stable (timestamps, IDs, status codes), but part of the payload is dynamic. Think user-defined attributes, tags, metadata, or extension fields that vary across records.
Trade-offs: Full performance on typed columns, flexibility on the JSON column. The JSON column still carries insert overhead and storage cost for its dynamic portion.
Setup, verification, and gotchas
Setup
Verification
Watch out for
- Use type hints on JSON paths you know ahead of time. Hints bypass the discriminator column and store the path like a regular typed column, with the same performance and no overhead.
- Use
SKIPorSKIP REGEXPfor paths you never query (debug metadata, internal tracing IDs) to save storage and reduce subcolumn count. - Set
max_dynamic_pathsproportional to the number of distinct paths you actually query. The default (1024) works for most cases. Lower it if your dynamic section is narrow. - Don't set
max_dynamic_pathsabove 10,000. High values increase resource consumption and reduce efficiency.
Keys with dots (e.g., http.status_code) are treated as nested paths by default, so {"http.status_code": 200} is stored the same as {"http": {"status_code": 200}}. This is common with OTel attributes. Use type hints to control how dotted paths are stored, or enable json_type_escape_dots_in_keys (25.8+).
Native JSON column
When to use: The structure is genuinely unpredictable, with keys that appear and disappear across records. User-generated schemas, plugin systems, or data-lake ingestion where you don't control the upstream schema.
Trade-offs: Slower inserts than typed columns. Slower full-object reads than String. Storage overhead from subcolumn management. Works well for field-level queries on specific paths.
Setup, verification, and gotchas
Setup
Use JSONAsObject format when inserting whole JSON documents into a JSON column. It treats each input line as a complete JSON object mapped to the column.
Verification
Watch out for
- Without type hints, ClickHouse infers types per-path from the first values it sees. If
scorearrives as"10"(string) in one record and10(integer) in another, the path gets a discriminator column and queries become slower. Add hints for paths with known types. - When path count exceeds
max_dynamic_paths, overflow values move to a shared data structure with reduced query performance. Monitor withJSONDynamicPaths()and keep the limit below 10,000. - Each dynamic path supports up to
max_dynamic_types(default 32) distinct data types. If a single path exceeds this, extra types fall back to shared variant storage. This rarely matters unless your data has highly inconsistent types for the same field.
Opaque String storage
When to use: JSON documents are stored and retrieved whole, then passed through to an application, archived, or forwarded downstream. No field-level filtering or aggregation inside ClickHouse.
Trade-offs: Fastest inserts and simplest schema. No field-level queries without runtime parsing (JSONExtract family), which is slow at scale.
Setup, verification, and gotchas
Setup
Verification
Watch out for
- If requirements change and you later need field-level queries, you'll need to create a new table with typed or JSON columns and backfill the data. If there's any chance you'll query individual fields, start with the hybrid approach instead.
JSONExtractfunctions parse the string on every query. Acceptable for ad-hoc exploration, not for production dashboards or high-QPS workloads.- Consider compression codecs (
ZSTD) on the String column if the JSON payloads are large — it compresses well.
Comparison
| Dimension | Typed columns | Hybrid | Native JSON | String |
|---|---|---|---|---|
| Insert throughput | Fastest | Fast | Moderate | Fastest |
| Field-level queries | Fastest | Fast (typed); good (hinted JSON) | Good (hinted); slower (dynamic) | Slow (runtime parsing) |
| Full-object reads | Fast | Moderate | Slow | Fastest |
| Storage efficiency | Best | Good | Moderate | Good (compresses well) |
| Schema flexibility | None (ALTER TABLE) | Partial (rigid core, flexible tail) | Full | Full |
| Complexity | Low | Medium | Medium–High | Low |
Related resources
- Use JSON where appropriate — when to use the JSON column type vs alternatives
- JSON data type reference — full syntax for type hints, SKIP, max_dynamic_paths, and introspection functions
- Selecting data types — general type selection guidance
- A New Powerful JSON Data Type for ClickHouse — deep dive on the JSON type's storage architecture
- JSON formats reference — input/output formats for JSON data (JSONEachRow, JSONAsObject, etc.)