Attribute Filtering Rules for Automated Vector Tile Generation

Attribute filtering rules are the structural backbone of efficient vector tile pipelines. Raw geospatial datasets routinely contain hundreds of columns — administrative codes, legacy identifiers, internal tracking fields, and redundant metadata — that serve no purpose in a browser renderer. When these attributes pass unfiltered into a tile generation step, they inflate payload sizes, degrade client-side parsing performance, and complicate style maintenance. Implementing deterministic, version-controlled attribute filtering ensures that only rendering-relevant properties survive the transformation from source data to cached Mapbox Vector Tiles (MVT).

This guide outlines a production-ready workflow for designing, implementing, and validating attribute filtering rules within automated generation pipelines. For broader context on how these rules integrate into end-to-end tile production, review the foundational architecture in Automated Generation Pipelines with Tippecanoe.

Prerequisites and Environment Baseline

Before implementing filtering logic, ensure your pipeline meets the following baseline requirements:

  • Source Data Format: Columnar or structured geospatial inputs (GeoParquet, GeoJSON, or PostGIS exports). For teams standardizing on modern parquet workflows, consult GeoParquet Input Processing for schema alignment and partitioning strategies.
  • Tippecanoe Installed: Version 2.0+ recommended. The attribute allowlist/blocklist flags are -y / --include (keep named attribute) and -x / --exclude (drop named attribute).
  • Python 3.9+ Environment: Required for preprocessing scripts using pyarrow, geopandas, or duckdb.
  • Tile Specification Awareness: Familiarity with the Mapbox Vector Tile Specification is essential. MVTs enforce strict type constraints (string, number, boolean) and have practical limits on attribute cardinality per tile.
  • CI/CD Runner Access: Pipeline steps will execute filtering, tiling, and validation in isolated containers with reproducible dependency locks.

Step-by-Step Implementation Workflow

1. Audit Source Attributes Against Style Usage

Run a schema inventory to identify column usage across your frontend map configuration. Export a frequency map of attributes against your style layers. Columns that never appear in paint, layout, or filter expressions are candidates for removal.

Automate this audit by parsing your MapLibre GL or Mapbox GL style JSON, extracting all source-layer references, and cross-referencing them with your dataset schema. Any attribute not explicitly consumed by a style expression, tooltip configuration, or client-side query should be flagged. For a deeper breakdown of how to systematically identify and eliminate bloat, see Dropping Unused Attributes to Reduce Tile Size.

2. Define a Deterministic Filtering Policy

Document a JSON-based policy that maps layers to explicit attribute rules. Policies should be version-controlled alongside your tile configuration:

json
{
  "policy_version": "1.0",
  "layers": {
    "buildings": {
      "keep": ["height", "floors", "building_type", "name"],
      "drop": ["legacy_id", "internal_audit_flag", "created_at"],
      "coerce": {
        "year_built": "number",
        "is_heritage": "boolean"
      },
      "rename": {
        "bldg_type": "building_type"
      }
    },
    "roads": {
      "keep": ["speed_limit", "surface", "oneway", "name"],
      "drop": ["maintenance_schedule", "contractor_id"],
      "coerce": { "speed_limit": "number" },
      "rename": {}
    }
  }
}
  • keep: Allowlist attributes required for styling, labeling, or interactivity.
  • drop: Blocklist columns that inflate tiles without adding visual value.
  • coerce: Convert unsupported types (e.g., timestamps) to MVT-compatible strings or numbers.
  • rename: Standardize property keys across datasets to simplify style targeting.

3. Execute Pre-Filtering with Python

Apply attribute rules before invoking the tiler. Pre-filtering reduces memory pressure during geometry simplification and ensures consistent type coercion. The following pyarrow-based snippet demonstrates a production-safe approach:

python
import pyarrow.parquet as pq
import pyarrow as pa
import json

def apply_filter_policy(input_path: str, output_path: str, policy_path: str, layer: str):
    with open(policy_path, "r") as f:
        policy = json.load(f)

    table = pq.read_table(input_path)
    layer_policy = policy["layers"][layer]

    # Drop blacklisted columns that exist in the table
    drop_cols = [c for c in layer_policy["drop"] if c in table.schema.names]
    table = table.drop(drop_cols)

    # Keep only whitelisted columns (plus geometry)
    keep_cols = [c for c in layer_policy["keep"] if c in table.schema.names]
    available_cols = keep_cols + [c for c in ["geometry"] if c in table.schema.names]
    table = table.select(available_cols)

    # Coerce types safely
    for col_name, target_type in layer_policy.get("coerce", {}).items():
        if col_name in table.schema.names:
            col_idx = table.schema.get_field_index(col_name)
            if target_type == "number":
                table = table.set_column(col_idx, col_name,
                                         table[col_name].cast(pa.float64(), safe=False))

    pq.write_table(table, output_path)

4. Apply Tippecanoe CLI Flags

Use Tippecanoe’s -y / --include flag as a final safety net. Pass one attribute name per -y flag to define the complete allowlist:

bash
tippecanoe \
  --output=buildings.mbtiles \
  --layer=buildings \
  -y height -y floors -y building_type -y name \
  --drop-densest-as-needed \
  --maximum-zoom=16 \
  --coalesce-densest-as-needed \
  buildings_filtered.geojson

When combining pre-filtering with CLI flags, ensure your policy files are synchronized. Tippecanoe silently drops attributes not explicitly included when -y flags are active, which prevents accidental metadata leakage into production tiles. For a complete reference on flag behavior, layer naming conventions, and compression trade-offs, consult Tippecanoe CLI Fundamentals.

Validation and CI/CD Integration

Attribute filtering rules must be validated automatically before tiles reach staging or production:

  1. Schema Diff Validation: Compare the output tile schema against an approved baseline. Use tippecanoe-decode to extract a sample tile, parse its properties, and assert that no blocklisted keys exist.
  2. Type Consistency Checks: Verify that all numeric attributes are serialized as numbers, not strings. Client-side map libraries perform poorly when evaluating > or < expressions against stringified values.
  3. Tile Size Regression Tests: Measure average and 95th-percentile tile sizes before and after policy changes. Enforce a maximum threshold (e.g., 512 KB per tile at z14) to prevent mobile network degradation.
  4. Style Compatibility Smoke Tests: Render the filtered tiles against your production style in a headless browser (e.g., Puppeteer or Playwright). Capture console warnings for missing properties or type mismatches.

Integrate these steps into GitHub Actions, GitLab CI, or Jenkins. Gate merges to the main branch on successful validation. Store policy files as code, and require peer review for any keep/drop modifications to maintain auditability.

Common Pitfalls and Edge Cases

  • Null Value Bloat: MVTs do not compress null efficiently. Replace null with a sentinel value (e.g., -1 for numbers, "unknown" for strings) during the coerce phase, or drop the attribute entirely if it lacks meaningful coverage.
  • Array Serialization: The MVT spec does not support arrays. Flatten or join array fields into delimited strings during preprocessing, or extract the most relevant element for rendering.
  • Over-Filtering for Interactivity: Removing attributes that power hover tooltips or click popups degrades UX. Coordinate with frontend developers to maintain a strict keep list that includes interactive properties, even if they are not used for styling.
  • Dynamic Attribute Injection: Some pipelines inject runtime attributes (e.g., tile_id, generation_timestamp). Exclude these from the filtering policy and apply them post-tiling or via server-side middleware.

Conclusion

Attribute filtering rules transform raw geospatial datasets into lean, performant vector tiles. By auditing style dependencies, codifying policies, executing deterministic pre-filtering, and validating outputs in CI/CD, engineering teams eliminate payload bloat while preserving rendering fidelity. Treat filtering logic as infrastructure-as-code: version it, test it, and review it alongside your map styles.

Next reading Dropping Unused Attributes to Reduce Tile Size