Skip to content

Text layer#155

Open
teunbrand wants to merge 31 commits intoposit-dev:mainfrom
teunbrand:text_layer
Open

Text layer#155
teunbrand wants to merge 31 commits intoposit-dev:mainfrom
teunbrand:text_layer

Conversation

@teunbrand
Copy link
Collaborator

@teunbrand teunbrand commented Feb 25, 2026

This PR implements text layers.

Unfortunately, it has become quite complex and I would have made a shorter PR if that were easy.
The main thing driving complexity is that the writer only accepts static angle/hjust/vjust/family/fontface per layer. In the worst case scenario, we split up the layer into many for every row in the data, but generally we try to be economical about this and use run length encoding to collect more rows per layer. In the best case scenario with static angle/hjust/vjust/family/fontface we just emit a single layer.

Also while this PR touches the 'label' layer, we haven't figured out yet how to draw fitted rectangles behind text, so we shouldn't consider the label layer finished.

teunbrand and others added 27 commits February 20, 2026 14:53
Introduces a separate 'fontsize' aesthetic as an alternative to 'size' for
text/label geoms. Unlike 'size' (which uses area-based scaling with radius²
conversion for point marks), 'fontsize' uses linear scaling for font sizes.

Changes:
- Grammar: Add 'fontsize' to aesthetic names
- Geoms: Add 'fontsize' to Text and Label supported aesthetics
- Aesthetics: Register 'fontsize' in NON_POSITIONAL list
- Writer: Map 'fontsize' → 'size' channel in Vega-Lite output
- Scale: Add default range [8.0, 20.0] for fontsize aesthetic
- Tests: Add test_fontsize_linear_scaling integration test

Usage:
  DRAW text MAPPING x AS x, y AS y, value AS fontsize
  SCALE fontsize TO [10, 20]  -- Linear: 10pt to 20pt (not area-converted)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add TextRenderer implementation that handles font aesthetics (family,
fontface, hjust, vjust) by splitting data into multiple Vega-Lite layers
when font properties vary across rows.

Key features:
- Single-layer optimization: When all fonts are constant, generates one
  layer with mark properties set directly
- Multi-layer splitting: When fonts vary, creates one layer per unique
  font combination while preserving ORDER BY
- Proper SOURCE_COLUMN filtering: Uses empty string for single-layer
  and suffix keys for multi-layer to match BoxplotRenderer pattern
- Font property mapping:
  - family → mark.font
  - fontface → mark.fontWeight/fontStyle
  - hjust → mark.align
  - vjust → mark.baseline

Tests included for both constant and varying font cases.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove the FontStrategy enum variants and use a single struct with a
groups vector. The single-layer case now has 1 group containing all rows,
while the multi-layer case has N groups.

Benefits:
- Eliminates redundant code paths (no more match statements)
- Simpler prepare_data() - just iterate over groups
- Simpler finalize() - unified layer generation logic
- Fewer lines of code overall

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
TextMetadata was simply wrapping FontStrategy with no additional value.
Store FontStrategy directly in PreparedData metadata instead.

This eliminates 4 lines and one level of indirection.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The signature field was only used during group construction as a
HashMap key to track row assignments. After groups are built, the
field was never accessed (marked with #[allow(dead_code)]).

Removed the field and its assignments, keeping the local signature
variable for grouping logic.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Eliminated FontGroup struct and common_properties field by:
- Using HashMap<String, (properties, indices)> for grouping during
  construction, then converting to sorted Vec
- Storing all properties (constant + varying) in each group's HashMap
- Using plain tuples (HashMap<String, Value>, Vec<usize>) instead of
  a dedicated struct

This reduces code by 24 net lines while maintaining the same
functionality. Properties are now the HashMap keys (via signature)
and row indices are values, making the data structure more direct.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
FontStrategy was just wrapping a single Vec. Eliminated it by:
- Returning Vec<(HashMap<String, Value>, Vec<usize>)> directly from
  analyze_font_columns()
- Storing the Vec directly as metadata in PreparedData::Composite
- Downcasting to Vec type directly in finalize()

This removes 7 net lines while maintaining identical functionality.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactored TextRenderer to use FontKey tuple containing converted
Vega-Lite Values instead of intermediate structures:

- FontKey = (family, fontWeight, fontStyle, align, baseline) as Values
- convert_fontface returns (fontWeight, fontStyle) tuple
- Properties converted once during grouping (in analyze_font_columns)
- finalize_layers directly inserts Values into mark object
- Eliminated font_key_to_properties, apply_mark_property, and
  map_aesthetic_to_mark_property helpers

Benefits:
- No string signatures or intermediate HashMaps
- Properties converted once per unique combination (not per row)
- Simpler finalize_layers with direct value insertion
- No special-case spreading logic for fontface

This removes 70 net lines while maintaining identical functionality.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed analyze_font_columns to return Vec<(FontKey, Vec<usize>)>
instead of HashMap, with sorting done once at the end of grouping.

Before: HashMap was sorted twice - once in prepare_data() and again
in finalize_layers() to maintain consistent ordering.

After: Groups are sorted once after HashMap construction in
analyze_font_columns(), then both prepare_data() and finalize_layers()
iterate the pre-sorted Vec directly.

This preserves HashMap's O(1) insertion benefit during construction
while eliminating redundant sort operations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes:
- convert_family() returns Option<Value> instead of Value
- Returns None for empty family strings
- Simplifies finalize_layers to use if let Some(family_val)
- Apply clippy suggestion: use or_default() instead of or_insert_with(Vec::new)

This eliminates the is_none_or check and makes the intent clearer:
family is optional and should be omitted from the mark object when
not specified.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When font groups have non-contiguous row indices (e.g., [0, 2, 5, 6]),
split them into separate contiguous ranges ([0], [2], [5, 6]) to
preserve rendering order.

Example:
- Row 0: Arial "A"
- Row 1: Courier "B"
- Row 2: Arial "C"

Before: Arial layer renders A and C together, then B on top
After: Three layers render in order: A, then B, then C

This ensures that the DRAW clause ORDER BY is respected for z-order
stacking, even when rows with the same font properties are
interleaved with rows having different properties.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The label aesthetic (mapped to Vega-Lite 'text' encoding) should not
generate a legend or scale, as text values are literal display strings
rather than data values that need scaling or legend representation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes:
- Use nested layer structure for multi-group text rendering
  - Single group: returns one layer with full encoding
  - Multiple groups: returns parent layer with shared encoding,
    child layers only have mark + transform
- Extract helper functions for code reuse:
  - apply_font_properties: applies font properties to mark object
  - build_transform_with_filter: creates transform with source filter
- Both finalize_single_layer and finalize_nested_layers now use
  helpers to avoid duplication

This approach eliminates duplicate encoding specifications in
multi-layer output while preserving z-order through contiguous
range splitting.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Verifies nested layer structure is correct for multiple font groups
- Tests that parent spec has shared encoding
- Tests that child layers only have mark + transform
- Tests that font properties are applied to mark objects

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes:
- Remove finalize_single_layer function
- Always use nested layer structure (works for 1 or N groups)
- Simplify prepare_data to always use _font_N suffix
- Update test expectations

This eliminates code duplication and special-case handling for
single-group scenarios, reducing implementation by ~24 lines.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes:
- Add 'angle' to supported aesthetics in Text geom
- Update FontKey tuple to include angle (6th element)
- Extract angle column in analyze_font_columns
- Add convert_angle function (parses numeric angle in degrees)
- Apply angle property in apply_font_properties
- Remove angle from encoding in modify_encoding

The angle aesthetic is now handled the same way as other font
properties (family, fontface, hjust, vjust) via data-splitting,
since Vega-Lite requires it as a mark property.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit completes the angle aesthetic implementation:

Grammar changes:
- Add 'angle' to aesthetic keywords in tree-sitter grammar

Label geom consistency:
- Add 'angle' to supported aesthetics in Label geom
- Brings label geom in line with text geom support

TextRenderer improvements:
- Fix convert_angle to handle both numeric and string columns
- Add angle normalization to [0, 360) range
- Handle integer, float, and string angle values

Integration test:
- Add test_text_angle_integration for full SQL → Vega-Lite pipeline
- Verifies nested layer structure with angle mark properties
- Tests angle normalization and data splitting
- Validates non-contiguous index handling

The angle aesthetic now works end-to-end: SQL query with angle
column → TextRenderer splits data by unique angles → Vega-Lite
generates nested layers with angle mark properties.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace the group-sort-split approach with elegant run-length encoding
for handling font property variations in text layers.

Changes:

Algorithm improvement:
- Replace HashMap grouping + sorting + contiguous splitting with
  single-pass RLE scan
- Complexity: O(n log n) → O(n)
- Memory: 8n bytes per run → 16 bytes per run

Type simplification:
- Before: Vec<(FontKey, Vec<usize>)> - explicit row indices
- After:  Vec<(FontKey, usize)> - run lengths with implicit positions
- Start positions derived from cumulative run lengths

DataFrame operations:
- Replace boolean masking (filter_by_indices) with direct slicing
- Use df.slice(position, length) - O(1) pointer arithmetic
- Remove filter_by_indices helper function entirely

Function rename:
- analyze_font_columns() → build_font_rle()
- Clearer name indicating RLE technique and output type

Benefits:
- 28 net lines removed (52 insertions, 80 deletions)
- Simpler single-pass algorithm
- More efficient memory usage
- Faster DataFrame operations
- All tests pass unchanged

The refactoring maintains identical behavior while using the canonical
run-length encoding pattern for grouping consecutive rows.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add nudge parameters that map to Vega-Lite's xOffset/yOffset mark
properties, allowing fine-grained positioning adjustments for text labels.

Changes:

Text and Label geoms:
- Add nudge_x and nudge_y to default_params
- Default to Null (not applied unless explicitly set)

TextRenderer:
- Build base mark prototype with nudge offsets (if specified)
- Clone and extend with font properties for each run
- Pass layer to finalize_nested_layers for parameter access

Integration test:
- Verify nudge_x → xOffset and nudge_y → yOffset mapping
- Confirm parameters apply to all nested text layers

Usage:
  DRAW text SETTING nudge_x => 5, nudge_y => -10

This enables fine-tuning text label positions without modifying
the underlying x/y data, useful for avoiding overlaps or improving
label placement in dense visualizations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add template-based label formatting to text/label geoms, reusing the
existing format.rs infrastructure from SCALE RENAMING.

Changes:

format.rs improvements:
- Add format_dataframe_column() - clean API for DataFrame column formatting
- Refactor to convert columns to strings first, then apply formatting
- Add format_value() helper shared by both APIs
- Improved error message showing actual datatype for unsupported types
- Two-step process: column→string, then template application

Text/Label geoms:
- Add 'format' parameter (defaults to Null)
- Works with both geoms for consistency

TextRenderer:
- Add apply_label_formatting() helper
- Apply formatting in prepare_data() before font analysis
- Pass layer parameter through prepare_data() trait method
- Update all GeomRenderer implementations

Integration tests:
- test_text_label_formatting - Title case transformation
- test_text_label_formatting_numeric - Printf-style number formatting

Supported placeholder syntax:
- {} - Plain insertion
- {:UPPER} - Uppercase
- {:lower} - Lowercase
- {:Title} - Title Case
- {:time %fmt} - DateTime strftime format
- {:num %fmt} - Number printf format

Usage:
  DRAW text SETTING format => 'Region: {:Title}'
  DRAW text SETTING format => '${:num %.2f}'
  DRAW text SETTING format => '{:time %b %Y}'

The format parameter transforms label values before rendering, enabling
clean label presentation without modifying source data.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Separate value selection from conversion in all convert functions
- Use early returns with ? operator for cleaner control flow
- Inline convert function calls to eliminate intermediate variables
- Change property insertion to use if let Some with .insert()
- Fix column lookup to use naming::aesthetic_column()
- Optimize angle extraction to handle numeric columns without cast->parse
- Remove unused FontKey type alias
- Fix test_fontsize_linear_scaling to include required label aesthetic

All text rendering tests passing (11/11).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@teunbrand teunbrand mentioned this pull request Feb 25, 2026
24 tasks
@teunbrand teunbrand marked this pull request as ready for review February 25, 2026 14:43
@teunbrand teunbrand requested a review from thomasp85 February 25, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant