Skip to content

Feature request: tile-level QC partitioning via regex on read name #691

Description

@jdidion

Background

I'm the maintainer of atropos, which I'm winding down. One feature without a fastp equivalent that I'd like to propose here.

Proposal

Add an option that groups QC statistics by flowcell tile (or any regex-extracted field from the read name) — e.g. --per-tile-stats or a more general --group-by-regex "<pattern>" where the first capture group names the bucket. The JSON/HTML report would then surface per-tile distributions of quality, base composition, duplication, and adapter content.

Why this is useful

Tile-level variation is a leading indicator of lane-specific chemistry problems (bubbles, laser issues, edge-tile effects). FastQC has per-tile quality heatmaps for exactly this reason, but it does not share fastp's performance or integration. An integrated per-tile view in fastp's existing HTML report would let users spot localized flowcell issues without a second tool.

Generalizing to an arbitrary read-name regex makes the feature useful for tagged libraries (e.g. 10x Genomics, sci-ATAC, UMI-bucketed pools) too — any read-name field the user cares to partition on.

Prior art

Thanks for considering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions