Background
I'm the maintainer of atropos, which I'm winding down. One feature without a fastp equivalent that I'd like to propose here.
Proposal
Add an option that groups QC statistics by flowcell tile (or any regex-extracted field from the read name) — e.g. --per-tile-stats or a more general --group-by-regex "<pattern>" where the first capture group names the bucket. The JSON/HTML report would then surface per-tile distributions of quality, base composition, duplication, and adapter content.
Why this is useful
Tile-level variation is a leading indicator of lane-specific chemistry problems (bubbles, laser issues, edge-tile effects). FastQC has per-tile quality heatmaps for exactly this reason, but it does not share fastp's performance or integration. An integrated per-tile view in fastp's existing HTML report would let users spot localized flowcell issues without a second tool.
Generalizing to an arbitrary read-name regex makes the feature useful for tagged libraries (e.g. 10x Genomics, sci-ATAC, UMI-bucketed pools) too — any read-name field the user cares to partition on.
Prior art
Thanks for considering.
Background
I'm the maintainer of atropos, which I'm winding down. One feature without a fastp equivalent that I'd like to propose here.
Proposal
Add an option that groups QC statistics by flowcell tile (or any regex-extracted field from the read name) — e.g.
--per-tile-statsor a more general--group-by-regex "<pattern>"where the first capture group names the bucket. The JSON/HTML report would then surface per-tile distributions of quality, base composition, duplication, and adapter content.Why this is useful
Tile-level variation is a leading indicator of lane-specific chemistry problems (bubbles, laser issues, edge-tile effects). FastQC has per-tile quality heatmaps for exactly this reason, but it does not share fastp's performance or integration. An integrated per-tile view in fastp's existing HTML report would let users spot localized flowcell issues without a second tool.
Generalizing to an arbitrary read-name regex makes the feature useful for tagged libraries (e.g. 10x Genomics, sci-ATAC, UMI-bucketed pools) too — any read-name field the user cares to partition on.
Prior art
pre:tiles=<regex>option: https://github.com/jdidion/atropos/blob/master/atropos/commands/trim/cli.py#L408-L412Thanks for considering.