Quantifying the Limits of Segmentation Foundation Models (WACV 2026)

Teaser: SAM struggling with tree-like and low-contrast objects

Segmentation foundation models (SFMs) struggle with tree-like and low-contrast objects. We introduce interpretable metrics that quantify these object properties and show that SFM performance (IoU) noticeably correlates with them — providing the first quantitative framework for modeling these failure modes.

Abstract

Image segmentation foundation models (SFMs) like Segment Anything Model (SAM) have achieved impressive zero-shot and interactive segmentation across diverse domains. However, they struggle to segment objects with certain structures, particularly those with dense, tree-like morphology and low textural contrast from their surroundings. These failure modes are crucial for understanding the limitations of SFMs in real-world applications.

To systematically study this issue, we introduce interpretable metrics quantifying object tree-likeness and textural separability. On carefully controlled synthetic experiments and real-world datasets, we show that SFM performance (e.g., SAM, SAM 2, HQ-SAM) noticeably correlates with these factors. We link these failures to "textural confusion", where models misinterpret local structure as global texture, causing over-segmentation or difficulty distinguishing objects from similar backgrounds. Notably, targeted fine-tuning fails to resolve this issue, indicating a fundamental limitation. Our study provides the first quantitative framework for modeling the behavior of SFMs on challenging structures, offering interpretable insights into their segmentation capabilities.

Key Contributions

1. Tree-Likeness Metrics (CPR & DoGD). We propose two interpretable metrics — Centripetal Persistence Ratio (CPR) and Difference of Gaussians Descriptor (DoGD) — that quantify the degree of tree-like morphology of a segmentation object.

2. Textural Separability Metric. We introduce a separability score measuring the textural contrast between an object and its background, capturing how visually distinct an object is from its surroundings.

3. Textural Confusion. We identify and characterize a failure mode we call textural confusion, where SFMs misinterpret fine local structure as global texture, leading to over-segmentation and boundary errors on challenging objects.

4. Fundamental Limitation. Targeted fine-tuning on challenging structures does not resolve textural confusion, indicating a fundamental architectural limitation of current SFMs rather than a training data issue.

5. Validated Across SAM, SAM 2, and HQ-SAM. Results are validated on synthetic and real-world datasets (DIS, MOSE) across multiple segmentation foundation models, demonstrating the generality of our findings.

Results

Correlation between our metrics and SFM segmentation performance

Correlation between segmentation performance (IoU) and our proposed metrics for object tree-likeness (CPR and DoGD) and textural separability, for SAM, SAM 2, and HQ-SAM evaluated on the DIS and MOSE datasets.

Code

Our codebase lets you compute all three metrics for any segmentation mask (and image) in just a few lines:

import torch
from treelikeness_metrics import get_CPR, get_DoGD
from textural_contrast_metrics import TexturalMetric
import torchvision.transforms as transforms
from PIL import Image

device = "cuda"

# load mask (shape: 1, H, W) and image
object_mask = torch.load('path/to/mask.pt')
img = transforms.functional.to_tensor(
    Image.open('path/to/image.png').convert('RGB')
).to(device)

# tree-likeness
cpr  = get_CPR(object_mask, device=device)
dogd = get_DoGD(object_mask, device=device)

# textural separability
separability = TexturalMetric(device).get_separability_score(img, object_mask)

print(f"CPR (tree-likeness)      = {cpr:.3f}")
print(f"DoGD (tree-likeness)     = {dogd:.3f}")
print(f"Textural separability    = {separability:.3f}")

View Full Code on GitHub

BibTeX

@article{zhang2024texturalconfusion,
  title   = {Quantifying the Limits of Segmentation Foundation Models:
             Modeling Challenges in Segmenting Tree-Like and Low-Contrast Objects},
  author  = {Yixin Zhang and Nicholas Konz and Kevin Kramer and Maciej A. Mazurowski},
  journal = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year    = {2026},
  url     = {https://arxiv.org/abs/2412.04243}
}

Quantifying the Limits of Segmentation Foundation Models: Modeling Challenges in Segmenting Tree-Like and Low-Contrast Objects

Abstract

Key Contributions

Results

Code

BibTeX

Quantifying the Limits of Segmentation Foundation Models:
Modeling Challenges in Segmenting Tree-Like and Low-Contrast Objects