Segmentation foundation models (SFMs) struggle with tree-like and low-contrast objects. We introduce interpretable metrics that quantify these object properties and show that SFM performance (IoU) noticeably correlates with them — providing the first quantitative framework for modeling these failure modes.
Image segmentation foundation models (SFMs) like Segment Anything Model (SAM) have achieved impressive zero-shot and interactive segmentation across diverse domains. However, they struggle to segment objects with certain structures, particularly those with dense, tree-like morphology and low textural contrast from their surroundings. These failure modes are crucial for understanding the limitations of SFMs in real-world applications.
To systematically study this issue, we introduce interpretable metrics quantifying object tree-likeness and textural separability. On carefully controlled synthetic experiments and real-world datasets, we show that SFM performance (e.g., SAM, SAM 2, HQ-SAM) noticeably correlates with these factors. We link these failures to "textural confusion", where models misinterpret local structure as global texture, causing over-segmentation or difficulty distinguishing objects from similar backgrounds. Notably, targeted fine-tuning fails to resolve this issue, indicating a fundamental limitation. Our study provides the first quantitative framework for modeling the behavior of SFMs on challenging structures, offering interpretable insights into their segmentation capabilities.
1. Tree-Likeness Metrics (CPR & DoGD). We propose two interpretable metrics — Centripetal Persistence Ratio (CPR) and Difference of Gaussians Descriptor (DoGD) — that quantify the degree of tree-like morphology of a segmentation object.
2. Textural Separability Metric. We introduce a separability score measuring the textural contrast between an object and its background, capturing how visually distinct an object is from its surroundings.
3. Textural Confusion. We identify and characterize a failure mode we call textural confusion, where SFMs misinterpret fine local structure as global texture, leading to over-segmentation and boundary errors on challenging objects.
4. Fundamental Limitation. Targeted fine-tuning on challenging structures does not resolve textural confusion, indicating a fundamental architectural limitation of current SFMs rather than a training data issue.
5. Validated Across SAM, SAM 2, and HQ-SAM. Results are validated on synthetic and real-world datasets (DIS, MOSE) across multiple segmentation foundation models, demonstrating the generality of our findings.
Correlation between segmentation performance (IoU) and our proposed metrics for object tree-likeness (CPR and DoGD) and textural separability, for SAM, SAM 2, and HQ-SAM evaluated on the DIS and MOSE datasets.
Our codebase lets you compute all three metrics for any segmentation mask (and image) in just a few lines:
import torch
from treelikeness_metrics import get_CPR, get_DoGD
from textural_contrast_metrics import TexturalMetric
import torchvision.transforms as transforms
from PIL import Image
device = "cuda"
# load mask (shape: 1, H, W) and image
object_mask = torch.load('path/to/mask.pt')
img = transforms.functional.to_tensor(
Image.open('path/to/image.png').convert('RGB')
).to(device)
# tree-likeness
cpr = get_CPR(object_mask, device=device)
dogd = get_DoGD(object_mask, device=device)
# textural separability
separability = TexturalMetric(device).get_separability_score(img, object_mask)
print(f"CPR (tree-likeness) = {cpr:.3f}")
print(f"DoGD (tree-likeness) = {dogd:.3f}")
print(f"Textural separability = {separability:.3f}") @article{zhang2024texturalconfusion,
title = {Quantifying the Limits of Segmentation Foundation Models:
Modeling Challenges in Segmenting Tree-Like and Low-Contrast Objects},
author = {Yixin Zhang and Nicholas Konz and Kevin Kramer and Maciej A. Mazurowski},
journal = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
year = {2026},
url = {https://arxiv.org/abs/2412.04243}
}