Nicholas (Nick) Konz

Email: nicholas (dot) konz (at) duke (dot) edu

Bluesky 🦋: @nickkonz.bsky.social

I’m a Ph.D. candidate studying machine learning at Duke University, working under Maciej Mazurowski. My research focuses on deep learning/computer vision for medical image analysis on a spectrum which ranges from application-oriented to foundational work, with an emphasis on topics like generative models, domain adaptation and generalization analysis, and image-to-image translation.

I am particularly interested in how foundational deep learning concepts–such as generalization, and image distribution distance metrics–behave differently in medical image analysis and other secondary computer vision domains. This includes exploring how these concepts need to be adapted for unique challenges in these fields. Additionally, I like to study how the intrinsic manifold properties of a model’s training data govern how it learns and generalizes.

Beyond medical imaging, I’m drawn to the intersection of machine learning and science: understanding deep learning through a scientific lens, and leveraging it for scientific modeling, discovery, and applications in science-adjacent domains.

Previously, I worked as a research intern in the Math, Stats, and Data Science Group at PNNL. I earned my undergraduate degree at UNC, double-majoring in physics and mathematics, where I conducted research on statistical techniques for astronomy under Dan Reichart.

To learn more about my research, check out my full list of research topics and papers.

news

Oct 6, 2025	I’m honored to have been selected to be an Area Chair for MIDL 2026! Looking forward to contributing to this fantastic conference.
Sep 24, 2025	Our paper, “Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?” (link here), has received the best paper award at the Deep-Brea³th Workshop at MICCAI 2025!
Aug 12, 2025	Our paper, “Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?” (link here), has been accepted and selected for an oral presentation at the Deep-Brea³th Workshop at MICCAI 2025!
Jun 18, 2025	Our paper, “SegmentAnyMuscle: A universal muscle segmentation model across different locations in MRI”, has been released on the arXiv!
Jun 15, 2025	Our paper, “Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?”, has been released on the arXiv!

selected recent papers (full list on google scholar)

Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?

Hanxue Gu*, Yaqian Chen*, Nicholas Konz, and 2 more authors

Deep-Breath @ MICCAI (Oral, Best Paper Award), 2025

Abs arXiv Bib Code

Foundation models, pre-trained on large image datasets and capable of capturing rich feature representations, have recently shown potential for zero-shot image registration. However, their performance has mostly been tested in the context of rigid or less complex structures, such as the brain or abdominal organs, and it remains unclear whether these models can handle more challenging, deformable anatomy. Breast MRI registration is particularly difficult due to significant anatomical variation between patients, deformation caused by patient positioning, and the presence of thin and complex internal structure of fibroglandular tissue, where accurate alignment is crucial. Whether foundation model-based registration algorithms can address this level of complexity remains an open question. In this study, we provide a comprehensive evaluation of foundation model-based registration algorithms for breast MRI. We assess five pre-trained encoders, including DINO-v2, SAM, MedSAM, SSLSAM, and MedCLIP, across four key breast registration tasks that capture variations in different years and dates, sequences, modalities, and patient disease status (lesion versus no lesion). Our results show that foundation model-based algorithms such as SAM outperform traditional registration baselines for overall breast alignment, especially under large domain shifts, but struggle with capturing fine details of fibroglandular tissue. Interestingly, additional pre-training or fine-tuning on medical or breast-specific images in MedSAM and SSLSAM, does not improve registration performance and may even decrease it in some cases. Further work is needed to understand how domain-specific training influences registration and to explore targeted strategies that improve both global alignment and fine structure accuracy. We also publicly release our code at https://github.com/mazurowski-lab/Foundation-based-reg.
@article{gu2025vision, title = {Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?}, author = {Gu*, Hanxue and Chen*, Yaqian and Konz, Nicholas and Li, Qihang and Mazurowski, Maciej A}, journal = {Deep-Breath @ MICCAI (Oral, Best Paper Award)}, year = {2025}, bibtex_show = true, foundationmodels = {true} }
Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets

Nicholas Konz*, Richard Osuala*, Preeti Verma, and 16 more authors

arXiv preprint, 2025

Abs arXiv Bib Code

Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (\eg, Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging—including the first large-scale comparative study of generative models for medical image translation—and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.
@article{konz2025frechetradiomicdistancefrd, title = {Fr\'echet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets}, author = {Konz*, Nicholas and Osuala*, Richard and Verma, Preeti and Chen, Yuwen and Gu, Hanxue and Dong, Haoyu and Chen, Yaqian and Marshall, Andrew and Garrucho, Lidia and Kushibar, Kaisar and Lang, Daniel M. and Kim, Gene S. and Grimm, Lars J. and Lewin, John M. and Duncan, James S. and Schnabel, Julia A. and Diaz, Oliver and Lekadir, Karim and Mazurowski, Maciej A.}, year = {2025}, journal = {arXiv preprint}, eprint = {2412.01496}, archiveprefix = {arXiv}, primaryclass = {cs.CV}, url = {https://arxiv.org/abs/2412.01496}, bibtex_show = true, generativemodels = {true} }
The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images

Nicholas Konz, and Maciej A. Mazurowski

ICLR, 2024

Abs arXiv Bib Code Poster

This paper investigates discrepancies in how neural networks learn from different imaging domains, which are commonly overlooked when adopting computer vision techniques from the domain of natural images to other specialized domains such as medical images. Recent works have found that the generalization error of a trained network typically increases with the intrinsic dimension (d_data) of its training set. Yet, the steepness of this relationship varies significantly between medical (radiological) and natural imaging domains, with no existing theoretical explanation. We address this gap in knowledge by establishing and empirically validating a generalization scaling law with respect to d_data, and propose that the substantial scaling discrepancy between the two considered domains may be at least partially attributed to the higher intrinsic “label sharpness” (K_F) of medical imaging datasets, a metric which we propose. Next, we demonstrate an additional benefit of measuring the label sharpness of a training set: it is negatively correlated with the trained model’s adversarial robustness, which notably leads to models for medical images having a substantially higher vulnerability to adversarial attack. Finally, we extend our d_data formalism to the related metric of learned representation intrinsic dimension (d_repr), derive a generalization scaling law with respect to d_repr, and show that d_data serves as an upper bound for d_repr. Our theoretical results are supported by thorough experiments with six models and eleven natural and medical imaging datasets over a range of training set sizes. Our findings offer insights into the influence of intrinsic dataset properties on generalization, representation learning, and robustness in deep neural networks. Code link: https://github.com/mazurowski-lab/intrinsic-properties.
@article{konz2024effect, title = {The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images}, author = {Konz, Nicholas and Mazurowski, Maciej A.}, journal = {ICLR}, year = {2024}, bibtex_show = true, intrinsicproperties = {true} }
Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models

Nicholas Konz, Yuwen Chen, Haoyu Dong, and 1 more author

MICCAI, 2024

Abs arXiv Bib Code Poster

Diffusion models have enabled remarkably high-quality medical image generation, yet it is challenging to enforce anatomical constraints in generated images. To this end, we propose a diffusion model-based method that supports anatomically-controllable medical image generation, by following a multi-class anatomical segmentation mask at each sampling step. We additionally introduce a random mask ablation training algorithm to enable conditioning on a selected combination of anatomical constraints while allowing flexibility in other anatomical areas. We compare our method ("SegGuidedDiff") to existing methods on breast MRI and abdominal/neck-to-pelvis CT datasets with a wide range of anatomical objects. Results show that our method reaches a new state-of-the-art in the faithfulness of generated images to input anatomical masks on both datasets, and is on par for general anatomical realism. Finally, our model also enjoys the extra benefit of being able to adjust the anatomical similarity of generated images to real images of choice through interpolation in its latent space. SegGuidedDiff has many applications, including cross-modality translation, and the generation of paired or counterfactual data. Our code is available at https://github.com/mazurowski-lab/segmentation-guided-diffusion.
@article{konz2024anatomically, title = {Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models}, author = {Konz, Nicholas and Chen, Yuwen and Dong, Haoyu and Mazurowski, Maciej A.}, journal = {MICCAI}, year = {2024}, organization = {Springer}, bibtex_show = true, generativemodels = {true} }
Unsupervised anomaly localization in high-resolution breast scans using deep pluralistic image completion

Nicholas Konz, Haoyu Dong, and Maciej A. Mazurowski

Medical Image Analysis, 2023

Abs arXiv Bib Code

Automated tumor detection in Digital Breast Tomosynthesis (DBT) is a difficult task due to natural tumor rarity, breast tissue variability, and high resolution. Given the scarcity of abnormal images and the abundance of normal images for this problem, an anomaly detection/localization approach could be well-suited. However, most anomaly localization research in machine learning focuses on non-medical datasets, and we find that these methods fall short when adapted to medical imaging datasets. The problem is alleviated when we solve the task from the image completion perspective, in which the presence of anomalies can be indicated by a discrepancy between the original appearance and its auto-completion conditioned on the surroundings. However, there are often many valid normal completions given the same surroundings, especially in the DBT dataset, making this evaluation criterion less precise. To address such an issue, we consider pluralistic image completion by exploring the distribution of possible completions instead of generating fixed predictions. This is achieved through our novel application of spatial dropout on the completion network during inference time only, which requires no additional training cost and is effective at generating diverse completions. We further propose minimum completion distance (MCD), a new metric for detecting anomalies, thanks to these stochastic completions. We provide theoretical as well as empirical support for the superiority over existing methods of using the proposed method for anomaly localization. On the DBT dataset, our model outperforms other state-of-the-art methods by at least 10% AUROC for pixel-level detection.
@article{konz2023picard, title = {Unsupervised anomaly localization in high-resolution breast scans using deep pluralistic image completion}, author = {Konz, Nicholas and Dong, Haoyu and Mazurowski, Maciej A.}, journal = {Medical Image Analysis}, volume = {87}, pages = {102836}, year = {2023}, publisher = {Elsevier}, bibtex_show = true, anomalydetection = {true} }
Attributing Learned Concepts in Neural Networks to Training Data

Nicholas Konz, Charles Godfrey, Madelyn Shapiro, and 3 more authors

ATTRIB @ NeurIPS (Oral), 2023

Abs arXiv Bib Poster

By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model’s original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
@article{konz2023attributing, title = {Attributing Learned Concepts in Neural Networks to Training Data}, author = {Konz, Nicholas and Godfrey, Charles and Shapiro, Madelyn and Tu, Jonathan and Kvinge, Henry and Brown, Davis}, journal = {ATTRIB @ NeurIPS (Oral)}, year = {2023}, bibtex_show = true, interpretability = {true} }