NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

Zhenye Lou Qing Xu Zekun Jiang \IEEEmembershipStudent Member, IEEE Xiangjian He \IEEEmembershipSenior Member, IEEE Zhen Chen Yi Wang Chenxin Li Maggie M. He Wenting Duan This work is partially supported by the Yongjiang Technology Innovation Project (2022A-097-G), and the Ningbo 2025 Key R&D Project (2023Z223). (Equal contribution: Z. Lou and Q. Xu, Corresponding author: Xiangjian He) Z. Lou is with Sichuan University Pittsburgh Institute, Sichuan University, Chengdu, China. Q. Xu and X. He are with School of Computer Science, University of Nottingham Ningbo China, Ningbo, Zhejiang, China (e-mail: [email protected]). Z. Jiang is with West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China. Z. Chen is with Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong SAR. Y. Wang is with School of Software, Dalian University of Technology, Dalian 116600, China. C. Li is with Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong 999077, SAR, China. M. M. He is with Department of Cardiology, Gold Coast University Hospital, QLD, Australia. W. Duan is with School of Computer Science, University of Lincoln, Lincoln LN6 7TS, UK.

Abstract

Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, the original SAM presents limited adaptation to medical images. Moreover, SAM requires providing manual bounding box prompts for each object to produce satisfactory segmentation masks, so it is laborious in nuclei segmentation scenarios. To address these limitations, we propose a domain-generalizable framework for nuclei image segmentation, abbreviated to NuSegDG. Specifically, we first devise a Heterogeneous Space Adapter (HS-Adapter) to learn multi-dimensional feature representations of different nuclei domains by injecting a small number of trainable parameters into the image encoder of SAM. To alleviate the labor-intensive requirement of manual prompts, we introduce a Gaussian-Kernel Prompt Encoder (GKP-Encoder) to generate density maps driven by a single point, which guides segmentation predictions by mixing position prompts and semantic prompts. Furthermore, we present a Two-Stage Mask Decoder (TSM-Decoder) to effectively convert semantic masks to instance maps without the manual demand for morphological shape refinement. Based on our experimental evaluations, the proposed NuSegDG demonstrates state-of-the-art performance in nuclei instance segmentation, exhibiting superior domain generalization capabilities. The source code is available at https://github.com/xq141839/NuSegDG.

{IEEEkeywords}

Nuclei image segmentation, foundation model fine-tuning, domain generalization

1 Introduction

Nuclei images are commonly obtained by various imaging modalities, including histopathology slides, fluorescence microscopy, and cryo-electron microscopy. The segmentation task based on such images is crucial for disease diagnosis and treatment planning [1, 2]. In particular, semantic segmentation can be used to calculate the disease area. Instance segmentation aims to identify each nuclear as a separate entity within an image, allowing detailed morphological studies and advanced cellular analysis, such as cell counting. However, the inherent heterogeneity of different modalities, intricate tissue structures and tight cell clustering pose challenges in building a universal nuclei segmentation framework [3, 4, 5, 6].

Traditional U-shape architectures adopt Convolutional Neural Network (CNN) for feature extraction and combine the predicted nuclear proxy maps with morphological post-processing methods to generate instance maps from the semantic segmentation masks [7, 8, 9]. Despite these task-specific models displaying acceptable performance on the seen data, they are difficult to handle unseen domains, especially for the nuclei with different shapes and stain environments. This is because morphological operations are sensitive to the intensity distribution, unexpected artifacts, and noise. This highlights that the generalized nuclei segmentation methods should reduce the dependence on classical image processing algorithms.

The recent emergence of the Segment Anything Model (SAM) [10] has revolutionized segmentation tasks, offering versatile capabilities that surpass traditional methods. SAM has demonstrated exceptional generalization performance in natural image segmentation, showcasing robustness and adaptability across various scenarios [11, 12]. Based on this success, SAM has been applied to a range of medical imaging tasks, revealing its potential to handle diverse and complex segmentation challenges in the medical field, including organ and tissue segmentation and detecting various pathological conditions. These advancements underscore that SAM is promising to provide a robust and generalized solution for diverse medical image segmentation tasks [13, 14, 15, 16]. Despite these advantages, globally fine-tuning SAM requires a large number of pixel-level annotated labels, so it is expensive and impractical for medical scenarios, especially for the specific disease or segmentation task.

Furthermore, SAM mainly adopts interactive prompt modes (e.g., point and box) to guide the segmentation decoding. Although the box mode enables SAM to provide accurate segmentation masks, it is sensitive to the precision of manual annotations and is labor-intensive in nuclei segmentation tasks as each nuclei image usually contains hundreds of cells. On the other hand, the point model is labor-saving, which asks users to click the desired segmentation area. However, current studies have proven that using only one positive point of every cell as the prompt is difficult to drive SAM predicting satisfactory segmentation masks [17, 18]. Therefore, the point prompt mode should be further optimized in nuclei segmentation tasks.

To address these limitations in nuclei image segmentation, we propose a domain-generalizable framework for semantic segmentation and automatic instance map conversion, abbreviated to NuSegDG. It is comprised of three modules: a Heterogeneous Space Adapter (HS-Adapter), a Gaussian-Kernel Prompt Encoder (GKP-Encoder) and a Two-Stage Mask Decoder (TSM-Decoder). Specifically, HS-Adapter is used to adapt the feature representation of SAM from natural to different nuclei images. GKP-Encoder utilizes a single-point prompt to generate the density map with sufficient semantic information for guiding segmentation predictions. TSM-Decoder is responsible for predicting precise semantic segmentation masks and converting them to instance maps without manual morphological image processing. The contributions of this work are summarized as follows:

•

We devise the HS-Adapter that utilizes heterogeneous space projection to adaptively adjust the feature representation of the image encoder based on different nuclei domains, seamlessly harmonizing knowledge transfer between natural and nuclei images.
•

To improve the efficiency of the single-point prompt mode, we introduce the GKP-Encoder that leverages the Gaussian kernel to produce a density map with sufficient semantic prompt information for guiding segmentation predictions.
•

We devise the TSM-Decoder for segmentation decoding. To avoid manual morphological shape refinement, the TSM-Decoder focuses on accurate semantic segmentation mask predictions and directly transforms them into instance maps.
•

We take HS-Adapter, GKP-Encoder, and TSM-Decoder to establish our NuSegDG framework. We conduct extensive experiments on diverse nuclei image datasets, demonstrating that NuSegDG performs better than classical nuclei segmentation methods and state-of-the-art medical SAMs with superior domain generalization capabilities.

Refer to caption — Figure 1: The overview of our NuSegDG for domain-generalized nuclei image segmentation. (a) Heterogeneous Space Adapter. (b) Gaussian-Kernel Prompt Encoder. (c) Two-Stage Mask Decoder.

2 Related Work

In this section, we review the state-of-the-art nuclei segmentation architectures. Moreover, the traditional DG frameworks and recent medical foundation models are mentioned.

2.1 Nuclei Image Segmentation

The segmentation of nuclei in histopathology images plays an essential role in pathological analysis, enabling pathologists to make precise diagnoses [19]. It can be mainly divided into nuclei semantic segmentation and nuclei instance segmentation. The semantic segmentation focuses on the accuracy of pixel-level classification in each nuclei image, where U-Net [20] has made great success in this task. Over the last decade, researchers mainly focused on improving its ability of feature extraction. Early CNN series leverages the advantages of inductive bias to provide sufficient prior knowledge for accelerating model convergence [21, 22, 23]. Vision Transformer (ViT) [24] further increases the model capacity by utilizing a self-attention mechanism to capture long-range dependencies [25, 26, 27, 28]. The recent Mamba-based frameworks adopted State Space Model (SSM) to optimize the computation complexity of global context [29].

In addition, the instance segmentation task aims to identify each nucleus as a distinct entity. Existing methods usually predict different types of nuclear proxy maps to synthesize instance maps. HoVer-Net [7], Cellpose [30, 31] and CellViT [9], for instance, employed horizontal and vertical distance maps to accurately delineate the boundaries of individual nuclear in histopathology images. CDNet [32] and PROnet [33] respectively utilized directional feature maps and offset maps to enhance the delineation of nuclei boundaries. TSFD-Net [34] and CPP-Net [8] additionally used boundary maps as auxiliary supervisions. Despite their advancements, these methods often require complex post-processing, such as manual morphology operations and thresholding algorithms, to manually synthesize instance maps, so they hinder the generalization capability of models to unseen domains [13]. Our proposed NuSegDG framework addresses these limitations by converting fundamental semantic segmentation masks to instance maps automatically, thereby demonstrating outstanding domain generalization performance across diverse nuclei image domains.

2.2 SAM for Generalized Medical Image Segmentation

The generalizability of neural networks is crucial for medical image segmentation [35, 36]. Existing methods mainly utilize multi-source domain adaptation [37] and federal learning [38] to address the Domain Generalization (DG) problem. The recent Segment Anything Model (SAM) [10] is a novel interactive architecture that leverages both sparse prompts (e.g., point, box and text) and dense prompts (e.g., mask) to guide the prediction of segmentation masks. Due to its large image encoder, SAM demonstrates robust feature extraction capabilities, enabling outstanding zero-shot generalization across diverse natural image segmentation tasks. On this basis, current studies have explored the potential of SAM in medical image segmentation tasks [39]. For instance, MedSAM [15] and SAMMI [16] globally fine-tuned SAM on more than 10 medical visual modality datasets, achieving notable generalization capabilities with bounding box prompts. However, globally fine-tuning SAM is computationally intensive and needs sufficient training samples due to its large ViT encoder, which is not efficient for nuclei image segmentation tasks. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques have received the most attention from researchers. Methods such as Low-Rank Adaptation (LoRA) [40] and Conv-LoRA [41] injected a set of trainable low-rank matrices into the attention layer of ViT to update the feature representation. Adapter [42] is another common approach used to fine-tune the foundation model [43, 44]. Although these methods reveal their power in homogeneous domain generalization tasks, nuclei images in different domains have disjoint label spaces. Our approach utilizes heterogeneous space mapping to harmonize the feature representation of SAM between natural and nuclei images.

Moreover, various SAM models [15, 17] have demonstrated the necessity of using bounding boxes as prompts to achieve optimal segmentation results in medical imaging. Conversely, relying on single-point prompts often fails to provide sufficient contextual information for accurate segmentation, especially in complex and dense nuclei images [16, 18]. To overcome the limitations of single-point prompts, existing studies introduced extra units, such as YOLO-NAS [45] and GroundingDino [46], to generate prompts. They perform object detection to identify points or bounding boxes, which are then used as prompts for SAM. However, due to the heterogeneity of nuclei images across different datasets, these single-task models often struggle to provide correct prompts, leading to sub-optimal segmentation results. On the contrary, our NuSegDG uses labor-saving single-point annotation to generate sufficient position and semantic prompts, enhancing the generalization capability.

3 Methodology

3.1 Overview of NuSegDG

In DG, $\mathcal{S}=\{\mathcal{S}_{k}=\{(\mathcal{X}_{k},\mathcal{Y}_{k})\},k=1,2,% \cdots,K\}$ is denoted as the set of $K$ distinct source domains, where $\mathcal{X}_{k}$ is the image in the $k$ -th source domain and $\mathcal{Y}_{k}$ is the segmentation mask of $\mathcal{X}_{k}$ . Let $\mathcal{X}=\{\mathcal{X}_{k}\}_{k=1}^{K}$ and $\mathcal{Y}=\{\mathcal{Y}_{k}\}_{k=1}^{K}$ . The goal of DG is to train a model $f_{\theta}:\mathcal{X}\rightarrow\mathcal{Y}$ , where $\theta$ represents learned parameters. The trained model can be generalized to an unseen target domain $\mathcal{T}$ with high performance.

As illustrated in Fig. 1, we present the overview of NuSegDG for domain-generalized nuclei image segmentation. Given a nuclei image from the $k$ -th domain, we first utilize the Heterogeneous Space Adapter (HS-Adapter) to update the attention computation and feature representation of SAM. The generated image embeddings are then delivered to the Gaussian-Kernel Prompt Encoder (GKP-Encoder) that adopts the single-point annotation to generate sufficient position and semantic information prompts for guiding segmentation decoding. Following this, the Two-Stage Mask Decoder (TSM-Decoder) leverages these prompts and image embeddings to produce a precise semantic segmentation mask and then automatically converts them to an instance map without the demand for laborious manual post-processing operations.

3.2 Heterogeneous Space Adapter

Recent studies [12, 16] have highlighted the impressive generalized segmentation capabilities of SAM [10], facilitated by its large-scale image encoder. Especially, the conventional Adapter [42] and LoRA [40] have been widely used to adapt SAM to medical image segmentation [43, 47]. However, such homogeneous space mapping methods are difficult to learn heterogeneous relationships [48] between different nuclei domains. To tackle the issue, we propose the HS-Adapter that leverages heterogeneous space integration to enhance the domain-specific feature representation of nuclei images. Specifically, the input image is first converted into a set of 2D patch embeddings $\mu\subset\mathbb{R}^{(\frac{H\times W}{n})\times d}$ , where $H$ and $W$ are height and width of the image, $n=16\times 16$ and $d=768$ stand for the patch size and channels of each patch embedding, respectively. To improve the information interaction within Multi-Head Attention (MHA) layers, the HS-Adapter respectively concatenates learnable parameters $W_{\rm que}=\{(E^{i}_{\rm que},\theta^{i}_{\rm que})\}_{i=1}^{N}$ and $W_{\rm val}=\{(E^{i}_{\rm val},\theta^{i}_{\rm val})\}_{i=1}^{N}$ with the query $\mathcal{Q}$ and value $\mathcal{V}$ branches of SAM, where $E^{i}_{\rm que}$ and $E^{i}_{\rm val}$ are projection layers that map embeddings $\mu$ into feature spaces with $i$ -th target mapping channel, $\theta^{i}_{\rm que}$ and $\theta^{i}_{\rm val}$ are up-projections. Additionally, we place the softmax operation $\delta$ on $\mu$ to calculate the weight of each feature space. Finally, $N$ weighted different feature spaces are merged into a heterogeneous space that is used to update the original query and value projection layers of SAM, guiding the computation of attention maps as:

\mathcal{A}=\delta(\frac{(\mathcal{Q}(\mu)\smallfrown h_{\rm que})\cdot{% \mathcal{K}(\mu)}^{T}}{\sqrt{d}})\cdot(\mathcal{V}(\mu)\smallfrown h_{\rm val}),

(1)

where

h_{\rm que}=\mathcal{Q}(\mu)+\sum_{i=1}^{N}\delta(\mu)_{i}\theta_{\rm que}^{i}% (E_{\rm que}^{i}(\mu)),

(2)

h_{\rm val}=\mathcal{V}(\mu)+\sum_{i=1}^{N}\delta(\mu)_{i}\theta_{\rm val}^{i}% (E_{\rm val}^{i}(\mu)),

(3)

$\mathcal{K}$ is the key branch of SAM, $\delta(\cdot)_{i}$ is the $i$ -th component of $\delta(\cdot)$ and $\smallfrown$ is an concatenation operation. In addition to updating the attention computation, we apply heterogeneous space integration to the feed-forward network $\mathcal{F}_{\rm ffn}$ for learning domain-specific embeddings. The final image embeddings $\mathcal{H}$ are defined by:

\mathcal{H}=\mathcal{A}+\mathcal{F}_{\rm ffn}(\mathcal{A})+\sum_{i=1}^{N}% \delta(\mathcal{A})_{i}\theta_{\rm ffn}^{i}(\phi(E_{\rm ffn}^{i}(\mathcal{A}))),

(4)

where $\{E_{\rm ffn}^{i}\}_{i=1}^{N}$ is a set of learnable linear layers that projects $\mathcal{A}$ into the different dimensions for the construction of heterogeneous space, $\{\theta_{\rm ffn}^{i}\}_{i=1}^{N}$ is a set of the up-projections used to align the dimension with $\mathcal{A}$ , and $\phi$ is the nonlinear activation function. Compared to the conventional parameter-efficient fine-tuning techniques, the HS-Adapter performs better in learning heterogeneous relationships between different nuclei domains by using multi-dimensional projection, enhancing the representation of domain-specific knowledge. Overall, our proposed HS-Adapter significantly reduces the number of parameters during the fine-tuning stage.

3.3 Gaussian-Kernel Prompt Encoder

The original SAM [10] and medical SAMs [43, 15, 16] mainly rely on manual box prompts to guide the model in predicting accurate segmentation masks. Despite its advantages, this prompt mode is sensitive to the localization of boxes. Minor labeling errors can significantly reduce the quality of generating segmentation masks. Therefore, the precise manual box annotation is impractical in nuclei segmentation tasks as a histopathological image usually contains thousands of nuclei and tight cell clusters in clinical scenarios. In this paper, we introduce the GKP-Encoder that leverages single-point prompts to produce a high-quality density map, providing additionally sufficient semantic information prompts to assist segmentation decoding.

Given $L$ cell positions: $\{(x_{l},y_{l})\}_{l=1}^{L}$ in a nuclei image, where $x_{l},y_{l}\in\mathbb{N}$ , the corresponding density map $\mathcal{D}=\{\mathcal{D}_{z,j}\}\in\mathbb{R}^{H\times W}$ [49] is defined by:

\mathcal{D}_{z,j}=\sum_{l=1}^{L}G_{\sigma}(z-x_{l},j-y_{l}),

(5)

where

G_{\sigma}(z-x_{l},j-y_{l})=C_{\rm norm}\cdot e^{-\frac{(z-x_{l})^{2}+(j-y_{l}% )^{2}}{2\sigma^{2}}},

(6)

$z\in\{0,1,\cdots,W\}$ , $j\in\{0,1,\cdots,H\}$ , $\sigma^{2}$ is the isotropic covariance, and $C_{\rm norm}$ is a normalization constant. In Eq. 6, $G_{\sigma}(\cdot)$ stands for a normalized 2D Gaussian kernel, and

\sum_{z-x_{l}=-r}^{r}\sum_{j-y_{l}=-r}^{r}G_{\sigma}(z-x_{l},j-y_{l})=1,

(7)

where $r\in\mathbb{Z}$ determines the kernel size of ${(2r+1)\times(2r+1)}$ . To fit different sizes of nuclei and provide sufficient semantic information, the parameter $r$ is set to 10 in our study. In the next step, we utilize a small convolutional network to transform $\mathcal{D}$ to a set of high-quality semantic information prompt embeddings $\mathcal{P}_{\rm sem}\in\mathbb{R}^{(\frac{H\times W}{n})\times 256}$ , where 256 is the channel number. The computation is formulated as:

\mathcal{P}_{\rm sem}=\phi(F_{\rm conv}(\phi(F_{\rm norm}(F_{\rm conv}(% \mathcal{D}))))),

(8)

where $F_{\rm conv}$ is a $2\times 2$ convolutional layer with the stride 2, $F_{\rm norm}$ is LayerNorm and $\phi$ represents GELU activation function. Moreover, the provided cell positions are used to generate additional position prompt embedding $\mathcal{P}_{\rm pos}$ using the sparse prompt encoder of SAM, where $\mathcal{P}_{\rm pos}\in\mathbb{R}^{L\times 256}$ stands for the sum of a positional encoding of the location and learnable embeddings. In this way, the proposed GKP-Encoder, driven by the single-point annotation, not only is labor-saving compared to the box annotation but also provides efficient semantic prompts $\mathcal{P}_{\rm sem}$ and position prompts $\mathcal{P}_{\rm pos}$ for guiding segmentation decoding.

3.4 Two-Stage Mask Decoder

In the last decade, U-shape hierarchical decoders [7, 34, 9] have been widely used for the prediction of nuclei semantic and instance segmentation masks. For the latter, previous methods usually utilized morphological post-processing methods to detect each cell based on the generated nuclear proxy maps. However, such operations require laboriously manual parameter adjustment when facing different nuclei domains, degrading the generalization capabilities of models. On the other hand, current medical SAMs [15, 16, 13] adopted a sequential inference algorithm to recognize each target object in images, so they are time-consuming for nuclei instance segmentation tasks involving a large number of cells. To address this issue, we propose the TSM-Decoder that improves the efficiency of producing instance maps by focusing on the prediction of precise semantic segmentation masks. Specifically, we first create trainable query embeddings $q\in\mathbb{R}^{C\times 256}$ to save the decoding information. Different from SAM [10], $C$ represents the number of prediction categories instead of multi-layer masks as histopathology images may include different types of nuclei. Then, we concatenate position prompts $\mathcal{P}_{pos}$ with $q$ and perform a self-attention operation as:

q^{\prime}=\delta(\frac{\mathcal{Q}(\mathcal{P}_{pos}\smallfrown q)\cdot{% \mathcal{K}(\mathcal{P}_{pos}\smallfrown q)}^{T}}{\sqrt{d}})\cdot\mathcal{V}(% \mathcal{P}_{pos}\smallfrown q),

(9)

where $q^{\prime}\in\mathbb{R}^{(C+L)\times 256}$ is updated query embedding. Following this, we combine the image embedding $\mathcal{H}$ with semantic information prompts $\mathcal{P}_{sem}$ : $\mathcal{H}^{\prime}\leftarrow\mathcal{H}\oplus\mathcal{P}_{sem}$ , where $\oplus$ stands for the element-wise addition operation. Further, we conduct cross-attention with $q^{\prime}$ to generate decoding embeddings $\mathcal{G}$ , by:

\mathcal{G}=\delta(\frac{(\mathcal{H}^{\prime}+\Psi)\cdot{(q^{\prime})}^{T}}{% \sqrt{d}})\cdot q^{\prime}+\mathcal{H}^{\prime},

(10)

where $\Psi$ is positional encodings. Similar to SAM [10], we iterate this operation twice for sufficient updation. Finally, we predict the semantic segmentation mask $\hat{\mathcal{Y}}_{k}\in\mathbb{R}^{H\times W}$ by:

\hat{\mathcal{Y}}_{k}=\rho(\mathcal{F}_{\rm inter}(\mathcal{F}_{\rm trans}(% \mathcal{G})\cdot\mathcal{F}_{\rm MLP}(q^{\prime}))),

(11)

where $\mathcal{F}_{\rm trans}$ is a $4\times 4$ transpose convolution for up-sampling the decoding embeddings, $\mathcal{F}_{\rm MLP}$ represents a multilayer perceptron to perform dimensional alignment, $\mathcal{F}_{\rm inter}$ is a bilinear interpolation function to recover the shape of masks and $\rho$ is the sigmoid function. During the fine-tuning stage, we apply the weighted combination of focal loss $L_{\rm focal}$ and dice loss $L_{\rm dice}$ to supervise the predicted semantic mask $\hat{\mathcal{Y}}_{k}$ of different domains by:

L_{\rm sem}=\alpha L_{\rm dice}+\beta L_{\rm focal},

(12)

where $\alpha$ and $\beta$ respectively stand for the coefficients of focal loss and dice loss. On this basis, the prediction semantic mask can provide accurate target segmentation areas, enabling simply separating each cell by using an auxiliary neural network (e.g., StarDIST). In summary, our NuSegDG framework achieves domain generalization on both nuclei semantic and instance segmentation tasks.

Table 1: Domain-generalized comparison with state-of-the-arts on nuclei semantic segmentation.

Datasets	$\mathcal{T}=\mathcal{S}_{1}$				$\mathcal{T}=\mathcal{S}_{2}$				$\mathcal{T}=\mathcal{S}_{3}$				$\mathcal{T}=\mathcal{S}_{4}$
Methods	Dice	mIoU	F1	HD	Dice	mIoU	F1	HD	Dice	mIoU	F1	HD	Dice	mIoU	F1	HD
U-Net [20]	21.88	15.31	26.52	355.19	56.01	41.66	65.81	77.08	25.73	16.87	30.73	399.99	65.49	49.89	68.52	99.06
U-Net++ [21]	25.99	17.53	31.55	372.39	59.64	46.03	67.26	86.35	20.14	12.50	22.50	418.79	66.54	51.04	68.80	99.63
AttUNet [22]	26.66	18.30	32.42	372.44	58.79	45.24	66.79	92.43	27.70	17.98	31.74	375.44	66.13	50.55	68.46	101.81
DCSAU-Net [23]	43.42	30.73	47.49	246.01	67.17	52.72	71.83	78.62	33.66	22.34	37.27	307.45	71.58	56.47	73.25	94.36
TransUNet [28]	64.41	52.85	70.93	151.67	77.51	63.54	78.11	64.64	73.41	60.26	76.53	229.66	74.17	59.43	75.27	95.42
ACC-UNet [50]	29.11	20.27	35.12	381.36	65.75	51.39	70.54	77.79	32.51	21.29	35.35	312.20	69.63	54.24	70.94	98.92
nnU-Net [51]	23.71	16.34	28.48	360.33	62.43	48.37	69.79	81.52	31.45	20.75	34.70	342.90	68.88	53.50	70.67	99.56
U-mamba [29]	12.01	9.05	14.89	392.97	51.68	39.28	62.53	101.20	19.39	11.89	21.50	408.82	58.79	45.24	66.79	92.43
FedDG [38]	62.79	49.43	69.99	266.36	77.03	62.90	78.01	70.12	71.19	57.65	74.47	242.18	71.85	56.64	72.83	99.19
DCAC [37]	50.29	38.01	55.50	162.13	68.50	53.71	72.48	73.95	29.96	19.48	33.36	322.49	69.71	54.36	71.15	100.21
SAM [10]	66.59	55.29	72.37	252.80	76.84	62.63	77.38	66.82	76.48	63.16	78.34	144.39	75.56	61.25	76.14	75.61
Med-SA [43]	76.94	66.21	80.73	125.24	79.55	66.25	80.00	65.19	78.43	66.26	81.02	127.02	80.54	67.66	80.75	80.75
SAMed [47]	76.79	65.70	80.27	137.15	78.86	65.35	79.51	66.20	78.48	65.68	80.06	137.82	78.77	65.29	79.08	69.09
SAMUS [44]	78.51	68.09	82.14	107.32	80.25	67.16	80.68	63.27	80.91	68.69	82.40	82.33	80.72	67.98	80.97	91.39
SAM-CL [41]	73.48	63.05	78.65	157.08	77.51	63.54	78.11	64.64	76.99	63.94	79.24	119.73	78.65	65.12	78.95	74.44
SAC [52]	69.11	58.14	74.60	245.04	77.80	63.89	78.52	66.76	76.09	63.17	78.49	212.19	75.32	60.86	75.74	90.28
LeSAM [53]	71.80	61.45	77.55	207.65	77.83	63.95	78.58	67.65	76.48	63.78	78.80	205.46	74.72	60.02	75.06	91.03
H-SAM [54]	76.79	65.70	80.73	165.25	78.00	64.17	78.65	66.07	76.33	64.03	78.89	215.47	77.04	63.00	77.40	89.83
NuSegDG	80.55	70.71	84.19	54.81	82.43	70.23	82.72	61.36	82.88	71.24	83.34	64.56	83.90	72.49	84.11	64.38

Table 2: Adaptability Comparison with state-of-the-arts on nuclei semantic segmentation.

Datasets	$\mathcal{S}_{1}$				$\mathcal{S}_{2}$				$\mathcal{S}_{3}$				$\mathcal{S}_{4}$
Methods	Dice	mIoU	F1	HD	Dice	mIoU	F1	HD	Dice	mIoU	F1	HD	Dice	mIoU	F1	HD
U-Net [20]	90.42	83.22	91.07	137.11	71.41	56.17	72.06	78.13	72.27	57.38	72.97	123.13	81.02	68.52	81.19	90.01
U-Net++ [21]	90.85	83.83	91.33	117.58	75.72	61.05	76.15	73.38	64.66	49.61	67.56	148.37	81.80	69.62	91.97	80.32
AttUNet [22]	91.01	84.13	91.41	112.98	75.81	61.23	76.13	74.32	75.98	61.81	77.28	131.51	81.34	68.98	81.69	80.75
DCSAU-Net [23]	91.74	85.15	92.04	127.16	75.19	60.38	75.58	77.22	78.33	64.61	78.99	103.61	80.90	68.42	81.24	83.89
TransUNet [28]	91.41	84.65	91.75	132.94	76.30	61.83	76.70	77.33	76.59	62.38	77.51	109.36	82.49	70.47	82.60	78.14
ACC-UNet [50]	90.95	83.96	91.38	119.39	77.90	63.93	78.32	74.79	66.84	52.32	70.29	147.97	82.13	69.91	82.23	77.89
nnU-Net [51]	90.11	82.45	90.57	138.83	80.84	67.91	81.03	73.04	84.32	72.91	84.43	117.51	81.37	68.73	81.63	83.01
U-mamba [29]	89.96	82.58	90.66	127.03	77.38	63.23	77.74	73.10	63.62	47.81	65.44	167.67	82.43	70.37	82.57	82.42
FedDG [38]	90.48	83.25	91.11	134.97	74.57	59.77	75.16	76.74	74.88	60.45	76.57	110.62	81.51	69.17	81.93	86.56
DCAC [37]	91.22	84.35	91.60	112.82	74.89	60.11	75.32	74.31	63.79	49.75	68.14	152.78	81.53	69.19	81.65	71.69
SAM [10]	89.78	82.14	90.33	138.58	76.87	62.63	77.23	78.83	79.28	65.91	79.90	153.47	77.75	64.43	78.39	101.65
Med-SA [43]	91.72	85.12	92.08	69.42	81.32	68.61	81.57	74.59	83.47	71.89	83.63	86.62	83.32	71.59	83.37	77.74
SAMed [47]	91.32	84.47	91.63	71.67	80.06	66.97	80.46	77.14	81.75	69.20	81.78	101.22	82.23	70.05	82.32	76.68
SAMUS [44]	92.07	85.67	92.35	115.18	83.26	71.39	83.33	71.14	84.67	73.45	84.72	114.98	83.40	71.75	83.70	79.15
SAM-CL [41]	91.81	85.29	92.12	89.48	81.17	68.41	81.34	73.57	82.95	70.90	83.02	135.31	82.51	70.46	82.86	90.85
SAC [52]	91.79	85.16	92.10	123.01	81.06	68.25	81.21	72.68	83.51	71.74	83.72	119.45	81.80	69.36	81.96	80.83
LeSAM [53]	91.71	85.08	92.00	128.00	79.67	66.34	79.85	74.36	82.43	70.15	82.57	137.49	80.38	67.56	80.90	90.91
H-SAM [54]	92.01	85.53	92.27	129.34	81.45	68.81	81.55	71.99	83.40	71.60	83.51	118.60	81.65	69.20	82.09	82.24
NuSegDG	93.17	87.46	93.35	33.18	86.37	76.06	86.40	44.44	88.20	78.93	87.03	49.69	84.59	73.44	84.75	64.61

4 Experiments

4.1 Datasets and Implementations

4.1.1 Datasets

To validate the effectiveness of the proposed NuSegDG, we collect DSB-2018 [3], MoNuSeg-2018 [4], TNBC [5] and CryoNuSeg [6] datasets to perform comprehensive comparisons for domain generalization. We denote these four nuclei datasets with source domains $\mathcal{S}_{1}$ , $\mathcal{S}_{2}$ , $\mathcal{S}_{3}$ and $\mathcal{S}_{4}$ , respectively. The details are as follows.

DSB-2018 [3] dataset includes 670 nuclei images captured using fluorescence microscopy, offering a range of staining methods including DAPI, Hoechst, Hematoxylin and Eosin (H&E). These images are annotated with nuclear masks to facilitate segmentation tasks and vary in size.

MoNuSeg-2018 [4] dataset consists of 51 H&E stained histopathology images from various organs, including breast, liver, kidney, prostate, bladder, colon, and stomach. Each image measures $1000\times 1000$ pixels, captured at $40\times$ magnification.

TNBC [5] dataset comprises nuclei images stained with Hematoxylin and Eosin (H&E), sourced from breast cancer patients. This dataset includes 50 images with a resolution of $512\times 512$ pixels, captured at $40\times$ magnification.

CryoNuSeg [6] dataset contains H&E stained tissues from 10 different organs, providing 30 images of $512\times 512$ pixels, captured at $40\times$ magnification. The diversity of tissue types offers a comprehensive resource for evaluating the robustness of segmentation methods.

4.1.2 Implementation Details

We conduct our experiments on two parallel NVIDIA Tesla P40 GPUs (48GB), utilizing PyTorch 1.13.0, Python 3.10, and CUDA 11.7. We maintain consistent training settings and configurations across all experiments to ensure fairness and reproducibility. For the optimizer, we employ Adam with a batch size of 2 and train models for 100 epochs. The initial learning rate is set to 0.0001 and is adjusted using an exponential decay strategy with a decay factor of 0.98. The loss coefficient $\alpha$ and $\beta$ are set to 0.8 and 0.2 during the training. In our proposed NuSegDG framework, the number of heterogeneous space $N$ is set to 2. All images are resized to $1024\times 1024$ . To save computational costs, the ViT-B [10] is considered as the image encoder for all SAM-based frameworks. For the TSM-Decoder, we select the pre-trained StarDIST [55] as our auxiliary neural network to facilitate accurate instance segmentation without manual morphological shape refinement. We utilize the single-point prompt to fine-tune all SAM-based architectures. The point is generated using the connectedComponents in OpenCV, which is the centroid of each nucleus instance.

4.2 Evaluation Metrics

In our experiments, we first evaluate the performance of models on the semantic segmentation task using four common metrics: Dice coefficient, mean Intersection over Union (mIoU), F1-score, and Hausdorff Distance (HD). Then, we adopt four extra metrics: Aggregated Jaccard Index (AJI), Detection Quality (DQ), Segmentation Quality (SQ), and Panoptic Quality (PQ), to make comparisons on the instance segmentation task. Especially, the best and second-best performance values are highlighted in bold and underlined. For each task, we use two different evaluation protocols: domain generalization and adaptability evaluation.

4.2.1 Domain Generalization Evaluation

We employ a standard leave-one-domain-out strategy [56] to conduct the domain generalization evaluation. Specifically, the model is trained on a training set $\mathcal{S}$ of $K-1$ source domains, where each source domain represents a different data distribution, and then evaluated on the remaining unseen target domains $\mathcal{T}$ , e.g, $\mathcal{S}=\{\mathcal{S}_{1},\mathcal{S}_{2},\mathcal{S}_{3}\}$ , $\mathcal{T}=\mathcal{S}_{4}$ .

4.2.2 Adaptability Evaluation

In this protocol, we perform a fully supervised learning where all four datasets (i.e., $\mathcal{S}_{1},\mathcal{S}_{2},\mathcal{S}_{3},\mathcal{S}_{4}$ ) are considered as seen domains. We randomly divide all datasets into three sets: training, validation, and testing, in the conventional ratio of 8:1:1. The model is evaluated on the testing set of each dataset individually. The reason for conducting this protocol is to assess the adaptability of each model across different domains. Moreover, we display the performance gap between our domain generalization approach and traditional fully supervised methods. This comparison further demonstrates the effectiveness of NuSegDG on domain generalization.

Table 3: Domain-generalized Comparison with state-of-the-arts on nuclei instance segmentation.

Datasets	$\mathcal{T}=\mathcal{S}_{1}$				$\mathcal{T}=\mathcal{S}_{2}$				$\mathcal{T}=\mathcal{S}_{3}$				$\mathcal{T}=\mathcal{S}_{4}$
Methods	AJI	DQ	SQ	PQ	AJI	DQ	SQ	PQ	AJI	DQ	SQ	PQ	AJI	DQ	SQ	PQ
U-Net [20]	6.35	4.94	10.74	3.41	31.13	28.70	66.70	19.86	14.24	16.19	48.22	10.88	42.35	47.64	72.10	34.51
Mask-RCNN [57]	7.73	7.26	12.82	5.08	30.06	30.71	66.49	21.74	11.30	14.05	48.53	9.37	43.57	48.98	71.73	35.30
StarDIST [55]	8.17	6.55	11.90	4.52	35.58	39.40	70.08	28.49	18.66	24.19	56.50	16.53	43.81	49.61	71.79	35.71
Hover-Net [7]	5.42	2.52	16.37	1.75	42.07	53.42	72.96	39.43	23.11	25.50	65.28	17.73	41.10	40.63	64.17	30.48
TSFD-Net [34]	6.03	5.23	11.53	3.68	41.37	49.92	71.41	35.87	12.48	17.02	51.67	11.36	37.95	45.16	71.56	32.50
CellPose [31]	15.75	19.00	36.72	13.98	20.56	27.47	73.04	20.18	38.70	54.55	71.16	42.53	41.84	54.39	72.58	41.27
CPP-Net [8]	8.63	6.58	12.04	4.53	41.07	47.81	72.39	34.93	17.42	21.94	60.86	14.66	41.09	47.78	72.33	34.72
CellViT [9]	4.39	4.95	12.00	3.66	48.44	63.31	73.85	49.17	47.30	64.00	64.76	45.53	43.94	57.23	67.57	42.36
SAM [10]	41.95	39.24	63.75	30.00	47.42	55.73	71.85	40.41	53.92	57.56	72.24	43.79	41.00	48.43	73.03	35.62
Med-SA [43]	57.64	55.06	74.90	44.42	50.59	62.06	74.10	46.23	56.60	63.26	72.98	47.94	51.40	61.72	73.50	45.11
SAMed [47]	51.15	46.98	69.71	36.92	49.47	61.00	73.58	45.10	54.59	59.69	73.20	44.14	49.34	59.31	73.33	43.65
SAMUS [44]	60.98	61.41	77.80	50.17	50.62	62.17	74.11	46.32	57.27	67.65	72.60	50.32	51.34	62.07	73.40	45.68
SAM-CL [41]	52.76	51.63	71.08	41.17	47.84	57.26	72.99	42.00	55.29	60.79	73.49	46.30	49.36	59.26	73.59	43.78
PromptNucSeg [13]	50.89	48.66	70.44	38.87	48.82	59.06	72.24	43.03	54.14	57.31	72.79	43.78	45.49	44.44	71.73	32.10
NuSegDG	63.31	72.02	77.99	58.07	58.18	73.19	74.46	54.63	58.30	69.54	73.76	51.61	55.56	63.73	75.78	48.51

Table 4: Adaptability Comparison with state-of-the-arts on nuclei instance segmentation.

Datasets	$\mathcal{S}_{1}$				$\mathcal{S}_{2}$				$\mathcal{S}_{3}$				$\mathcal{S}_{4}$
Methods	AJI	DQ	SQ	PQ	AJI	DQ	SQ	PQ	AJI	DQ	SQ	PQ	AJI	DQ	SQ	PQ
U-Net [20]	63.49	74.81	81.72	61.45	50.27	61.75	74.74	46.23	51.34	60.73	75.32	45.88	46.43	55.90	75.80	42.38
Mask-RCNN [57]	63.32	75.03	81.05	61.44	45.32	55.98	74.16	41.61	43.84	54.79	74.71	41.35	46.72	56.16	76.25	42.87
StarDIST [3]	63.38	74.78	80.51	60.90	54.95	68.24	74.36	50.87	45.26	56.15	76.11	42.76	46.57	53.82	74.84	40.29
Hover-Net [7]	61.59	61.04	79.48	50.03	54.97	71.29	75.68	54.02	25.10	25.14	69.16	17.75	36.35	34.98	71.59	25.21
TSFD-Net [34]	62.25	72.67	80.14	59.06	54.16	67.98	74.55	50.78	42.93	53.77	75.97	40.85	47.64	58.15	75.20	43.79
CellPose [31]	66.77	80.21	82.54	66.93	20.62	27.82	74.08	20.72	45.21	62.82	76.88	48.40	36.46	45.90	74.92	34.61
CPP-Net [8]	63.60	74.72	81.51	61.48	52.29	66.21	73.94	49.03	56.00	68.20	77.75	53.05	47.36	56.03	76.00	43.21
CellViT [9]	60.51	73.92	84.10	63.25	57.91	77.35	77.19	60.54	53.16	68.72	77.65	54.83	40.87	58.13	76.16	44.38
SAM [10]	73.32	80.21	83.08	67.16	55.51	69.70	75.24	52.51	55.16	54.44	77.74	42.45	41.66	50.71	75.71	38.77
Med-SA [43]	74.17	84.89	81.95	70.03	64.53	78.64	75.45	59.42	64.59	67.92	77.11	52.89	50.35	59.52	73.80	44.26
SAMed [47]	71.95	82.30	80.04	66.42	62.62	75.52	75.70	57.29	63.30	65.15	75.09	49.09	49.14	56.94	73.94	42.42
SAMUS [44]	73.70	86.97	80.88	70.72	67.98	83.09	76.87	63.93	63.34	76.56	77.23	60.84	51.25	63.53	75.79	48.35
SAM-CL [41]	73.20	86.40	80.89	70.37	63.60	78.49	74.84	58.84	60.11	67.01	77.19	51.90	50.42	61.28	74.58	45.83
PromptNucSeg [13]	74.26	85.13	82.25	70.31	64.53	79.03	76.32	60.41	61.94	72.44	77.46	56.98	52.31	62.11	73.23	45.71
NuSegDG	77.91	88.88	85.47	76.31	69.81	88.66	77.68	68.88	73.08	85.33	78.15	66.84	53.33	63.64	76.87	49.11

Table 5: Ablation study of NuSegDG in domain-generalized Nuclei Instance Segmentation:

\mathcal{S}\rightarrow\mathcal{T}

M_{1}

: HS-Adapter.

M_{2}

: GKP-Encoder.

M_{3}

: TSM-Decoder.

Row	$M_{1}$	$M_{2}$	$M_{3}$	AJI (Avg.)	DQ (Avg.)	SQ (Avg.)	PQ (Avg.)
1				48.69	53.78	70.11	39.90
2	✓			53.86	60.14	72.77	46.51
3		✓		51.48	56.66	71.36	43.07
4			✓	50.70	55.81	71.05	41.48
5	✓	✓		56.29	66.23	74.21	50.38
6	✓		✓	54.85	63.93	73.17	48.26
7		✓	✓	53.15	58.76	71.96	46.14
8	✓	✓	✓	58.84	69.62	75.50	53.21

4.3 Comparison on Nuclei Semantic Segmentation

To comprehensively assess our NuSegDG, we first compare it with state-of-the-art frameworks on nuclei semantic segmentation. As illustrated in Table 1, in the domain generalization evaluation, TransUNet [28] achieves leading results among previous U-shape segmentation algorithms due to its large model capacity. Benefiting from pre-training on the large-scale dataset, PEFT SAMs [43, 47, 44, 41, 52, 53, 54] display better performance than these task-specific models. In contrast, our NuSegDG surpasses the second-best SAMUS by a significant mIoU increase of 2.62%, 3.07%, 2.55%, and 4.51% on these four target domains, respectively. Compared to the prompt-free SAMs, NuSegDG presents a mIoU rise of 5.01% to 12.57%.

Moreover, Table 2 provides results of the adaptability evaluation. We observe that previous U-shape architectures show remarkable performance gains in the seen domain but are inferior to PEFT SAMs. Our NuSegDG achieves superior performance on these four datasets, with the best mIoU of 87.46%, 76.06%, 78.93% and 73.44%, respectively. On the other hand, the domain-generalized NuSegDG in Table 1 demonstrates competitive performance on $\mathcal{S}_{2}$ , $\mathcal{S}_{3}$ and $\mathcal{S}_{4}$ domains compared to fully-supervised U-shape and SAM-based architectures in Table 2. We provide the visualization results in Fig. 2. Consequently, these comparisons validate the superiority of our NuSegDG on domain-generalized nuclei semantic segmentation tasks and better adaptability in diverse nuclei domains.

4.4 Comparison on Nuclei Instance Segmentation

To further evaluate our NuSegDG in nuclei instance segmentation tasks, we perform the comparison with advanced nuclei instance segmentation frameworks on four different nuclei domains. Firstly, Table 3 presents experimental results under the domain generalization evaluation. It is revealed that previous morphological post-processing algorithms [7, 34, 31, 8, 9] show poor generalization capabilities on the $\mathcal{S}_{1}$ domain. On the contrary, SAMUS [44] performs better than these methods by achieving a remarkable PQ of 50.17%, 46.32%, 50.32% and 45.68% on four domains, respectively. Our NuSegDG outperforms it with a significant PQ increase of 7.90%, 8.31%, 1.29%, and 2.83%, respectively.

Furthermore, we provide the adaptability evaluation result in Table 4. It is demonstrated that PEFT SAMs outperform morphological post-processing algorithms in four nuclei datasets. For example, PromptNucSeg [13] has a 6.62% AJI increase over CellViT [9] on the $\mathcal{S}_{2}$ domain. In contrast, our NuSegDG framework achieves the best AJI of 77.91%, 69.81%, 73.08%, and 53.33%, respectively, on the four datasets, and performs better than the state-of-the-art methods in the other three evaluation metrics. The quantitative comparison is presented in Fig. 3. As a result, these results reveal a significant performance advantage of our NuSegDG over current medical foundation models and task-specific architectures on domain-generalized nuclei semantic and instance segmentation tasks.

4.5 Ablation Study

To investigate the effectiveness of the individual components within the NuSegDG framework, we conduct an ablation study on domain-generalized nuclei instance segmentation, as summarized in Table 5. This study sequentially enables or disables the HS-Adapter $M_{1}$ , GKP-Encoder $M_{2}$ , and TSM-Decoder $M_{3}$ to evaluate their impact on the performance of the average AJI, DQ, SQ, and PQ metrics. Firstly, we consider the standard fine-tuned SAM ( $1^{st}$ row) as the ablation baseline. By respectively embedding the HS-Adapter ( $2^{nd}$ row), GKP-Encoder ( $3^{rd}$ row) and TSM-Decoder ( $4^{th}$ row), the performance is raised with the average AJI of 5.17%, 2.79%, 2.07%, and the average PQ of 6.61%, 3.17%, 1.58%.

Moreover, when we combine HS-Adapter with GKP-Encoder ( $5^{th}$ row), the performance of the model is further improved, with the average AJI of 56.29 and PQ of 50.38% on the four domains. This result proves that these two modules can promote the domain generalization capability in nuclei instance segmentation. By comparing $6^{th}$ and $7^{th}$ rows with $2^{nd}$ and $3^{rd}$ rows, the TSM-Decoder demonstrates significant performance gains while eliminating the demand for manual morphological refinement. Finally, our NuSegDG framework ( $8^{th}$ row) integrates all three modules and achieves the best performance on all metrics, with an average AJI of 58.84%, an average DQ of 69.62%, an average SQ of 75.50%, and an average PQ of 53.21%. This full configuration significantly outperforms the others, emphasizing the synergistic benefits of incorporating all modules. This result highlights the importance of each component in enhancing the generalization capability of NuSegDG across different nuclei image domains.

4.6 Analysis of Hyper-Parameters

In this section, we perform a comprehensive hyper-parameters analysis of our NuSegDG model. As reported in Section 3.2 and 3.3, NuSegDG contains two hyper-parameters, including the Gaussian kernel size $r$ in GKP-Encoder and the number of heterogeneous space $N$ in HS-Adapter. For the kernel size, we perform a grid search under the fully-supervised learning to select an optimal configuration. Fig. 4a shows the average Dice and mIoU of NuSegDG on the four nuclei domains with different kernel sizes. It is indicated that the NuSegDG with $r=10$ demonstrates the best performance due to the sufficient semantic information prompts. However, excessive kernel size may generate false positive errors, which cannot offer additional benefits. For the number of heterogeneous space, we provide the result of grid search in Fig. 4b. We observe that the NuSegDG with $N=2$ obtains the best performance. Setting more heterogeneous space significantly increases the computational complexity of NuSegDG, which is not suitable for limited training samples in nuclei domains. These experimental results prove the importance of tuning these hyper-parameters to improve the efficiency of our NuSegDG framework in learning domain-specific knowledge.

5 Conclusion

In this paper, we have proposed NuSegDG for domain-generalized nuclei image segmentation. Specifically, the HS-Adapter has been introduced to adapt the feature representation of SAM from natural to different nuclei images by heterogeneous space integration. Then, the GKP-Encoder has been devised to produce high-quality density maps, driven by the single-point prompt, with sufficient semantic information for guiding segmentation predictions. Finally, the TSM-Decoder has achieved the automatic conversion between the semantic masks and instance maps without demand for labor-intensive morphological post-processing methods. Extensive experimental results have demonstrated that NuSegDG has outperformed the existing nuclei-specific and SAM-based segmentation methods in domain-generalized nuclei image segmentation and displayed superior adaptability across different nuclei domains.

References

[1] C. Li, W. Ma, L. Sun, X. Ding, Y. Huang, G. Wang, and Y. Yu, “Hierarchical deep network with uncertainty-aware semi-supervised learning for vessel segmentation,” Neural Computing and Applications, pp. 1–14, 2022.
[2] L. Sun, C. Li, X. Ding, Y. Huang, Z. Chen, G. Wang, Y. Yu, and J. Paisley, “Few-shot medical image segmentation using a global correlation network with discriminative embedding,” Computers in biology and medicine, vol. 140, p. 105067, 2022.
[3] J. C. Caicedo, A. Goodman, K. W. Karhohs, B. A. Cimini, J. Ackerman, M. Haghighi, C. Heng, T. Becker, M. Doan, C. McQuin et al., “Nucleus segmentation across imaging experiments: the 2018 data science bowl,” Nature methods, vol. 16, no. 12, pp. 1247–1253, 2019.
[4] N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane, and A. Sethi, “A dataset and a technique for generalized nuclear segmentation for computational pathology,” IEEE transactions on medical imaging, vol. 36, no. 7, pp. 1550–1560, 2017.
[5] P. Naylor, M. Laé, F. Reyal, and T. Walter, “Segmentation of nuclei in histopathology images by deep regression of the distance map,” IEEE transactions on medical imaging, vol. 38, no. 2, pp. 448–459, 2018.
[6] A. Mahbod, G. Schaefer, B. Bancher, C. Löw, G. Dorffner, R. Ecker, and I. Ellinger, “Cryonuseg: A dataset for nuclei instance segmentation of cryosectioned h&e-stained histological images,” Computers in biology and medicine, vol. 132, p. 104349, 2021.
[7] S. Graham, Q. D. Vu, S. E. A. Raza, A. Azam, Y. W. Tsang, J. T. Kwak, and N. Rajpoot, “Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images,” Medical image analysis, vol. 58, p. 101563, 2019.
[8] S. Chen, C. Ding, M. Liu, J. Cheng, and D. Tao, “Cpp-net: Context-aware polygon proposal network for nucleus segmentation,” IEEE Transactions on Image Processing, vol. 32, pp. 980–994, 2023.
[9] F. Hörst, M. Rempe, L. Heine, C. Seibold, J. Keyl, G. Baldini, S. Ugurel, J. Siveke, B. Grünwald, J. Egger et al., “Cellvit: Vision transformers for precise cell segmentation and classification,” Medical Image Analysis, vol. 94, p. 103143, 2024.
[10] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026.
[11] X. Lan, J. Lyu, H. Jiang, K. Dong, Z. Niu, Y. Zhang, and J. Xue, “Foodsam: Any food segmentation,” IEEE Transactions on Multimedia, 2023.
[12] X. Zhang, Y. Liu, Y. Lin, Q. Liao, and Y. Li, “Uv-sam: Adapting segment anything model for urban village identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 20, 2024, pp. 22 520–22 528.
[13] Z. Shui, Y. Zhang, K. Yao, C. Zhu, Y. Sun, and L. Yang, “Unleashing the power of prompt-driven nucleus instance segmentation,” European conference on computer vision, 2024.
[14] J. Cheng, J. Ye, Z. Deng, J. Chen, T. Li, H. Wang, Y. Su, Z. Huang et al., “Sam-med2d,” arXiv preprint arXiv:2308.16184, 2023.
[15] J. Ma, Y. He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,” Nature Communications, vol. 15, no. 1, p. 654, 2024.
[16] Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, J. Yu, J. Chen, C. Chen et al., “Segment anything model for medical images?” Medical Image Analysis, vol. 92, p. 103061, 2024.
[17] Y. Zhang, Z. Shen, and R. Jiao, “Segment anything model for medical image segmentation: Current applications and future directions,” Computers in Biology and Medicine, p. 108238, 2024.
[18] M. A. Mazurowski, H. Dong, H. Gu, J. Yang, N. Konz, and Y. Zhang, “Segment anything model for medical image analysis: an experimental study,” Medical Image Analysis, vol. 89, p. 102918, 2023.
[19] Z. Chen, Q. Xu, X. Liu, and Y. Yuan, “Un-sam: Universal prompt-free segmentation for generalized nuclei images,” arXiv preprint arXiv:2402.16663, 2024.
[20] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 2015, pp. 234–241.
[21] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE transactions on medical imaging, vol. 39, no. 6, pp. 1856–1867, 2019.
[22] J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, and D. Rueckert, “Attention gated networks: Learning to leverage salient regions in medical images,” Medical image analysis, vol. 53, pp. 197–207, 2019.
[23] Q. Xu, Z. Ma, H. Na, and W. Duan, “Dcsau-net: A deeper and more compact split-attention u-net for medical image segmentation,” Computers in Biology and Medicine, vol. 154, p. 106626, 2023.
[24] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
[25] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “Ds-transunet: Dual swin transformer u-net for medical image segmentation,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–15, 2022.
[26] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” in European conference on computer vision. Springer, 2022, pp. 205–218.
[27] Z. Li, Y. Li, Q. Li, P. Wang, D. Guo, L. Lu, D. Jin, Y. Zhang, and Q. Hong, “Lvit: language meets vision transformer in medical image segmentation,” IEEE transactions on medical imaging, 2023.
[28] J. Chen, J. Mei, X. Li, Y. Lu, Q. Yu, Q. Wei, X. Luo, Y. Xie, E. Adeli, Y. Wang et al., “Transunet: Rethinking the u-net architecture design for medical image segmentation through the lens of transformers,” Medical Image Analysis, p. 103280, 2024.
[29] J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,” arXiv preprint arXiv:2401.04722, 2024.
[30] M. Pachitariu and C. Stringer, “Cellpose 2.0: how to train your own model,” Nature methods, vol. 19, no. 12, pp. 1634–1641, 2022.
[31] C. Stringer, T. Wang, M. Michaelos, and M. Pachitariu, “Cellpose: a generalist algorithm for cellular segmentation,” Nature methods, vol. 18, no. 1, pp. 100–106, 2021.
[32] H. He, Z. Huang, Y. Ding, G. Song, L. Wang, Q. Ren, P. Wei, Z. Gao, and J. Chen, “Cdnet: Centripetal direction network for nuclear instance segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4026–4035.
[33] S. Nam, J. Jeong, M. Luna, P. Chikontwe, and S. H. Park, “Pronet: Point refinement using shape-guided offset map for nuclei instance segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2023, pp. 528–538.
[34] T. Ilyas, Z. I. Mannan, A. Khan, S. Azam, H. Kim, and F. De Boer, “Tsfd-net: Tissue specific feature distillation network for nuclei segmentation and classification,” Neural Networks, vol. 151, pp. 1–15, 2022.
[35] C. Li, Y. Zhang, Z. Liang, W. Ma, Y. Huang, and X. Ding, “Consistent posterior distributions under vessel-mixing: a regularization for cross-domain retinal artery/vein classification,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 61–65.
[36] C. Li, X. Lin, Y. Mao, W. Lin, Q. Qi, X. Ding, Y. Huang, D. Liang, and Y. Yu, “Domain generalization on medical imaging classification using episodic training with task augmentation,” Computers in biology and medicine, vol. 141, p. 105144, 2022.
[37] S. Hu, Z. Liao, J. Zhang, and Y. Xia, “Domain and content adaptive convolution based multi-source domain generalization for medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 42, no. 1, pp. 233–244, 2022.
[38] Q. Liu, C. Chen, J. Qin, Q. Dou, and P.-A. Heng, “Feddg: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1013–1023.
[39] Q. Xu, J. Li, X. He, Z. Liu, Z. Chen, W. Duan, C. Li, M. M. He, F. B. Tesema, W. P. Cheah et al., “Esp-medsam: Efficient self-prompting sam for universal domain-generalized medical image segmentation,” arXiv preprint arXiv:2407.14153, 2024.
[40] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
[41] Z. Zhong, Z. Tang, T. He, H. Fang, and C. Yuan, “Convolution meets loRA: Parameter efficient finetuning for segment anything model,” in The Twelfth International Conference on Learning Representations, 2024.
[42] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International conference on machine learning. PMLR, 2019, pp. 2790–2799.
[43] J. Wu, W. Ji, Y. Liu, H. Fu, M. Xu, Y. Xu, and Y. Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,” arXiv preprint arXiv:2304.12620, 2023.
[44] X. Lin, Y. Xiang, L. Zhang, X. Yang, Z. Yan, and L. Yu, “Samus: Adapting segment anything model for clinically-friendly and generalizable ultrasound image segmentation,” International Conference on Medical Image Computing and Computer-Assisted Intervention, 2024.
[45] J. Terven, D.-M. Córdova-Esparza, and J.-A. Romero-González, “A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas,” Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, 2023.
[46] S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
[47] K. Zhang and D. Liu, “Customized segment anything model for medical image segmentation,” arXiv preprint arXiv:2304.13785, 2023.
[48] C. Li, X. Liu, C. Wang, Y. Liu, W. Yu, J. Shao, and Y. Yuan, “Gtp-4o: Modality-prompted heterogeneous graph learning for omni-modal biomedical representation,” arXiv preprint arXiv:2407.05540, 2024.
[49] Q. Xu, W. Kuang, Z. Zhang, X. Bao, H. Chen, and W. Duan, “Sppnet: A single-point prompt network for nuclei image segmentation,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2023, pp. 227–236.
[50] N. Ibtehaz and D. Kihara, “Acc-unet: A completely convolutional unet model for the 2020s,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2023, pp. 692–702.
[51] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature methods, vol. 18, no. 2, pp. 203–211, 2021.
[52] S. Na, Y. Guo, F. Jiang, H. Ma, and J. Huang, “Segment any cell: A sam-based auto-prompting fine-tuning framework for nuclei segmentation,” arXiv preprint arXiv:2401.13220, 2024.
[53] Y. Gu, Q. Wu, H. Tang, X. Mai, H. Shu, B. Li, and Y. Chen, “Lesam: Adapt segment anything model for medical lesion segmentation,” IEEE Journal of Biomedical and Health Informatics, 2024.
[54] Z. Cheng, Q. Wei, H. Zhu, Y. Wang, L. Qu, W. Shao, and Y. Zhou, “Unleashing the potential of sam for medical adaptation via hierarchical decoding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 3511–3522.
[55] U. Schmidt, M. Weigert, C. Broaddus, and G. Myers, “Cell detection with star-convex polygons,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11. Springer, 2018, pp. 265–273.
[56] K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4396–4415, 2022.
[57] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.