DFA-Net: Dual multi-scale feature aggregation network for vessel segmentation in X-ray digital subtraction angiography

Even though deep learning is fascinated in fields of coronary vessel segmentation in X-ray angiography and achieves prominent progresses, most of those models probably bring high false and missed detections due to indistinct contrast between coronary vessels and background, especially for tiny sub-branches. Image improvement technique is able to better such contrast, while boosting extraneous information, e.g., other tissues with similar intensities and noise. If incorporating features derived from original and enhanced images, the segmentation performance is improved because those images comprise complementary information from different contrasts. Accordingly, inspired from advantages of contrast improvement and encoding-decoding architecture, a dual multi-scale feature aggregation network (named DFA-Net) is introduced for coronary vessel segmentation in digital subtraction angiography (DSA). DFA-Net integrates the contrast improvement using exponent transformation into a semantic segmentation network that individually accepts original and enhanced images as inputs. Through parameter sharing, multi-scale complementary features are aggregated from different contrasts, which strengthens leaning capabilities of networks, and thus achieves an efficient segmentation. Meanwhile, a risk cross-entropy loss is enforced on the segmentation, for availably decreasing false negatives, which is incorporated with Dice loss for joint optimization of the proposed strategy during training. Experimental results demonstrate that DFA-Net can not only work more robustly and effectively for DSA images under diverse conditions, but also achieve better performance, in comparison with state-of-the-art methods. Consequently, DFA-Net has high fidelity and structure similarity to the reference, providing a way for early diagnosis of cardiovascular diseases.


Introduction
Cardiovascular diseases (CVDs), the top global cause of death, represented 32% of all worldwide deaths (about 17.9 million) in 2019, in which 85% were due to heart attack and stroke [1].Of these diseases, coronary artery diseases (CADs) are the most common [2], which potentially leads to the sudden death of patients [3].Because of plaque build-up in coronary arteries, the narrowing (or stenosis) of the lumen is a proegumenal cause of CADs, which restricts blood flow to cardiac muscle, depriving the heart of oxygen and nutrient supplements, ultimately inducing myocardial ischemia and infarction [4].At present, X-ray digital subtraction angiography (DSA) is considered as the gold standard in clinical diagnosis and interventional treatment of CADs [2], because such tactic visualizes the severity of vessel narrowing, protrusions, bifurcation, stenosis, etc..Under this case, accurate and efficient segmentation of coronary vessels is a crucial issue, which provides the basis for the quantification and assessment of vascular stenosis [5,6].Nevertheless, manual delineation is expensive, and with high inter-and intra-reader variations [7].On the other hand, fully automatic tactics eliminate potential subjectivity and supply quantitative and systematic measure of diameter reduction [5,6], but such strategy is challenging owing to low resolution, high levels of noise, and complex structures with different vessel shapes and tiny sub-branches [8].
Automatic segmentation methods of coronary vessels are roughly divided into traditional and learning-based techniques [9].Conventional algorithms are usually based on pattern recognition, model, and tracking.Even although those strategies produce promising results, they are not robust under complicate conditions, e.g., low contrast / resolution, heavy noise, and nonuniform intensity distributions of background [8].Recently, with advances in deep learning, learning-based techniques achieve superior segmentation performance for coronary angiographic images in terms of accuracy and time-consuming [10][11][12], and such tactic generally generates a prediction map of pixel-wise probability.Among them, U-shaped architecture and skip-connections (e.g., U-net) exhibit great advantages in medical image segmentation due to strong capability of feature extraction [13].Accordingly, lots of segmentation networks have been presented according to the basic U-net structure, e.g., U-net++ [14], attention U-net [15], and U-net 3+ [16].Although U-net and its variants efficiently balance context and local accuracy, they are difficult to explicitly model long-range contextual interactions owing to the intrinsic locality of convolution (conv), which probably misses global contextual information in the extracted features [12].To address the shortcomings, Dong et al. proposed a new multi-attention, multi-scale 3D convolutional neural network (CNN) for coronary artery segmentation [17].Shi et al. introduced an affinity feature strengthening network by jointly modelling the geometry and refining pixel-wise segmentation features using a contrast-insensitive, multiscale affinity strategy [18].Despite many efforts are dedicated to perfect segmentation performance of vessels, the performance is still unsatisfactory, and further improvements are required in such task.
It is known that DSA is a fluoroscopic technique that generates images of blood vessels without interfering shadows from overlapping tissues.Through the administered intensifier (e.g., contrast medium) in X-ray coronary angiogram, the structures with the contrast medium passing appear dark gray, while the background appears light gray (see Fig. 1).If the contrast between foreground objects (e.g., coronary vessels) and background (e.g., other tissues) is higher, the better segmentation result will be achieved.Nevertheless, due to low resolution and heavy noise, such contrast is usually inconspicuous, especially on tiny sub-branches, which probably leads to inefficient segmentation performance, e.g., high false and/or missed detections.This implies that the contrast improvement has a potential to perfect the segmentation performance of coronary vessels.Howbeit, such way also enhance additionally extraneous information, e.g., other tissues with similar intensities and noise (see Fig. 1).Under this case, an incorporation of original and enhanced images can improve the performance of vessel segmentation networks, because those images embed complementary information from different contrasts.Inspired from merits of contrast enhancement (e.g., Retinex decomposition [19]) and encoding-decoding structure (U-net and its variants), a dual multi-scale feature aggregation network (termed as DFA-Net) is proposed for coronary vessel segmentation in DSA images.DFA-Net incorporates the contrast improvement using exponent transformation (CIET) into a semantic segmentation network (e.g., ResUnet++ [20]) that individually accepts original and contrast-improved images as inputs.Via parameter sharing and dual multi-scale feature aggregation of complementary information derived from different contrasts, an effective segmentation is achieved accordingly.Moreover, DFA-Net is optimized by jointly minimizing the risk cross-entropy loss and Dice loss.The contributions of this work are summarized as follows.
1. Inspired from merits of contrast enhancement and encoding-decoding architecture, a dual multi-scale feature aggregation network (DFA-Net) is proposed, which individually takes original and enhanced images as inputs for decoding complementary information from different contrasts, and outputs effective predictions of coronary vessels through the minimum of joint risk cross-entropy loss and Dice loss.

Framework of DFA-Net
The framework of DFA-Net is showed in Fig. 1, a dual branch structure, which respectively accepts original and enhanced images as input, inspired from advantages of contrast improvement (CIET) and encoding-decoding architecture (ResUnet++).To start with, the contrast between foreground objects and background in an original DSA image is enhanced through CIET.Since CIET also enlarges irrelevant content, e.g., other tissues with similar intensities and noise, original and enhanced images are individually input to ResUnet++, for decoding multi-scale complementary information from different contrasts.After the aggregation (concatenation and addition) of dual multi-scale features under adaptive weighting parameters, the segmentation is acquired accordingly, in which parameter sharing is used during training.Meanwhile, DFA-Net is optimized by minimizing the joint risk cross-entropy loss and Dice loss synchronously.

Contrast improvement
Low contrast between coronary vessels and background undoubtedly brings some difficulties in DSA segmentation.Derived from the multi-scale Retinex decomposition [19], CIET is proposed to enhance the contrast.For a given DSA image I, Retinex decomposition is described as, where R is enhancement result, G i is the i th Gaussian filtering function with standard derivation σ i (named scale parameter), N is the number of scale parameters, * is the conv operator, and ω i is the weighting of i th scale parameter, respectively.In Eq. ( 1), the logarithm potentially yields negative values, which causes ambiguity in the transformation.Thus, an exponent function is applied to replace that logarithm, which is defined as, (1) The weighting parameter α guarantees that the codomain of S(x) is in the interval [0, 1], where S(0) = 0 and S(1) = 1 .The function S(x) is monotonically increasing in the range of [0, 1], avoiding the ambiguity and negative values in the transformation.The graph of y = x − S(x), x ∈ [0, 1] is showed in Fig. 2, where 0, x 0 and 1 are three zeros.It can be found that S(x) < x , if x ∈ (0, x 0 ) , and S(x) > x , if x ∈ (x 0 , 1) .Under this case, when the intensity of an image is normalized to [0, 1], in those pixels with low magnitude ( < x 0 ), the intensity is expanded.In contrast, high intensity ( > x 0 ) is suppressed.This property is used to stretch the intensity distribution of an image, which is favorable to better the image contrast.
Considering Eq. ( 2), Eq. ( 1) becomes as, where S denotes the exponent function, and norm denotes a normalization operation.
To balance the enhancement performance against computation complexity of Gaussian filtering, the parameters σ i in Eq. ( 3) are chosen as the large-, middle-and small-scale parameters, that is, where max(I) is the maximum of the image, and β i is a positive constant (e.g., 0.8, 0.5 and 0.2).
After that, the estimation acquired from Eq. ( 3) is requisite to be adjusted via the intensity correction, for further improving the visual quality.According to the histogram of image, the intensity correction is, where R up and R low denote high and low shearing points of the image, and γ is a con- stant.We choose upper and lower confidence limits of confidence intervals as the high and low shearing points, in which the confidence level is set to 0.99 in the cumulative histogram in this section. (3)

Backbone network
U-net and its variants (e.g., U-net ++ and U-net 3+), deeply-supervised encoderdecoder architecture, are widely applied for medical image segmentation with high efficiencies [14], because they can effectively capture fine-grained details of foreground objects.Moreover, ResNet is a powerful CNN, which helps the encoder in U-net become more effective [21].Compared with U-net, U-net++ holds more skip connections that help the decoder acquire more information from multi levels of encoder [14].Accordingly, ResUnet++ is developed to replace U-net with U-net++ and applies ResNet for the encoder [20], which achieves significant performance boost for a small number of images.Derived from those viewpoints, DFA-Net adopts ResUNet++ as its backbone semantic segmentation network.Even although CIET improves the contrast between the foreground objects (e.g., coronary vessels) and intricate background, interference factors (e.g., noise and other tissues with similar intensities) are also elevated to some extent, which has an effect on the segmentation.If original and improved images are individually input into the semantic network, dual multi-scale features are acquired accordingly that embed complementary information from different contrasts, in which the parameter sharing tactic is applied during training.Since ResNet comprises five layers, termed as layer [j], j = 0, . . ., 4 , and layer[k], k = 1, . . ., 4 , consists of different number of double conv blocks, in which every block holds a 3 × 3 conv, a batch normalization and a ReLu.As displayed in Fig. 1, the encoding process is described as follows.An input image x i , i = 1, 2 , with size of 512 × 512 , passes through layer [0] that includes a 7 × 7 conv with 64 filters and stride of 2, and then produces an output x i 0,0 .Next, x i 0,0 goes through layer [1] (three double conv blocks) of ResNet, and outputs x i 1,0 .After, x i 1,0 passes through layer [2] (four double conv blocks), and outputs x i 2,0 .Next, x i 2,0 goes through layer [3] (six double conv blocks), and produces x i 3,0 .Last, x i 3,0 passes through layer [4] (three double conv blocks), and yields x i 4,0 .Details of the decoding process are described as follows.First, by upsampling x i 4,0 and concatenating with x i 3,0 , and then passing them through a triple conv block, we get x i 3,1 .Second, by upsampling x i 3,0 and concatenating with x i 2,0 , and then passing them through a triple conv block, we get x i 2,1 .Third, by concatenating x i 2,0 and x i 2,1 , and upsampling of x i 3,1 , and then pass them through a triple conv block, we get x i 2,2 .Fourth, by upsampling x i 2,0 , and concatenating with x i 1,0 , and then passing them through a triple conv block, we get x i 1,1 .Fifth, by concatenating x i 1,0 and x i 1,1 and upsampling of x i 2,1 , and then pass them through a triple conv block, we get x i 1,2 .Sixth, by concatenating x i 1,0 , x i 1,1 and x i 1,2 , and upsampling of x i 2,2 , and then passing them through a triple conv block, we get x i 1,3 .Seventh, by upsampling x i 1,0 and concatenating with x i 0,0 , and then passing them through a triple conv block, we get x i 0,1 .Eighth, by concatenating x i 0,0 and x i 0,1 , and upsampling of x i 1,1 , and then pass them through a triple conv block, we get x i 0,2 .Ninth, by concatenating x i 0,0 , x i 0,1 and x i 0,2 , and upsampling of x i 1,2 , and then passing them through a triple conv block, we get x i 0,3 .Last, by concatenating x i 0,0 , x i 0,1 , x i 0,2 , and x i 0,3 , and upsampling of x i 1,3 , and then passing them through a triple conv block, we get x i 0,4 .

Feature aggregation
Since ResUNet++ separately accepts images with different contrasts as inputs, those corresponding multi-scale features are gained, that is, x 1 0,i and x 2 0,i , i = 1, . . ., 4 , which include different contrast information.After the feature aggregation that comprises the concatenation and addition under adaptive weighting parameters (see Fig. 1), dual multi-scale features are acquired, which produces a prediction map P of pixel-wise probability for segmentation.The feature aggregation can be described as, where cat denotes concatenation, and α i , i = 1, . . ., 4 , denote adaptive weighting parameters.
During the back propagation process, the iteration of adaptive weighting parameters can be described as, where g k is the partial derivative of Ĵ (loss function) with respect to α k , s t and r t are the first and second moment estimates, št and řt are the correction results of the first and second moment estimates, µ is the learning rate, t is the iteration step, ρ 1 and ρ 2 are weight decay, and ε is a constant, avoiding the denominator of zero, respectively.

Loss function
Since the vessel segmentation in X-ray angiography can be considered as a classification task, the concept of cross-entropy loss is adopted in this section because it is insensitive to the identity of assigned class in case of misclassifications [22].Howbeit, misclassifying vessels (or foreground pixels) as 'background' (false negatives) is potentially at higher risk than misclassifying background pixels as 'foreground' (false positives).The reason maybe that the identification of vascular occlusion or stenosis is important in clinical diagnosis, while false negatives possibly result in missed opportunities of effective treatment at an early stage.Considering those different risks under different misclassification cases, a risk cross-entropy loss is introduced to address this issue, that is, to differentiate between false negatives and false positives by penalizing each error differently.For the sample x i , label y i and output of DFA- Net p i , i = 1, . . ., n , where n is the number of samples, the risk cross-entropy loss is defined as, where γ is a risk factor.If y i > 0.5 Besides, Dice loss is commonly applied in semantic segmentation [15], which tells how well the model is performing when it comes to detecting boundaries regarding ground truth.Then, Dice loss is also adopted, which is defined as, ( 6) where ẑi represents the prediction and z i denotes ground truth.From Fig. 1, it can be found that DSA images have high levels of noise, and with an imbalance between positive and negative samples.Because an adaptive moment estimation (Adam) strategy is computationally efficient, has little memory requirements, and is appropriate for non-stationary objectives and issues with noisy and/or sparse gradients [23], which is one of the most popular optimizer for accelerating the training of deep networks, DFA-Net is optimized with an Adam in this section, and the training process is implemented by minimizing the joint risk cross-entropy loss and Dice loss, which is defined as, where α and β are balance weightings.For each epoch in the training, the validation is used for evaluation regarding accuracy.The result with the highest accuracy is selected as the final model.

Metrics, baseline methods and datasets
Several widely-applied quantitative measures are adopted to evaluate the performance of different segmentation strategies, including two overlap-based metrics, e.g., intersection over union (IoU) and Dice coefficient (Dice), accuracy, specificity, and precision, which are defined as, where X and Y, binary vectors (0 and 1 denote background and foreground areas), are the prediction and ground truth, |X ∩ Y | , |Y /X| , |X/Y | , and Ẋ ∩ Ẏ are the number of true positives, false positives, false negatives, and true negatives, respectively.
Besides that, after injected a contrast dye through the catheter, DSA provides temporal and/or spatial information that helps to visualize blood flow over time.However, we only apply DSA images under maximum filling of contrast dye to demonstrate the accuracy and superiority of DFA-Net.This work enrolled 78 participants, containing 48 (9) male subjects (aged 22-77 years) and 30 female subjects (aged 51-77 years) from the Tongji Hospital, and the total number of images adopted was 292.Informed consent was acquired from each participant before DSA examination according to the procedure approved by the local ethics committee.Table 1 lists the demographic information for those participants, including age, gender, and medical history.The vascular contour in each DSA image with size of 512 × 512 is manually delineated by some experts, which is termed as ground truth (or label) during the training and testing.Under this condition, the dataset is randomly divided into 80% of training, 10% of validation, and 10% of testing.During training, the epoch is set to 400, batch size is set to 4, learning rate is set to 10 −4 , optimizer is chosen as Adam, the weight decay is initial- ized to 10 −4 , and loss function is chosen as the joint risk cross-entropy loss and Dice loss.Experiments are conducted using PyTorch 1.10.1 on a GPU (GeForce RTX 3090), and the operating system environment used is Linux.

Qualitative comparisons
For a representative DSA image under maximum filling of contrast dye, Fig. 3 displays original image and segmented results obtained from baseline and proposed strategies, and ground truth (i.e., manual labels), and quantitative measures regarding the five metrics are presented below each image, in which yellow ellipses denote false detections, white rectangles indicate regions of interest (ROIs), e.g., tiny sub-branches, that are  important in the study of coronary blood circulation, and yellow rectangles suggest capillary vessels that are not manually labelled in ground truth (or considered as missed detections in ground truth).It can be seen that many false detections are yielded in those segmented results through U-net, U-net++, SA-Unet, U-net 3+, and CMU-Net, especially in Fig. 3e-g.Besides, the vessel integrity of ROIs (as displayed in white rectangles) is difficult to be preserved in Fig. 3b, c and f.The reason maybe that the contrast in those ROIs is inconspicuous, which causes some difficulties in the coronary vessel segmentation.If the contrast in ROIs is improved, the segmentation performance is promoted.This is a cornerstone of DFA-Net.By comparison, DFA-Net yields less false positives, and well protects the completeness of true positives, which implies that DFA-Net well matches ground truth.Furthermore, yellow rectangles in Fig. 3a indicate capillary vessels, but they are not labelled in j.However, the baseline and proposed strategies identify those tiny subbranches because of strong learning capabilities of deep models.This indicates that a good model could have a potential to supplement true detections to manual labels.If segmentation results acquired from an appropriate learning method are provided to experts for manual interactions (or correction) in prior, the annotation efficiency and accuracy will be significantly advanced.In this case, DFA-Net can competent for such task.
As Deeplabv3+, U-net++, SA-Unet, U-net 3+, CMU-Net, CA-Net, and DFA-Net, respectively, and Fig. 4(j) is ground truth, where yellow ellipses denote false detections and yellow rectangles denote capillary vessels that are not manually labelled in ground truth.Similar to Fig. 3, some false detections are found in the results through baseline methods, with different degrees of distortion.In contrast, DFA-Net gains less false detections, which is helpful for early detection of cardiovascular diseases.Besides, DFA-Net achieves the highest IoU, Dice, accuracy, and precision scores, and the second-best specificity one.When compared with U-net 3+, the lift ratios ( % ) of DFA-Net are 14.89, 7.93, 1.42, 1.64, and 16.62 regarding those five metrics, and they become 13.59, 7.23, 1.32, 1.64, and 16.26, if compared with SA-Unet.The comparisons from Figs. 3 and 4 validate the superiority of DFA-Net over baseline tactics.

Quantitative comparisons
Since IoU, Dice, accuracy, specificity, and precision are widely-applied to measure segmentation performance, higher values suggest better performance.Table 2 lists quantitative comparisons for the testing dataset acquired from U-net, Deeplabv3+, U-net++, SA-Unet, U-net 3+, CMU-Net, CA-Net, and DFA-Net regarding the five metrics.It can be found that DFA-Net achieves the best results in terms of each metric, CA-Net acquires the second-rank IoU, Dice and accuracy scores, and U-net++ acquires the second-rank specificity and precision values.Compared with second-rank results, the lift ratios of DFA-Net are 1.65, 0.92, 0.15, 0.25, and 3.35 ( % ) regarding IoU, Dice, accuracy, specificity, and precision, respectively.If compared with Deeplabv3+, the respective lift ratios of DFA-Net are 19.02,10.56, 1.47, 1.62, and 21.96, and they become as 3.42, 1.90, 0.31, 0.52, and 6.71 ( % ), when compared with CMU-Net.This indicates that DFA-Net is superior to the seven baseline approaches regarding each measure, which is consistent with the conclusions derived from Figs. 3 and 4. In this case, DFA-Net holds high reliability and robustness for the segmentation of DSA images, which is instructive to highlight ROIs, understand context, and represent structure details of images.
A receiver operating characteristic (ROC) curve represents the sensitivity and specificity pairs corresponding to a particular decision threshold, and the area under ROC curve (AUC) is a metric of how well a parameter identifies between two diagnostic groups.Besides that, an alternative to ROC curve is the precision-recall (P-R) curve that is considered as a supplement to ROC curve when evaluating and comparing tests because it may be a better choice for imbalanced data [28].P-R curve represents the recall and precision pairs, and accepts the area under P-R curve (or named average precision, AP) as a performance metric.The ROC and P-R curves acquired from the baseline and proposed tactics are displayed in Fig. 5, and the corresponding AUC and AP values are listed in Table 3.It can be found that CA-Net acquires the second-best results, while DFA-Net achieves the highest AUC of 0.9953 and highest AP of 0.9547, which is consistent to the conclusions derived from Figs. 3 and 4, and Table 2. Accordingly, ROC and P-R curves validate that DFA-Net is able to distinguish coronary vessels from complex background, and construct an effective biomarker that decodes the characteristics of DSA images.

Ablation study
The proposed strategy integrates CIET (a contrast improvement module) into ResU-net++ (a semantic segmentation module), in which original and improved images are fed into the model separately, and the loss is chosen as joint risk cross-entropy loss and Dice loss.In order to demonstrate the contributions of each component, ablation experiments are implemented in terms of IoU, Dice, accuracy, specificity, and precision metrics.Eight configurations that possibly impact the segmentation performance are considered, e.g., w/ or w/o original / contrast-improved image, risk cross-entropy and Dice loss, as listed in Table 4. Besides, we analyze the compatibility with other loss functions, e.g., cross-entropy loss and focal loss.Table 4 provides the ablation experimental results by disabling improved image, original image, risk cross-entropy loss, and Dice loss components to the baseline architecture (i.e., DFA-Net) on the testing data regarding IoU, Dice, accuracy, specificity, and precision measures.It can be found that the baseline acquires the best quantitative  results among the eight configurations, which indicates the incorporation of original and improved images, risk cross-entropy and Dice loss improves the segmentation performance of DSA images.If removing one or more components, the performance is degraded.For example, if only original image is input into DFA-Net, the performance is slightly superior to that of the organization that only accepts contrast-improved image as an input.Nonetheless, the concatenation of extracted features derived from original and contrast-improved images is able to evidently boost the metrics.The reason maybe that images with different contrasts embed complementary information, which is helpful to strengthen learning capabilities of networks, and thus effectively segment coronary vessels.
From Table 4, it can be also found that the risk cross-entropy loss makes more contributions to DSA segmentation regarding IoU and Dice measures, but a trifle inferiority in terms of other terms, in comparison with the Dice loss.Nevertheless, the joint of risk cross-entropy and Dice loss holds superior performance regarding the five terms.Besides that, other loss functions, e.g., cross-entropy loss and focal loss, have their own pros and cons.These suggest that appropriate loss functions guarantee good segmentation performance of DSA images.
In general, different configurations possess advantages and disadvantages, which demonstrates the importance and necessity of each component incorporated in DFA-Net.Accordingly, those ablation experimental results demonstrate that DFA-Net is dependable and impactful to segment coronary vessels in DSA images.

Discussion
In recent years, deep learning is fascinated in fields of angiographic segmentation and achieves prominent progresses, while most of tactics bring high false and missed detections due to indistinct contrast between vessels and background, especially on tiny subbranches.Image improvement strategies better such contrast, while boosting extraneous information, e.g., catheter or other tissues with similar intensities.An incorporation of complementary information derived from diverse contrasts is able to strengthen leaning abilities of networks.Accordingly, inspired by advantages of contrast improvement and encoding-decoding architecture, DFA-Net is proposed, incorporating a contrast improvement using exponent transformation (i.e., CIET) into a semantic segmentation network (i.e., ResUnet++).Meanwhile, the joint risk cross-entropy loss and Dice loss are imposed on DSA segmentation.Compared with state-of-the-art tactics, DFA-Net achieves superior performance with regard to IoU, Dice, accuracy, specificity, and precision measures.This indicates that DFA-Net holds high fidelity and robustness to ground truth, providing an active way in the interpretation of DSA images.Even although DFA-Net yields promising segmentation results under different cases, there are some limitations to this work.Firstly, the number of DSA images and labels is insufficient.Nevertheless, large amounts of training data are required to train reliable and effective deep learning models.Therefore, the further work is to collect massive data with manual labels.Secondly, although the encoder-decoder structure is considered as the backbone, other advanced networks (e.g., unsupervised and semi-supervised models) are compatible in the future.Thirdly, the compatibility with other loss functions (e.g., focal loss and cross-entropy loss) is discussed in the methodology (see Table 4), but other advanced loss functions, e.g., direction connection loss and Tversky loss, will be exploited in the future.Lastly, the temporal and spatial domain information is insufficiently explored, which leads to a future work.

Conclusion
In order to overcome difficulties of angiographic segmentation caused by indistinct contrast between vessels and background, especially on tiny sub-branches, a dual multi-scale feature aggregation network is proposed in this work (i.e., DFA-Net), which respectively accepts original images and contrast-improved images as inputs because different contrasts embed complementary information that is advantageous to strengthen learning capabilities of networks, and thus obtain an available segmentation of coronary vessels.Meanwhile, a risk cross-entropy loss is imposed on the segmentation, for effectively lessening false negatives, which is incorporated with Dice loss for jointly optimizing the proposed model during training.Experimental results validate that DFA-Net not only works more effectively for diverse DSA images, but also has significant superiorities over stateof-the-art tactics, e.g., the highest IoU, Dice, accuracy, specificity, and precision measures.This indicates that DFA-Net achieves high effectiveness and robustness to ground truth.Consequently, DFA-Net is promising in the coronary vessel segmentation of X-ray angiography, and facilitates the study in the explanation and understanding of ROIs in DSA images.This leads to a linchpin for our further work.

Fig. 1
Fig.1Framework of the proposed DFA-Net for coronary vessel segmentation in X-ray angiography, in which an original DSA image and its improved result by CIET are fed into ResUNet++ respectively, followed by the aggregation of dual multi-scale features, and the segmentation is acquired accordingly.The model is optimized by minimizing the joint risk cross-entropy loss and Dice loss synchronously

Fig. 4
Fig. 4 For an original DSA image (a), (b)-(i) denotes segmented results obtained from U-net, Deeplabv3+, U-net++, SA-Unet, U-net 3+, CMU-Net, CA-Net, and DFA-Net, respectively, and (j) is ground truth, where yellow ellipses denote false detections, and yellow rectangles indicate capillary vessels that are not manually labelled in ground truth

Fig. 5
Fig. 5 ROC curves a and P-R curves b obtained from different segmentation strategies

Table 1
Demographic characteristics of participants

Table 2
Quantitative comparisons acquired from different segmentation strategiesThe best two results are marked in italic and bold

Table 3
AUC and AP measures acquired from different segmentation algorithms

Table 4
Quantitative analyses according to different configurationsThe best two results are marked in italic and bold3√ denotes that cross-entropy loss substitutes for risk cross-entropy loss, and 4 √ denotes that focal loss substitutes for Dice loss in the proposed model