Wang 2016
Wang 2016
Wang 2016
www.elsevier.com
PII: S0925-2312(15)01410-1
DOI: http://dx.doi.org/10.1016/j.neucom.2015.09.077
Reference: NEUCOM16150
To appear in: Neurocomputing
Received date: 23 July 2015
Revised date: 18 September 2015
Accepted date: 19 September 2015
Cite this article as: Zhe Wang, Hongsheng Li, Qinwei Zhang, Jing Yuan and
Xiaogang Wang, Magnetic Resonance Fingerprinting with Compressed Sensing
and Distance Metric Learning, Neurocomputing,
http://dx.doi.org/10.1016/j.neucom.2015.09.077
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Magnetic Resonance Fingerprinting with Compressed Sensing and Distance
Metric Learning
Zhe Wanga , Hongsheng Lia,∗, Qinwei Zhangb , Jing Yuanb , Xiaogang Wanga
a Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
b Department of Imaging and Interventional Radiology, the Chinese University of Hong Kong, Shatin, Hong Kong
Abstract
Magnetic Resonance Fingerprinting (MRF) is a novel technique that simultaneously estimates multiple tissue-
related parameters, such as the longitudinal relaxation time T1 , the transverse relaxation time T2 , off resonance
frequency B0 and proton density, from a scanned object in just tens of seconds. However, the MRF method
suffers from aliasing artifacts because it significantly undersamples the k-space data. In this work, we propose a
compressed sensing (CS) framework for simultaneously estimating multiple tissue-related parameters based on the
MRF method. It is more robust to low sampling ratio and is therefore more efficient in estimating MR parameters
for all voxels of an object. Furthermore, the MRF method requires identifying the nearest atoms of the query
fingerprints from the MR-signal-evolution dictionary with the L2 distance. However, we observed that the L2
distance is not always a proper metric to measure the similarities between MR fingerprints. Adaptively learning
a distance metric from the undersampled training data can significantly improve the matching accuracy of the
query fingerprints. Numerical results on extensive simulated cases show that our method substantially outperforms
state-of-the-art methods in terms of accuracy of parameter estimation.
Keywords: Magnetic Resonance Fingerprinting, compressed sensing, metric learning, cartesian sampling
∗ Correspondingauthor
Email address: [email protected] (Hongsheng Li)
2
2.1. Magnetic Resonance Fingerprinting (MRF) Only a limited portion of the k-space data is collected
at each time frame. Due to the undersampling, the im-
The key underlying assumption in MRF is that dif-
age directly obtained by the inverse Fourier transform
ferent materials or tissues have their own unique signal
will be contaminated with strong aliasing artifacts.
evolutions or fingerprints. The magnetization at a given
voxel location at time t depends on its magnetic res-
2.2. Compressed Sensing for Magnetic Resonance Fin-
onance parameters and the system-related parameters,
gerprinting (CSMRF)
including the flip angle F A, repetition time T R and oth-
ers, at time t − 1. For illustration purposes, we explain In this section, we propose a compressed sensing
the estimation of MR parameter maps of only a single framework based on MRF for simultaneously estimat-
slice in Section 2. ing multiple tissue-related parameters with tolerance to
Let X ∈ C N ×T
denote multiple scans of one slice very low sampling ratios.
of the object of interest, where N is the total number For some fingerprint with minor aliasing noise, once
of voxels in the slice and T is the sequence length. Let it is replaced by its nearest atom in the dictionary by
i
Xt ∈ C denote the ith voxel of the scanned slice at Eq. (3), the undersampling error at that voxel location
time t, X ∈ Ci 1×T
denote the signal evolution or finger- is already eliminated. This has been shown in [11].
print at voxel i at all times, and Xt ∈ C N ×1
denote the However, the exact matching may fail because the
scanned image of the slice at time t. query fingerprint is impaired by significant undersam-
Given the initial magnetization, the signal evolution pling errors. Instead of treating the aliasing artifacts as
or fingerprint at voxel i can be written as random noise as in [11], we treat it as the leakage of en-
ergy caused by undersampling, which can be estimated
i
X = ρi B(θi ; F A, T R), (1) by the theory of compressed sensing. Under this assump-
tion and some additional conditions, the missing signals
where ρi is the proton density – one of the magnetic res- can be perfectly recovered. The problem of reconstruct-
onance parameters to be estimated, θi is the collection ing undersampled k-space data can be formulated as a
of other magnetic resonance parameters at voxel i, and compressed sensing problem:
B is the Bloch equation dynamics.
Since the possible range of θi of the object is known in min kΦXt k1 (6)
Xt
advance, we densely sample each MR parameter and use
the Bloch equation to create the dictionary D ∈ CK×T , s.t. kFu (Xt ) − Yt k22 < ,
where K is the number of dictionary atoms. Each dic-
where Fu is the Fourier transform operator with our pro-
tionary atom is normalized so that kDk k2 = 1, for
posed sampling mask (the details will be described in
k = 1, 2, · · · , K. The same set of system-related param-
Section 2.4), and Yt is the k-space measurement at time
eters F A and T R is used for both creating the dictionary
t. Minimizing the k · k1 term forces the image x to be
and obtaining the scanning data X. Given a query fin-
sparse in some transform domain Φ. In this work, we
gerprint, it is matched to its nearest atom in the prede-
assume that the image is sparse in 2 domains, i.e., 1) the
fined dictionary with the L2 distance. The index of the
wavelet domain with Daubechies filters of 4 scales and 2)
nearest dictionary atom for the fingerprint X i is denoted
the finite difference domain. The wavelet transform term
as k̃i , and is obtained as
often results in the removal of high frequency noise-like
i i 2 k patterns, while the finite difference term favours solu-
k̃i = arg min X /kX k2 − D 2 (2)
k tions that are piecewise smooth [15]. The k · k22 term
= arg max realhX i /kX i k22 , Dk i , (3) requires the image Xt , when transformed back to the k-
k space, being consistent with the measurement Yt . The
where real is the operation to extract the real part of controls the fidelity of the reconstruction to the measured
a complex number and h·, ·i is the inner product oper- data. The optimization problem (6) can be rewritten in
ation. The corresponding parameters of the fingerprint a Lagrangian form:
Xi is obtained as 2
X̂ = arg min kFû (Xt ) − Yt k2 + α kΦXt k1 , (7)
θ̃i = Γ(k̃i ), (4) Xt
where Γ retrieves the MR parameters based on the dic- where α is a weight parameter. This problem can be
tionary index. The proton density at voxel i is then solved by the Conjugate Gradient algorithm [15].
estimated as Optimizing the compressed sensing problem can be
n
˜
o considered as utilizing the spatial information to remove
ρ̃i = max realhX i /kX i k22 , Dki i, 0 , (5) aliasing artifacts in the reconstructed images. After that,
the denoised fingerprint X̂ i is matched to the nearest dic-
where the max operation is applied to remove unaccept- tionary atom with a learned Mahalanobis distance metric
able negative values. A, which can be written as:
3
distance, with which the query fingerprint is closest to
its corresponding atom in the pre-defined dictionary, to
k̂i = arg min X̂ i /kX̂ i k22 − Dk , (8) replace the L2 distance. This is because the L2 distance
k A
might not be a proper metric for measuring the dissim-
The learning of the metric A is detailed in the next ilarity between MR fingerprints. In Fig. 1 (a), we show
section. The learned distance metric captures important such an example where the L2 distance fails to retrieve
dimensions of the fingerprints in the temporal domain, correct MR parameters for a query fingerprint corrupted
and is more accurate than the L2 distance used in pre- by strong aliasing noise. For the query fingerprint (red),
vious work [11] and [26]. In this way, the temporal in- its nearest dictionary atom (green) with the L2 distance
formation is also used in our framework. is quite different from its ground truth atom (blue).
Given the dictionary index k̂i , the tissue-related pa- Ground
Ground truth
truth atom
atom
rameters θ̂ can be retrieved as Query
Query fingerprint
fingerprint Ground truth atom
Best
Best matching
matching atom with
atom QueryL fingerprint
distance
θ̂ = Γ(k̂i ), (9) 2
11 Best matching atom with L2 distance
and the proton density is thus 0 100 200 300 400 500
0.5 0.5
n o
ρ̂i = max realhX̂ i , Dk̂i i22 , 0 . (10)
0 0
0 100 200 300 400 500 0 100 200 300 400 500
TR index TR index
To sequentially solve the two objectives Eqs. (7) and 1 1
4
In this work, we adopt the Relevant Component large entries in the main diagonal, which agrees with the
Analysis (RCA) [28] algorithm and found it is supe- previous works [11], [26] that the L2 distance is also a
rior than other distance metric learning algorithms in benign distance metric choice.
our framework. The RCA requires training samples (fin- In practice, the distance metric can be learned in ad-
gerprints) and their labels to learn an optimal distance vance. First, the MR parameter maps of either phantoms
metric between them. The training fingerprints consists or volunteers can be obtained by standard MR imaging
of fingerprints from the image sequence and their cor- methods. Then with fixed imaging-related settings, such
responding dictionary atoms. Labels are assigned to as F A, T R, sampling strategy, etc., the MR fingerprints
the training fingerprints in the following way. Finger- can be collected and used for training the distance met-
prints corresponding to the same dictionary atom are ric. The learned metric is applicable for later use under
given the same label. The same label is also assigned the same experimental settings.
to the corresponding dictionary atom. Notice that al-
though neighbouring dictionary atoms may also be good 2.4. The Time-dependent Sampling Strategy
candidates for the query fingerprint, we do not merge We propose an undersampling strategy based on
their labels. Dictionary atoms that have no correspond- Cartesian sampling, in which only the phase encodes
ing fingerprints are not included in the training samples. are randomly sampled. Each phase encode line is rep-
Those atoms may represent materials or tissues rarely resented by a row in the k-space matrix. Due to the
exist in the scanned object. sampling mechanism in MRI, unlike frequency encoding,
Let M denote the number of training fingerprints each phase encode line has to be entirely sampled, thus
nj
with L different labels, {Pji }i=1 denote the fingerprints we can not achieve the random sampling in two direc-
in the chunklet j, where nj is the number of fingerprints tions which is assumed to be ideal for compressed sens-
in the jth chunklet. A chunklet means a subset of fin- ing. The method [15] on MR image reconstruction has
gerprints that are known to share the same label. shown that the k-space should be sampled more at the
The objective function of the RCA is formulated as: lower frequencies because most energy is concentrated
L nj
around the k-space origin. At the same time, the recon-
1 XX 2 struction of MR fingerprints requires the aliasing noise
max log | A | s.t. Pji − P̄j A
≤ 1, (14)
A M j=1 i=1 at each time to be as incoherent as possible.
Here we propose a sampling strategy that takes both
where P̄j is the mean of the jth chunklet. requirements into consideration. A sampling probability
Multiplying a solution A by a constant larger than 1 sp1 at time 1 is first initialized following [15] such that
increases the objective value as well as the constrained it samples more near the k-space origin. More specif-
sum. Therefore, the solution is obtained at the bound- ically, the probability of sampling a row scales accord-
ary of the feasible region, where the inequality constraint ing to a power of distance from the k-space (see Fig. 2
becomes an equality, i.e., (a) for illustration). The sampling of each row of the
mask at time t − 1 then follows a binomial distribution
L nj parametrized by the probability value at that location.
1 XX 2
max log | A | s.t. Pji − P̄j A
= 1. (15) The sampling probability spt at time t is conditional on
A M j=1 i=1
the mask at time t − 1. If some row on the (t − 1)th mask
has been sampled, then the entry of the tth probability
Solving the equality constraint leads to the solution,
corresponding to the same location is set to zero, except
which is linear in A. Let the within chunklet covariance
for the c rows nearest to the k-space center. Thus we
matrix C be
force the consecutive sampling masks to be as different
L nj as possible, while they still sample more data from the
1 XX
C= (Pji − P̄j )(Pji − P̄j )T . (16) lower frequencies. The sampling probability for the ith
M j=1 i=1
row in the k-space at time t is defined as
The optimal transformation matrix is thus calculated as
S
sp1 (i) i ∈ C Mt−1
1
W = C − 2 , which has large weights on relevant dimen- spt (i) =
0 otherwise (17)
sions and small weights on irrelevant dimensions.
As shown in Fig. 1 (b), after a distance metric is where spt is the probability at time t, C is the index set
learned for the example in Fig. 1 (a), the fingerprint and of the c rows to be kept near the k-space origin, Mt−1 is
dictionary atoms are transformed such that the query the index set of the rows that are sampled in the (t−1)th
fingerprint is now most similar to the ground truth atom. mask, and Mt−1 is the complementary set of Mt−1 . If c
Once the experimental parameters F A, T R and the equals the total number of rows, then all the sampling
undersampling pattern are determined, the learned dis- masks are independent. By controlling the c value we can
tance metric can be used again for future scans together balance between the sampling of low frequency parts and
with them. Another observation is that the matrix have high frequency parts of the k-space data.
5
Phase encodes
Phase encodes
Phase encodes
Phase encodes
Phase encodes
Phase encodes
0 0
1 The code for generating the dictionary and fingerprint matching with L distance is obtained from supplementary information of
2
[11]: http://www.nature.com/nature/journal/v495/n7440/extref/nature11971-s1.pdf
6
increment of 10 ms, from 110 to 320 ms, and from 380 differentiate the errors caused by the Bloch response dis-
to 630 ms with an increment of 50 ms. The B0 values cretization and those by the other factors.
are sampled from −200 to 200 Hz with an increment of
10 Hz. To make the simulation more close to the real 3.2. Overall Performance and Comparisons with Exist-
scenarios, both the parameter maps and the dictionary ing Methods
were designed so that no exact fingerprint match can be The overall performance of our method and the com-
found. pared MRF [11] and BLIP [26] methods is reported in
this section. We set the sequence length T to 500, with
3.1.3. Sequence Setting a sampling ratio of 6.25% (16 rows out of 256). The c
We simulated the signal evolutions with the IR- value in Eq. (19) is chosen to be 6. All the experiments
bSSFP sequence using a randomized series of flip angles were performed on 10 different slices, and the averaged
and repetition times of 10 ms. While we also experi- results were reported in Table 2 (also see the visual com-
mented with randomized repetition times, no significant parisons in Figs. 4 - 7). As expected, the oracle estima-
performance change was observed. Let η be the noise tor achieved the best result among all methods because
term sampled from a Gaussian distribution with a stan- it does not undersample the k-space data. Our proposed
dard deviation of 5 ms. The flip angles, F A, are cal- method CSMRF+ML performs best compared to MRF
culated as a series of repeating sinusoidal curves added and BLIP on the estimation of T1 , T2 and off resonance
with Gaussian random noise, i.e., frequency maps, while the accuracy improvement of pro-
ton density maps is slightly lower than BLIP. This is be-
cause the proton density depends on the product of the
10 + sin(2πt/500) × 50 + η 0 < t ≤ 250 query fingerprint and the matched dictionary atom, al-
F A(t) = 10 250 < t ≤ 300 though our methods can find better matches than MRF,
5 + sin(2π/200 × 25) + η 300 < t ≤ 500
the final result will still be affected by the shrinkage effect
of undersampling, even after the density compensation is
3.1.4. Evaluation Metrics applied to the k-space. And since BLIP iteratively up-
The quality of the MR parameter estimation is quan- dates the fingerprints by alternatively projecting them
tified using the Peak-Signal-to-Noise-Ratio (PSNR) in to the Bloch response manifold and minimizing the re-
decibels and the Structural SIMilarity (SSIM) index [31]. construction error, it can estimate the proton density
When computing PSNR, the MR parameter maps are map better. BLIP shows better accuracy than MRF.
first normalized to the range of [0,255]. PSNR is then However, its performance is not stable and depends on
computed as the ratio of the peak intensity value of the the randomness of its sampling masks because the sam-
ground truth to the Mean Square Error (MSE) recon- pling strategy used by BLIP does not have constraints
struction error relative to the ground truth. The SSIM on sampling masks at consecutive times.
index is developed as a complementary approach to the Figs. 4 - 7 show example estimated maps by differ-
traditional metrics based on error-sensitivity. It has been ent algorithms. MRF and CSMRF+ML share the same
shown to be more consistent with human eye percep- set of sampling masks by Eq. (19), while BLIP uses
tion [31]. Unlike PSNR which estimates perceived errors, uniformly undersampling strategy based on EPI. We set
SSIM considers image degradation as perceived change BLIP to run 16 iterations, and no significant improve-
in structural information. The SSIM index is a decimal ment is observed if more iterations of operations are per-
value between -1 and 1, and value 1 can only be reached formed. The result of MRF shows substantial aliasing
when two images are identical. artifacts. BLIP successfully removes most of the aliasing
noise, while the estimated T1 , T2 and off resonance fre-
3.1.5. Compared Methods quency maps of CSMRF+ML exhibit almost no aliasing
Our proposed CSMRF+ML was able to recover the artifacts. The proton density map of CSMRF+ML is
MR parameter maps accurately. We compared it with slightly overestimated due to the shrinkage effect.
the MRF [11] and BLIP [26] methods. For simplicity
and better comparison, the proposed sampling strategy 3.3. Performance with Noise
is used for MRF. For BLIP, the uniform sampling strat- To evaluate the noise robustness of the proposed
egy based on EPI is used as in [26], i.e., the rows in k- method, we added zero-mean complex Gaussian noise
space is undersampled by a factor p with random shifts of standard deviation σ = 0.5 to the k-space of all the
across time. In some experiments, we also reported the frames. An example of the fully sampled noisy frame
performance of the oracle estimator, which was obtained at time 1 is shown in Fig. 8 (a), and can be observed
by matching the fully sampled image sequence data to to be considerably noisy. The PSNR of the noisy image
the nearest dictionary atoms using the L2 distance met- with respect to the reference is 19.1 dB. The distance
ric. Because the oracle estimator samples all the data, metric is trained with another set of images contami-
it should always achieve the best estimation results if nated by the same type of noise but with different ran-
the dictionary is correctly created. In this way, we could dom seeds. Here we show the estimated T2 maps by
7
T1 map T2 map B0 map density map
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Oracle Estimator 42.8 0.99 40.6 0.99 52.0 1.0 92.6 1.0
MRF 27.0 ± 0.54 0.95 ± 0.02 22.9 ± 0.32 0.86 ± 0.06 24.8 ± 0.51 0.89 ± 0.03 23.6 ± 0.23 0.87 ± 0.02
BLIP 30.2 ± 1.66 0.96 ± 0.12 26.8 ± 0.81 0.86 ± 0.07 28.2 ± 1.12 0.90 ± 0.05 28.6 ± 0.96 0.88 ± 0.04
CSMRF+ML 31.1 ± 0.87 0.99 ± 0.01 37.3 ± 0.76 0.99 ± 0.01 39.9 ± 0.64 0.99 ± 0.01 25.8 ± 0.46 0.92 ± 0.04
Table 2: Quantitative results of the proposed algorithm and the state-of-the-art algorithms. The winning entries are marked in bold.
Figure 4: Example estimated T1 maps with a sampling ratio of 6.25%. (a) Ground truth. Results by (b) MRF [11], (c) BLIP [26] and
(d) CSMRF+ML. All the images are displayed in the same color range.
MRF, BLIP and CSMRF+ML in Figs. 8 (b)-(d). The In Fig. 9, we show that the overall estimation accu-
results show that MRF is unable to effectively remove racy increases as the sampling ratio increases for all the
the noise. BLIP performs better than MRF and show a MR parameter maps. Interestingly, BLIP has a perfor-
clearer reconstruction result. However, there still exists mance boost at sampling ratios of 3.13% and 6.25% (8
certain amount of noise. Our methods CSMRF+ML on and 16 rows per image). This may be because the sam-
the other hand generates a satisfactory result. Almost pling strategy used by BLIP requires undersampling the
all the noise is eliminated except for some on the bound- k-space uniformly with random shifts across time. Ide-
aries. The PSNR and SSIM of CSMRF+ML is 4.2 dB ally, each row should have the same chance to be sampled
and 0.083 higher than BLIP, respectively, which shows during the whole process, which will maximize the ran-
that CSMRF+ML can perform better in the presence of domness of different sampling masks. However, if the
reasonable amount of noise. number of rows to be sampled is not divisible by the to-
tal number of rows (e.g. sampling 12 out of 256 rows),
3.4. Evaluation on Different Components then some rows might never be sampled, which leads to
degraded performance. Note that in [26], only sampling
In this section, the effect of each individual parameter ratios of 6.25%, 12.5% and 25% are shown.
is investigated. We compare CSMRF+ML and the state- With metric learning and the proposed sampling
of-the-art methods against different sampling ratios in strategy, CSMRF+ML always performs best. Since
Section 3.4.1, and against different sequence lengths T CSMRF+ML uses a non-uniform sampling strategy, it
in Section 3.4.2. In Section 3.4.3, we compare the pro- does not suffer from the problem described above. It can
posed sampling strategy with a baseline strategy. The achieve satisfactory and stable reconstruction quality at
effect of different c values of our strategy is also investi- arbitrary sampling ratios in our experiments.
gated. In Section 3.4.4, different distance metric learning
algorithms are tested for CSMRF+ML. 3.4.2. Evaluation on Sequence Length
This experiment evaluated the performance of the
3.4.1. Evaluation on Sampling Ratio proposed algorithm with different sequence lengths,
In this section, we evaluated the estimation accuracy which varied from 100 to 500. While we also tested
of our algorithm with different sampling ratios. Recall longer sequence, no significant performance improve-
that the images are of size 256 × 256. We experimented ment is observed.
with sampling ratios of 3.13%, 3.91%, 4.69%, 5.49%, In Fig. 10, both the mean value and the standard de-
6.25%, 7.03% and 7.81% (which are equivalent to 8, 10, viation of 10 trials by each algorithm are plotted. PSNR
12, 14, 16, 18 and 20 rows per image), respectively. We of the estimated parameter maps by all the algorithms
report the PSNR of the estimated MR parameter maps increase as the sequence length increases. Notice here
v.s. the sampling ratios here. we do not include PSNR of the oracle estimator in Fig.
8
(a) Ground truth (b) MRF (c) BLIP (d) CSMRF+ML
Figure 5: Example estimated T2 maps with a sampling ratio of 6.25%. (a) Ground truth. Results by (b) MRF [11], (c) BLIP [26] and
(d) CSMRF+ML. All the images are displayed in the same color range.
Figure 6: Example estimated B0 maps with a sampling ratio of 6.25%. (a) Ground truth. Results by (b) MRF [11], (c) BLIP [26] and
(d) CSMRF+ML. All the images are displayed in the same color range.
10 (d) for the visualization purpose (it ranges from 90- proach . It does not force choosing different rows of the
91 dB, far greater than the other algorithms). It can k-space data at two consecutive time frames. All the
be seen that CSMRF+ML is stable and outperforms the experiments were repeated on 10 different slices. Each
other algorithms except for the density maps. MRF be- entry in Table 3 was obtained by averaging the results
haves not so well yet stably. It can not effectively makeof the 10 slices.
use of the spatial information of the image and the L2 In Table 3, we show the quantitative results of the
distance often fails to match query fingerprints to cor- proposed sampling strategy and those of the baseline
rect dictionary atoms. BLIP is better but unstable (i.e.,strategy. The performance of the proposed sampling
having large variance) when the sequence length is short.strategy was tested on both the MRF method and our
This can be explained by the fact that its sampling strat-
proposed CSMRF+ML. We show that the baseline sam-
egy is independent each time and does not force to have pling strategy leads to not the best, yet stable results,
different spatial encodings. This problem can be allevi- which denotes that totally independent random variable
ated when a longer sequence is used. Furthermore, the sampling can guarantee a satisfactory performance.
PSNR of T1 , T2 and B0 maps by CSMRF+ML are close The proposed sampling strategy results in better ac-
to the oracle estimator when the sequence length is no curacy of parameter map estimation because our strat-
fewer than 500, which means that the they are visually egy increases the incoherence between the noise and the
very close to the ground truth without obvious aliasing fingerprints. When c in Eq. (19) is 4, i.e., the probabili-
artifacts or noise. ties of sampling the center 4 rows on the mask are fixed
while other rows are dependent on the previous mask,
3.4.3. Evaluation on Sampling Strategies the proposed sampling strategy achieved the best per-
In order to test whether our proposed sampling formance. Since our sampling ratio is 4.69%, only 12
scheme influences the performance of estimating multiple out of 256 rows of the k-space data would be sampled.
MR parameter maps, we compared the proposed sam- That means when c is 8, there are only about 4 possible
pling strategy with a baseline sampling strategy with an rows to choose from the high frequency parts of the k-
equivalent undersampling ratio of 4.69%. The baseline space. On the contrary, when c is 2, the low frequency
sampling strategy is to sample the k-space independently parts are sampled too few. In both cases, the perfor-
at each time, which follows a variable density random mance drops because of the inappropriate ratio of low
sampling pattern as the first sampling mask in our ap- frequency against high frequency.
9
(a) Ground truth (b) MRF (c) BLIP (d) CSMRF+ML
Figure 7: Example estimated density maps with a sampling ratio of 6.25%. (a) Ground truth. Results by (b) MRF [11], (c) BLIP [26]
and (d) CSMRF+ML. All the images are displayed in the same color range.
(a) Fully sampled frame 1 (b) MRF (c) BLIP (d) CSMRF+ML
Figure 8: (a) Fully sampled noisy frame at time 1. T2 maps reconstructed by (b) MRF [11], (c) BLIP [26] and (d) CSMRF+ML. All
the image are displayed in the same color range.
MRF CSMRF+ML
Parameter Type T1 T2 B0 proton density T1 T2 B0 proton density
Evaluation Metric PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Baseline 24 0.921 20.6 0.831 22.9 0.864 20.6 0.822 33.9 0.981 31.2 0.983 39.0 0.979 22.1 0.808
Proposed, c = 2 26.4 0.952 21.8 0.852 24.3 0.892 21.8 0.864 29.7 0.967 35.9 0.98 32.4 0.955 27.7 0.873
Proposed, c = 4 26.5 0.943 21.3 0.852 22.5 0.881 21.9 0.862 30.0 0.975 35.6 0.977 39.9 0.985 24.7 0.872
Proposed, c = 6 27.9 0.947 20.1 0.825 20.3 0.845 21.2 0.864 30.1 0.969 35.2 0.982 41.8 0.991 25.2 0.866
Proposed, c = 8 26.6 0.945 20.6 0.839 23.7 0.899 20.4 0.855 30.7 0.975 34.6 0.982 38.1 0.983 23.4 0.851
Table 3: Comparisons of the proposed sampling strategy with the baseline strategy. The baseline strategy undersamples the k-space
independently at each time. Different c values of our proposed sampling strategy were chosen to study how the sampling ratio between
low frequency components and high frequency components would affect the results. The winning entries are marked in bold.
3.4.4. Evaluation on Distance Metric Learning Methods a full-sized matrix for each of them.
Our proposed algorithm can be combined with vari- Distance Metric PSNR SSIM
ous distance metric learning algorithms. Here we focus L2 distance in [11] 23.3 0.80
on their performance on estimation of T1 map (note that RCA [28] 32.3 0.98
similar performance was observed for the T2 and proton DCA [32] 24.6 0.89
density maps). We compared the L2 distance with the LFDA [33] 31.5 0.98
distance metric learned by RCA [28], the Discriminative
Component Analysis (DCA) [32] and the Local Fisher Table 4: PSNR and SSIM of the estimated T1 map with different
distance metric learning algorithms.
Discriminant Analysis (LFDA) [33]. The full-sized ma-
trix is of 1000 × 1000 because the real part and the
imaginary part of the training samples are concatenated. In Table 4, we show the PSNR and SSIM of the pa-
However, all these metric learning algorithms are able to rameter maps estimated by our algorithm with different
−
learn a dimension-reduction transform W ∈ RT ×1000 , metric learning algorithms. The performance of CSMRF
−
where T is a number smaller than 1000. While we also with the L2 distance is listed as the baseline. DCA only
tried different T − values for different algorithms, no sig- improved the baseline a little while RCA performed best
nificant difference was observed until T − is below 500, in our framework. The reason RCA works better than
where the performance began to degrade. For a fair com- other distance metric learning algorithms might be that
parison on these metric learning algorithms, we learned it does not force the samples with different labels to be
10
MRF BLIP CSMRF+ML MRF BLIP CSMRF+ML MRF BLIP CSMRF+ML MRF BLIP CSMRF+ML
35 40 50 35
30 30 40 30
30 25
25 20 20 20
20 10 10 15
8 10 12 14 16 18 20 8 10 12 14 16 18 20 8 10 12 14 16 18 20 8 10 12 14 16 18 20
Sampling Ratio Sampling Ratio Sampling Ratio Sampling Ratio
(a) PSNR of T1 map (b) PSNR of T2 map (c) PSNR of B0 map (d) PSNR of density map
Figure 9: Estimation accuracy v.s. the number of rows sampled per slice. Both mean value and standard deviation are plotted. (a)
PSNR for estimated T1 maps with varying sampling ratios. (b) PSNR for estimated T2 maps with varying sampling ratios. (c) PSNR
for estimated off resonance frequency maps with varying sampling ratios. (d) PSNR for estimated proton density maps with varying
sampling ratios.
Oracle MRF BLIP CSMRF+ML Oracle MRF BLIP CSMRF+ML Oracle MRF BLIP CSMRF+ML MRF BLIP CSMRF+ML
40 40 50 30
30 30 40 20
20 20 30
20 10
10 10 10
0 0 0 0
100 300 500 100 300 500 100 300 500 100 300 500
Sequence Length Sequence Length Sequence Length Sequence Length
(a) PSNR of T1 map (b) PSNR of T2 map (c) PSNR of B0 map (d) PSNR of density map
Figure 10: PSNR v.s. sequence length of all the tested algorithms. Both mean value and standard deviation are plotted. (a) PSNR
of T1 map v.s. sequence length. (b) PSNR of T2 map v.s. sequence length. (c) PSNR of B0 map v.s. sequence length. (d) PSNR of
proton density map v.s. sequence length.
far away from each other. This is consistent with our 3) The proposed sampling strategy. In our pro-
observations: a different sample may come from a neigh- posed sampling strategy, we empirically choose the c
boring dictionary atom, which is also a good approxima- value, which is equivalent to the number of rows near the
tion of the ground truth. k-space origin. We observe that, when the sampling ra-
tio is small (e.g. 8 out of 256 rows), the c value should be
closer to the total number of rows to be sampled. This is
4. Discussions and Conclusions because in this case, if more low-frequency data (a larger
c value) are sampled, the image would be smoothed out
1) The learned distance metric. The success of and thus contains less noise. On the contrary, if more
applying metric learning to MRF indicates that some di- rows are allowed to be sampled (e.g., more than 32 out
mensions may be more useful than others for matching of 256 rows), then the c value does not have to be too
MR fingerprints to the dictionary atoms and there exist large. Generally, the users can set it according to the
correlations between each dimension. Although learning sampling ratio or by cross validation.
a distance metric offline can discover important informa- The proposed Cartesian-based sampling scheme in
tion in MR fingerprints and may well tackle this problem, this study is quite different from the non-Cartesian sam-
the collection of the ground truth data from phantoms pling proposed for the original MRF study, and its im-
or volunteers will take additional efforts. Moreover, cal- plementation in various MR pulse sequences should be
culating the Mahalanobis distances for each query fin- carefully considered in practice. For the normal spin-
gerprint with all the dictionary atoms is more time con- echo and gradient-echo pulse sequences widely used for
suming than their inner-products. Therefore, a better clinical morphological imaging, the implementation of
solution may be to specifically design the pulse sequence the proposed sampling scheme is highly practical be-
so that the MR fingerprints of interest can be best dis- cause these sequences usually utilize phase-encoding gra-
tinguished with the inner-product. dient lobe and phase-encoding rewinding lobe pair prior
2) Compressed sensing algorithm. Currently and posterior to each echo acquisition. The frequency-
the Conjugate Gradient descent with backtracking line encoding (or readout) gradient would not be lengthened.
search is used to optimize the proposed objective func- This implementation is also applicable for fast spin-echo
tion. We will investigate more recent optimization meth- sequences with multiple k-space row acquisition in each
ods for compressed sensing algorithms (e.g., [21],[24]). shot. The index sets for all time frames could be cal-
Moreover, the compressed sensing step requires many culated once prior to acquisition and then applied to
empirically set parameters, such as the line search iter- each time frame. Alternatively, the index set for each
ations and the step size. A possible research direction time frame t-1 could also be recorded for the calcula-
may be to design a systematic way of determining all tion of the sampling mask for the next time frame t.
such parameters. It is worth noting that this Cartesian-based sampling
11
scheme is technically challenging, so may not be suit- References
able for echo planar imaging (EPI) sequence and gradi-
ent and spin echo (GRASE) sequence (either single-shot [1] P. Ehses, N. Seiberlich, D. Ma, F. A. Breuer,
or multi-shot), in which gradient echo trains are used P. M. Jakob, M. A. Griswold, V. Gulani, IR True-
for frequency-encoding, and phase pre-winder lobe and FISP with a golden-ratio-based radial readout: Fast
small blip gradients are used for phase encoding. In these quantification of T1 , T2 , and proton density, Mag-
sequences, to achieve the proposed sampling mask, blips netic Resonance in Medicine 69 (1) (2013) 71–81.
with different areas have to be used to skip some k-space [2] D. C. Look, D. R. Locker, Time saving in measure-
rows due to the non-continuous k-space sampling. In this ment of NMR and EPR relaxation times, Review of
case, the large blip gradient lobes could inevitably pro- Scientific Instruments 41 (2) (1970) 250–251.
long the required readout slope and hence the total gra-
dient echo train duration, leading to more severe image [3] I. Kay, R. Henkelman, Practical implementation
distortion, SNR reduction and many other artifacts such and optimization of one-shot T1 imaging, Magnetic
as ghosting. Nevertheless, the implementation of our resonance in medicine 22 (2) (1991) 414–424.
proposed method involves tremendous efforts in pulse [4] E. K. Fram, R. J. Herfkens, G. A. Johnson, G. H.
sequence development and its performance on prospec- Glover, J. P. Karis, A. Shimakawa, T. G. Perkins,
tively undersampled real MRI data has to be thoroughly N. J. Pelc, Rapid calculation of T1 using variable
validated in future works. flip angle gradient refocused imaging, Magnetic res-
The main difference between CSMRF-ML and BLIP onance imaging 5 (3) (1987) 201–208.
is that we learned the distance metric from the data in-
stead of a pre-defined one. This new metric allows us [5] S. Meiboom, D. Gill, Modified spin-echo method for
to better match the fingerprints to the dictionary atoms. measuring nuclear relaxation times, Review of sci-
Besides, we explicitly ask the reconstructed images to entific instruments 29 (8) (1958) 688–691.
be sparse in some transform domain while BLIP tried [6] K. Scheffler, J. Hennig, T1 quantification with in-
to apply the sparse prior on the proton density maps version recovery TrueFISP, Magnetic resonance in
and found no significant improvement. We also proposed medicine 45 (4) (2001) 720–723.
a variable density randomized sampling strategy while
BLIP adopted a uniform sampling strategy. [7] S. C. Deoni, T. M. Peters, B. K. Rutt, High-
The MRF method is a new approach to magnetic res- resolution T1 and T2 mapping of the brain in
onance and not fully exploited yet. In this work, we pro- a clinically acceptable time with DESPOT1 and
pose a compressed sensing framework for MRF with dis- DESPOT2, Magnetic resonance in medicine 53 (1)
tance metric learning. A novel algorithm is proposed to (2005) 237–241.
reconstruct the undersampled data and estimate the MR [8] J. Warntjes, O. Dahlqvist, P. Lundberg, Novel
parameters. It first solves the compressed sensing opti- method for rapid, simultaneous T1 ,T2∗ , and pro-
mization problem and then projects the signal evolution ton density quantification, Magnetic Resonance in
to the Bloch response manifold with a learned distance Medicine 57 (3) (2007) 528–537.
metric. Thus the solution benefits from both temporal
and spatial regularization. A novel sampling strategy is [9] J. Warntjes, O. D. Leinhard, J. West, P. Lund-
also proposed for maximizing the incoherence between berg, Rapid magnetic resonance quantification on
the fingerprint and the aliasing error on it. We con- the brain: Optimization for clinical usage, Magnetic
ducted numerical simulations to demonstrate the effec- Resonance in Medicine 60 (2) (2008) 320–329.
tiveness of our framework. When compared with MRF [10] P. Schmitt, M. A. Griswold, P. M. Jakob, M. Kotas,
[11] and BLIP [26], our algorithm outperforms them in V. Gulani, M. Flentje, A. Haase, Inversion recovery
terms of accuracy of parameter map estimation. TrueFISP: quantification of T1 ,T2 , and spin density,
Magnetic resonance in medicine 51 (4) (2004) 661–
667.
Acknowledgement [11] D. Ma, V. Gulani, N. Seiberlich, K. Liu, J. L. Sun-
shine, J. L. Duerk, M. A. Griswold, Magnetic reso-
nance fingerprinting, Nature 495 (7440) (2013) 187–
This work was supported in part by Lui Che Woo In-
192.
stitute of Innovative Medicine (No. LCWIM 8303122),
in part by NSFC (No. 61301269), in part by the [12] P. Schmitt, M. Griswold, V. Gulani, A. Haase,
PhD Programs Foundation of MOE of China (No. M. Flentje, P. Jakob, A simple geometrical descrip-
20130185120039), in part by China Postdoctoral Science tion of the truefisp ideal transient and steady-state
Foundation (No. 2014M552339) and in part by Sichuan signal, Magnetic resonance in medicine 55 (1) (2006)
High Tech R&D Program (No. 2014GZX0009). 177–186.
12
[13] E. J. Candès, J. Romberg, T. Tao, Robust uncer- [24] S. Ravishankar, Y. Bresler, MR image reconstruc-
tainty principles: Exact signal reconstruction from tion from highly undersampled k-space data by dic-
highly incomplete frequency information, Informa- tionary learning, Medical Imaging, IEEE Transac-
tion Theory, IEEE Transactions on 52 (2) (2006) tions on 30 (5) (2011) 1028–1041.
489–509.
[25] Z. Wang, Q. Zhang, J. Yuan, X. Wang, MRF denois-
[14] D. L. Donoho, Compressed sensing, Information ing with compressed sensing and adaptive filtering,
Theory, IEEE Transactions on 52 (4) (2006) 1289– Biomedical Imaging: From Nano to Macro, IEEE
1306. International Symposium on (2014) 870–873.
[15] M. Lustig, D. Donoho, J. M. Pauly, Sparse MRI: [26] P. V. M. E. Davies, G. Puy, Y. Wiaux, A com-
The application of compressed sensing for rapid MR pressed sensing framework for magnetic resonance
imaging, Magnetic resonance in medicine 58 (6) ngerprinting, Preprint.
(2007) 1182–1195.
[27] T. Blumensath, Sampling and reconstructing sig-
[16] C. Chen, J. Huang, Compressive sensing MRI with nals from a union of linear subspaces, Information
wavelet tree sparsity, Advances in Neural Informa- Theory, IEEE Transactions on 57 (7) (2011) 4660–
tion Processing Systems (2012) 1115–1123. 4671.
[17] Y. Yu, S. Zhang, K. Li, D. Metaxas, L. Axel, De- [28] A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall,
formable models with sparsity constraints for car- G. Ridgeway, Learning a mahalanobis metric from
diac motion analysis, Medical Image Analysis 18 (6) equivalence constraints., Journal of Machine Learn-
(2014) 927–937. ing Research. 6 (2005) 937–965.
13