Processing of 2D Electrophoresis Gels: Abstract
Processing of 2D Electrophoresis Gels: Abstract
1 Introduction
(c) (d)
(e) (f)
Fig. 1. (a,b) Low complexity pair of images. (c,d) High complexity pair of images. (e)
Maximal meaningful boundaries detected of image (d). (e) Resulting shapes obtained
applying the isoperimetric relation, over the lower level sets of the image and enters of
the spots on the obtained shapes.
2 Spot Detection using Level Lines
The process to establish correspondences between two images must be based
on invariant features present on them. Due to the nature of the generation of
the gel images explained before, features as the shape or intensity of the spots
may vary between experiments, turning them useless. One of the features that
remain invariant is the relative distribution of the spots within the image. This
is the reason why most of the existing methods are based on point-matching
approaches, where the points to be matched are, in general, the center of the
spots. Before applying such an algorithm, we must obtain for each image, the
set of points to be matched.
The proteins submitted to an electrophoresis process suffer deformations from
an initial circular or punctual concentration that turns them into ellipses. Given
a set of spots, the points can be obtained as the darkest point of each spot. The
problem then is how to obtain the set of valid spots of the image. On this work
we propose to detect them using the meaningful boundaries [5] together with
several criteria related to the shapes of the spots. We will start in next section
with a review of the meaningful boundaries approach.
where where Nll is the number of level lines in the image, l is the length of C
and P (x) is the probability of the contrast x under the a contrario model.3
A final observation is that on real images, boundaries width are bigger than
one pixel, which leads to the detection of several meaningful boundaries for each
real one. This problem is solved by applying a maximality criterion over the set
of all meaningful boundaries of a monotone section of the tree of shapes.
3 Spot Matching
The best methods for spot matching are based on point-matching techniques.
In our case we use the metric Shape Context (SC) [3] following [9] where this
metric was applied to gel images. The idea behind SC is to describe each point
(spot) with the distribution of points on its neighborhood. Using a set of bins
in polar coordinates the number of points in each bin is computed to obtain a
two-dimensional histogram in polar coordinates. We will denote the normalized
histogram at point i as hi (k) where the index k identifies the bin. Given this
metric we can compute the distance between the SC of two points i and j using
the χ2 distance:
nθ = 12 nθ = 16 nθ = 24 nθ = 12 nθ = 16 nθ = 24
Kernel EMD 6 (6) 7 (5) 5 (0) 4 (6) 5 (7)
EMD 9 (6) 6 (14) 5 (11) 4 (9) 5 (5)
Kernel χ2 8 (6) 8 (5) 10 (0) 10 (6) 11 (5) 11 (6)
χ2 24 (6) 21 (14) 25 (11) 23 (9) 19 (7) 29 (15)
Table 1. First three columns: results for outer radius two times the average spot
distance. Last three columns: results for outer radius four times the average spot dis-
tance. In each cell we show the number of mismatched spots before and after (between
parenthesis) gel registration. The last column for EMD is not computed due to its
computational cost.
4 Gel Registration
where Pij is a permutation matrix which encodes the matching. Since we may
have outlier spots we also include a set of virtual spots with cost ε for rejection
purposes. Obviously the problem with the above procedure is that no global
coherence is imposed.
As discussed before when pursuing gel registration we have to deal with
two types of errors. First we have differences in the spot sets (either genuine
or produced during spot). Second, the aforementioned spot differences produce
discrepancies in the SC used for spot matching and gel registration. To over-
come this problem we developed an iterative random sampling procedure. At
each iteration we randomly sample a subset of spots from both images and com-
pute the SC and the corresponding distances between pairs of spots. Along this
process, for each pair of corresponding spot we record the smallest distance in
a matrix, Cij , and the number of times each pair is matched in a matrix Nij .
We use these matrices to compute the gel matching using (1). For each spot in
one image we obtain a set of possible corresponding spots. Using Cij and Nij
we obtain a measure of confidence of each pairing. This can be used to assist the
user in the rejection of false matches and refinement of the registration solution.
No matter how robust the automatic methods could be, there will be always
potential mistakes that would need high level information to be resolved. That
is why we equipped every step of the method with a measure of confidence to
rapidly assist the user.
To test the methodology here presented we divided image pairs in three groups
of: low, medium and high complexity. In Fig. 3 we show the results. For both
pairs we have the ground truth of corresponding points. In the first example
of low complexity, in Fig. 3(a), we detected 36 spots in each image and the
ground truth contains 29 pairs. The proposed registration process finds 26 correct
pairs, 1 erroneous pair and two points with no correspondent. In the second
example of medium complexity, in Fig. 3(b), we have 61 spots in one image, 56
in the other one and 45 corresponding pairs in the ground truth. The proposed
registration process finds 44 correct pairs and 1 erroneous pair. As we can see
the global results are extremely accurate for low and medium complexity pairs.
Unfortunately these results are not achieved in high complexity examples. In the
example showed in Fig. 3(c) only 25 out of 56 correct pairs are found from 39
spots without correspondent the method correctly finds 22. The lack of global
constraints and the complexity of these pairs are not resolved with the proposed
method. In this case the strong differences between gels conspire against the
results. Although, some of the erroneous correspondents are close to the valid
ones, the actual matching is incorrect. This is a clear difference with other works
which report the error in pixels. At the end of the day we may have small error
but a huge number of incorrect matching to be resolved by the user. Our method
intends to overcome this problem and the results for the first two examples seem
promising. Below we discuss possible improvements to cope with high complexity
pairs.
The spot detection using meaningful boundaries with the addition of the two
criteria explained on this article gives good results. Most of the spots present on
the images are correctly detected and a very low number of them are missed.
Furthermore, we can obtain the meaningfulness for each spot which allows the
user to supervise, if needed, the less confident ones. In the future we will ad-
dress the problem of grouped spots. We expect that the joint inclusion of spots
features, instead of in sequential order, will allow us to consider a level line to
be meaningful if it is contrasted enough but also has an isoperimetric relation
similar to the circle.
The use of a kernel based estimation of the SC histogram and the EMD
distance showed to improve the results.
The iterative random sampling process gives good results despite it does not
include global coherence. Furthermore, the distance matrix, Cij , and the matrix
with the number of matches, Nij , can be used to decide the confidence on the
correspondence of pair of spots. We are currently investigating how to include a
more detailed validation of the matching and registration transformation using
the theory of Computational Gesltalt. Also, we are exploring the inclusion of
constraint on the permutation matrix Pij to reject invalid matchings.
References
1. http://www.lecb.ncifcrf.gov/2dgeldatasets/.
2. C. Ballester, V. Caselles, and P. Monasse. The tree of shapes of an image. Technical
report, ENS, 2001.
3. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition
using shape contexts. IEEE Trans. on Pattern Analysis and Machine Intelligence,
24(4):509–522, April 2002.
4. A. Desolneux, L. Moisan, and J. Morel. From Gestalt Theory to Image Analysis. A
probabilistic Approach. Springer “Interdisciplinary Applied Mathematics”, 2007.
5. A. Desolneux, L. Moisan, and J.-M. Morel. Edge detection by Helmholtz principle.
Journal of Mathematical Imaging and Vision, 14(3):271–284, 2001.
6. G. Kanizsa. La Grammaire du Voir. arts et sciences. Editions Diderot, 1997.
7. Haibin Ling and Kazunori Okada. An efficient earth mover’s distance algorithm
for robust histogram comparison. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 29(5):840–853, May 2007.
8. M. Rogers, J. Graham, and R.P. Tong. 2 dimensional electrophoresis gel reg-
istration using point matching and local image-based refinement. In BMVC04,
volume 2, pages 567–576.
9. M. Rogers and M. Graham. Robust and accurate registration of 2-d electrophoresis
gels using point matching. IEEE Trans. on Image Processing, 16(3):624–635, March
2007.
10. J. Serra. Image analysis and mathematical morphology. Academic Press, 1982.
Corresponding Spots
50
30 36
23 34
32 28 33 141315
191718 12 28 100
31 23 2019 9 22 29
1516 2122 30 36 18
17 31 35
20 29 35 16 27 150
2413 34 1121 33 8
1412 27 10 32 200
25 26 11
25
7 26 24 250
8
109 7 3 5 6
6 5 4 300
4
3 350
2
400
450
2
1 1
(a)
500
50 100 150 200 250 300 350 400 450 500
50
60 54
61 494847
14 56 50
39 38
29 46
100
20 44
19
13 25 22 10 32
23
12 24
261823 41 56 22 33
21
20
2816 2127 42 11 55
150
31 55 241912
1415 47
464 8
17 51 11 17 37 45
15 37 52 50 53 18
16 30 27 43
59 200
303835363334 57
58 53 2526 54 49
32 10 1331 29 52
45 4039 9 34 35 28
36 250
40 51
300
86 7 4443 5 42
7 8 41 9
350
4 3 6 5
2 4 400
3 450
2
(b) 1 500
1 50 100 150 200 250 300 350 400 450 500
Corresponding Spots
31 117 115
87 53
86 52
5436 3534 66 22
5133 59 85 95 32 78
48 7172 68 41
84 46
47
44
43
4542 69 70 84
22 83 94 40 42
70 1921
4140 60 82 96 67
7775 76
2016
79 17 383958
37 56
81 74 93 116
18
80 49 26 55 57 39 4948 73 88
90 105
78 6561
152725 38 554647 53 104
548789 106
103 4
14
13 30 7776 90 37 44 52
43 51
50 107108
75 2829 2467 23 353645 6264101 9899 102
100
93 92 63
6160 119
69 507172 113 11133
110
112
65 118
127468 73 32 59 58 86 5695 114
31
81 57 3097 10925
64 98 63
10 62 11
83
80 799185 2928
82
7 91
4 65 10 27 26 2321 24
92 20
89 34 3 2 1 7
3 88 11 15
14 18
16 17 8 65
94 1312
66 2 19
1 9
(c)
Fig. 3. Results for low (a), medium (b) and high (c) complexity pair of images. (a, b)
Left: corresponding pairs. Right registration of both images with thin plate splines. (c)
Corresponding pairs.