the nearest neighbor distance

edit

Suppose there are N points in a 1 x 1 square, a point named i, a point named j as i's nearest neighbor.

Denote the distance between i and j as r. Ask average(r) = ?

Since other N-2 points locate outside the r circle. The probability of such locating is proportional to

(1-Pi * r * r )**(N-2)

So the probability of i located between r and (r+dr), and N-2 points locates outside r is

(1-Pi * r * r)**(N-2) * 2Pi * r * dr

So the averaged distance r is

integral( r * N * (1-Pi * r * r)**(N-2) * 2Pi * r * dr , r from 0 to 0.5 ) , which is a integral so called as Gamma function.

The result is (1/2)*sqrt(1/N). Simular integral worked on the second nearest case yields average distance as (3/4)*sqrt(1/N).

the nearest neighbor distance for a point near boundary

edit

Suppose there are N points in a 1 x 1 square, a point k located near the square boundary.

Suppose k's nearest neighbor is averagely c*sqrt(1/N) far. Ask c = ?

We may translate this question into the following one.

Suppose there are N points in a square with left lower corner at coordiate (-1/2, -1/2) and right upper corner at (1/2, 1/2), a point k located at (0,y1). y1 near 0.

Then we move those point located at (x,y) with x>0 to a new position (-x,y). Now k feels itself standing at some boundary that no point at its right hand side.

Now k feels itself standing at the boundary of a square with double denser points inside. So its avg nearest = c*sqrt(1/(2N)).

This avg nearest is the same as k locates at (0,y1) in the original square, which gives avg nearest = (1/2)*sqrt(1/N).

So c*sqrt(1/(2N)) = (1/2)*sqrt(1/N) , so c = sqrt(1/2) or we say k has avg nearest as sqrt(1/2)*sqrt(1/N).

Simular method tells us a point locates at square corner has avg nearest as 1*sqrt(1/N).

sqrt(N/2) is a lower bound of TSP tour length

edit

Suppose there are N points in a 1 x 1 square and one point named j.

Suppose the shortest tour is known.

A runner stands on j marchs forward N/2 points according to tour and paints them into red.

A runner stands on j marchs backward N/2 points according to tour and paints them into blue.

So the neighbors of j have half possibility in color red.

So the neighbors of j have half possibility in color red and distribute in random ?

Not sure. However, we have a way to explain the distribution is random.

Suppose j locate at (xj,yj).

move each point h locates at (xj-e,y) to (xj+e,y) thus distance(j,h) is not changed.

now j feels it standing on a boundary of some square of double denser points inside.

now j feels it standing on a boundary of some square of once denser red points inside and distributed in random.

Let j's next point on the shortest tour is named k. So k is red.

So distance(j,k) = distance(j, some red neighbor) >= distance(j, nearest red neighbor)

So distance(j,k) >= distance(j, nearest red neighbor) =distance(j on boundary, j's nearest red)

Since distance(j on boundary, j's nearest red)=0.707/sqrt(N) according to the formula of boundary case.

So distance(j,k) >=0.707/sqrt(N)

So whole tour length >= 0.707/sqrt(N)*N=sqrt(N/2)

an upper bound of TSP tour length

edit

Suppose N points distribute in a 1 x 1 square.

a runner stands on point j is going to visit 0.5N points and finally going back to j.

among such visitations, one particular choosed 0.5N points and one particular tour is shortest.

we see such (shortest tour of 0.5N segments)/(0.5N) > (TSP tour length)/N since the exceeded boundary on the two 0.5N groups.

we also see (shortest tour of 0.1N segments)/(0.1N) > (shortest tour of 0.5N segments)/(0.5N)

By induction we see (shortest tour of 20 segments)/20 > (TSP tour length)/N

That is, a runner visits 20 points as a closed loop will experience a longer length than the shortest TSP. His run is an upper bound.

an opposite person say

(shortest tour of 2 segments)/2 > (TSP tour length)/N is not valid.

(shortest tour of 3 segments)/3 > (TSP tour length)/N might be not valid.

the invalidation might comes from that the runner has too many chances to find a set of 20 points that yield a too short tour.

To find tightly and closely upper bound, we do so: consider N+1 points in a square and the N points form a TSP tour, one point k outside tour. k break segment(j1,j2) to join itself into tour so the segment becomes j1-k-j2. We define a notation 'up edge diff for N to N+1'=dist(j1,k)+dist(j2,k)-dist(j1,j2)

consider N+1 points in a square and the N+1 points form a TSP tour, one segment j1-k-j2 inside tour. k break j1 and j2 to escape itself from tour so the segment becomes j1-j2. We define a notation 'down edge diff for N+1 to N'=dist(j1,k)+dist(j2,k)-dist(j1,j2)

obviously we see tsp.L(N) < tsp.L(N+1) - down.eg.dif, and tsp.L(N)+up.eg.dif> tsp.L(N+1), so

tsp.L(N)+ down.eg.dif < tsp.L(N+1) < tsp.L(N)+up.eg.dif

sandwiching

edit

Suppose some points are randomly distributed in a 1 x 1 square.

While m is a small integer, I wish the following statement valid:

F1=avg(shortest tour length of visiting whole 2m+1 points inside)/(2m+1) is smaller than

F2=avg(nearest neighbor distance when whole m points inside), and this is smaller than

F3=avg(shortest tour length of visiting whole 2m-1 points inside)/(2m-1).

Computer experimetal data is:

when m=03, t(07)=0.xxxx+-0.0001, d(03)=0.389200+-0.000100 , t(05)=0.xxxx

when m=04, t(09)=0.xxxx+-0.0001, d(04)=0.321700+-0.000100 , t(07)=0.xxxx

when m=05, t(11)=0.xxxx+-0.0001, d(05)=0.279200+-0.000100 , t(09)=0.xxxx

when m=06, t(13)=0.xxxx+-0.0001, d(06)=0.249400+-0.000100 , t(11)=0.xxxx

when m=07, t(15)=0.xxxx+-0.0001, d(07)=0.227100+-0.000100 , t(13)=0.xxxx

when m=08, t(17)=0.2103+-0.0001, d(08)=0.209700+-0.000100 , t(15)=0.xxxx, by 10^5 samples, badly

when m=08x t(18)=0.2035+-0.0005 by 100*1000 samples in greedy+fullSearch method cost 1 hour to do 1000 samples

when m=09, t(19)=0.1972+-0.0005, d(09)=0.195500+-0.000100 , t(17)=0.xxxx

when m=10, t(21)=0.1864+-0.0003, d(10)=0.183800+-0.000100 , t(19)=0.xxxx

when m=11, t(23)=0.1768+-0.0004, d(11)=0.173930+-0.000030 , t(21)=0.xxx

when m=12, t(25)=0.1685+-0.0003, d(12)=0.165410+-0.000030 , t(23)=0.xxx

when m=13, t(27)=0.1614+-0.0004, d(13)=0.xxxxx+-0.000030 , t(25)=0.xxx

when m=14, t(29)=0.1549+-0.0004, d(14)=0.xxxxx+-0.000030 , t(25)=0.xxx

when m=25, t(51)=0.1127+-0.0002, d(25)=xxx ,t51=L/10**8/51

when m=51, t(101)=0.xxxxx+-0.xxxxx, d(49)=0.076106+-0.000003

when m=51, t(101)=0.xxxxx+-0.xxxxx, d(50)=0.075291+-0.000003

when m=51, t(103)=0.07647+-0.0001, d(51)=xxx

when m=100, t(201)=0.xxxxx+-0.xxxxx, d(100)=0.052220+-0.000010

when m=123, t(247)=0.xxxxx+-0.xxxxx, d(123)=0.046877+-0.000006

07-0.35xx 03-0.38920 09-0.30xx 04-0.32170 07-0.35xx 11-0.25xx 05-0.27920 09-0.30xx 13-0.23xx 06-0.24940 11-0.25xx 15-0.21xx 07-0.22710 13-0.23xx 17-0.2103 15-0.21xx 19-0.1972 08-0.20970 17-0.2103 21-0.1864 09-0.19550 19-0.1972 23-0.1768 10-0.18380 21-0.1864 25-0.167x 11-0.17393 23-0.1768 12-0.16541 25-0.167x


L[i]<D[ (i-1)/2] i=7,…15 L[i]<D[ (i-3)/2] i=19,21,… L[i] < D[ (i-intSqrt[i/2])/2] < L[i-2]

L[9]< D[ (9-2)/2] <L[7] ? L[11]<D[ (11-2.xx)/2] < L[9] ? L[13]<D[ (13-2.xx)/2] < L[11] ? L[32]<D[ (32-4)/2] <L[30] ?


L[33]<D[ (33-4.xx)/2] <L[31] ?


Another conjecture is

avg(nearest neighbor distance when m+1 points inside) is smaller than

avg(shortest tour length of 2m points inside)/(2m), and this is smaller than

avg(nearest neighbor distance when m-1 points inside).


This form a theorem: TSP(N points) segment length = nearest_distance(N/2 points) in any manifold.


avg.segLen.of.1st.tour(2m+1) ?<= avg.NNdis(m)

avg.segLen.of.1st.tour(2m+1) <= avg.segLen.of.1st.tour(2m) ?<= avg.NNdis(m)

we see avg.segLen.of.1st.tour(2*2-1) = avg.NNdis(2) is true for any 4 points in square.

tourLen(2m+1) = loopLen(2m)-longBar+exBar1+exBar2=loopLen(2m)+DifBar > tourLen(2m)+DifBar

(2m+1)*tour(2m+1)=(2m+1)*avg.loop(2m)- (2m+1)*avg.longBar +2*tour(2m+1)> (2m+1)*(tour(2m)+DifBar)

TSP length is sqrt(1/2)*sqrt(N) for 2D unit square

edit

The previous text explains when 2m points randomly located in a 1 x 1 square then TSP(2m)=0.5/sqrt(m)*(2m)=sqrt(1/2)*sqrt(N).

sqrt(1/2) is lower than the experimental mean value in Jonson as 0.70791 in table 7. However sqrt(1/2) is larger than his experimental mean-standard_deviation=0.70791-0.00108=0.70683

TSP length is 0.69795*power(N,-1/3)*N for 3D unit cube

edit

By integral, nearest_distance(N/2) in 3D unit cube is power(N/2, 1/3)*power(3/4/Pi, 1/3)*integral(exp(-u)*power(u,1/3)*du, u=0..inf)

that is 0.69795*power(N,1/3).

According to the theorem TSP(N points)= nearest_distance(N/2 points)*N, TSP(N points)=0.69795*power(N,2/3) which is the same as the computer experimental result of Percus

global TSP length can be known by calculating local neighbor length

edit

This is the basic spirit of differential geometry that knowing global by only knowing local.

simulated annealing

edit

We can do simulated annealing(SA) for N=40 points randomly distributed in unit square by 2-opt method. The definition of 2-opt is dividing a given tour into 2 segments and might inverse rejoin the two segments. That is

if a given tour= 1,2,...i, i+1,....j, j+1,...N,1

then new tour= 1,2,...i, j,....i+1, j+1,...N,1

SA means the new tour is accepted in a possibility <= 1/[exp(beta*(newLength-oldLength))+1] where beta varys with time as beta=bet0*sqrt(N)*tti/ttN with tti=1..ttN

Try SA many times by (for bet0=10..30)(for ttN=10^5 to 10^6 step 10^5) loop. Pick up the shortest tour among these trys as the final result of one sample. a 2-opt SA is enough to reach the shortest length rather than 3-opt takes more CPU time. The key factor to touch ground is the speed of beta increasing not higher k-opt.

Slower beta increasing is not necessary for arriving shorter length. Once a tour is trapped by a local minimum, further beta increasing no matter how slow will not help jumping out this trap. This contradict with my previous impression that slow is good, rather, adequate beta speed is best. an experienced adequate speed is:

Try SA many times by (for bet0=10 to 30)(for ttN=10U to 40U step 1U=10^4) and constraint bet0 in (0.75 ttN/U to 1.00 ttN/U)

full search quickly

edit

We are going to do full search to find out the shortest tour of the N=20 case. To do full search quickly, knowing a very short tour length before hand will be much helpful. Suppose we have a known tour length L0. Then we let a runner stand on point 1 and march to point k.

If distance(1,k) + lower_bound( shortest tour length among the N-2 unvisited points) < L0 then he may further march(k, some other point).

Else march(1,k) is bad he must go back and try march(1, some other point).

Thus L0 helps runner to known march(1,k) is bad before he walks through N points. The NN greedy algorithm gives a good L0. The simulated annealing gives a shorter L0. We may do simulated annealing 100 times on one sample and pick up the shortest length among the 100 times. We denote this method as SA100 and denote this length as SA100L0. SA100L0 is much helpful to acclerate full search.

in the other hand, lower_bound( shortest tour length among unvisited points) is:

suppose point {2,3,4,5,6,7,8,9} are unvisited yet, i and j are some two points in the list. This list should be connected as a chain ( not a closed loop) and i must be also connected to point 1, j must be also connected to k. Thus a lower bound is

sum( dist(p, its 1st nearest neighbor)+ dist(p, its 2nd nearest neighbor) ; for each p in list )/2, or more precsiely

sum( dist(p, its 1st nearest neighbor)+ dist(p, its 2nd nearest neighbor) ; for each p in list )/2 + (dist(1,its 1st nearest neighbor) +dist(k,its 1st nearest neighbor))/2


We do full search on 1000 samples of N=20 case and fortunately see the final shortest tour length is the same as SA100L0 on all samples without exception. The reason that SA100 touch the ground is: suppose one SA might touch the groud in possibility 1/10, then SA100 may touch the ground in possibilty 1-power(9/10, 100) which is very close to 1. After 1000 full saerches, we do SA100 on 100000 samples of N=20 case without checking full search. We believe SA100L0 is the shortest length and it is much much quickly yielded rather than full search. Similarly, we do SA100 on 1000 samples of N=21 to 30 cases and see the final shortest tour length is the same as SA100L0 on all samples without exception. So we do SA100 on 100000 samples of N=21 to 30 cases without checking full search. The results are in section Sandwiching

apply SA100 on other combination problem

edit

We see SA100 carry the same precision as full search. We may apply SA100 on other combination problem to get exact solution in short time. Especially applied on GOE game which play on 19 x 19 lattices. We regard this game as handing sequence. After one handput we evaluate current board by random play board to the end for 10000 times. The average score is avg(B), its standard deviation is var(B). we say

when last put is white, white wish avg(B) up and var(B) dn, so it wish +avg(B)- var(B) maximized

when last put is black, white wish avg(B) dn and var(B) dn, so it wish -avg(B)- var(B) maximized

inner property of TSP segment

edit

With abbriviation NNB=nearest_neighbor, dis=distance, we see

avg(dis.NNB) in_half_density=avg(0.5 dis.1thNNB + 0.25 dis.2thNNB + 0.125 dis.3thNNB +...)in_original_density

a short line connect one point to its next on the shortest tour is denoted as a TSP segment. I guess a TSP tour iscomposed by

0.7* 1thNNB + 0.7*(1-0.7)* 2thNNB + 0.7*(1-0.7)^2* 3thNNB + 0.7*(1-0.7)^3* 4thNNB + ...

design supply chain of double play

edit

In supply chain you should let one line be one basic event. Basically one event is a mass born from left hand side and gradually move to right hand side. This event can be described in 1 lines. A seldom event is 'exchanging' , Exchanging makes 2 lines and need a column to memo the 2 lines are paired simuteneously.

Random section of TSP article

edit

Hi Lingwanjae, I'm afraid I don't understand this section as written, or some of your explanations on this page. Perhaps you could tell me what quantity exactly it is trying to describe a lower bound for? Do you have any references for where any of this material has been written up -- a journal maybe? xnn (talk) 16:28, 16 June 2010 (UTC)Reply

.

If 1000000 points are randomly distributed in a 1x1 square, then its shortest path length is between 0.707sqrt(1000000) and 0.710sqrt(1000000). So 0.707 is a lower bound. In the graph you illustrated the points are distributed on lattice not random so its length is 1sqrt(1000000). References are linked on wiki TSP page and the following is one of it.


http://users.cs.cf.ac.uk/Antonia.J.Jones/Papers/EJORHeldKarp/HeldKarp.pdf Lingwanjae (talk) 12:00, 18 June 2010 (UTC)Reply