?? decluster.w,v
字號:
Revision 1.46 1997/08/15 20:18:25 netoAdded Index major section.Revision 1.45 1997/06/26 19:26:14 netoBetter documentation about declustering motivation.Revision 1.44 1997/06/17 13:33:19 netoMake DECLUSTER DEBUG a zero/non-zero test instead of a defined/undefinedtest.Revision 1.43 1997/06/16 20:08:24 netoFixed a TeX bug.Revision 1.42 1997/06/13 17:59:32 netoAdded support for DECLUSTER CONSERVE MEMORY. If defined, thisfrees T prime after preprocessing.Revision 1.41 1997/06/13 15:44:49 netoComment on the bit trick I just made more safe.Revision 1.40 1997/06/13 15:16:27 neto$(t+1)>>1$ is unsafe when $t=2^{31}-1$ because of possible sign extensionon the right shift.Replace it with $t\XOR(t>>1)$. (Here t is a low bits mask.) Also, thismight be faster because it replaces an $AC^0$ operation (t+1) with an $NC^0$operation (exclusive-or).Revision 1.39 1997/06/12 19:55:19 netoMade declustertest a little quieter, and nicer for usual verbose==100output.Revision 1.38 1997/06/12 19:45:04 netoFixed bug in lca: last step was comparing inlabel numbers of xpp, yppinstead of level numbers. A transcripting error, pretty much. :)The LCA passes the tests on both the simple matrix and Euclideaninstances.I'll encapsulate the debugging output now.Revision 1.37 1997/06/12 18:50:21 netoTesting on one example shows thatinlabel, ascendant, level, parent of head, head, parent entries arecomputed correctly.Revision 1.36 1997/06/11 20:04:49 netoFixed a depth-first search bookkeeping bug.Revision 1.35 1997/06/11 19:37:37 netoFixed visibility and ordering of prototypes for print tree.Revision 1.34 1997/06/11 19:31:44 netoRefined when and how print tree is called.Revision 1.33 1997/06/11 19:19:06 netoMust preprocess the MST!In preprocessing: component must be valid for n+n-1 entries.: must make component[w]=NO PARENT.Added function decluster print tree.Revision 1.32 1997/06/10 22:06:56 netoMST stuff now works, including test.Added code to test LCA.LCA does not work. :(Revision 1.31 1997/06/10 20:51:47 netoFixed 2 bugs in plain prim.Fixed 2 bugs in testing code.plain prim: mark cities as members of the component.plain prim: in updating arrays, I set infinity wrong.testing code: setup needs number of cities, not number of edges in MST.Revision 1.30 1997/06/10 20:30:24 netoFinished coding self-test. Plain prim fails. It adds self-edges.Revision 1.29 1997/06/04 22:27:08 netoFixed a macro placement.Revision 1.28 1997/06/04 22:26:04 netoMore code for selftest.Revision 1.27 1997/06/04 21:37:33 netoBegin a small test program.Revision 1.26 1997/06/04 21:06:55 netoNow it compiles.Revision 1.25 1997/06/03 22:40:38 netoRemoved a bad trailing comma in pow 2.Revision 1.24 1997/06/03 22:35:46 netoAdded a comment about whether we should actually store thecost in the topology tree.Revision 1.23 1997/06/03 22:32:39 netoMoved the bit twiddling forward so that it doesn't split thedigestion code.Completed the cluster distance code.Revision 1.22 1997/06/03 22:13:02 netoWe don't need index rightmost 1Revision 1.21 1997/06/03 22:01:17 netoWe don't need halfmasks.Revision 1.20 1997/06/03 21:58:40 netoCompleted coding xpp and ypp code.Created a macro for copying high 1 bit down.Reduced sectional clutter at the end of the web file.Revision 1.19 1997/06/03 19:00:17 netoFinish coding parent of head of inlabel chain.Revision 1.18 1997/06/03 18:54:04 netoFixed accesses to edges of T prime.Use `parent of head of inlabel chain' instead of `head of inlabel chain'.Revision 1.17 1997/06/03 18:50:14 netoAdded code for finding inlabel of z, the LCA of x and y.Revision 1.16 1997/06/02 15:50:47 netoFixed two bugs.First, head levels weren't being checked in the initialization phase.Second, I was missing the lowest high bit in LCA code.Revision 1.15 1997/05/28 17:31:24 netoMake make happy so I can make a distribution.Revision 1.14 1997/05/28 17:28:25 netoStarted on doing LCA queries.Revision 1.13 1997/05/27 23:57:02 netoFinished code to create T prime.Created code to compute level, inlabel, and ascendant numbers.Revision 1.12 1997/05/23 22:46:52 netoMissed an at sign...Revision 1.11 1997/05/23 22:46:14 netoFill out null sections so I can pack this up and back it up to school.Revision 1.10 1997/05/23 22:37:46 netoFixed two typos.Revision 1.9 1997/05/23 22:34:55 netoMiddle of coding up the transformation from MST T to topology tree T'.I need to think more before I code. I may have to go to a full-blownunion-find algorithm, but then again, maybe not.Revision 1.8 1997/05/23 20:35:20 netoBridges drawn from the priority queue might no longer be bridges. Fixed.Still no compile.Revision 1.7 1997/05/23 20:12:00 netoAdded accounting for computing the MST length.Revision 1.6 1997/05/23 19:13:54 netoNew major section: Interface and code preamble.Include dict.h and kdtree.hRevision 1.5 1997/05/23 19:08:07 netoImplemented Prim's algorithm in the geometric case.Still not compiled.Revision 1.4 1997/05/23 18:05:06 netoEdit for readability.Skip over cities that are in the component (in Prim's algorithm).Revision 1.3 1997/05/22 19:22:05 netoFixed memory accounting.Fixed fencepost error.Revision 1.2 1997/05/22 19:13:52 netoCompleted description of declustering idea.Created the interfaceNow writing the MST routine.Finished coding the general Prim algorithm.(Perhaps reference Knuth's MILES\_SPAN).Revision 1.1 1997/05/21 16:57:50 netoInitial revision}%%\def\t#1ip#2{t_{#1i+#2}}%%at s t2ip1 TeX%%at s t2ip2 TeX%%% The following would be cruel...%%%\def\twoxi{2i} %%%at at s two_i TeX@@*Declustering.The Lin-Kernighan algorithm can perform quite badly on clusteredinstances. This becomes evident when one compares its performance onuniform Euclidean instances with its performance on clustered instancessuch as instance \instance{dsj1000}. I described the problem and suggesteda remedy for it in my research proposal, ``Topology and the travelingsalesman problem'', March 17, 1997.This module implements the function |decluster_d| that computes the clusterdistance between any pair of cities. @@^cluster distance@@>It is the key ingredient in the ``efficient declustering'' fixthat I propose for Lin-Kernighan.@@ To understand the problem and its solution, we need to review theLin-Kernighan heuristic, whose salient pointsare as follows. The heuristic searchesfor a cycle in the underlying graph consisting of $2k$ edges $e_1, \ldots, e_{2k}$ withthe followingproperties:\itemskip\li{i}Edge $e_j$ is the pair of cities $(t_j,t_{j+1})$,except that edge $e_{2k}$ is the pair of cities $(t_{2k},t_1)$.\itemskip\li{ii}Odd-numbered edges are in the current tour;even-numbered edges are not. Removing the odd edges from the current tour and adding the even edges results in another tour.The $2k$ edges therefore form anaugmenting cycle in the graph. (CHECK NOMENCLATURE)%Thanks, Derek.\itemskip\li{iii}The total weight of the odd edges exceeds the total weight of the even edges. That is,$$ \sum_{j {\rm\ odd}}\cost(e_j) -\sum_{j {\rm\ even}}\cost(e_j) > 0.$$\itemskipAlthough many ideas go into making the search both extensive and efficient, one of the principal ideas is the \term{cumulative gain criterion}.% a strengthening(?) of rule iii.@@^cumulative gain criterion@@>The reasoning goes as follows.Taking any cycle satisfying the above three criteria, we may rotate itby twos (taking $e'_{j} = e_{j+2}$, and so on) so that we eventuallyfind a cycle in which all the partial sums of weights are positive:$$\cost(e_1)>0$$ $$\cost(e_1)-\cost(e_2)>0$$ $$\cost(e_1)-\cost(e_2)+\cost(e_3)>0$$ $$\cost(e_1)-\cost(e_2)+\cost(e_3)-\cost(e_4)>0$$ and so on. In general, for any $1\le s\le 2k$ we have$$ \sum_{j {\rm\ odd},\, j\le s}\cost(e_j) -\sum_{j {\rm\ even},\, {j\le s}}\cost(e_j) > 0.$$%$$ \sum_{j \le s}(-1)^{j+1}\cost(e_j) >0.$$ %This is yucky.@@ The proof is easy and is obvious once one draws a picture.Let $b$ be the least partial sum in the original list, and say it occursafter subtracting cost of edge $e_{2j}$. Let $a$ be the gain made overthe last $2k-2j$ edges, \ie, the total sum minus $b$. %Since the total%sum is positive, we know $a+b>0$.Rotate the cycle so that $e_{2j}$ appears last in the list of edges, \ie, $e'_{2k}=e_{2j}$ and so on. All the partial sums will nowbe positive, because the least partial sum appears at the end of thelist (to where $e_{2j}$ has migrated) and is equal to the original totalsum, $a+b$, which we know is positive.@@Improving cycles $t$ are built incrementally from $t_1$, by first removing edge$e_1=(t_1,t_2)$ to form a Hamiltonian path.From then on, city $t_1$ anchors the head of the path while werepeatedly flip thetail of the path.Each flip of the tail adds two cities to the sequence of cities $t$ (an odd city and an even city), and hence adds two edges to the sequence of edges $e$ (an even edge andan odd edge). The cumulativegain criterion is applied only after an odd edge is added. Actually,instead of checking that the cumulative sum is positive, one checksthat the cumulative sum is greater than the gain given bythe best tour improvement seenso far in this sequence, denoted by |best_gain|. In symbols, theactual cumulative gain criterion is:$$ \sum_{j {\rm\ odd},\, j < 2i}\cost(e_j) -\sum_{j {\rm\ even},\, {j\le 2i}}\cost(e_j) > \hbox{|best_gain|}.$$@@The cumulative gain criterion in the usual Lin-Kernighan algorithm deliberately ignores the costof immediately closing up the Hamiltonian path to form a tour. That is, $cost(t_{2i},t_1)$ isnot removed from the cumulative gain because there might be a way toextend the sequence a bit farther and achieve a closing-up cost much lower than $cost(t_{2i},t_1)$.@@^cumulative gain criterion@@>But there's a problem with this. If the tail $t_{2i}$ somehow crosses intoa cluster of the graph distant from the cluster containing $t_1$, thenwe end up with an overly-optimistic cumulative gain. This makes thesearch go on for much longer than it should. We may never gain enough potential to cross the great divide back intothe territory surrounding the anchor city $t_1$.@@^anchor city@@>My declustering idea is to discount the cumulative gainby the \term{cluster distance} between$t_{2i}$ and $t_1$. Given a path $P$ from $u$ to $v$, the \term{bottleneck cost} of $P$is the cost of the most expensive edge in $P$. Then the cluster distancebetween $u$ and $v$ is the minimum bottleneck cost over all paths from$u$ to $v$. We denote this value by $d(u,v)$.@@^cluster distance@@>@@^bottleneck cost@@>% See my research proposal for a precise statement of the relationship% with the possible costs of closing up the tour. It is not quite% an upper bound and not quite a lower bound. But it ought to be% close enough.The cumulative gain criterion applied when considering adding$t_{2i-1}$ and $t_{2i}$ becomes:$$ \sum_{j {\rm\ odd},\, j < 2i}\cost(e_j) -\sum_{j {\rm\ even},\, {j\le 2i}}\cost(e_j) - d(t_{2i},t_1) > \hbox{|best_gain|}.$$@@ The rotation trick shows that using the original cumulative gain criteriondoesn't eliminate improvements that satisfy the three Lin-Kernighancriteria given above. That is, applying the cumulative gain criterion can't hurt.However, the declustering technique ---discounting the cumulative gainfunction by the cluster distance between $t_{2i}$ and $t_1$--- definitely{\it can\/} hurt.That is, it might eliminate valid improvement sequences. (INSERT MOREDETAILS. Also, see research proposal.)@@ If the underlying cost function is postitive-definite and symmetric, thenthe cluster distance is positive-definite, symmetric, andalso satisfies the triangle inequality.In short, undersuch conditions$d$ is a \term{metric}. (A function $f$ is \term{positive-definite} if $f(x,y)\ge0$ with equality ifand only if $x=y$; function $f$ satisfies the \term{triangle inequality} if for any $x$, $y$, and $z$, we have $f(x,y) \le f(x,z)+f(z,y)$.)@@^metric@@>@@^positive-definite@@>@@^triangle inequality@@>We're quite lucky because $d$ may be computed very efficiently.In fact with about $O(n \log n)$ preprocessing of the graph, onlinequeries of $d$ may be processed in constant time! This remarkable feat is the job of this module. @@Several ingredients are required to make the declustering code run fast.First, the cluster distance betweenvertices $u$ and $v$ in graph $G=(V,E)$ is also equal to the bottleneckcost along the unique simple path from $u$ to $v$ in a minimum spanning tree$T=(V,E_T)$ of$G$. (The proof is left as an exercise.) Second, we construct a rooted binary tree $T'=(V_{T'},E_{T'})$ from $T$ by making the vertices of $T$ be the leavesof $T'$, and the edges of $T$ be the internal nodes of $T'$: $V_{T'}=V\cup E_T$.Run Kruskal's minimum spanning tree algorithm on $T$. Each subtree of $T'$ correspondsto a connected component of $T$ formed during the execution of Kruskal'salgorithm on $T$: adding edge $(i,j)$ during reconstruction of $T$corresponds to adding internal node $(i,j)$ to $T'$ with its two childrenbeing the current components containing $i$ and $j$.%Tree $T'$ captures the topology of the graph under $d$.Because Kruskal's algorithm adds edges in order of non-decreasing cost,$T'$ is built bottom-up, with more expensive edges toward the root:the cost of the edge labeling an internal node $(i,j)$ of $T'$is as great as the cost of any edge labeling a node in the subtree rootedat $(i,j)$. Finally, the bottleneck cost of the unique path from $u$ to $v$ in $T$ is justthe weight of the edge labeling the least common ancestor of $u$ and$v$ in $T'$.@@^MST@@>@@^LCA@@>@@^minimum spanning tree@@>@@^Kruskal's algorithm@@>@@^cluster distance@@>@@^least common ancestor@@>For any rooted tree $R$, with $O(n\log n)$ time preprocessing and $O(n)$ extra space,online least common ancestor (LCA)queries in $R$ may be answered in constant time (INSERT REFERENCE: Schieber1993). This technique is applied to tree $T'$ from the previous paragraph.The LCA algorithm I give here runs in constant time assuming a fixed wordlength. If machine word size is unrestricted and given as parameter $w$, the algorithm given here runs in $O(\log w)$ time.(The main barrier to asymptotically faster code is the problem of writing standard \CEE/ code to find the most significant1 bit in a word. See \file{http://www.cs.utoronto.ca/\~{}neto/code/\penalty-500first1bit.html}.)@@* Interface and code preamble. The outline of this module is as follows:@@c#include <config.h>#include "lkconfig.h"@@<System headers@@>@@;@@<Early module headers@@>@@;@@<Module headers@@>@@;@@<Global variables@@>@@;@@<Module definitions@@>@@;@@<Module types@@>@@;@@<Module variables@@>@@;@@<Module subroutines@@>@@;@@<Subroutines@@>@@;const char *decluster_rcs_id = "$Id: decluster.w,v 1.61 1998/10/10 21:27:40 neto Exp neto $";@@ This module provides the following interface.Procedures |decluster_setup| and |decluster_cleanup| are the usual intialization and shutdown routines. They allocate the persistent space for the rest of the
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -