?? newfct.c
字號(hào):
/*************************************************************************** ************************************************************************** Spherical Harmonic Transform Kit 2.7 Contact: Peter Kostelec geelong@cs.dartmouth.edu Copyright 1997-2003 Sean Moore, Dennis Healy, Dan Rockmore, Peter Kostelec Copyright 2004 Peter Kostelec, Dan Rockmore SpharmonicKit is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SpharmonicKit is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Commercial use is absolutely prohibited. See the accompanying LICENSE file for details. ************************************************************************ ************************************************************************//************************************************************************ If FFTPACK *is* defined, use the FFTPACK-based dcts. Small C-interfaces have been written, to act as front ends to the functions which live in the fftpack library that I'll be linking to. NOTE THAT even if FFTPACK is defined, I still need the inverse DCT routine ExpIFCT. That is because, in the inverse discrete Legendre transform, I need to take N coefficients and inverse DCT them to produce 2*N samples. I don't know how to do that with FFTPACK's inverse dct. If I have to do an inverse DCT whose input and output are the same lengths, *then* I can use FFTPACK's routine. The "standard" (i.e. non-FFTPACK) forward and inverse dct routines defined in this file are forward dct: kFCT, kFCTX inverse dct: ExpIFCT When using FFTPACK, the routines are forward dct: DCTf ("f" for forward) inverse dct: DCTb ("b" for backward) If you have your own optimized dct routines, just modify the definitions of DCTf and DCTb accordingly (and still compile with the -DFFTPACK flag and change the library you're linking to in the Makefile). NOTE: DCTf and DCTb use slightly modified versions of the fftpack functions cosqb ( in DCTf ) and cosqf (in DCTb ). Before compiling the fftpack library, these functions were modified in order to scale the coefficients the same way as they are in kFCT, kFCTX, and ExpIFCT. Originally, the transforms cosqb and cosqf were not orthogonal. THEY ARE NOW. Exactly how was the scaling modified in the original, unmodified fftpack routines cosqb, cosqf ? Let X be an input array of length N. Let Y be the output of kFCT (or kFCTX - they're identical), Z be the output of the original, unmodified fftpack routine cosqb. Let SCALE = 1 / (2 * N) .Then (using C and not Fortran indexing) Y[0] = SCALE * Z[0] / 2 Y[i] = SCALE * Z[i] for i = 1, 2, ..., N-1 The original fftpack library routine cosqb was further modified to return only the first p <= N many coefficients. This required modifying the original, unmodified fftpack-source file cosqb.f . Let X be an input array of length N. Let Y be the output of ExpIFCT. Now let me multiply the first coefficient of X by 2 ( so X[0] *= 2 ) and plug this modified X into the original, unmodified fftpack-routine cosqf. Let Z be the resulting output of cosqf. Let SCALE = 1/2. Then (using C and not Fortran indexing) Y[i] = SCALE * Z[i] for i = 0, 1, 2, ..., N-1 All this scaling is now done within the source files cosqf1.f and cosqb1.f . It's a little hokey, but it works. IMPORTANT NOTE: DCTf and DCTb both write the output over the input array!!!************************************************************************//************************************************ Some general background on the routines kFCT, kFCTX and ExpIFCT ... This is new C cource code for doing fast cosine transforms, and it contains variations for speeding up FCTs and convolutions for certain structures appearing in Legendre transforms. Many of these commands have both a real-valued version and a complex-valued version to save time when performing spherical harmonic transforms. It also is written with memory management in mind. This cosine transform is based on the Steidl and Tasche algorithm with some modifications. See Sean's Ph.D. thesis for refs. NOTE: I will try to explicitly unwrap the ChebyDivRem routine as called in ExpIFCT: the while-loop will be for divsize > 4. Then I'll explicitly type in for the cases of divsize = 4 and divsize = 2. This, I believe, significantly reduces the overhead involved in calling ChebyDivRem so often. Thank you Doug and Sumit ! Note about structures: some functions use the lowhigh structure, defined to be struct lowhigh{ double low; double high; } The reason for this is to "interweave" elements. That is, if I have a vector A of length 2*n, then I'll create an array with n-many lowhigh elements which will look like {{A[0],A[n]},{A[1],A[n+1]},{A[2],A[n+2]},...,{A[n-1],A[2*n]}} Having the elements arranged this way increases in a small (but real) way the efficiency of the forward fast cosine transform. So, for example, kFCTX is the "structure" version of kFCT. The structure is well suited for the transform, given how this particular algorithm is implemented. Basically, I believe, using this structure cuts down on cache-thrashing. General note about flags - throughout this code flags are often used as input arguments to tell the routine something about the input information. The convention adopted throughout is that when a flag is set to 0, that means that no action needs to be performed on the pertinent data, and 1 means that some action must be taken. ************************************************/#include <math.h>#include <string.h>#include "OURperms.h"#include "OURmods.h"#ifdef FFTPACK#include "fftpack.h"#endif#include "newFCT.h"/************************************************************************ Utility functions for kFCT, kFCTX and ExpIFCT *********************************************************************//************************************************************************ reverse the elements of a (double) vector of length n i.e. if a = [0 1 2 3] then what's returned is b = [3 2 1 0] input -> vector to reverse n -> length of input (must be even !) output -> where to store result************************************************************************/static void reverse_it(double *input, double *output, int n){ int i, len; len = n / 2; if (len >= 4) { for(i = 0; i < (len % 4); ++i) { output[i] = input[n - 1 - i]; output[n - 1 - i] = input[i]; } for( ; i < len ; i += 4) { output[i] = input[n - i - 1]; output[i + 1] = input[n - i - 2]; output[i + 2] = input[n - i - 3]; output[i + 3] = input[n - i - 4]; output[n - i - 1] = input[i]; output[n - i - 2] = input[i + 1]; output[n - i - 3] = input[i + 2]; output[n - i - 4] = input[i + 3]; } } else if (len == 2) { output[0] = input[n - 1]; output[1] = input[n - 2]; output[n - 1] = input[0]; output[n - 2] = input[1]; } else if (len == 3) { output[0] = input[n - 1]; output[1] = input[n - 2]; output[2] = input[n - 3]; output[n - 1] = input[0]; output[n - 2] = input[1]; output[n - 3] = input[2]; } else { output[0] = input[n - 1]; output[n - 1] = input[0]; }}/************************************************************************//* permutation management code *//* OUR permutations - a self-inverting permutation used for performing fast cosine transforms. See Sean's Ph.D. thesis *//************************************************************************//* permutes data and puts the result in result *//* data and result should be arrays of size n */static void OURpermute(double *data, double *result, int n){ int i; const int *pptr; pptr = get_perm(n); for (i=0; i<n; i++) result[i] = data[pptr[i]]; }/************************************************************************//* permutes data and puts the result in result *//* data is size n, result array of structures of size n/2 *//* just like above except designed to work with structures */static void OURpermuteX(double *data, struct lowhigh *result, int n){ register int i, dummy; const int *pptr; dummy = n / 2; pptr = get_perm(n); for ( i = 0 ; i < dummy; i ++) { result->low = data[*pptr]; result->high = data[*(pptr+dummy)]; result++; pptr++; }}/************************************************************************//* used by kFCT to add up polynomial coefficients once k level is reached *//* data is double array of size n, result is double array of size k */static void PartitionAdd(double *data, double *result, int n, int k){ int i, j; double tmp; for(i = 0; i < k ; i++) { tmp = 0.0; for(j = i; j < n; j += k) tmp += data[j]; result[i] = tmp; }}/************************************************************************//* used by kFCT to add up polynomial coefficients once k level is reached *//* data is double array of size n, result is double array of size k *//* just like above except used for lowhigh structures ... ASSUMING k = (n/2) */static void PartitionAddX(struct lowhigh *data, double *result, int n, int k){ int i, j; double tmp0, tmp1; int dummy; dummy = n / 2; for (i = 0 ; i < k ; i++) { tmp0 = 0.0 ; tmp1 = 0.0; for(j = i ; j < dummy ; j += k) { tmp0 += data[j].low; tmp1 += data[j].high; } result[i] = ( tmp0 + tmp1 ); }}/************************************************************************//************************************************************************ Ok, if FFTPACK *is* defined, use DCTf, DCTb. Otherwise use kFCT, kFCTX *********************************************************************/#ifdef FFTPACK/**** If will use the fftpack routines cosqf and cosqb, first have to precompute an array of values that both routines use. This interface function will call the fftpack routine cosqi This function will allocate memory to store the precomputed values and then will return a double pointer to that array. NOTE: The memory allocated in this function is freed in the routine that calls precomp_dct!!! size = length of sequence to be transformed wSave = the array created by precomp_dct, of length 3 * size + 15, which contains the precomputed values. precomp_dct will return a double pointer to this array. ****/double *precomp_dct( int size ){ double *wSave; /* allocate space */ wSave = (double *) malloc( sizeof(double) * (3 * size + 15) ); /* now precompute data */ cosqi_( &size, wSave ); return wSave;}/*********************************** This is the forward DCT routine DCTf. It uses the fftpack library function cosqb (NOT cosqf). array = double array of samples, of length n p = how many coefficients desired. CoswSave = array of precomputed data that cosqb requires (produced by a call to the function cosqi ). OUTPUT IS WRITTEN OVER INPUT !!! *******************************/void DCTf ( double *array, int n, int p, double *CoswSave ){ cosqb_( &n, &p, array, CoswSave );}/*********************************** This is the inverse DCT routine DCTb. It uses the fftpack library function cosqf (NOT cosqb). array = double array of coefficients, of length n; n many samples will be produced CoswSave = array of precomputed data that cosqf requires (produced by a call to the function cosqi ). OUTPUT IS WRITTEN OVER INPUT !!! *******************************/void DCTb( double *array , int n, double *CoswSave ){ cosqf_( &n , array , CoswSave );}/************************************************************************//************************************************************************/#else /* FFTPACK is *not* defined, use the "local" functions kFCT, kFCTX *//************************************************************************//************************************************************************//************************************************************************//* OK - here is the forward transform code. The main modification here is that given n samples of data, kFCT will compute only the first k <= n coefficients, where n and k are assumed to be powers of 2. The basic idea is that the algorithm keeps interpolating a polynomial of degree 2d-1 from two polynomials of degree d-1. Initially, lowpoly contains polynomials of degree 0 - the data samples. From these values, polynomials of degree 1 are interpolated and stored in highpoly. This transform is (essentially) the transpose of the matrix formulation of ExpIFCT. Series is returned with coefficients ordered from low degree to high degree data - double array of size n result - double array of size k workspace - double array of size 2*n n - number of samples k = number of coefficients desired permflag - 0 if data in OUR permuted order, 1 if data needs to be permuted */void kFCT(double *data, double *result, double *workspace, int n, int k, int permflag){ double *lowpoly, *highpoly, *lowptr, *highptr; const double *modptr; double *temp, dn2; int p2, p3; unsigned int i, j, levelsize, highsize, lowsize, loopcntr; double modptr0; double e0, e1, f0, f1; lowpoly = workspace; lowptr = lowpoly; highpoly = workspace + n; highptr = highpoly; levelsize = n; modptr = get_mods(levelsize); if (permflag == 1) OURpermute(data,lowpoly,n); else { memcpy(lowpoly, data, sizeof(double) * n); } /***** in what follows, am using the fact that modptr[i+1] = -modptr[i] for i even ****/ /* do first level of interpolation */ for (i=0; i<n; i+=2) { *highptr++ = *lowptr + *(lowptr+1); *highptr++ = *modptr * (*(lowptr+1) - *lowptr); modptr+=2;lowptr+=2; } lowptr = highpoly; highptr = lowpoly; loopcntr = 0; /* now do higher-order interpolations */ if (k > 2) { /* reset pointers to logical polynomial workspace */ levelsize /= 2; modptr = get_mods(levelsize); highsize = 4; lowsize = 2; /* now set counter pointers to appropriate matrix locations */ p2 = lowsize - 1; p3 = 1; /* main loop */ while (highsize <= k) { for (j=0; j<n; j+=highsize) { /* do loworder coeffs first */ for (i=0; i<lowsize; i++) *highptr++ = lowptr[i] + lowptr[i+lowsize]; /* now do higher order coeffs */ modptr0 = modptr[0]; *highptr++ = modptr0 * (lowptr[lowsize] - lowptr[0]); modptr0 *= 2.0; e0 = modptr0 * ( lowptr[p3+lowsize] - lowptr[p3] ); e1 = lowptr[p2+lowsize] + lowptr[p2]; *highptr++ = e0 - e1; p2--; p3++; for ( i = 0 ; i<(lowsize-2); i += 2) { e0 = -lowptr[p3] + lowptr[p3+lowsize]; f0 = -lowptr[p3+1] + lowptr[p3+lowsize+1]; f1 = lowptr[p2-1] + lowptr[p2+lowsize-1]; e1 = lowptr[p2] + lowptr[p2+lowsize]; *highptr++ = modptr0 * e0 - e1; *highptr++ = modptr0 * f0 - f1; p2 -= 2; p3 += 2; } lowptr += highsize; modptr += 2; p2 = lowsize - 1; p3 = 1; }
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -