3d: Amit IronHawk irt poligonsorrendezesre, az valoban gyorsan
   megvalosithato, csak perspektivikus lekepezes eseten nem biztos, hogy
   mindig jo eredmenyt fog adni. Konvex testekkel nincs is gond, csak
   amikor pl. tobb testet akarok sorbarendezni lathatosag alapjan, akkoor
    nem ad jo eredmenyt a Z alapjan torteno rendezes, mivel az igazan
   tokeletes eredmeny a nezoponttol valo tavolsag alapjan valo rendezesbol
   jonne ki, csak ahhoz kellene 3 szorzas is... 
        En az alabbival probalkoztam, igen nagy %-bn hellyes eredmenyt ad:
        s = abs(x) + abs(y) + z*c
        Ahol x,y,z a pont koordinatai (az origo a kepernyo kozeppontja)
    es c pedig egy szorzo tenyezo (alt. c=2, c=4 jo). Ezutan mar lehey
    s alapjan rendezni... Tudna valaki mondani eg jobbat ?

        Thanx. TeeM


Valaki egy gyokvono algoritmus utan erdeklodott, ezt nem hagyhattam szo nelkul: Az a[n+1]=(N/a[n]+a[n])/2 sorozat a negyzetgyok N-hez konvergal, ha megfelelo a[0] tagot valasztunk akkor a a[3],a[4] esetleg a[5] mar eleg kozel lesz hozza, ezt kihasznalva ime a rutin: EAX <- N bsr ecx,eax mov ebx,eax shr ecx,1 mov edi,eax shr ebx,cl ;ebx <- a[0]=N fele (nem N/2) xor edx,edx div ebx add eax,ebx shr eax,1 ;eax <- a[1] cmp eax,ebx je ready l1: mov ebx,eax ;ebx <- a[n] xor edx,edx mov eax,edi div ebx add eax,ebx shr eax,1 ;eax <- a[n+1] cmp eax,ebx jb l1 mov eax,ebx ready: ... sqrt(N) -> EAX A[0]-nak N binaris alakjanak felso reszet valasztom, ha N x bites akkor A[0] x/2 bites lesz, igy a[0] nagysagrendje megegyezik a sqrt(N) nagysagrendjevel... A rutin a gyokhoz legkozelebbi kisebb vagy egyenlo egeszet adja vissza... Ha esetleg ez nem eleg pontos akkor minimalis valtoztatassal kiegeszitheto, hogy pontosabb legyen... Nem tudom, hogy van-e gyorsabb algoritmus, szerintem az a[0] tagot kell esetleg valahogy trukkosebben megvalasztani, igy atlag 3.72 osztas kell a gyok megkeresesehez... Remelem nem irtam valamit hibasan... Deity
> Ha egy byte-on elfer a gyok akkor pedig csak 16 bites szamrol lehet szo, > arra tenyleg egyszerubb egy tablazatott csinalni, a negyzetszamokkal es > abban egy binaris keresesel kivalasztani a megfelelo szamot,ez elege > egyszeru ezt nem is reszletezem... Ime egy tablazat nelkuli megoldas: (BX - a szam, AX - a szam gyoke) mov ax,-1 @@1: add ax,2 ;vagy inc ax ; inc ax ha rovidebb kodot szeretnel sub bx,ax jnc @@1 shr ax,1 Igy a gyok egesz reszet kapod, de a kerekites is csak 2 plusz utasitas: add bx,ax adc ax,0 1024-ig tuti gyorsabb barmely osztasos rutinnal, de lehet, hogy meg nagyobb szamokra is jobb sebessegu... Udv, TomCat/Abaddon
A gyokvono rutin-rol jutott eszmbe: A gyokvonast eleg surun arra szoktak hasznalni, hogy egy-egy vektor hosszat szamoljak ki vele a pitagorasz tetellel... ami ugye 2D-ben sqrt(x^2+Y^2); 3D-ben sqrt(x^2+y^2+z^2)... Ez helyet ajanlok egy masik modszert: -2d-ben a=0.328426 x=abs(x);y=abs(y); ha x>y akkor hossz=x+a*y kulonben hossz=y+a*x a keplet eleg jol kozelitti a gyokos megoldast: max hiba 6%, atlagos hiba: 3.3% -3d-ben a=0.301272 b=0.286257 x=abs(x);y=abs(y);z=abs(z); ugy cserelgetjuk a koordinatakat, hogy x>=y>=z legyen hossz=x+a*y+b*z max hiba 8.3%, atlagos hiba: 3.7% Ha a koordinatak, nem valtoznak csak bizonyos korlatok kozott akkor a szorzasok elvegezhetok egy-egy elore elkeszitett tablazat segitsegevel... A konstansok meghatarozasa az 'nagyobb matematikat aparatust' igenyel azt most nem reszleteznem. Sajnos a 3d-s esetben a derive nem volt hajlando kiintegralni nehany osszefuggest, igy kenytelen voltam kozelito algoritmussal meghatarozni a ket konstans, so lehet, hogy alkalmasabb konstansok is talalhatok... bye Deity
Persze masodfokban jobb eredmenyt lehet elerni, nem? en ezt a 3-8 %-ot eleg soknak tartom, persze nyilvan ez alkalmazastol fugg. Te mire hasznalod ezt nagy sikerrel (ugy ertem, hol eleg ez a pontossag?) Perla
RTNEWS7B I've come up with a fast approximation to 3D Euclidean distance ( sqrt(dx*dx+dy*dy+dz*dz) ). (It's probably not original, but .....) 1) find these 3 values: abs(dx), abs(dy), abs(dz). 2) Sort them (3 compares, 0-3 swaps) 3) Approx E.D. = max + (1/4)med + (1/4)min. (error: +/- 13%) max + (5/16)med + (1/4)min has 9% error. max + (11/32)med + (1/4)min has 8% error. As you can see, only shifts & adds are used, and it can be done with integer arithmetic. It could be used in ray tracing as a preliminary test before using the exact form.
RTNEWS7C Euclidean distance calculation: Iterative approaches work great on this problem. A short summary (and code for one) is in Tom Duff's SIGGRAPH '84 Tutorial on Numerical Analysis (a terrific article, by the way, as is his spline piece in the same place) most of which he got from Moler and Morrison's "Replacing Square Roots by Pythagorean Sums" IBM Journal of Research and Development, 27/6, Nov. 1983. The method he describes has cubic convergence, so it works well to unroll the loop and do a set number of iterations. 4 iterations (2/, 4*, 2+) yield 62 digits of precision.
USENET The referenced article asks about computing Euclidean distance in fixed-point -- the problem being that you don't want to compute a*a+b*b because overflow is very likely. Here is a description of an algorithm that avoids harmful overflow and underflow, and that can probably be adapted for a fixed-point application without too much trouble. Note that I wrote the following text 10 years ago (except that I rewrote the code in ANSI C), and its discussion of machine speeds is probably out of date. [Moler-Morrison 83] describes an algorithm for computing sqrt(a*a+b*b) which is fast, robust and portable. The naive approach to this problem (i.e. just writing sqrt(a*a+b*b)) has the problem that for many cases when the result is a representable floating point number, the intermediate results may overflow or underflow. This seriously restricts the usable range of a and b. Moler and Morrison's method never causes harmful overflow or underflow when the result is in range. The algorithm has cubic convergence, and depending on implementation details may even be faster than sqrt(a*a+b*b). (Warning: before using this or any other numerical algorithm, be sure to measure its performance. On my VAX 750, this algorithm runs 25% faster than the standard sqrt routine, but 18% slower than a highly optimized sqrt designed by W. Kahan of UC Berkeley. However, the Moler and Morrison algorithm's domain is unrestricted, which may still make it superior for some applications.) Here is a C function that implements their algorithm: double hypot(double p, q){ double r, s; if(p<0) p = -p; if(q<0) q = -q; if(p USENET I use 64 bit intermediate results. I have a 32->64 bit square & accumulate routine and a 64 bit square root. Of course I try to avoid these accurate versions whenever I can. Graphics Gems I has a nice approximation of the hypotenuse that can be used to speed up tests of the "distance < radius" type, where you would normally use a square root to calculate the distance. /* A Fast Approximation to the Hypotenuse by Alan Paeth from "Graphics Gems", Academic Press, 1990 */ int idist(x1, y1, x2, y2) int x1, y1, x2, y2; { /* * gives approximate distance from (x1,y1) to (x2,y2) * with only overestimations, and then never by more * than (9/8) + one bit uncertainty. */ if ((x2 -= x1) < 0) x2 = -x2; if ((y2 -= y1) < 0) y2 = -y2; return (x2 + y2 - (((x2>y2) ? y2 : x2) >> 1) ); }
Van meg vmi amit erdemes gyorsitani? Gyokvonas tablaval meg hasonlokra gondolok. Ja a gyoktablara van vkinek vmi jol mukodo otlete? En arra gondoltam hogy nagysagrendenkent erdemes csinalni egy-egy tablat... Szoval nem linearis az egesz csak szakaszonkent (0..1..10..100..1000..) Hozzaszolas? Greg
nekem reg volt egy (nem is olyan jo) otletem: (csak egeszekre jo es csak 1FF negyzeteig) szoval,legeneralunk egy tablat 1FF negyzeteig,de csak az elso bytejat taroljuk na most 100h negyzete 10000h ezaltal ami nagyobb 10000h nal annak a gyokenek az MSB-je 1... +:igy eleg egyetlen byteba letarolni egy gyokerteket -:csak x-ig jo... x=1ff*1ff na? teccik? he?:) Lorenzo/DarkSiDe
Biztos mindenki latta a Pan sk-iflic.zip RayTrace introjat. O adta az otletet, hogy ugyanugy, mint ahogy a exponencialis fuggvenyt mar lecsereltem szorzasokra, talan a gyokvonast is le lehetne cserelni egy gyorsabb rutinra... Azaz a kerdesem, vagyis a feleadat a kovetkezo: ??? Lehet-e irni gyorsabb gyokvono rutint, mint az FSQRT a szimpla pontossag megtartasa mellett ??? (azaz 6-7 ertekes jegy maradna a gyokben). Gondolok itt a Newton iteraciora es a kulonfele gyorsitasaira (Aitiken stb.) Tehat a vegcel 70 ciklusnal gyorsabb FSQRT rutint irni egyszeru pontossag mellett! Udv, TomCat
Here is the code of approximated sqrt (btw, fsqrt(0) = 5.2e-20, not 0.5 as i said before... hat error was in the previous version, not in this). Note: Written for Watcom C++ 11.0 inline float fsqrt(float x) { float y; _asm { mov ecx, x and ecx, 0x7f800000 fld x mov eax, ecx add ecx, 0x3f800000 or ecx, 0x00800000 shr ecx, 1 mov y, eax and ecx, 0x3f800000 fld y fadd fstp y and [y], 0x007fffff or [y], ecx } return y; } Maximum error is reached in x=2^n, with n integer and odd. Correct results are in x=2^n, with n even. Send me your comments ;) *PaN!*
Ez sajna hibas. Talan az or ecx, 0x00800000 helyett or eax, 0x00800000 kellene! De megprobaltam en is megirni a sajat FSQRT rutinjaim. Itt sorakoznak: - ASM FSQRT rutinok (5 db) - C teszt program a gyokvonasok pontossagara - fordito batch fajl WATCOM C-hez .586 LOCALS _TEXT SEGMENT BYTE PUBLIC USE32 'CODE' ASSUME CS:_TEXT, DS:DGROUP PUBLIC I87_FSQRT I87_FSQRT: FSQRT RETN PUBLIC TC1_FSQRT TC1_FSQRT: FSTP SZAM MOV EAX,SZAM ADD EAX,3F800000H SHR EAX,1 MOV SZAM,EAX FLD SZAM RETN PUBLIC TC2_FSQRT TC2_FSQRT: FST SZAM MOV EAX,SZAM ADD EAX,3F800000H SHR EAX,1 MOV SZAM,EAX FDIV SZAM FSUBR SZAM FMUL HALF FSUBR SZAM RETN PUBLIC PAN_FSQRT PAN_FSQRT: FST SZAM MOV ECX,SZAM AND ECX,7F800000H MOV EAX,ECX ADD ECX,3F800000H OR EAX,00800000H SHR ECX,1 MOV SZAM,EAX AND ECX,3F800000H FADD SZAM FSTP SZAM AND SZAM,007FFFFFH OR SZAM,ECX FLD SZAM RETN PUBLIC TC3_INIT TC3_INIT: XOR EDI,EDI MOV SZAM,3F800000H MOV ECX,16384 @@1: FLD SZAM FSQRT ADD SZAM,00000200H FSTP GYOK MOV EAX,GYOK SHR EAX,7 MOV TC3_TAB[EDI],AX ADD EDI,2 DEC ECX JNZ @@1 MOV SZAM,3F800000H MOV ECX,16384 @@2: FLD SZAM FADD ST,ST FSQRT ADD SZAM,00000200H FSTP GYOK MOV EAX,GYOK SHR EAX,7 MOV TC3_TAB[EDI],AX ADD EDI,2 DEC ECX JNZ @@2 RETN PUBLIC TC3_FSQRT TC3_FSQRT: FSTP SZAM MOV EAX,SZAM ADD EAX,3F800000H MOV ECX,0FFFEH SHR EAX,8 AND ECX,EAX MOV AX,TC3_TAB[ECX] SHL EAX,7 MOV SZAM,EAX FLD SZAM RETN _TEXT ENDS DGROUP GROUP _DATA,_BSS _DATA SEGMENT DWORD PUBLIC USE32 'DATA' HALF DD 0.5 _DATA ENDS _BSS SEGMENT DWORD PUBLIC UNINIT USE32 'BSS' SZAM DD ? GYOK DD ? TC3_TAB DW 65536 DUP(?) _BSS ENDS END #include #include #define rnd() (((float) rand())/RAND_MAX) extern float I87_FSQRT(float); #pragma aux I87_FSQRT "*" parm [8087] value [8087] modify exact extern float TC1_FSQRT(float); #pragma aux TC1_FSQRT "*" parm [8087] value [8087] modify exact [EAX] extern float TC2_FSQRT(float); #pragma aux TC2_FSQRT "*" parm [8087] value [8087] modify exact [EAX] extern float PAN_FSQRT(float); #pragma aux PAN_FSQRT "*" parm [8087] value [8087] modify exact [EAX ECX] extern void TC3_INIT(void); #pragma aux TC3_INIT "*" modify exact [EAX ECX EDI] extern float TC3_FSQRT(float); #pragma aux TC3_FSQRT "*" parm [8087] value [8087] modify exact [EAX ECX] void main (void) { void CLRSCR(void); #pragma aux CLRSCR modify exact [EAX EDI ECX] =\ " MOV EAX,0700070007000700H"\ " MOV ECX,1000"\ " MOV EDI,0B8000H"\ " REP STOSD" void CURPOS(char,char); #pragma aux CURPOS parm [DH] [DL] modify exact [AX BX DX]=\ " MOV AH,2"\ " MOV BH,0"\ " INT 10H" float num,ratio,avg; int count; float root1,rmax1,amin1,amax1,sum1,wavg1,vavg1,oavg1,pavg1,cmin1,cmax1; float root2,rmax2,amin2,amax2,sum2,wavg2,vavg2,oavg2,pavg2,cmin2,cmax2; float root3,rmax3,amin3,amax3,sum3,wavg3,vavg3,oavg3,pavg3,cmin3,cmax3; float root4,rmax4,amin4,amax4,sum4,wavg4,vavg4,oavg4,pavg4,cmin4,cmax4; TC3_INIT(); CLRSCR(); CURPOS(3,0); printf("Root Ticks Ratio MaxRatio AvgRatio LocalMinAvg LocalMaxAvg\n"); count=0; rmax1=0.0;amin1=1.0;amax1=0.0;cmin1=1.0;cmax1=0.0;sum1=0.0; rmax2=0.0;amin2=1.0;amax2=0.0;cmin2=1.0;cmax2=0.0;sum2=0.0; rmax3=0.0;amin3=1.0;amax3=0.0;cmin3=1.0;cmax3=0.0;sum3=0.0; rmax4=0.0;amin4=1.0;amax4=0.0;cmin4=1.0;cmax4=0.0;sum4=0.0; while ( ++count!=0 && !kbhit() ) { do num=rnd() + count; while (num<=0.0); CURPOS(1,0); printf("The Number: %.9f\n",num); root1=I87_FSQRT(num); ratio = root1/root1 - 1; if (ratio < 0.0) ratio = ratio * -1.0; sum1=sum1+ratio; if (ratio>rmax1) rmax1=ratio; avg = sum1/count; if ((count & 511) == 0) { if (wavg1vavg1 && vavg1>oavg1 && oavg1>pavg1 && pavg1>avg) { amax1=cmax1; cmin1=1.0; } wavg1=vavg1;vavg1=oavg1;oavg1=pavg1;pavg1=avg; } if (avgcmax1) cmax1=avg; if (avgamax1) amax1=avg; CURPOS(4,0); printf("%.9f\n",root1); CURPOS(4,17); printf("70\n"); CURPOS(4,20); printf("%.9f\n",ratio); CURPOS(4,32); printf("%.9f\n",rmax1); CURPOS(4,44); printf("%.9f\n",avg); CURPOS(4,56); printf("%.9f\n",amin1); CURPOS(4,68); printf("%.9f\n",amax1); root2=TC1_FSQRT(num); ratio = root2/root1 - 1; if (ratio < 0.0) ratio = ratio * -1.0; sum2=sum2+ratio; if (ratio>rmax2) rmax2=ratio; avg = sum2/count; if ((count & 511) == 0) { if (wavg2vavg2 && vavg2>oavg2 && oavg2>pavg2 && pavg2>avg) { amax2=cmax2; cmin2=1.0; } wavg2=vavg2;vavg2=oavg2;oavg2=pavg2;pavg2=avg; } if (avgcmax2) cmax2=avg; if (avgamax2) amax2=avg; CURPOS(5,0); printf("%.9f\n",root2); CURPOS(5,17); printf(" 7\n"); CURPOS(5,20); printf("%.9f\n",ratio); CURPOS(5,32); printf("%.9f\n",rmax2); CURPOS(5,44); printf("%.9f\n",avg); CURPOS(5,56); printf("%.9f\n",amin2); CURPOS(5,68); printf("%.9f\n",amax2); root3=TC2_FSQRT(num); ratio = root3/root1 - 1; if (ratio < 0.0) ratio = ratio * -1.0; sum3=sum3+ratio; if (ratio>rmax3) rmax3=ratio; avg = sum3/count; if ((count & 511) == 0) { if (wavg3vavg3 && vavg3>oavg3 && oavg3>pavg3 && pavg3>avg) { amax3=cmax3; cmin3=1.0; } wavg3=vavg3;vavg3=oavg3;oavg3=pavg3;pavg3=avg; } if (avgcmax3) cmax3=avg; if (avgamax3) amax3=avg; CURPOS(6,0); printf("%.9f\n",root3); CURPOS(6,17); printf("54\n"); CURPOS(6,20); printf("%.9f\n",ratio); CURPOS(6,32); printf("%.9f\n",rmax3); CURPOS(6,44); printf("%.9f\n",avg); CURPOS(6,56); printf("%.9f\n",amin3); CURPOS(6,68); printf("%.9f\n",amax3); root4=TC3_FSQRT(num); ratio = root4/root1 - 1; if (ratio < 0.0) ratio = ratio * -1.0; sum4=sum4+ratio; if (ratio>rmax4) rmax4=ratio; avg = sum4/count; if ((count & 511) == 0) { if (wavg4vavg4 && vavg4>oavg4 && oavg4>pavg4 && pavg4>avg) { amax4=cmax4; cmin4=1.0; } wavg4=vavg4;vavg4=oavg4;oavg4=pavg4;pavg4=avg; } if (avgcmax4) cmax4=avg; if (avgamax4) amax4=avg; CURPOS(7,0); printf("%.9f\n",root4); CURPOS(7,17); printf("30\n"); CURPOS(7,20); printf("%.9f\n",ratio); CURPOS(7,32); printf("%.9f\n",rmax4); CURPOS(7,44); printf("%.9f\n",avg); CURPOS(7,56); printf("%.9f\n",amin4); CURPOS(7,68); printf("%.9f\n",amax4); } getch(); } @echo off cls wcc386 C tasm B /m9 /mx /z %1 %2 %3; if errorlevel 1 goto h wlink system pmodew file C, B DEL ?.OBJ :PMWLITE /C4 C.EXE C goto :z :h pause >nul: :z