3d: Amit IronHawk irt poligonsorrendezesre, az valoban gyorsan
megvalosithato, csak perspektivikus lekepezes eseten nem biztos, hogy
mindig jo eredmenyt fog adni. Konvex testekkel nincs is gond, csak
amikor pl. tobb testet akarok sorbarendezni lathatosag alapjan, akkoor
nem ad jo eredmenyt a Z alapjan torteno rendezes, mivel az igazan
tokeletes eredmeny a nezoponttol valo tavolsag alapjan valo rendezesbol
jonne ki, csak ahhoz kellene 3 szorzas is...
En az alabbival probalkoztam, igen nagy %-bn hellyes eredmenyt ad:
s = abs(x) + abs(y) + z*c
Ahol x,y,z a pont koordinatai (az origo a kepernyo kozeppontja)
es c pedig egy szorzo tenyezo (alt. c=2, c=4 jo). Ezutan mar lehey
s alapjan rendezni... Tudna valaki mondani eg jobbat ?
Thanx. TeeM
Valaki egy gyokvono algoritmus utan erdeklodott, ezt nem hagyhattam szo
nelkul:
Az a[n+1]=(N/a[n]+a[n])/2 sorozat a negyzetgyok N-hez konvergal, ha
megfelelo a[0] tagot valasztunk akkor a a[3],a[4] esetleg a[5] mar eleg
kozel lesz hozza, ezt kihasznalva ime a rutin:
EAX <- N
bsr ecx,eax
mov ebx,eax
shr ecx,1
mov edi,eax
shr ebx,cl ;ebx <- a[0]=N fele (nem N/2)
xor edx,edx
div ebx
add eax,ebx
shr eax,1 ;eax <- a[1]
cmp eax,ebx
je ready
l1: mov ebx,eax ;ebx <- a[n]
xor edx,edx
mov eax,edi
div ebx
add eax,ebx
shr eax,1 ;eax <- a[n+1]
cmp eax,ebx
jb l1
mov eax,ebx
ready: ...
sqrt(N) -> EAX
A[0]-nak N binaris alakjanak felso reszet valasztom, ha N x bites akkor
A[0] x/2 bites lesz, igy a[0] nagysagrendje megegyezik a sqrt(N)
nagysagrendjevel...
A rutin a gyokhoz legkozelebbi kisebb vagy egyenlo egeszet adja vissza...
Ha esetleg ez nem eleg pontos akkor minimalis valtoztatassal
kiegeszitheto, hogy pontosabb legyen...
Nem tudom, hogy van-e gyorsabb algoritmus, szerintem az a[0] tagot kell
esetleg valahogy trukkosebben megvalasztani, igy atlag 3.72 osztas kell a
gyok megkeresesehez...
Remelem nem irtam valamit hibasan...
Deity
> Ha egy byte-on elfer a gyok akkor pedig csak 16 bites szamrol lehet szo,
> arra tenyleg egyszerubb egy tablazatott csinalni, a negyzetszamokkal es
> abban egy binaris keresesel kivalasztani a megfelelo szamot,ez elege
> egyszeru ezt nem is reszletezem...
Ime egy tablazat nelkuli megoldas:
(BX - a szam, AX - a szam gyoke)
mov ax,-1
@@1:
add ax,2 ;vagy inc ax
; inc ax ha rovidebb kodot szeretnel
sub bx,ax
jnc @@1
shr ax,1
Igy a gyok egesz reszet kapod, de a kerekites is csak 2 plusz utasitas:
add bx,ax
adc ax,0
1024-ig tuti gyorsabb barmely osztasos rutinnal, de lehet, hogy meg
nagyobb szamokra is jobb sebessegu...
Udv, TomCat/Abaddon
A gyokvono rutin-rol jutott eszmbe:
A gyokvonast eleg surun arra szoktak hasznalni, hogy egy-egy vektor
hosszat szamoljak ki vele a pitagorasz tetellel... ami ugye 2D-ben
sqrt(x^2+Y^2); 3D-ben sqrt(x^2+y^2+z^2)...
Ez helyet ajanlok egy masik modszert:
-2d-ben
a=0.328426
x=abs(x);y=abs(y);
ha x>y akkor hossz=x+a*y
kulonben hossz=y+a*x
a keplet eleg jol kozelitti a gyokos megoldast:
max hiba 6%, atlagos hiba: 3.3%
-3d-ben
a=0.301272
b=0.286257
x=abs(x);y=abs(y);z=abs(z);
ugy cserelgetjuk a koordinatakat, hogy x>=y>=z legyen
hossz=x+a*y+b*z
max hiba 8.3%, atlagos hiba: 3.7%
Ha a koordinatak, nem valtoznak csak bizonyos korlatok kozott akkor a
szorzasok elvegezhetok egy-egy elore elkeszitett tablazat segitsegevel...
A konstansok meghatarozasa az 'nagyobb matematikat aparatust' igenyel
azt most nem reszleteznem. Sajnos a 3d-s esetben a derive nem volt
hajlando kiintegralni nehany osszefuggest, igy kenytelen voltam kozelito
algoritmussal meghatarozni a ket konstans, so lehet, hogy alkalmasabb
konstansok is talalhatok...
bye
Deity
Persze masodfokban jobb eredmenyt lehet elerni, nem? en ezt a 3-8 %-ot
eleg soknak tartom, persze nyilvan ez alkalmazastol fugg. Te mire
hasznalod ezt nagy sikerrel (ugy ertem, hol eleg ez a pontossag?)
Perla
RTNEWS7B
I've come up with a fast approximation to
3D Euclidean distance ( sqrt(dx*dx+dy*dy+dz*dz) ).
(It's probably not original, but .....)
1) find these 3 values: abs(dx), abs(dy), abs(dz).
2) Sort them (3 compares, 0-3 swaps)
3) Approx E.D. = max + (1/4)med + (1/4)min.
(error: +/- 13%)
max + (5/16)med + (1/4)min has 9% error.
max + (11/32)med + (1/4)min has 8% error.
As you can see, only shifts & adds are used, and it can be done with integer
arithmetic. It could be used in ray tracing as a preliminary test before using
the exact form.
RTNEWS7C
Euclidean distance calculation: Iterative approaches work great on this
problem. A short summary (and code for one) is in Tom Duff's SIGGRAPH '84
Tutorial on Numerical Analysis (a terrific article, by the way, as is his
spline piece in the same place) most of which he got from Moler and Morrison's
"Replacing Square Roots by Pythagorean Sums" IBM Journal of Research and
Development, 27/6, Nov. 1983. The method he describes has cubic convergence,
so it works well to unroll the loop and do a set number of iterations. 4
iterations (2/, 4*, 2+) yield 62 digits of precision.
USENET
The referenced article asks about computing Euclidean distance in
fixed-point -- the problem being that you don't want to compute
a*a+b*b because overflow is very likely. Here is a description
of an algorithm that avoids harmful overflow and underflow, and
that can probably be adapted for a fixed-point application without
too much trouble. Note that I wrote the following text 10 years
ago (except that I rewrote the code in ANSI C), and its discussion
of machine speeds is probably out of date.
[Moler-Morrison 83] describes an algorithm for computing sqrt(a*a+b*b)
which is fast, robust and portable. The naive approach to this problem (i.e.
just writing sqrt(a*a+b*b)) has the problem that for many cases when the result
is a representable floating point number, the intermediate results may overflow
or underflow. This seriously restricts the usable range of a and b.
Moler and Morrison's method never causes harmful overflow or underflow when the
result is in range. The algorithm has cubic convergence, and depending on
implementation details may even be faster than sqrt(a*a+b*b).
(Warning: before using this or any other numerical algorithm, be sure
to measure its performance. On my VAX 750,
this algorithm runs 25% faster than the standard sqrt routine, but 18% slower
than a highly optimized sqrt designed by W. Kahan of UC Berkeley. However, the
Moler and Morrison algorithm's domain is unrestricted, which may still make it
superior for some applications.)
Here is a C function that implements their algorithm:
double hypot(double p, q){
double r, s;
if(p<0) p = -p;
if(q<0) q = -q;
if(p
USENET
I use 64 bit intermediate results. I have a 32->64 bit square & accumulate
routine and a 64 bit square root. Of course I try to avoid these accurate
versions whenever I can. Graphics Gems I has a nice approximation of the
hypotenuse that can be used to speed up tests of the "distance < radius"
type, where you would normally use a square root to calculate the distance.
/*
A Fast Approximation to the Hypotenuse
by Alan Paeth
from "Graphics Gems", Academic Press, 1990
*/
int idist(x1, y1, x2, y2)
int x1, y1, x2, y2;
{
/*
* gives approximate distance from (x1,y1) to (x2,y2)
* with only overestimations, and then never by more
* than (9/8) + one bit uncertainty.
*/
if ((x2 -= x1) < 0) x2 = -x2;
if ((y2 -= y1) < 0) y2 = -y2;
return (x2 + y2 - (((x2>y2) ? y2 : x2) >> 1) );
}
Van meg vmi amit erdemes gyorsitani? Gyokvonas tablaval meg hasonlokra
gondolok. Ja a gyoktablara van vkinek vmi jol mukodo otlete? En arra
gondoltam hogy nagysagrendenkent erdemes csinalni egy-egy tablat...
Szoval nem linearis az egesz csak szakaszonkent (0..1..10..100..1000..)
Hozzaszolas?
Greg
nekem reg volt egy (nem is olyan jo) otletem:
(csak egeszekre jo es csak 1FF negyzeteig)
szoval,legeneralunk egy tablat 1FF negyzeteig,de csak az elso bytejat taroljuk
na most 100h negyzete 10000h
ezaltal ami nagyobb 10000h nal annak a gyokenek az MSB-je 1...
+:igy eleg egyetlen byteba letarolni egy gyokerteket
-:csak x-ig jo... x=1ff*1ff
na? teccik? he?:)
Lorenzo/DarkSiDe
Biztos mindenki latta a Pan sk-iflic.zip RayTrace introjat.
O adta az otletet, hogy ugyanugy, mint ahogy a exponencialis
fuggvenyt mar lecsereltem szorzasokra, talan a gyokvonast
is le lehetne cserelni egy gyorsabb rutinra...
Azaz a kerdesem, vagyis a feleadat a kovetkezo:
??? Lehet-e irni gyorsabb gyokvono rutint, mint az FSQRT
a szimpla pontossag megtartasa mellett ???
(azaz 6-7 ertekes jegy maradna a gyokben).
Gondolok itt a Newton iteraciora es
a kulonfele gyorsitasaira (Aitiken stb.)
Tehat a vegcel 70 ciklusnal gyorsabb FSQRT rutint irni
egyszeru pontossag mellett!
Udv, TomCat
Here is the code of approximated sqrt (btw, fsqrt(0) = 5.2e-20, not 0.5 as i
said before... hat error was in the previous version, not in this).
Note: Written for Watcom C++ 11.0
inline float fsqrt(float x) {
float y;
_asm {
mov ecx, x
and ecx, 0x7f800000
fld x
mov eax, ecx
add ecx, 0x3f800000
or ecx, 0x00800000
shr ecx, 1
mov y, eax
and ecx, 0x3f800000
fld y
fadd
fstp y
and [y], 0x007fffff
or [y], ecx
}
return y;
}
Maximum error is reached in x=2^n, with n integer and odd.
Correct results are in x=2^n, with n even.
Send me your comments ;)
*PaN!*
Ez sajna hibas.
Talan az or ecx, 0x00800000 helyett or eax, 0x00800000 kellene!
De megprobaltam en is megirni a sajat FSQRT rutinjaim.
Itt sorakoznak:
- ASM FSQRT rutinok (5 db)
- C teszt program a gyokvonasok pontossagara
- fordito batch fajl WATCOM C-hez
.586
LOCALS
_TEXT SEGMENT BYTE PUBLIC USE32 'CODE'
ASSUME CS:_TEXT, DS:DGROUP
PUBLIC I87_FSQRT
I87_FSQRT:
FSQRT
RETN
PUBLIC TC1_FSQRT
TC1_FSQRT:
FSTP SZAM
MOV EAX,SZAM
ADD EAX,3F800000H
SHR EAX,1
MOV SZAM,EAX
FLD SZAM
RETN
PUBLIC TC2_FSQRT
TC2_FSQRT:
FST SZAM
MOV EAX,SZAM
ADD EAX,3F800000H
SHR EAX,1
MOV SZAM,EAX
FDIV SZAM
FSUBR SZAM
FMUL HALF
FSUBR SZAM
RETN
PUBLIC PAN_FSQRT
PAN_FSQRT:
FST SZAM
MOV ECX,SZAM
AND ECX,7F800000H
MOV EAX,ECX
ADD ECX,3F800000H
OR EAX,00800000H
SHR ECX,1
MOV SZAM,EAX
AND ECX,3F800000H
FADD SZAM
FSTP SZAM
AND SZAM,007FFFFFH
OR SZAM,ECX
FLD SZAM
RETN
PUBLIC TC3_INIT
TC3_INIT:
XOR EDI,EDI
MOV SZAM,3F800000H
MOV ECX,16384
@@1:
FLD SZAM
FSQRT
ADD SZAM,00000200H
FSTP GYOK
MOV EAX,GYOK
SHR EAX,7
MOV TC3_TAB[EDI],AX
ADD EDI,2
DEC ECX
JNZ @@1
MOV SZAM,3F800000H
MOV ECX,16384
@@2:
FLD SZAM
FADD ST,ST
FSQRT
ADD SZAM,00000200H
FSTP GYOK
MOV EAX,GYOK
SHR EAX,7
MOV TC3_TAB[EDI],AX
ADD EDI,2
DEC ECX
JNZ @@2
RETN
PUBLIC TC3_FSQRT
TC3_FSQRT:
FSTP SZAM
MOV EAX,SZAM
ADD EAX,3F800000H
MOV ECX,0FFFEH
SHR EAX,8
AND ECX,EAX
MOV AX,TC3_TAB[ECX]
SHL EAX,7
MOV SZAM,EAX
FLD SZAM
RETN
_TEXT ENDS
DGROUP GROUP _DATA,_BSS
_DATA SEGMENT DWORD PUBLIC USE32 'DATA'
HALF DD 0.5
_DATA ENDS
_BSS SEGMENT DWORD PUBLIC UNINIT USE32 'BSS'
SZAM DD ?
GYOK DD ?
TC3_TAB DW 65536 DUP(?)
_BSS ENDS
END
#include
#include
#define rnd() (((float) rand())/RAND_MAX)
extern float I87_FSQRT(float);
#pragma aux I87_FSQRT "*" parm [8087] value [8087] modify exact
extern float TC1_FSQRT(float);
#pragma aux TC1_FSQRT "*" parm [8087] value [8087] modify exact [EAX]
extern float TC2_FSQRT(float);
#pragma aux TC2_FSQRT "*" parm [8087] value [8087] modify exact [EAX]
extern float PAN_FSQRT(float);
#pragma aux PAN_FSQRT "*" parm [8087] value [8087] modify exact [EAX ECX]
extern void TC3_INIT(void);
#pragma aux TC3_INIT "*" modify exact [EAX ECX EDI]
extern float TC3_FSQRT(float);
#pragma aux TC3_FSQRT "*" parm [8087] value [8087] modify exact [EAX ECX]
void main (void) {
void CLRSCR(void);
#pragma aux CLRSCR modify exact [EAX EDI ECX] =\
" MOV EAX,0700070007000700H"\
" MOV ECX,1000"\
" MOV EDI,0B8000H"\
" REP STOSD"
void CURPOS(char,char);
#pragma aux CURPOS parm [DH] [DL] modify exact [AX BX DX]=\
" MOV AH,2"\
" MOV BH,0"\
" INT 10H"
float num,ratio,avg;
int count;
float root1,rmax1,amin1,amax1,sum1,wavg1,vavg1,oavg1,pavg1,cmin1,cmax1;
float root2,rmax2,amin2,amax2,sum2,wavg2,vavg2,oavg2,pavg2,cmin2,cmax2;
float root3,rmax3,amin3,amax3,sum3,wavg3,vavg3,oavg3,pavg3,cmin3,cmax3;
float root4,rmax4,amin4,amax4,sum4,wavg4,vavg4,oavg4,pavg4,cmin4,cmax4;
TC3_INIT();
CLRSCR();
CURPOS(3,0);
printf("Root Ticks Ratio MaxRatio AvgRatio LocalMinAvg LocalMaxAvg\n");
count=0;
rmax1=0.0;amin1=1.0;amax1=0.0;cmin1=1.0;cmax1=0.0;sum1=0.0;
rmax2=0.0;amin2=1.0;amax2=0.0;cmin2=1.0;cmax2=0.0;sum2=0.0;
rmax3=0.0;amin3=1.0;amax3=0.0;cmin3=1.0;cmax3=0.0;sum3=0.0;
rmax4=0.0;amin4=1.0;amax4=0.0;cmin4=1.0;cmax4=0.0;sum4=0.0;
while ( ++count!=0 && !kbhit() )
{
do num=rnd() + count;
while (num<=0.0);
CURPOS(1,0);
printf("The Number: %.9f\n",num);
root1=I87_FSQRT(num);
ratio = root1/root1 - 1;
if (ratio < 0.0) ratio = ratio * -1.0;
sum1=sum1+ratio;
if (ratio>rmax1) rmax1=ratio;
avg = sum1/count;
if ((count & 511) == 0)
{
if (wavg1vavg1 && vavg1>oavg1 && oavg1>pavg1 && pavg1>avg)
{
amax1=cmax1;
cmin1=1.0;
}
wavg1=vavg1;vavg1=oavg1;oavg1=pavg1;pavg1=avg;
}
if (avgcmax1) cmax1=avg;
if (avgamax1) amax1=avg;
CURPOS(4,0);
printf("%.9f\n",root1);
CURPOS(4,17);
printf("70\n");
CURPOS(4,20);
printf("%.9f\n",ratio);
CURPOS(4,32);
printf("%.9f\n",rmax1);
CURPOS(4,44);
printf("%.9f\n",avg);
CURPOS(4,56);
printf("%.9f\n",amin1);
CURPOS(4,68);
printf("%.9f\n",amax1);
root2=TC1_FSQRT(num);
ratio = root2/root1 - 1;
if (ratio < 0.0) ratio = ratio * -1.0;
sum2=sum2+ratio;
if (ratio>rmax2) rmax2=ratio;
avg = sum2/count;
if ((count & 511) == 0)
{
if (wavg2vavg2 && vavg2>oavg2 && oavg2>pavg2 && pavg2>avg)
{
amax2=cmax2;
cmin2=1.0;
}
wavg2=vavg2;vavg2=oavg2;oavg2=pavg2;pavg2=avg;
}
if (avgcmax2) cmax2=avg;
if (avgamax2) amax2=avg;
CURPOS(5,0);
printf("%.9f\n",root2);
CURPOS(5,17);
printf(" 7\n");
CURPOS(5,20);
printf("%.9f\n",ratio);
CURPOS(5,32);
printf("%.9f\n",rmax2);
CURPOS(5,44);
printf("%.9f\n",avg);
CURPOS(5,56);
printf("%.9f\n",amin2);
CURPOS(5,68);
printf("%.9f\n",amax2);
root3=TC2_FSQRT(num);
ratio = root3/root1 - 1;
if (ratio < 0.0) ratio = ratio * -1.0;
sum3=sum3+ratio;
if (ratio>rmax3) rmax3=ratio;
avg = sum3/count;
if ((count & 511) == 0)
{
if (wavg3vavg3 && vavg3>oavg3 && oavg3>pavg3 && pavg3>avg)
{
amax3=cmax3;
cmin3=1.0;
}
wavg3=vavg3;vavg3=oavg3;oavg3=pavg3;pavg3=avg;
}
if (avgcmax3) cmax3=avg;
if (avgamax3) amax3=avg;
CURPOS(6,0);
printf("%.9f\n",root3);
CURPOS(6,17);
printf("54\n");
CURPOS(6,20);
printf("%.9f\n",ratio);
CURPOS(6,32);
printf("%.9f\n",rmax3);
CURPOS(6,44);
printf("%.9f\n",avg);
CURPOS(6,56);
printf("%.9f\n",amin3);
CURPOS(6,68);
printf("%.9f\n",amax3);
root4=TC3_FSQRT(num);
ratio = root4/root1 - 1;
if (ratio < 0.0) ratio = ratio * -1.0;
sum4=sum4+ratio;
if (ratio>rmax4) rmax4=ratio;
avg = sum4/count;
if ((count & 511) == 0)
{
if (wavg4vavg4 && vavg4>oavg4 && oavg4>pavg4 && pavg4>avg)
{
amax4=cmax4;
cmin4=1.0;
}
wavg4=vavg4;vavg4=oavg4;oavg4=pavg4;pavg4=avg;
}
if (avgcmax4) cmax4=avg;
if (avgamax4) amax4=avg;
CURPOS(7,0);
printf("%.9f\n",root4);
CURPOS(7,17);
printf("30\n");
CURPOS(7,20);
printf("%.9f\n",ratio);
CURPOS(7,32);
printf("%.9f\n",rmax4);
CURPOS(7,44);
printf("%.9f\n",avg);
CURPOS(7,56);
printf("%.9f\n",amin4);
CURPOS(7,68);
printf("%.9f\n",amax4);
}
getch();
}
@echo off
cls
wcc386 C
tasm B /m9 /mx /z %1 %2 %3;
if errorlevel 1 goto h
wlink system pmodew file C, B
DEL ?.OBJ
:PMWLITE /C4 C.EXE
C
goto :z
:h
pause >nul:
:z