SC5 - CO:Sqrt

3d: Amit IronHawk irt poligonsorrendezesre, az valoban gyorsan
   megvalosithato, csak perspektivikus lekepezes eseten nem biztos, hogy
   mindig jo eredmenyt fog adni. Konvex testekkel nincs is gond, csak
   amikor pl. tobb testet akarok sorbarendezni lathatosag alapjan, akkoor
    nem ad jo eredmenyt a Z alapjan torteno rendezes, mivel az igazan
   tokeletes eredmeny a nezoponttol valo tavolsag alapjan valo rendezesbol
   jonne ki, csak ahhoz kellene 3 szorzas is... 
        En az alabbival probalkoztam, igen nagy %-bn hellyes eredmenyt ad:
        s = abs(x) + abs(y) + z*c
        Ahol x,y,z a pont koordinatai (az origo a kepernyo kozeppontja)
    es c pedig egy szorzo tenyezo (alt. c=2, c=4 jo). Ezutan mar lehey
    s alapjan rendezni... Tudna valaki mondani eg jobbat ?

        Thanx. TeeM


 Valaki egy gyokvono algoritmus utan erdeklodott, ezt nem hagyhattam szo 
nelkul:

 Az a[n+1]=(N/a[n]+a[n])/2 sorozat a negyzetgyok N-hez konvergal, ha 
megfelelo a[0] tagot valasztunk akkor a a[3],a[4] esetleg a[5] mar eleg 
kozel lesz hozza, ezt kihasznalva ime a rutin:
 EAX <- N

                bsr     ecx,eax
                mov     ebx,eax
                shr     ecx,1
                mov     edi,eax
                shr     ebx,cl    ;ebx <- a[0]=N fele (nem N/2)
                xor     edx,edx
                div     ebx
                add     eax,ebx
                shr     eax,1     ;eax <- a[1]
                cmp     eax,ebx
                je      ready
l1:             mov     ebx,eax   ;ebx <- a[n]
                xor     edx,edx
                mov     eax,edi
                div     ebx
                add     eax,ebx
                shr     eax,1     ;eax <- a[n+1]
                cmp     eax,ebx
                jb      l1
                mov     eax,ebx
ready:          ...

 sqrt(N) -> EAX

 A[0]-nak N binaris alakjanak felso reszet valasztom, ha N x bites akkor 
A[0] x/2 bites lesz, igy a[0] nagysagrendje megegyezik a sqrt(N) 
nagysagrendjevel...
 A rutin a gyokhoz legkozelebbi kisebb vagy egyenlo egeszet adja vissza...
 Ha esetleg ez nem eleg pontos akkor minimalis valtoztatassal 
kiegeszitheto, hogy pontosabb legyen...
 Nem tudom, hogy van-e gyorsabb algoritmus, szerintem az a[0] tagot kell 
esetleg valahogy trukkosebben megvalasztani, igy atlag 3.72 osztas kell a 
gyok megkeresesehez...
 
 Remelem nem irtam valamit hibasan...
                                                Deity


>  Ha egy byte-on elfer a gyok akkor pedig csak 16 bites szamrol lehet szo, 
> arra tenyleg egyszerubb egy tablazatott csinalni, a negyzetszamokkal es 
> abban egy binaris keresesel kivalasztani a megfelelo szamot,ez elege 
> egyszeru ezt nem is reszletezem...

Ime egy tablazat nelkuli megoldas:
(BX - a szam, AX - a szam gyoke)

        mov ax,-1
@@1:
        add ax,2        ;vagy   inc ax
                        ;       inc ax  ha rovidebb kodot szeretnel
        sub bx,ax
        jnc @@1
        shr ax,1

Igy a gyok egesz reszet kapod, de a kerekites is csak 2 plusz utasitas:

        add bx,ax
        adc ax,0

1024-ig tuti gyorsabb barmely osztasos rutinnal, de lehet, hogy meg
nagyobb szamokra is jobb sebessegu...

Udv, TomCat/Abaddon


  A gyokvono rutin-rol jutott eszmbe:
  A gyokvonast eleg surun arra szoktak hasznalni, hogy egy-egy vektor 
hosszat szamoljak ki vele a pitagorasz tetellel... ami ugye 2D-ben 
sqrt(x^2+Y^2); 3D-ben sqrt(x^2+y^2+z^2)...
  Ez helyet ajanlok egy masik modszert: 
 -2d-ben
     a=0.328426
     x=abs(x);y=abs(y);
     ha x>y akkor hossz=x+a*y
     kulonben     hossz=y+a*x
     a keplet eleg jol kozelitti a gyokos megoldast:
     max hiba 6%, atlagos hiba: 3.3%
 -3d-ben
     a=0.301272
     b=0.286257
     x=abs(x);y=abs(y);z=abs(z);
     ugy cserelgetjuk a koordinatakat, hogy x>=y>=z legyen
     hossz=x+a*y+b*z
     
     max hiba 8.3%, atlagos hiba: 3.7%

 Ha a koordinatak, nem valtoznak csak bizonyos korlatok kozott akkor a 
szorzasok elvegezhetok egy-egy elore elkeszitett tablazat segitsegevel...
 A konstansok meghatarozasa az 'nagyobb matematikat aparatust' igenyel 
azt most nem reszleteznem. Sajnos a 3d-s esetben a derive nem volt 
hajlando kiintegralni nehany osszefuggest, igy kenytelen voltam kozelito 
algoritmussal meghatarozni a ket konstans, so lehet, hogy alkalmasabb 
konstansok is talalhatok...

 bye
                                        Deity


Persze masodfokban jobb eredmenyt lehet elerni, nem? en ezt a 3-8 %-ot 
eleg soknak tartom, persze nyilvan ez alkalmazastol fugg. Te mire 
hasznalod ezt nagy sikerrel (ugy ertem, hol eleg ez a pontossag?)

Perla


RTNEWS7B

I've come up with a fast approximation to
3D Euclidean distance ( sqrt(dx*dx+dy*dy+dz*dz) ).
  (It's probably not original, but .....)

1) find these 3 values: abs(dx), abs(dy), abs(dz).

2) Sort them (3 compares, 0-3 swaps)

3) Approx E.D. = max + (1/4)med + (1/4)min.
                        (error: +/- 13%)

        max +  (5/16)med + (1/4)min has  9% error.
        max + (11/32)med + (1/4)min has  8% error.

As you can see, only shifts & adds are used, and it can be done with integer
arithmetic.  It could be used in ray tracing as a preliminary test before using
the exact form.


RTNEWS7C

    Euclidean distance calculation:  Iterative approaches work great on this
problem.  A short summary (and code for one) is in Tom Duff's SIGGRAPH '84
Tutorial on Numerical Analysis (a terrific article, by the way, as is his
spline piece in the same place) most of which he got from Moler and Morrison's
"Replacing Square Roots by Pythagorean Sums" IBM Journal of Research and
Development, 27/6, Nov. 1983.  The method he describes has cubic convergence,
so it works well to unroll the loop and do a set number of iterations.  4
iterations (2/, 4*, 2+) yield 62 digits of precision.


USENET

The referenced article asks about computing Euclidean distance in
fixed-point -- the problem being that you don't want to compute
a*a+b*b because overflow is very likely.  Here is a description
of an algorithm that avoids harmful overflow and underflow, and
that can probably be adapted for a fixed-point application without
too much trouble.  Note that I wrote the following text 10 years
ago (except that I rewrote the code in ANSI C), and its discussion
of machine speeds is probably out of date.


[Moler-Morrison 83] describes an algorithm for computing sqrt(a*a+b*b)
which is fast, robust and portable.  The naive approach to this problem (i.e.
just writing sqrt(a*a+b*b)) has the problem that for many cases when the result
is a representable floating point number, the intermediate results may overflow
or underflow.  This seriously restricts the usable range of a and b.
Moler and Morrison's method never causes harmful overflow or underflow when the
result is in range.  The algorithm has cubic convergence, and depending on
implementation details may even be faster than sqrt(a*a+b*b).
(Warning: before using this or any other numerical algorithm, be sure
to measure its performance.  On my VAX 750,
this algorithm runs 25% faster than the standard sqrt routine, but 18% slower
than a highly optimized sqrt designed by W. Kahan of UC Berkeley.  However, the
Moler and Morrison algorithm's domain is unrestricted, which may still make it
superior for some applications.)

Here is a C function that implements their algorithm:

double hypot(double p, q){
        double r, s;
        if(p<0) p = -p;
        if(q<0) q = -q;
        if(p
USENET

I use 64 bit intermediate results. I have a 32->64 bit square & accumulate
routine and a 64 bit square root. Of course I try to avoid these accurate
versions whenever I can. Graphics Gems I has a nice approximation of the
hypotenuse that can be used to speed up tests of the "distance < radius"
type, where you would normally use a square root to calculate the distance.

/*
A Fast Approximation to the Hypotenuse
by Alan Paeth
from "Graphics Gems", Academic Press, 1990
*/

int idist(x1, y1, x2, y2)
        int x1, y1, x2, y2;
        {
/*
 * gives approximate distance from (x1,y1) to (x2,y2)
 * with only overestimations, and then never by more
 * than (9/8) + one bit uncertainty.
 */
        if ((x2 -= x1) < 0) x2 = -x2;
        if ((y2 -= y1) < 0) y2 = -y2;
        return (x2 + y2 - (((x2>y2) ? y2 : x2) >> 1) );
        }


Van meg vmi amit erdemes gyorsitani? Gyokvonas tablaval meg hasonlokra
gondolok. Ja a gyoktablara van vkinek vmi jol mukodo otlete? En arra
gondoltam hogy nagysagrendenkent erdemes csinalni egy-egy tablat...
Szoval nem linearis az egesz csak szakaszonkent (0..1..10..100..1000..)
Hozzaszolas?
                                                        Greg


nekem reg volt egy (nem is olyan jo) otletem:
(csak egeszekre jo es csak 1FF negyzeteig)
 szoval,legeneralunk egy tablat 1FF negyzeteig,de csak az elso bytejat taroljuk
 na most 100h negyzete 10000h
  ezaltal ami nagyobb 10000h nal annak a gyokenek az MSB-je 1...
  +:igy eleg egyetlen byteba letarolni egy gyokerteket
  -:csak x-ig jo...   x=1ff*1ff

na? teccik? he?:)
                                        Lorenzo/DarkSiDe


Biztos mindenki latta a Pan sk-iflic.zip RayTrace introjat.

O adta az otletet, hogy ugyanugy, mint ahogy a exponencialis
fuggvenyt mar lecsereltem szorzasokra, talan a gyokvonast
is le lehetne cserelni egy gyorsabb rutinra...

Azaz a kerdesem, vagyis a feleadat a kovetkezo:

??? Lehet-e irni gyorsabb gyokvono rutint, mint az FSQRT
a szimpla pontossag megtartasa mellett ???

(azaz 6-7 ertekes jegy maradna a gyokben).

Gondolok itt a Newton iteraciora es
a kulonfele gyorsitasaira (Aitiken stb.)

Tehat a vegcel 70 ciklusnal gyorsabb FSQRT rutint irni
egyszeru pontossag mellett!

Udv, TomCat


Here is the code of approximated sqrt (btw, fsqrt(0) = 5.2e-20, not 0.5 as i
said before...  hat error was in the previous version, not in this).

Note: Written for Watcom C++ 11.0

inline float fsqrt(float x) {
  float y;
  _asm {
    mov ecx, x
    and ecx, 0x7f800000
    fld x
    mov eax, ecx
    add ecx, 0x3f800000
    or  ecx, 0x00800000
    shr ecx, 1
    mov y, eax
    and ecx, 0x3f800000
    fld y
    fadd
    fstp y
    and [y], 0x007fffff
    or  [y], ecx
  }
  return y;
}

Maximum error is reached in x=2^n, with n integer and odd.
Correct results are in x=2^n, with n even.

Send me your comments ;)
                                                        *PaN!*


Ez sajna hibas.
Talan az  or  ecx, 0x00800000 helyett  or  eax, 0x00800000  kellene!

De megprobaltam en is megirni a sajat FSQRT rutinjaim.
Itt sorakoznak:
- ASM FSQRT rutinok (5 db)
- C teszt program a gyokvonasok pontossagara
- fordito batch fajl WATCOM C-hez


.586
LOCALS

_TEXT   SEGMENT BYTE PUBLIC USE32 'CODE'
    ASSUME  CS:_TEXT, DS:DGROUP

PUBLIC I87_FSQRT
I87_FSQRT:
                FSQRT
                RETN

PUBLIC TC1_FSQRT
TC1_FSQRT:
                FSTP    SZAM
                MOV     EAX,SZAM
                ADD     EAX,3F800000H
                SHR     EAX,1
                MOV     SZAM,EAX
                FLD     SZAM
                RETN

PUBLIC TC2_FSQRT
TC2_FSQRT:
                FST     SZAM
                MOV     EAX,SZAM
                ADD     EAX,3F800000H
                SHR     EAX,1
                MOV     SZAM,EAX
                FDIV    SZAM
                FSUBR   SZAM
                FMUL    HALF
                FSUBR   SZAM
                RETN

PUBLIC PAN_FSQRT
PAN_FSQRT:
                FST     SZAM
                MOV     ECX,SZAM
                AND     ECX,7F800000H
                MOV     EAX,ECX
                ADD     ECX,3F800000H
                OR      EAX,00800000H
                SHR     ECX,1
                MOV     SZAM,EAX
                AND     ECX,3F800000H
                FADD    SZAM
                FSTP    SZAM
                AND     SZAM,007FFFFFH
                OR      SZAM,ECX
                FLD     SZAM
                RETN

PUBLIC TC3_INIT
TC3_INIT:
                XOR     EDI,EDI
                MOV     SZAM,3F800000H
                MOV     ECX,16384
@@1:
                FLD     SZAM
                FSQRT
                ADD     SZAM,00000200H
                FSTP    GYOK
                MOV     EAX,GYOK
                SHR     EAX,7
                MOV     TC3_TAB[EDI],AX
                ADD     EDI,2
                DEC     ECX
                JNZ     @@1
                MOV     SZAM,3F800000H
                MOV     ECX,16384
@@2:
                FLD     SZAM
                FADD    ST,ST
                FSQRT
                ADD     SZAM,00000200H
                FSTP    GYOK
                MOV     EAX,GYOK
                SHR     EAX,7
                MOV     TC3_TAB[EDI],AX
                ADD     EDI,2
                DEC     ECX
                JNZ     @@2
                RETN

PUBLIC TC3_FSQRT
TC3_FSQRT:
                FSTP    SZAM
                MOV     EAX,SZAM
                ADD     EAX,3F800000H
                MOV     ECX,0FFFEH
                SHR     EAX,8
                AND     ECX,EAX
                MOV     AX,TC3_TAB[ECX]
                SHL     EAX,7
                MOV     SZAM,EAX
                FLD     SZAM
                RETN

_TEXT   ENDS

DGROUP GROUP _DATA,_BSS

_DATA   SEGMENT DWORD PUBLIC USE32 'DATA'
HALF    DD      0.5
_DATA   ENDS

_BSS    SEGMENT DWORD PUBLIC UNINIT USE32 'BSS'
SZAM    DD      ?
GYOK    DD      ?
TC3_TAB DW      65536 DUP(?)
_BSS    ENDS

END


#include 
#include 

#define rnd()   (((float) rand())/RAND_MAX)

extern float I87_FSQRT(float);
#pragma aux I87_FSQRT "*" parm [8087] value [8087] modify exact
extern float TC1_FSQRT(float);
#pragma aux TC1_FSQRT "*" parm [8087] value [8087] modify exact [EAX]
extern float TC2_FSQRT(float);
#pragma aux TC2_FSQRT "*" parm [8087] value [8087] modify exact [EAX]
extern float PAN_FSQRT(float);
#pragma aux PAN_FSQRT "*" parm [8087] value [8087] modify exact [EAX ECX]
extern void TC3_INIT(void);
#pragma aux TC3_INIT "*" modify exact [EAX ECX EDI]
extern float TC3_FSQRT(float);
#pragma aux TC3_FSQRT "*" parm [8087] value [8087] modify exact [EAX ECX]

void main (void) {

void CLRSCR(void);
#pragma aux CLRSCR modify exact [EAX EDI ECX] =\
"        MOV     EAX,0700070007000700H"\
"        MOV     ECX,1000"\
"        MOV     EDI,0B8000H"\
"        REP     STOSD"

void CURPOS(char,char);
#pragma aux CURPOS parm [DH] [DL] modify exact [AX BX DX]=\
"        MOV     AH,2"\
"        MOV     BH,0"\
"        INT     10H"

float num,ratio,avg;
int count;
float root1,rmax1,amin1,amax1,sum1,wavg1,vavg1,oavg1,pavg1,cmin1,cmax1;
float root2,rmax2,amin2,amax2,sum2,wavg2,vavg2,oavg2,pavg2,cmin2,cmax2;
float root3,rmax3,amin3,amax3,sum3,wavg3,vavg3,oavg3,pavg3,cmin3,cmax3;
float root4,rmax4,amin4,amax4,sum4,wavg4,vavg4,oavg4,pavg4,cmin4,cmax4;

TC3_INIT();

CLRSCR();
CURPOS(3,0);
printf("Root          Ticks Ratio       MaxRatio    AvgRatio    LocalMinAvg LocalMaxAvg\n");

count=0;
rmax1=0.0;amin1=1.0;amax1=0.0;cmin1=1.0;cmax1=0.0;sum1=0.0;
rmax2=0.0;amin2=1.0;amax2=0.0;cmin2=1.0;cmax2=0.0;sum2=0.0;
rmax3=0.0;amin3=1.0;amax3=0.0;cmin3=1.0;cmax3=0.0;sum3=0.0;
rmax4=0.0;amin4=1.0;amax4=0.0;cmin4=1.0;cmax4=0.0;sum4=0.0;

while ( ++count!=0 && !kbhit() )
  {

  do num=rnd() + count;
  while (num<=0.0);

  CURPOS(1,0);
  printf("The Number: %.9f\n",num);

  root1=I87_FSQRT(num);

  ratio = root1/root1 - 1;
  if (ratio < 0.0) ratio = ratio * -1.0;
  sum1=sum1+ratio;
  if (ratio>rmax1) rmax1=ratio;
  avg = sum1/count;
  if ((count & 511) == 0)
  {
    if (wavg1vavg1 && vavg1>oavg1 && oavg1>pavg1 && pavg1>avg)
    {
      amax1=cmax1;
      cmin1=1.0;
    }
    wavg1=vavg1;vavg1=oavg1;oavg1=pavg1;pavg1=avg;
  }
  if (avgcmax1) cmax1=avg;
  if (avgamax1) amax1=avg;

  CURPOS(4,0);
  printf("%.9f\n",root1);
  CURPOS(4,17);
  printf("70\n");
  CURPOS(4,20);
  printf("%.9f\n",ratio);
  CURPOS(4,32);
  printf("%.9f\n",rmax1);
  CURPOS(4,44);
  printf("%.9f\n",avg);
  CURPOS(4,56);
  printf("%.9f\n",amin1);
  CURPOS(4,68);
  printf("%.9f\n",amax1);

  root2=TC1_FSQRT(num);

  ratio = root2/root1 - 1;
  if (ratio < 0.0) ratio = ratio * -1.0;
  sum2=sum2+ratio;
  if (ratio>rmax2) rmax2=ratio;
  avg = sum2/count;
  if ((count & 511) == 0)
  {
    if (wavg2vavg2 && vavg2>oavg2 && oavg2>pavg2 && pavg2>avg)
    {
      amax2=cmax2;
      cmin2=1.0;
    }
    wavg2=vavg2;vavg2=oavg2;oavg2=pavg2;pavg2=avg;
  }
  if (avgcmax2) cmax2=avg;
  if (avgamax2) amax2=avg;

  CURPOS(5,0);
  printf("%.9f\n",root2);
  CURPOS(5,17);
  printf(" 7\n");
  CURPOS(5,20);
  printf("%.9f\n",ratio);
  CURPOS(5,32);
  printf("%.9f\n",rmax2);
  CURPOS(5,44);
  printf("%.9f\n",avg);
  CURPOS(5,56);
  printf("%.9f\n",amin2);
  CURPOS(5,68);
  printf("%.9f\n",amax2);

  root3=TC2_FSQRT(num);

  ratio = root3/root1 - 1;
  if (ratio < 0.0) ratio = ratio * -1.0;
  sum3=sum3+ratio;
  if (ratio>rmax3) rmax3=ratio;
  avg = sum3/count;
  if ((count & 511) == 0)
  {
    if (wavg3vavg3 && vavg3>oavg3 && oavg3>pavg3 && pavg3>avg)
    {
      amax3=cmax3;
      cmin3=1.0;
    }
    wavg3=vavg3;vavg3=oavg3;oavg3=pavg3;pavg3=avg;
  }
  if (avgcmax3) cmax3=avg;
  if (avgamax3) amax3=avg;

  CURPOS(6,0);
  printf("%.9f\n",root3);
  CURPOS(6,17);
  printf("54\n");
  CURPOS(6,20);
  printf("%.9f\n",ratio);
  CURPOS(6,32);
  printf("%.9f\n",rmax3);
  CURPOS(6,44);
  printf("%.9f\n",avg);
  CURPOS(6,56);
  printf("%.9f\n",amin3);
  CURPOS(6,68);
  printf("%.9f\n",amax3);

  root4=TC3_FSQRT(num);

  ratio = root4/root1 - 1;
  if (ratio < 0.0) ratio = ratio * -1.0;
  sum4=sum4+ratio;
  if (ratio>rmax4) rmax4=ratio;
  avg = sum4/count;
  if ((count & 511) == 0)
  {
    if (wavg4vavg4 && vavg4>oavg4 && oavg4>pavg4 && pavg4>avg)
    {
      amax4=cmax4;
      cmin4=1.0;
    }
    wavg4=vavg4;vavg4=oavg4;oavg4=pavg4;pavg4=avg;
  }
  if (avgcmax4) cmax4=avg;
  if (avgamax4) amax4=avg;

  CURPOS(7,0);
  printf("%.9f\n",root4);
  CURPOS(7,17);
  printf("30\n");
  CURPOS(7,20);
  printf("%.9f\n",ratio);
  CURPOS(7,32);
  printf("%.9f\n",rmax4);
  CURPOS(7,44);
  printf("%.9f\n",avg);
  CURPOS(7,56);
  printf("%.9f\n",amin4);
  CURPOS(7,68);
  printf("%.9f\n",amax4);

  }

getch();

}


@echo off
cls
wcc386 C
tasm B /m9 /mx /z %1 %2 %3;
if errorlevel 1 goto h
wlink system pmodew file C, B
DEL ?.OBJ
:PMWLITE /C4 C.EXE
C
goto :z
:h
pause >nul:
:z