fastscrl.txt

From: fobits@io.org (Frank Obits)
Newsgroups: rec.games.programmer
Subject: Fast Scrolling in Mode X
Date: 28 Dec 1994 06:52:59 -0500

This  article  describes  a  fast  and efficient method of scrolling a
tiled world, using 320x200 Mode X. It will scroll an infinite distance
vertically,  horizontally  and diagonally. On my system, a 386/40 with
an ancient 8-bit video card, it has enough time to bounce  nine  small
(16x10)  sprites  around  on the screen while scrolling at a steady 70
fps.

Probably I'll be flamed for posting such kindergarten stuff, but  when
I  took  an  interest  in game programming (not that long ago) I found
such information surprisingly difficult to find.

The method described by Diana Gruber in the  Action  Arcade  Adventure
Set scrolls in all directions and is easy to implement. The problem is
that whenever it reaches a tile boundary it  stops  to  copy  a  whole
screenful  of pixels back to the center of the page. On my system this
causes the normal 70 fps rate to miss a beat, producing a slight  jerk
in  the  scrolling. Slowing down the framerate would cure this, but it
still struck me as inefficient to move all those pixels from hither to
yon.

I had high hopes for Dave Robert's PC Game Programming Explorer, which
concentrates on such low-level coding. Although  it  is  an  excellent
book in most respects, the scrolling method used is rather limited. It
moves beautifully in the vertical direction and can be easily modified
to scroll horizontally, but it can't go in both directions in the same
game - not unless you are willing to give up page-flipping.

This method is derived  from  some  hints  in  the  PCGPE  which  were
confirmed  by  a post here from Henric Steen. In an exchange of e-mail
he gave me a few more pointers... Thanks, Henric.

It isn't perfect, though. The big problem  is  that  in  it's  present
version  it is  incompatible with a split screen. If the user wants to
see a status bar s/he will have to request one in a pop-up window,  as
in   the  Commander  Keen  series.  Henric  says  that  it's  easy  to
incorporate a split screen, but despite his help I must admit  that  I
haven't caught on yet... but enough blather, on with the show!


Not  everyone  uses  the same terminology, so I'll start by explaining
some names I use and a few functions in my Mode X library.

** Definitions **

Window - the "window" is the part of vram which is actually  displayed
on the screen. In this method it is always 320x200 pixels.

Page  - this is NOT the same as the window. The width of a page is set
by writing to the hardware, but the  starting  point  and  length  are
simply determined by variables in the program.

Visible Page - the page in which the window is currently located.

Active Page - the page on which we are currently drawing or erasing
sprites.

Background Page - holds a copy of the tiled background.

Page Flipping - making the active page visible and vice-versa.

** Functions **

void window_at( unsigned pg_off, int x, int y )

This function writes to the Line Start and HPP registers  to  set  the
visible window at x,y within the page which starts at pg_off.

void ltile_to_vram( char *tilearray, int tilenum,
                    unsigned pg_off, int x, int y )

Gets a tile from a linear array of tiles and writes it at position x,y
within the specified page.

void rect_vram_to_vram( unsigned src_off, unsigned dest_off
                        int x, int y, int hgt, int width )

Uses  write mode #1  to copy a rectangular area of pixels from  x,y on
the source page to the same position on the destination page.

** Initialization **

By default the length of a Mode X line is 80 addresses, or 320 pixels.
We want a page wide enough so that when the window is  centered  there
is a buffer on each side to hold one column of tiles, which is a total
width of 352 pixels or 88 addresses.

The height of a page will be 240 lines, which is 16 + 200 +  24.  When
the window is in the initial position this leave a buffer for one tile
row at the top. An even number of 16x16 tiles won't fit on a  200-line
screen.  After  filling  the screen with 12 rows we have 8 extra lines
hanging off the bottom. The 24-line buffer at the  bottom  allows  for
these 8 lines plus another complete row of tiles.

Thus each page takes 240 * 88 = 21120 addresses, and three of them use
63360 addresses. You can put them adjacent to each other, but I spaced
them approximately evenly in vram.

So now we need to create and initialize some global variables:

unsigned
   visible_off =   360,    /* initial positions of pages */
   active_off  = 22208,
   back_off    = 44052;

int window_x = 16,         /* position of window within page */
    window_y = 16;

int world_x = 0,           /* position of upper left tile in world */
    world_y = 0;

After  setting  Mode X and the page width, fill all three windows with
the initial background. The tiles will extend all the way  across  the
page  (22  tiles) and down 14 rows - one at the top and 12 1/2 for the
window.

  set_mode_x();
  set_page_width( 88 );
  set_pallette( pal );
  window_at( visible_off, window_x, window_y );

  for( i=0; i<14; i++ )
     put_tile_row( back_off, i, tiles );

  /* copy to other pages */
  rect_vram_to_vram( back_off, active_off,  0, 0, 352, 224 );
  rect_vram_to_vram( back_off, visible_off, 0, 0, 352, 224 );

When scrolling North, East or West the method is simple. First  go  as
far as possible by moving the window within the page. When you run out
of valid data move *all* the pages upward or downward  far  enough  to
accommodate a new row or column of tiles.  Grab the tiles, copy to the
other pages and reposition the window within the page.

Moving South is just slightly  different,  because  of  the  odd-sized
buffer. To make it easy to grab rows or columns of tiles we would like
to keep the top of the page aligned at a tile boundary. So after using
eight  lines  of the buffer ( ++window_y == 20 ) we grab a new row for
the bottom of the page, but don't reposition  the  pages  until  we've
gone down an even 16 lines ( window_y == 33 ).

Meanwhile,  we're  flipping  pages  and handling sprites. With a clean
copy of the tiled background on the background page,  sprites  can  be
erased with a fast block copy, without regard to tile boundaries. This
is more efficient than the "dirty tile" method, which requires copying
a whole tile if a sprite overlaps even a few pixels in the corner.

By  now  it  has  probably  occurred to you that we can't do this very
often before one of the pages goes right off the beginning or  end  of
the video segment.

If it's the background page, ignore it. The positions of the pages are
held in 16-bit  unsigned ints,  which  will  automaticly  wrap  around
between  the  ends  of  the segment. All of the routines that write or
copy pixels are presumably written in assembly. If you're  using  real
mode  the index registers will also wrap, so that address a000:ffff is
adjacent to a000:0000. In protected mode I believe there's an assembly
directive  to  allow  manipulation  of  the lower 16 bits of the index
registers, so you can have the same effect.

If the visible window becomes split, it's a more  serious  matter.  In
most  video  cards  the hardware which paints the screen also wraps at
the ends of the segment, but some SVGA cards don't do that.

The solution is simple. We are doing all of this in a loop that  looks
something like this:

LOOP:
  wait for retrace
  swap( visible_off, active_off )
  window_at( visible_off, window_x, window_y )
  erase sprites from active page
  check user input
  do scrolling
  if( active_off > 44416 )
      swap( active_off, back_off )
  draw sprites on active page
  goto LOOP

Note  that  the  scrolling  is done after the sprites have been erased
>from the active page. At this point the active  and  background  pages
are  pixel-for-pixel  identical,  so  if  the  scrolling has split the
active page, just swap it with the background page. If  the  scrolling
has  split the visible page it won't take effect until the *next* time
it becomes visible. Before that it will take it's turn as  the  active
page, and we'll catch it then.

** Some Improvements **

This  is both simple and efficient, but as I've described it so far it
still has one big problem. When it needs more tiles it first  consults
the  world map and copies them one-by-one to the background page. Then
it uses two block copies to transfer them to the other pages,  all  in
one  frame.  That's a lot to do in one frame - in fact it's almost all
the work the engine does. Out of curiosity I temporarily  removed  all
sprite  handling  and the wait-for-retrace function, so it did nothing
but scroll the screen as fast as it could. Under those conditions  the
Watcom profiler said that execution time was distributed as follows:

45.5%  ltile_to_vram()
49.9%  rect_vram_to_vram()
 4.6%  everything else

Each new row or column is copied twice after it is grabbed, so getting
them into vram takes about twice as long as a copy, and  between  them
they account for almost all of the work of the scrolling.

So  the  first  15  pixels  require almost no time, then all this gets
dumped into one frame... and that's not the worst. The worst  case  is
when  it's  scrolling  diagonally and the tiles are aligned so that it
needs *both* a new row and column at the same time. Let's tackle these
problems one by one.

Do  we  really need to update all the pages at once? No, we don't. The
copy to the visible page can be put off until the next frame, when  it
will be the active page. So instead of doing two copies we can just do
one and set a global variable to indicate that another is needed.  For
instance, when scrolling East:

   /* grab new row of tiles */
   put_tile_col( active_off, 21, tiles );

   /* copy to other pages */
  rect_vram_to_vram( active_off, back_off, 336, 0, 16, 240 );
  deferred_copy = COPY_RGT;

The  deferred  copy can be done right after the page flip, so the main
loop starts like this:

  wait for retrace
  swap( visible_off, active_off )
  window_at( visible_off, window_x, window_y )
  if( deferred_copy )  do copy
  erase sprites from active page

That's a big improvement, but if we take the percentages of  execution
time  as arbitrary units of time, the worst-case diagonal scroll still
looks like this:

            frame   frame+1
Horizontal:   75   |  25   |
Vertical:     75   |  25   |

In the vast majority of games the  user  would  never  notice  if  the
screen  moved  a bit horizontally before starting the diagonal scroll,
so by deferring the vertical scroll we could spread it out to  one  of
these:

Horizontal:   75  |  25  |  --  |           (better)
Vertical:     --  |  75  |  25  |

Horizontal:   75  |  25  |  --  |  --  |    (best)
Vertical:     --  |  --  |  75  |  25  |

To  implement  this  requires  some additions to the code. First, when
scrolling diagonally the horizontal scroll must always be done  first.
Then  in  the  vertical  scrolling  functions  we  need  to check if a
deferred copy is pending. If it is, do not import a new row.

void scroll_north( void )
  {
   if( --window_y < 0 )
     {
      /* check if we have deferred copy pending */
      if( deferred_copy )
        {
         ++window_y;               /* cancel move */
         return;                   /* wait until next time */
        }

   /* get new row, etc */
  }

That will do for a one-frame delay, and it can be put off for one more
by defining the flags for the copies with two flags in one int.

#define  COPY_TOP  0x11
#define  COPY_BOT  0x21
#define  COPY_LFT  0x31
#define  COPY_RGT  0x41
#define  NOT_YET   0x01

  /* in main loop */
  if( deferred_copy )
    {
     switch( deferred_copy )
       {
        case COPY_TOP :
          rect_vram_to_vram( back_off, active_off,  0, 0, 352, 16 );
          break;

        /* other cases */

        case NOT_YET  :
          deferred_copy = FALSE;
          break;
       }
     deferred_copy &= 0x01;
    }

So  the  first  time  through the loop it does the copy and clears all
except the low bit. The second time that is cleared too,  opening  the
door for the vertical scroll.

** The End (finally) **

So  there  it  is. I make no claim to be the first to use this method,
not by a long way. It's pretty simple and has probably  been  used  by
umpteen  programmers  for  years.  The  only  problem  is  that nobody
bothered to explain it in any kind of detail, or at least not where  I
could find it.

If  anybody  can  think of improvements - especially an elegant way to
add a split screen - I'll be more than happy to hear them. It would be
pushing my luck too far to post the full code for my little demo,  but
if anyone would like it in e-mail, drop me a line.

  -]Frank[-