From: fobits@io.org (Frank Obits)
Newsgroups: rec.games.programmer
Subject: Fast Scrolling in Mode X
Date: 28 Dec 1994 06:52:59 -0500
This article describes a fast and efficient method of scrolling a
tiled world, using 320x200 Mode X. It will scroll an infinite distance
vertically, horizontally and diagonally. On my system, a 386/40 with
an ancient 8-bit video card, it has enough time to bounce nine small
(16x10) sprites around on the screen while scrolling at a steady 70
fps.
Probably I'll be flamed for posting such kindergarten stuff, but when
I took an interest in game programming (not that long ago) I found
such information surprisingly difficult to find.
The method described by Diana Gruber in the Action Arcade Adventure
Set scrolls in all directions and is easy to implement. The problem is
that whenever it reaches a tile boundary it stops to copy a whole
screenful of pixels back to the center of the page. On my system this
causes the normal 70 fps rate to miss a beat, producing a slight jerk
in the scrolling. Slowing down the framerate would cure this, but it
still struck me as inefficient to move all those pixels from hither to
yon.
I had high hopes for Dave Robert's PC Game Programming Explorer, which
concentrates on such low-level coding. Although it is an excellent
book in most respects, the scrolling method used is rather limited. It
moves beautifully in the vertical direction and can be easily modified
to scroll horizontally, but it can't go in both directions in the same
game - not unless you are willing to give up page-flipping.
This method is derived from some hints in the PCGPE which were
confirmed by a post here from Henric Steen. In an exchange of e-mail
he gave me a few more pointers... Thanks, Henric.
It isn't perfect, though. The big problem is that in it's present
version it is incompatible with a split screen. If the user wants to
see a status bar s/he will have to request one in a pop-up window, as
in the Commander Keen series. Henric says that it's easy to
incorporate a split screen, but despite his help I must admit that I
haven't caught on yet... but enough blather, on with the show!
Not everyone uses the same terminology, so I'll start by explaining
some names I use and a few functions in my Mode X library.
** Definitions **
Window - the "window" is the part of vram which is actually displayed
on the screen. In this method it is always 320x200 pixels.
Page - this is NOT the same as the window. The width of a page is set
by writing to the hardware, but the starting point and length are
simply determined by variables in the program.
Visible Page - the page in which the window is currently located.
Active Page - the page on which we are currently drawing or erasing
sprites.
Background Page - holds a copy of the tiled background.
Page Flipping - making the active page visible and vice-versa.
** Functions **
void window_at( unsigned pg_off, int x, int y )
This function writes to the Line Start and HPP registers to set the
visible window at x,y within the page which starts at pg_off.
void ltile_to_vram( char *tilearray, int tilenum,
unsigned pg_off, int x, int y )
Gets a tile from a linear array of tiles and writes it at position x,y
within the specified page.
void rect_vram_to_vram( unsigned src_off, unsigned dest_off
int x, int y, int hgt, int width )
Uses write mode #1 to copy a rectangular area of pixels from x,y on
the source page to the same position on the destination page.
** Initialization **
By default the length of a Mode X line is 80 addresses, or 320 pixels.
We want a page wide enough so that when the window is centered there
is a buffer on each side to hold one column of tiles, which is a total
width of 352 pixels or 88 addresses.
The height of a page will be 240 lines, which is 16 + 200 + 24. When
the window is in the initial position this leave a buffer for one tile
row at the top. An even number of 16x16 tiles won't fit on a 200-line
screen. After filling the screen with 12 rows we have 8 extra lines
hanging off the bottom. The 24-line buffer at the bottom allows for
these 8 lines plus another complete row of tiles.
Thus each page takes 240 * 88 = 21120 addresses, and three of them use
63360 addresses. You can put them adjacent to each other, but I spaced
them approximately evenly in vram.
So now we need to create and initialize some global variables:
unsigned
visible_off = 360, /* initial positions of pages */
active_off = 22208,
back_off = 44052;
int window_x = 16, /* position of window within page */
window_y = 16;
int world_x = 0, /* position of upper left tile in world */
world_y = 0;
After setting Mode X and the page width, fill all three windows with
the initial background. The tiles will extend all the way across the
page (22 tiles) and down 14 rows - one at the top and 12 1/2 for the
window.
set_mode_x();
set_page_width( 88 );
set_pallette( pal );
window_at( visible_off, window_x, window_y );
for( i=0; i<14; i++ )
put_tile_row( back_off, i, tiles );
/* copy to other pages */
rect_vram_to_vram( back_off, active_off, 0, 0, 352, 224 );
rect_vram_to_vram( back_off, visible_off, 0, 0, 352, 224 );
When scrolling North, East or West the method is simple. First go as
far as possible by moving the window within the page. When you run out
of valid data move *all* the pages upward or downward far enough to
accommodate a new row or column of tiles. Grab the tiles, copy to the
other pages and reposition the window within the page.
Moving South is just slightly different, because of the odd-sized
buffer. To make it easy to grab rows or columns of tiles we would like
to keep the top of the page aligned at a tile boundary. So after using
eight lines of the buffer ( ++window_y == 20 ) we grab a new row for
the bottom of the page, but don't reposition the pages until we've
gone down an even 16 lines ( window_y == 33 ).
Meanwhile, we're flipping pages and handling sprites. With a clean
copy of the tiled background on the background page, sprites can be
erased with a fast block copy, without regard to tile boundaries. This
is more efficient than the "dirty tile" method, which requires copying
a whole tile if a sprite overlaps even a few pixels in the corner.
By now it has probably occurred to you that we can't do this very
often before one of the pages goes right off the beginning or end of
the video segment.
If it's the background page, ignore it. The positions of the pages are
held in 16-bit unsigned ints, which will automaticly wrap around
between the ends of the segment. All of the routines that write or
copy pixels are presumably written in assembly. If you're using real
mode the index registers will also wrap, so that address a000:ffff is
adjacent to a000:0000. In protected mode I believe there's an assembly
directive to allow manipulation of the lower 16 bits of the index
registers, so you can have the same effect.
If the visible window becomes split, it's a more serious matter. In
most video cards the hardware which paints the screen also wraps at
the ends of the segment, but some SVGA cards don't do that.
The solution is simple. We are doing all of this in a loop that looks
something like this:
LOOP:
wait for retrace
swap( visible_off, active_off )
window_at( visible_off, window_x, window_y )
erase sprites from active page
check user input
do scrolling
if( active_off > 44416 )
swap( active_off, back_off )
draw sprites on active page
goto LOOP
Note that the scrolling is done after the sprites have been erased
>from the active page. At this point the active and background pages
are pixel-for-pixel identical, so if the scrolling has split the
active page, just swap it with the background page. If the scrolling
has split the visible page it won't take effect until the *next* time
it becomes visible. Before that it will take it's turn as the active
page, and we'll catch it then.
** Some Improvements **
This is both simple and efficient, but as I've described it so far it
still has one big problem. When it needs more tiles it first consults
the world map and copies them one-by-one to the background page. Then
it uses two block copies to transfer them to the other pages, all in
one frame. That's a lot to do in one frame - in fact it's almost all
the work the engine does. Out of curiosity I temporarily removed all
sprite handling and the wait-for-retrace function, so it did nothing
but scroll the screen as fast as it could. Under those conditions the
Watcom profiler said that execution time was distributed as follows:
45.5% ltile_to_vram()
49.9% rect_vram_to_vram()
4.6% everything else
Each new row or column is copied twice after it is grabbed, so getting
them into vram takes about twice as long as a copy, and between them
they account for almost all of the work of the scrolling.
So the first 15 pixels require almost no time, then all this gets
dumped into one frame... and that's not the worst. The worst case is
when it's scrolling diagonally and the tiles are aligned so that it
needs *both* a new row and column at the same time. Let's tackle these
problems one by one.
Do we really need to update all the pages at once? No, we don't. The
copy to the visible page can be put off until the next frame, when it
will be the active page. So instead of doing two copies we can just do
one and set a global variable to indicate that another is needed. For
instance, when scrolling East:
/* grab new row of tiles */
put_tile_col( active_off, 21, tiles );
/* copy to other pages */
rect_vram_to_vram( active_off, back_off, 336, 0, 16, 240 );
deferred_copy = COPY_RGT;
The deferred copy can be done right after the page flip, so the main
loop starts like this:
wait for retrace
swap( visible_off, active_off )
window_at( visible_off, window_x, window_y )
if( deferred_copy ) do copy
erase sprites from active page
That's a big improvement, but if we take the percentages of execution
time as arbitrary units of time, the worst-case diagonal scroll still
looks like this:
frame frame+1
Horizontal: 75 | 25 |
Vertical: 75 | 25 |
In the vast majority of games the user would never notice if the
screen moved a bit horizontally before starting the diagonal scroll,
so by deferring the vertical scroll we could spread it out to one of
these:
Horizontal: 75 | 25 | -- | (better)
Vertical: -- | 75 | 25 |
Horizontal: 75 | 25 | -- | -- | (best)
Vertical: -- | -- | 75 | 25 |
To implement this requires some additions to the code. First, when
scrolling diagonally the horizontal scroll must always be done first.
Then in the vertical scrolling functions we need to check if a
deferred copy is pending. If it is, do not import a new row.
void scroll_north( void )
{
if( --window_y < 0 )
{
/* check if we have deferred copy pending */
if( deferred_copy )
{
++window_y; /* cancel move */
return; /* wait until next time */
}
/* get new row, etc */
}
That will do for a one-frame delay, and it can be put off for one more
by defining the flags for the copies with two flags in one int.
#define COPY_TOP 0x11
#define COPY_BOT 0x21
#define COPY_LFT 0x31
#define COPY_RGT 0x41
#define NOT_YET 0x01
/* in main loop */
if( deferred_copy )
{
switch( deferred_copy )
{
case COPY_TOP :
rect_vram_to_vram( back_off, active_off, 0, 0, 352, 16 );
break;
/* other cases */
case NOT_YET :
deferred_copy = FALSE;
break;
}
deferred_copy &= 0x01;
}
So the first time through the loop it does the copy and clears all
except the low bit. The second time that is cleared too, opening the
door for the vertical scroll.
** The End (finally) **
So there it is. I make no claim to be the first to use this method,
not by a long way. It's pretty simple and has probably been used by
umpteen programmers for years. The only problem is that nobody
bothered to explain it in any kind of detail, or at least not where I
could find it.
If anybody can think of improvements - especially an elegant way to
add a split screen - I'll be more than happy to hear them. It would be
pushing my luck too far to post the full code for my little demo, but
if anyone would like it in e-mail, drop me a line.
-]Frank[-