[ale] linux byte alignment

Sun Aug 4 16:46:36 EDT 2002

What you have seen is that gcc is maintaining the stack alignment 
at a 16-byte boundary.  This is to prevent potential problems when 
using the Streaming SIMD (single instruction multiple data) 
Extensions (SSE) first introduced on Pentium III CPUs.  These 
extensions added some new instructions and a set of 8 128-bit 
registers (XMM0 to XMM7) to the CPU.   Some of the new instructions 
will cause a general protection fault when moving 128-bit data to or 
from memory which is not on a 16-byte boundary.  While there are 
alternative instructions which can access 128-bit data on arbitrary 
boundaries, they are less efficient.

In the example you gave, the stack had the following items on it:

 4 bytes:  return address
 4 bytes:  old frame pointer
12 bytes:  x[10]
 8 bytes:  y[5]

That's a total of 28 bytes, requiring an additional 4 bytes of 
padding for alignment to the nearest 16-byte boundary.  Therefore, to 
make room for x[10], y[5], plus padding, the stack pointer must be 
decremented by 12 + 8 + 4 = 24 bytes.  

In your example however, the stack pointer is decremented by 40 
rather than 24. The result is still properly aligned, but an 
additional 16 bytes of space is wasted.  This is just a peculiarity 
of the compiler version you are using.  I think you will find this 
waste is eliminated in the latest versions of gcc.

Incidentally, you can also observe stack alignment being performed 
when setting up function calls.  Some example code:

extern int foo (char bar);

int foobar (void)
{
   return foo (0);
}

In the resulting assembly code, you see %esp reduced by 8 which 
maintains stack alignment on function entry.  Then you see %esp 
reduced an additional 12 before pushing 4 more bytes on the stack and 
calling foo(), again maintaining alignment.  On return from the 
function call, 16 bytes are removed from the stack:

foobar:
        pushl %ebp
        movl %esp,%ebp
        subl $8,%esp
        addl $-12,%esp
        pushl $0
        call foo
        addl $16,%esp
        movl %eax,%edx
        movl %edx,%eax
        jmp .L2
        .p2align 4,,7
.L2:
        leave
        ret

All of this alignment behavior is controlled by the 
"-mpreferred-stack-boundary=" compiler option.  The default is for 
16-byte alignment, but the default can be overridden.

--Joe

-----Original Message-----
From:	Benjamin Dixon [SMTP:beatle at arches.uga.edu]
Sent:	Monday, July 29, 2002 3:26 PM
To:	ale at ale.org
Subject:	[ale] linux byte alignment

Hi all,

I'm trying to pry into linux byte alignment issues and assembly and I ran
across something I haven't figured out. My understanding is that alignment
is at one word (4 bytes) so I have the following function:

int main()
{
   char x[10];
   char y[5];
}

By my calculation, if the memory has to be alignment, x[10] will take up
12 bytes (ceiling of 2.5 words = 3, 3x4-bytes = 12). And likewise, the
y[5] will take up 8 bytes. So there's 20 bytes of excess memory laying
around? But when I run the program through gcc with the -S option, I get
the following:

..
main:
        pushl %ebp
        movl %esp,%ebp
        subl $40,%esp
.L2:
        movl %ebp,%esp
        popl %ebp
        ret
.Lfe1:
..

The question is, what's that 40? If I use different numbers for the array
sizes, I get a different number there, always divisible by 4 but always
greater than the number I expect. Anyone know why?

Ben

---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.