This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Re: powerpc new PLT and GOT
On Wed, May 11, 2005 at 11:37:29PM -0700, Richard Henderson wrote:
> On Thu, May 12, 2005 at 03:38:14PM +0930, Alan Modra wrote:
> > Not if the plt call stubs are modified to copy their got pointer to,
> > say, r12. Or better, if PLTresolve loads its own got pointer, like
> > this:
>
> Or better, ensure that r0 (or r12 if r0 can't be used in the appropriate
> addressing modes) still contains a copy of ctr, which means that it
> contains a copy of PLTresolve. Then use pc-relative references to your
> got entries instead of got-relative.
Yes, that would be even nicer. Thanks for checking over the ABI
proposal. Hmm, your idea about ctr triggered some more ideas..
The one thing that I'm a little unhappy about with the new plt call
scheme is that
# ith PLT code stub.
addis 11,30,(plt+(i-1)*4-got)@ha
lwzu 0,(plt+(i-1)*4-got)@l(11)
mtctr 0
bctr
is slower than the old plt call scheme, which allowed ld.so to optimise
.plt to simple branches. Steve Munroe improved it a little by
suggesting that when plt and got are close enough we could reduce it to
lwz 0,(plt+(i-1)*4-got)(30)
mtctr 0
bctr
but that loses r11 as an index into the plt. So each plt call stub
needs a different entry into PLTresolve in order to differentiate plt
entries. Steve suggested that each entry would load r11, using
"li 11,(i-1)*4; b PLTresolve" as is done with the PowerPC64 .glink.
Combining Steve's idea with yours about ctr gets me to
# ith PLT code stub.
addis 11,30,(plt+(i-1)*4-got)@ha
lwz 11,(plt+(i-1)*4-got)@l(11)
mtctr 11
bctr
# or, if plt+(i-1)*4-got is less than 32k
lwz 11,(plt+(i-1)*4-got)(30)
mtctr 11
bctr
# A table of branches, one for each plt entry.
# The idea is that the plt call stub loads ctr (and r11) with these
# addresses, so (r11 - res_0) gives the plt index * 4.
res_0: b PLTresolve
res_1: b PLTresolve
.
# Some number of entries towards the end can be nops
res_n_m3: nop
res_n_m2: nop
res_n_m1:
PLTresolve:
mflr 0
bcl 20,31,1f
1: mflr 12
addis 11,11,(1b-res_0)@ha
addi 11,11,(1b-res_0)@l
sub 11,11,12 # r11 = index * 4
addis 12,12,(got-1b)@ha
addi 12,12,(got-1b)@l # r12 = _GLOBAL_OFFSET_TABLE_
mtlr 0
add 0,11,11
add 11,0,11 # r11 = index * 12 = reloc offset.
lwz 0,4(12) # got[1] address of dl_runtime_resolve
mtctr 0
lwz 12,8(12) # got[2] contains the map address
bctr
Of course, if we want to make the normal plt call path go fast, then the
thing to do is have gcc generate the plt call stubs so that they can be
scheduled. So gcc generates
addis 11,30,foo@gotplt@ha
lwz 11,foo@gotplt@l(11)
mtctr 11,foo@gotplt_marker
bctr foo@gotplt_marker
hopefully with other instructions scheduled in the sequence. The funny
looking gotplt_marker relocs are because ld might resolve "foo" to a
local function, and would then turn the sequence into
nop
nop
nop
bl foo
--
Alan Modra
IBM OzLabs - Linux Technology Centre