From 49d3118621a68f6583228f123efd16f0f803d908 Mon Sep 17 00:00:00 2001 From: Balazs Kezes Date: Thu, 25 Aug 2016 10:06:51 +0100 Subject: [PATCH] Incorrect function call code on ARMv6 On 2016-08-11 09:24 +0100, Balazs Kezes wrote: > I think it's just that that copy_params() never restores the spilled > registers. Maybe it needs some extra code at the end to see if any > parameters have been spilled to stack and then restore them? I've spent some time on this and I've found an alternative solution. Although I'm not entirely sure about it but I've attached a patch nevertheless. And while poking at that I've found another problem affecting the unsigned long long division on arm and I've attached a patch for that too. More details in the patches themselves. Please review and consider them for merging! Thank you! -- Balazs [PATCH 1/2] Fix slow unsigned long long division on ARM The macro AEABI_UXDIVMOD expands to this bit: #define AEABI_UXDIVMOD(name,type, rettype, typemacro) \ ... while (num >= den) { \ ... while ((q << 1) * den <= num && q * den <= typemacro ## _MAX / 2) \ q <<= 1; \ ... With the current ULONG_MAX version the inner loop goes only until 4 billion so the outer loop will progress very slowly if num is large. With ULLONG_MAX the inner loop works as expected. The current version is probably a result of a typo. The following bash snippet demonstrates the bug: $ uname -a Linux eper 4.4.16-2-ARCH #1 Wed Aug 10 20:03:13 MDT 2016 armv6l GNU/Linux $ cat div.c int printf(const char *, ...); int main(void) { unsigned long long num, denom; num = 12345678901234567ULL; denom = 7; printf("%lld\n", num / denom); return 0; } $ time tcc -run div.c 1763668414462081 real 0m16.291s user 0m15.860s sys 0m0.020s [PATCH 2/2] Fix long long dereference during argument passing on ARMv6 For some reason the code spills the register to the stack. copy_params in arm-gen.c doesn't expect this so bad code is generated. It's not entirely clear why the saving part is necessary. It was added in commit 59c35638 with the comment "fixed long long code gen bug" with no further clarification. Given that tcctest.c passes without this, maybe it's no longer needed? Let's remove it. Also add a new testcase just for this. After I've managed to make the tests compile on a raspberry pi, I get the following diff without this patch: --- test.ref 2016-08-22 22:12:43.380000000 +0100 +++ test.out3 2016-08-22 22:12:49.990000000 +0100 @@ -499,7 +499,7 @@ 2 1 0 1 0 4886718345 -shift: 9 9 9312 +shift: 291 291 291 shiftc: 36 36 2328 shiftc: 0 0 9998683865088 manyarg_test: More discussion on this thread: https://lists.nongnu.org/archive/html/tinycc-devel/2016-08/msg00004.html --- lib/armeabi.c | 2 +- tccgen.c | 5 ----- tests/tcctest.c | 5 +++++ 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/lib/armeabi.c b/lib/armeabi.c index 0d1217b..b12d164 100644 --- a/lib/armeabi.c +++ b/lib/armeabi.c @@ -399,7 +399,7 @@ REGS_RETURN(ulldiv_t, ulldiv_t) REGS_RETURN(idiv_t, idiv_t) REGS_RETURN(uidiv_t, uidiv_t) -AEABI_UXDIVMOD(uldivmod, unsigned long long, ulldiv_t, ULONG) +AEABI_UXDIVMOD(uldivmod, unsigned long long, ulldiv_t, ULLONG) __AEABI_XDIVMOD(ldivmod, long long, uldivmod, lldiv_t, ulldiv_t, LLONG) diff --git a/tccgen.c b/tccgen.c index 598ba88..2376848 100644 --- a/tccgen.c +++ b/tccgen.c @@ -861,11 +861,6 @@ ST_FUNC int gv(int rc) #endif if (r >= VT_CONST || /* XXX: test to VT_CONST incorrect ? */ (vtop->r & VT_LVAL)) { - /* We do not want to modifier the long long - pointer here, so the safest (and less - efficient) is to save all the other registers - in the stack. XXX: totally inefficient. */ - save_regs(1); /* load from memory */ vtop->type.t = load_type; load(r, vtop); diff --git a/tests/tcctest.c b/tests/tcctest.c index 6ec23a1..9a4d0ef 100644 --- a/tests/tcctest.c +++ b/tests/tcctest.c @@ -2010,6 +2010,11 @@ void longlong_test(void) printf("%d %d %d %d\n", a > b, a < b, a >= b, a <= b); printf(LONG_LONG_FORMAT "\n", 0x123456789LLU); + + /* long long pointer deref in argument passing test */ + a = 0x123; + long long *p = &a; + llshift(*p, 5); } void manyarg_test(void)