Discussion:
Help needed for debugging OCaml failure on m68k
(too old to reply)
Stéphane Glondu
2024-06-19 06:30:01 UTC
Permalink
Hi all,

OCaml 5.2.0 FTBFS on m68k:


https://buildd.debian.org/status/fetch.php?pkg=ocaml&arch=m68k&ver=5.2.0-1%7Eexp1&stamp=1718285451&raw=0

The failure happens very early, at the very first run of the bytecode
interpreter (ocamlrun). It seems to be related to a thread local
variable that moves unexpectedly. I've posted reports of my
investigations in an issue on the upstream github:

https://github.com/ocaml/ocaml/issues/13249

To reproduce the problem quickly:
- unpack ocaml 5.2.0 source package
- ./configure --enable-imprecise-c99-float-ops
- make coldstart

Is there some subtlety with thread local variables on m68k?


Cheers,
--
Stéphane
John Paul Adrian Glaubitz
2024-06-19 07:10:01 UTC
Permalink
Hi Stéphane,
Post by Stéphane Glondu
https://buildd.debian.org/status/fetch.php?pkg=ocaml&arch=m68k&ver=5.2.0-1%7Eexp1&stamp=1718285451&raw=0
The failure happens very early, at the very first run of the bytecode
interpreter (ocamlrun). It seems to be related to a thread local
variable that moves unexpectedly. I've posted reports of my
https://github.com/ocaml/ocaml/issues/13249
- unpack ocaml 5.2.0 source package
- ./configure --enable-imprecise-c99-float-ops
- make coldstart
Is there some subtlety with thread local variables on m68k?
Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
make sure it's not related to the QEMU build environment on the buildds?

If it turns out to be a QEMU bug, we need to report it there instead.

Thanks,
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Stéphane Glondu
2024-06-19 10:40:01 UTC
Permalink
Hi,
Post by John Paul Adrian Glaubitz
Post by Stéphane Glondu
- unpack ocaml 5.2.0 source package
- ./configure --enable-imprecise-c99-float-ops
- make coldstart
Is there some subtlety with thread local variables on m68k?
Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
make sure it's not related to the QEMU build environment on the buildds?
I can reproduce the issue on mitchy.debian.net.


Cheers,
--
Stéphane
John Paul Adrian Glaubitz
2024-06-19 10:50:01 UTC
Permalink
Hi Stéphane,
Post by Stéphane Glondu
Post by John Paul Adrian Glaubitz
Post by Stéphane Glondu
- unpack ocaml 5.2.0 source package
- ./configure --enable-imprecise-c99-float-ops
- make coldstart
Is there some subtlety with thread local variables on m68k?
Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
make sure it's not related to the QEMU build environment on the buildds?
I can reproduce the issue on mitchy.debian.net.
OK, then it's actually a bug.

One important thing to know is that the natural alignment on m68k is actually 16 bits
and not 32 bits which causes quite some issues with various upstream projects.

We're currently planning on switching the alignment on m68k to 32 bits and chances are
that this could this issue as well.

Can you maybe try passing "-malign-int" to CFLAGS/CXXFLAGS when building OCaml on m68k
to verify this hypothesis? Please note that this also breaks the SysV ABI, so it's not
possible to easily do this on a per-package basis.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Stéphane Glondu
2024-06-19 11:30:01 UTC
Permalink
Post by John Paul Adrian Glaubitz
Post by Stéphane Glondu
Post by John Paul Adrian Glaubitz
Post by Stéphane Glondu
- unpack ocaml 5.2.0 source package
- ./configure --enable-imprecise-c99-float-ops
- make coldstart
Is there some subtlety with thread local variables on m68k?
Can you please try reproduce the issue on the porterbox mitchy.debian.net first to
make sure it's not related to the QEMU build environment on the buildds?
I can reproduce the issue on mitchy.debian.net.
OK, then it's actually a bug.
One important thing to know is that the natural alignment on m68k is actually 16 bits
and not 32 bits which causes quite some issues with various upstream projects.
We're currently planning on switching the alignment on m68k to 32 bits and chances are
that this could this issue as well.
Can you maybe try passing "-malign-int" to CFLAGS/CXXFLAGS when building OCaml on m68k
to verify this hypothesis? Please note that this also breaks the SysV ABI, so it's not
possible to easily do this on a per-package basis.
I observe the same behaviour with "-malign-int": the address of
caml_state (a thread local variable) changes unexpectedly (goes from
0x402e5fac to 0x402e7454) after the following goto:


https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L295

which leads to:


https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L819

...confirmed by adding:

fprintf(stderr, "&caml_state = %p\n", &caml_state);

before the goto and after the "Instruct(BRANCH):".


Cheers,
--
Stéphane
John Paul Adrian Glaubitz
2024-06-19 12:10:02 UTC
Permalink
Hi Stéphane,
Post by Stéphane Glondu
Post by John Paul Adrian Glaubitz
Can you maybe try passing "-malign-int" to CFLAGS/CXXFLAGS when building OCaml on m68k
to verify this hypothesis? Please note that this also breaks the SysV ABI, so it's not
possible to easily do this on a per-package basis.
I observe the same behaviour with "-malign-int": the address of
caml_state (a thread local variable) changes unexpectedly (goes from
https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L295
https://salsa.debian.org/ocaml-team/ocaml/-/blob/debian/experimental/runtime/interp.c?ref_type=heads#L819
fprintf(stderr, "&caml_state = %p\n", &caml_state);
before the goto and after the "Instruct(BRANCH):".
Hmm, I guess then maybe Andreas Schwab or Geert Uytterhoeven might have an idea what
the problem with the TLS variable is. I'll CC both.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2024-06-19 13:00:01 UTC
Permalink
Post by John Paul Adrian Glaubitz
Hmm, I guess then maybe Andreas Schwab or Geert Uytterhoeven might have an idea what
the problem with the TLS variable is. I'll CC both.
I noticed that &caml_state changes when pc changes. Looking further, pc
is a register variable pinned to a5. I guess this conflicts with the
implementation of TLS...?
I've removed the register pin and launched a build, and it goes past the
problematic point. We'll see how it goes...
Oh, nice catch. Yeah, I think A5 is the register that is used for TLS but
Andreas or Geert need to correct me.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2024-06-22 08:20:01 UTC
Permalink
Hi Stéphane,
I noticed that &caml_state changes when pc changes. Looking further, pc
is a register variable pinned to a5. I guess this conflicts with the
implementation of TLS...?
I've removed the register pin and launched a build, and it goes past the
problematic point. We'll see how it goes...
Were you able to complete the build successfully on m68k?

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Loading...