Dear developpers,
we are academic researchers working in automated program analysis. We are currently interested in checking compliance of inline asm chunks as found in C programs.
While benchmarking our tool and technique, we found 2 significant compliance issues in UDPCAST. We report them to you, as well as adequate patches with explanations.
* All these bugs are related to compliance between the block of asm and its surrounding "contract" (in gcc-style notation). They are akin to undefined or implementation-defined behaviours in C: they currently do not manifest themselves in your program, but at some point in time with compiler optimizations becoming more and more aggressive or changes in undocumented compiler choices regarding asm chunks, they can suddenly trigger a (hard-to-find) bug.
* The typical problems come from the compiler missing dataflow information and performing undue optimizations on this wrong basis, or the compiler allocating an already used register. Actually, we demonstrate "in lab" problems with all these categories of bugs in case of inlining (especially with LTO enabler) or code refactoring.
* Some of those issues may seems benign or irrealistic but it cost nothing to patch so, why not do it?
We would be very interested to hear your opinion on these matters. Are you interested in such errors and patches? Also, besides the patches, we are currently working on a code analyzer prototype designed to check asm compliance and to propose patches when the chunk is not compliant. This is still work in progress and we are finalizing it. The errors and patches I reported to you came from my prototype. In case such a prototype would be made available, would you consider using it?
Best regards
Frédéric Recoules
Hi,
On 03/04/2020 22:49, FRÉDÉRIC RECOULES wrote:
Dear developpers,
we are academic researchers working in automated program analysis. We are currently interested in checking compliance of inline asm chunks as found in C programs.
While benchmarking our tool and technique, we found 2 significant compliance issues in UDPCAST. We report them to you, as well as adequate patches with explanations.
[...]
We would be very interested to hear your opinion on these matters. Are you interested in such errors and patches?
Sure, but see below :-)
Also, besides the patches, we are currently working on a code analyzer prototype designed to check asm compliance and to propose patches when the chunk is not compliant. This is still work in progress and we are finalizing it. The errors and patches I reported to you came from my prototype. In case such a prototype would be made available, would you consider using it?
I might indeed have a look at it, but I won't probably use it regularly, due to the fairly limited usage of assembly in my projects (FEC in udpcast is about the only place where I use assembly).
Just wondering, why did you put the actual body of your issues into a text attachment, rather than in the mail body itself, it makes it harder to quote...
The issues are the sames for the addmul1 and mul1 functions defined in fec.c.
- the read-only input dst1 and src1 are clobbered
Indeed, I guess they should go into the "clobbered" field (along with memory, eax, edx, ...)
The compiler may assume the old values of dst1 and src1 are still present respectively in edi and esi may misuse the new values.
OK
Patched by moving dst1 and src1 from read-only input to read-write output.
Hmmm, I'll have to watch out for this, if ever I add some processing in the future that continues to use the "dst" variable after the asm statement :-)
Too bad the "clobber" list cannot be used for this :-(
Then here are some minor improvements.
- add "cc" to the clobbers (even if it is, for now, redundant)
Ok
- remove the useless push/pop instructions in mul1
eax and ebx are already declared in the clobbers, so there is no need to manually save them.
Noted, probably dates back from an epoch when compilers were not yet smart enough to do this on their own :-)
Another thing: your patch puts all "operands" into one line. Any particular reason for this? This tends to obscure what the patch actually does...
Regards, and thanks for your interest in udpcast,
Alain
Hi Alain,
I might indeed have a look at it, but I won't probably use it regularly, due to the fairly limited usage of assembly in my projects (FEC in udpcast is about the only place where I use assembly).
We are glad to hear that. Of course, you would not be supposed to use it all the time, only when you are refactoring the assembly code or adding a new one -- what we are expecting to not happen very often ;-)
Just wondering, why did you put the actual body of your issues into a text attachment, rather than in the mail body itself, it makes it harder to quote...
Sorry for that, I had not really thought about that.
Indeed, I guess they should go into the "clobbered" field (along with memory, eax, edx, ...) [...] Hmmm, I'll have to watch out for this, if ever I add some processing in the future that continues to use the "dst" variable after the asm statement :-)
Too bad the "clobber" list cannot be used for this :-(
As it is said in the warning of https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#InputOperands clobbers can not be used here. (1) But, we can declare, as it is suggested by the documentation, two new, never used C, variables that will be tied to the inputs.
The problem here may or may not be relevant for your project but we assume that "copy-paste" interesting code is a common practice in open-source software development, especially on Github and we prefer the assembly to be "self-contained" so it would be safe in any context.
Another thing: your patch puts all "operands" into one line. Any particular reason for this? This tends to obscure what the patch actually does...
(2) No reason for that. It is only coding habits.
Here is the same patch with (1) and (2) fixed. Hope it will be useful.
diff --git a/fec-test.c b/fec-test.c index 428105c..5502a00 100644 --- a/fec-test.c +++ b/fec-test.c @@ -303,10 +303,10 @@ int main(int argc, char **argv) { exit(1); }
- begin = rdtsc(0); - init_fec(); + begin = rdtsc(); + fec_init(); initFastTable(); - end = rdtsc(1); + end = rdtsc(); fprintf(stderr, "%d cycles to create FEC buffer\n", (unsigned int) (end-begin)); for(i=0, j=0; i<size; i+=blocksize, j++) { diff --git a/fec.c b/fec.c index 043a71e..9f3752c 100644 --- a/fec.c +++ b/fec.c @@ -342,6 +342,7 @@ slow_addmul1(gf *dst1, gf *src1, gf c, int sz) static void addmul1(gf *dst1, gf *src1, gf c, int sz) { + gf *dummy0, *dummy1; USE_GF_MULC ;
GF_MULC0(c) ; @@ -353,7 +354,7 @@ addmul1(gf *dst1, gf *src1, gf c, int sz) return; }
- asm volatile("xorl %%eax,%%eax;\n" + asm volatile(" xorl %%eax,%%eax;\n" " xorl %%edx,%%edx;\n" ".align 32;\n" "1:" @@ -390,13 +391,16 @@ addmul1(gf *dst1, gf *src1, gf c, int sz) " cmpl %%ecx, %%esi;\n" " jb 1b;" - : : - + : + + "=D" (dummy0), + "=S" (dummy1) : + "b" (__gf_mulc_), "D" (dst1-8), "S" (src1), "c" (sz+src1) : - "memory", "eax", "edx" + "memory", "eax", "edx", "cc" ); } #else @@ -465,6 +469,7 @@ slow_mul1(gf *dst1, gf *src1, gf c, int sz) static void mul1(gf *dst1, gf *src1, gf c, int sz) { + gf *dummy0, *dummy1; USE_GF_MULC ;
GF_MULC0(c) ; @@ -476,9 +481,7 @@ mul1(gf *dst1, gf *src1, gf c, int sz) return; }
- asm volatile("pushl %%eax;\n" - "pushl %%edx;\n" - "xorl %%eax,%%eax;\n" + asm volatile(" xorl %%eax,%%eax;\n" " xorl %%edx,%%edx;\n" "1:" " addl $8, %%edi;\n" @@ -514,15 +517,16 @@ mul1(gf *dst1, gf *src1, gf c, int sz) " cmpl %%ecx, %%esi;\n" " jb 1b;\n" - " popl %%edx;\n" - " popl %%eax;" - : : - + : + + "=D" (dummy0), + "=S" (dummy1) : + "b" (__gf_mulc_), "D" (dst1-8), "S" (src1), "c" (sz+src1) : - "memory", "eax", "edx" + "memory", "eax", "edx", "cc" ); } #else
Regards, Frédéric
----- Mail original ----- De: "Alain Knaff" alain@knaff.lu À: "FRÉDÉRIC RECOULES" frederic.recoules@univ-grenoble-alpes.fr Cc: "Richard Bonichon" richard.bonichon@gmail.com, "Sébastien Bardin" sebastien.bardin@cea.fr, "udpcast" udpcast@udpcast.linux.lu Envoyé: Samedi 4 Avril 2020 17:48:53 Objet: Re: [Udpcast] [inline assembly compliance] Issues and patches
Hi,
On 03/04/2020 22:49, FRÉDÉRIC RECOULES wrote:
Dear developpers,
we are academic researchers working in automated program analysis. We are currently interested in checking compliance of inline asm chunks as found in C programs.
While benchmarking our tool and technique, we found 2 significant compliance issues in UDPCAST. We report them to you, as well as adequate patches with explanations.
[...]
We would be very interested to hear your opinion on these matters. Are you interested in such errors and patches?
Sure, but see below :-)
Also, besides the patches, we are currently working on a code analyzer prototype designed to check asm compliance and to propose patches when the chunk is not compliant. This is still work in progress and we are finalizing it. The errors and patches I reported to you came from my prototype. In case such a prototype would be made available, would you consider using it?
I might indeed have a look at it, but I won't probably use it regularly, due to the fairly limited usage of assembly in my projects (FEC in udpcast is about the only place where I use assembly).
Just wondering, why did you put the actual body of your issues into a text attachment, rather than in the mail body itself, it makes it harder to quote...
The issues are the sames for the addmul1 and mul1 functions defined in fec.c.
- the read-only input dst1 and src1 are clobbered
Indeed, I guess they should go into the "clobbered" field (along with memory, eax, edx, ...)
The compiler may assume the old values of dst1 and src1 are still present respectively in edi and esi may misuse the new values.
OK
Patched by moving dst1 and src1 from read-only input to read-write output.
Hmmm, I'll have to watch out for this, if ever I add some processing in the future that continues to use the "dst" variable after the asm statement :-)
Too bad the "clobber" list cannot be used for this :-(
Then here are some minor improvements.
- add "cc" to the clobbers (even if it is, for now, redundant)
Ok
- remove the useless push/pop instructions in mul1
eax and ebx are already declared in the clobbers, so there is no need to manually save them.
Noted, probably dates back from an epoch when compilers were not yet smart enough to do this on their own :-)
Another thing: your patch puts all "operands" into one line. Any particular reason for this? This tends to obscure what the patch actually does...
Regards, and thanks for your interest in udpcast,
Alain
Hi Frédéric,
On 05/04/2020 11:55, FRÉDÉRIC RECOULES wrote:
Hi Alain,
I might indeed have a look at it, but I won't probably use it regularly, due to the fairly limited usage of assembly in my projects (FEC in udpcast is about the only place where I use assembly).
We are glad to hear that. Of course, you would not be supposed to use it all the time, only when you are refactoring the assembly code or adding a new one -- what we are expecting to not happen very often ;-)
:-)
Just wondering, why did you put the actual body of your issues into a text attachment, rather than in the mail body itself, it makes it harder to quote...
Sorry for that, I had not really thought about that.
No problem :-)
Indeed, I guess they should go into the "clobbered" field (along with memory, eax, edx, ...) [...] Hmmm, I'll have to watch out for this, if ever I add some processing in the future that continues to use the "dst" variable after the asm statement :-)
Too bad the "clobber" list cannot be used for this :-(
As it is said in the warning of https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#InputOperands clobbers can not be used here. (1) But, we can declare, as it is suggested by the documentation, two new, never used C, variables that will be tied to the inputs.
The problem here may or may not be relevant for your project but we assume that "copy-paste" interesting code is a common practice in open-source software development, especially on Github and we prefer the assembly to be "self-contained" so it would be safe in any context.
Another thing: your patch puts all "operands" into one line. Any particular reason for this? This tends to obscure what the patch actually does...
(2) No reason for that. It is only coding habits.
Here is the same patch with (1) and (2) fixed. Hope it will be useful.
Thanks.
This looks indeed so much better, and has far less impact on the generated .s file as the patch before, which gives indeed confidence that it doesn't subtly break things :-)
This is now in the new udpcast release.
Regards,
Alain