How much faster is assembly language?

来源:互联网 发布:淘宝鬼脚七离婚真相 编辑:程序博客网 时间:2024/06/02 09:15
http://www.fourtheye.org/armstrong.shtml


How much faster is assembly language?

On reading about the philosophy behind the Raspberry Pi and the emphasis on teaching programming I looked for a book I have called Problems For Computer Solution which I have used on occasion to learn. I was also asked, when talking about the ARM processor in my SheevaPlug at my local Linux User Group, how much faster code was when written in assembly language.

As an experiment, I chose to code the seventh problem, to locate all of the Armstrong numbers of 2, 3 or 4 digits.

I coded the different versions in the following order:

  1. Perl - short and clear (armstrong4.pl).
  2. C - using sprintf into a string to separate the digits. A little more involved (armstrong4string.c).
  3. Assembly language - I sketched a flow chart and then coded it (armstrong.s).
  4. Assembly language with a macro - I realised that I was repeating code in the previous version so abstracted it to a macro (armstrong4macro.s).
  5. A version in C which uses division to separate the digits and follows a similar algorithm to the assembly language version (armstrong4divide.c).

The code is listed in the appendix. See also

Timing

Here is the cpuinfo for the machine.

bob@poland:~/src/problems_for_computer_solution/07_armstrong_numbers$ cat /proc/cpuinfo Processor: Feroceon 88FR131 rev 1 (v5l)BogoMIPS: 1192.75Features: swp half thumb fastmult edsp CPU implementer: 0x56CPU architecture: 5TECPU variant: 0x2CPU part: 0x131CPU revision: 1Hardware: Marvell SheevaPlug Reference BoardRevision: 0000Serial: 0000000000000000bob@poland:~/src/problems_for_computer_solution/07_armstrong_numbers$

I extended the search space to 5 and 6 digits to allow for longer runtimes.

Maximum number
of digitsPerlC - stringC - divideAssembly code4time perl armstrong4.plreal0m0.583suser0m0.580ssys0m0.000stime ./armstrong4dividereal0m0.256suser0m0.260ssys0m0.000stime ./armstrong4stringreal0m0.267suser0m0.270ssys0m0.000stime ./armstrong4macroreal0m0.007suser0m0.020ssys0m0.000s5time perl armstrong5.plreal0m6.202suser0m6.180ssys0m0.020stime ./armstrong5stringreal0m3.302suser0m3.300ssys0m0.000stime ./armstrong5dividereal0m3.198suser0m3.200ssys0m0.000stime ./armstrong5macroreal0m0.044suser0m0.060ssys0m0.000s6time perl armstrong6.plreal1m10.881suser1m10.650ssys0m0.010stime ./armstrong6stringreal0m39.312suser0m39.200ssys0m0.000stime ./armstrong6dividereal0m40.903suser0m38.230ssys0m0.000stime ./armstrong6macroreal0m0.512suser0m0.510ssys0m0.000s

The assembly language is the first draft, apart from the abstraction of the macro. It could probably be further optimised to shave a few cycles if performance were important. The ARM is a RISC processor and the version I have in the SheevaPlug (5TE) has no divide instruction (though I think that ARMv7 does?). Division can be achieved via repeated subtraction and counting which is the approach followed here.

Engineering is often a tradeoff between different constraints - here coding time and run time. If the code is to be run once - or once a day, then it makes sense to write it in Perl (or some other high-level language); if, however, it is to be run a million times per day then it makes sense to invest the time to make it run efficiently.

I documented some preliminary investigations into assembly language programming on the ARM here.

I have just been reading about THUMB mode - which allows the 32 bit processor to run 16 bit instructions. There are, however, restrictions on what is permissible in this mode, and I am not convinced of the benefits of having smaller instruction (quicker to load and execute?). However, I was curious to see if the switch (.thumb) would work, and if it ran faster. It may, but requires investigation which I may do?

I have no experience of teaching, so if anyone has any ideas as to how I could improve this page, or the code, please email me.

Arnaud tested the code on his Nokia 900 phone, which is an ARMV7 with approx 250 BogoMIPS (c.f. the SheevaPlug with approx 1000 BogoMIPS). The relative performances of Perl, C and assembly language were similar to those seen on the SheevaPlug.

Appendix - The code

  1. Perl version
  2. #!/usr/bin/perluse strict;use warnings;foreach my $number (10 .. 9999) {  my $size = length $number;  my @digits = split(//, $number);  my $total = 0;  for (my $index = 0; $index < $size; $index++) {    $total += $digits[$index] ** $size;  }  print "ARMSTRONG NUMBER is $number\n" if ($total == $number);}
  3. C versions
  4. N.B. These are functionally equivalent.

    • First version using a string
    • #include "stdio.h"#include "math.h"#include "stdlib.h"/* we allocate sufficient space to store the widest integer */#define MAXWIDTH 4/* numeric string characters are offset from their value */#define NUMOFFSET 48int main(){  int number;  for (number=10; number < 10000; number++)  {    char string[MAXWIDTH+1] = {};    snprintf(string, MAXWIDTH+1, "%d", number);    int numlen = strnlen(string, MAXWIDTH);         int total = 0;    int j;    for (j=0; j < numlen; j++)    {      int digit = string[j] - NUMOFFSET;      total += pow(digit, numlen);    }    if (total == number)      printf("ARMSTRONG NUMBER is %d\n", total);  }  exit(0);}
    • Second version using division
    • #include "stdio.h"#include "stdint.h"#include "stdlib.h"#include "math.h"/* work on base 10 */#define BASE 10int main(){  uint8_t numlen = 2;  uint16_t number;  for (number=10; number < 10000; number++)  {    if (number >= 1000)      numlen = 4;    else if (number >= 100)      numlen = 3;    uint32_t counter = number;    uint8_t digit = counter % BASE;    uint32_t armstrong = pow(digit, numlen);    while (counter = (uint32_t) floor(counter / BASE))    {      digit = counter % BASE;      armstrong += pow(digit, numlen);    }    if (armstrong == number)      printf("ARMSTRONG NUMBER is %d\n", armstrong);  }  exit(0);}
  5. Assembly language
    • Power function
    • # this subroutine returns the passed digit to the passed power## inputs#   r0 - digit#   r1 - power ## outputs#   r0 - digit ** power## locals#   r4.globl _power.align 2        .text_power:nop        stmfdsp!, {r4, lr}@ save variables to stacksubsr1, r1, #1@ leave unless power > 1ble_power_endmovr4, r0@ copy digit_power_loop_start:mulr0, r4, r0@ raise to next powersubsr1, r1, #1beq_power_end@ leave when doneb_power_loop_start@ next iteration_power_end:        ldmfd   sp!, {r4, pc}@ restore state from stack and leave subroutime
    • Armstrong function
    • # inputs#   r0 - number## outputs#   r0 - armstrong number## local r4, r5, r6, r7, r8.equ ten,10.equ hundred,100.equ thousand,1000.equ ten_thousand,10000number .req r4width .req r5digit .req r6current .req r7armstrong .req r8.globl _armstrong.align 2        .text_armstrong:        nop        stmfd   sp!, {r4, r5, r6, r7, r8, lr}   @ save variables to stack        mov     number, r0@ copy passed parameter to working numbercmpnumber, #ten@ exit unless number > 10blt_end        ldr     current, =ten_thousand@ exit unless number < 10000cmpnumber, currentbge_endmovwidth, #0@ initialisemovdigit, #0movarmstrong, #0ldrcurrent, =thousand@ handle 1000 digit_thousand_start:cmpnumber, currentblt_thousand_end@ exit thousand code if none leftmovwidth, #4@ width must be 4addcurrent, current, #thousand@ bump thousand counteradddigit, digit, #1@ and corresponding digit countb_thousand_start@ and loop_thousand_end:addnumber, number, #thousand@ need number modulo thousandsubnumber, number, currentmovr0, digit@ push digitmovr1, width@ and widthbl_power@ to compute digit **widthaddarmstrong, r0, armstrong@ and update armstrong number with this valueldrcurrent, =hundred@ then we do the hundreds as we did the thousandsmovdigit, #0_hundred_start:cmpnumber, currentblt_hundred_endteqwidth, #0@ and only set width if it is currently unsetmoveqwidth, #3_hundred_width:addcurrent, current, #hundred@ yada yada as thousands aboveadddigit, digit, #1b_hundred_start_hundred_end:addnumber, number, #hundredsubnumber, number, currentmovr0, digitmovr1, widthbl_poweraddarmstrong, r0, armstrongldrcurrent, =ten@ then the tens as the hundred and thousands abovemovdigit, #0_ten_start:cmpnumber, currentblt_ten_endteqwidth, #0moveqwidth, #2_ten_width:addcurrent, current, #tenadddigit, digit, #1b_ten_start_ten_end:addnumber, number, #tensubnumber, number, currentmovr0, digitmovr1, widthbl_poweraddarmstrong, r0, armstrongmovr0, number@ then add in the trailing digitsmovr1, widthbl_poweraddarmstrong, r0, armstrongmovr0, armstrong@ and copy the armstrong number back to r0 for return_end:        ldmfd   sp!, {r4, r5, r6, r7, r8, pc}   @ restore state from stack and leave subroutine
    • Armstrong function with a macro to abstract repeated code
    • N.B. This is functionally equivalent but much shorter than the previous function. The variable \@ here is a magic variable, incremented each time the macro is instantiated. This enables the use of distinct labels, which we need here.

      # inputs#   r0 - number## outputs#   r0 - armstrong number## local r4, r5, r6, r7, r8.equ ten,10.equ hundred,100.equ thousand,1000.equ ten_thousand,10000number .req r4width .req r5digit .req r6current .req r7armstrong .req r8.macro armstrong_digit a, bldrcurrent, =\amovdigit, #0_start\@:cmpnumber, currentblt_end\@teqwidth, #0@ and only set width if it is currently unsetmoveqwidth, #\baddcurrent, current, #\aadddigit, digit, #1b_start\@_end\@:addnumber, number, #\asubnumber, number, currentmovr0, digitmovr1, widthbl_poweraddarmstrong, r0, armstrong.endm.globl _armstrong.align 2        .text_armstrong:        nop        stmfd   sp!, {r4, r5, r6, r7, r8, lr}   @ save variables to stack        mov     number, r0@ copy passed parameter to working numbercmpnumber, #ten@ exit unless number > 10blt_end        ldr     current, =ten_thousand@ exit unless number < 10000cmpnumber, currentbge_endmovwidth, #0@ initialisemovarmstrong, #0armstrong_digit thousand 4armstrong_digit hundred 3armstrong_digit ten 2movr0, number@ then add in the trailing digitsmovr1, widthbl_poweraddarmstrong, r0, armstrongmovr0, armstrong@ and copy the armstrong number back to r0 for return_end:        ldmfd   sp!, {r4, r5, r6, r7, r8, pc}   @ restore state from stack and leave subroutine
    • Armstrong_main function
    • .equ ten,10.equ ten_thousand,10000.section.rodata.align2string:.asciz "armstrong number of %d is %d\n".text.align2.globalmain.typemain, %functionmain:ldrr5, =tenldrr6, =ten_thousandmovr4, r5@ start with n = 10_main_loop:cmpr4, r6@ leave if n = 10_000beq_main_endmovr0, r4@ call the _armstrong functionbl_armstrongteqr0, r4@ if the armstong value = n print itbne_main_next@ else skipmovr2, r0movr1, r4ldrr0, =string@ store address of start of string to r0blprintf@ call the c function to display information_main_next:addr4, r4, #1b_main_loop_main_end:movr7, #1@ set r7 to 1 - the syscall for exitswi0@ then invoke the syscall from linux
  6. A Makefile for the armstrong code
  7. AS      := /usr/bin/asCC      := /usr/bin/gccLD      := /usr/bin/ldASOPTS  := -gstabsCCOPTS  := -gCLIBS   := -lmall: armstrong4 armstrong5 armstrong6#harness: harness.s armstrong4macro.s power.s#armstrong: armstrong4main.s armstrong.s power.sarmstrong4: armstrong4macro armstrong4string armstrong4divide armstrong4macro: armstrong4main.s armstrong4macro.s power.sarmstrong4string: armstrong4string.carmstrong4divide: armstrong4divide.carmstrong5: armstrong5macro armstrong5string armstrong5dividearmstrong5macro: armstrong5main.s armstrong5macro.s power.sarmstrong5divide: armstrong5divide.carmstrong5divide: armstrong5divide.carmstrong6: armstrong6macro armstrong6string armstrong6dividearmstrong6macro: armstrong6main.s armstrong6macro.s power.sarmstrong6string: armstrong6string.carmstrong6divide: armstrong6divide.c%: %.c$(CC) $(CCOPTS) -o $@ $^ $(CLIBS)clean:rm -f armstrong harness armstrong4macro armstrong4string armstrong4divide armstrong5macro armstrong5string armstrong5divide armstrong6macro armstrong6string armstrong6divide
<script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"},"share":{}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script>
阅读(1336) | 评论(0) | 转发(0) |
0

上一篇:A number of problems from coded in ARM assembly language Problems

下一篇:从ARMASM汇编到GNU ARM ASM汇编

相关热门文章
  • SHTML是什么_SSI有什么用...
  • shell中字符串操作
  • 卡尔曼滤波的原理说明...
  • 关于java中的“错误:找不到或...
  • shell中的特殊字符
  • linux dhcp peizhi roc
  • 关于Unix文件的软链接
  • 求教这个命令什么意思,我是新...
  • sed -e "/grep/d" 是什么意思...
  • 谁能够帮我解决LINUX 2.6 10...
给主人留下些什么吧!~~