打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
6-Profile-feedback optimization

1. Introduction

Profile-feedback optimization also known as feedback-driven optimization (FDO) can significantly improve the compiler's decision on what optimizations would be beneficial in which parts of the program. On many applications, we have measured 15-20% improvement due to this optimization.

To provide the profile information to the compiler initially a so-called training run is performed with an instrumented version of the program. In the second pass the profile data produced in the training run is consulted by the compiler during the various optimizations it performs.

The Simple Executive does not provide a file system where normally the profile data is saved. We provide different solutions to gather profile data on the simulator and on the hardware.

Profile data is emitted when the program exits. For programs that never exit, the bootloader provides a way to force an exit.

Note:
The program used to produce the profile data has to be essentially identical to the program compiled in the second pass. If there is a change in a structural in the program, the second pass will refuse to accept the profile data file with messages like:
  /tmp/a.c:2: error: coverage mismatch for function 'main' while reading counter 'arcs'  /tmp/a.c:2: error: checksum is 1f18b8db instead of 112c2d5e

Note:
When compiling the second pass GCC only issues a note if the profile data file cannot be found. For example
  executive/cvmx-zone.c:164: note: file obj/cvmx-zone.gcda not found, execution counts estimated

It is expected that if an object file is included in an archive but never actually becomes part of the link there will be no profile data generated for it. In other cases this note probably means that GCC was not looking at the place where you put the profile data (see 2. Core-specific profile data how this can happen) or the training run has failed to produce profiling output because perhaps the program did not exit normally.

The rest of the page describes various issues in more details:

2. Core-specific profile data

Given the multi-core nature of Octeon profile data is always core specific. The compiler run-time has been modified to output profile data for each core. The core number will be attached to the file name of the profile data file. E.g. the profile data for the module foo.c produced on core 3 will be named foo.gcda3.

Before compiling in the second pass you need to provide a .gcda file for each module. This is the file that will be used by the compiler. You can either rename per-code data files to end with .gcda or you can merge the per-core files. Read on for examples on both of these.

3. Building for training (1st) phase

The SDK build system provides a makefile variable FDO_PASS to activate profile-feedback compilation. This can be most simply passed on the make command line. If the variable is set to 1, -fprofile-generate is passed to the compiler which will turn on instrumentation. If the variable is set to 2, -fprofile-use is passed in which case GCC will look for the profile data files (*.gcda).

For example to build the SDK example passthrough to perform a training run use these steps:

  examples/passthrough$ make -s clean  examples/passthrough$ make FDO_PASS=1  cvmx-config config/executive-config.h  mipsisa64-octeon-elf-gcc -fprofile-generate -g -O2 -c -o obj/passthrough.o passthrough.c ......

4. Collecting profile data using the Octeon simulator

The Octeon simulator maps I/O operations to the appropriate I/O operation of the host system (see Simulator "magic"). Thus the profile data generated by the GCC run-time will appear as regular files on the host system.

For example to run the SDK example passthrough to generate profile data perform these steps:

  examples/passthrough$ make FDO_PASS=1 run  oct-packet-io -p 2000 -c 1 -o output-%d.data -i input-0.data -i input-1.data -i input-2.data -i input-3.data &  oct-sim passthrough -quiet -serve=2000 -maxcycles=120000000 -numcores=4  Starting simulator.......

At this point the per-core profile data files will be under obj/. You can rebuild the application using profile data, see 7. Rebuilding the application using profile data..

5. Forcing programs to exit

Profile data is only written to the console at exit time through and atexit() handler. There are some programs that run forever and never call exit() or return from main()programmatically. To help to collect profile data for such applications we provide an option break to the boot command bootoct which forces the program to exit upon receiving a ctrl-c character on the console.

Note:
On the simulator, in order to pass break to bootoct you need to use the -envfile simulator option (see oct-sim (simulator wrapper script)). You also need to use -uart0/-uart1 option (Simulator Arguments) to issue ctrl-c on the console. With most telnet clients ctrl-c is associated with the protocol command interrupt. You need to disable that so that ctrl-c is passed to the serial port unchanged:
  $ telnet localhost 1234  Trying 127.0.0.1...  Connected to localhost.localdomain.  Escape character is '^]'.  U-Boot 1.1.1 (U-boot build #: 189) (SDK version: 1.7.3-257) (Build time: May 22  2008 - 21:00:00)...  ^]  telnet> unset interrupt  interrupt character is 'off'.  ^C

^] stands for the escape character to enter command mode. After hitting ctrl-c you might need to hit an enter as well to flush out the ctrl-c in line mode.

6. Collecting profile data on the hardware through the serial console.

The Simple Executive environment does not provide a file system. To collect profile data we extended newlib (libc) to provide a very simple write-only memory based file system. A few low level file I/O operations (open, write, close) maintain the contents of the files in memory. Upon exit the contents of the files are dumped on the console uuencoded.

The memory used for the files is allocated from the heap. In case the heap is exhausted while trying to write profile data, you should see the following messages:

  memfile: out of memory

If necessary you can increase the heap size when booting (see Octeon Bootloader).

You should capture the console output in the profile run and then run oct-uudecode on the resulting file. The tool oct-uudecode unlike the standard uudecode tool can decode multiple files.

Here is a sample session using break option with the SDK example passthrough:

  $ kermit  C-Kermit 8.0.211, 10 Apr 2004, for Linux   Copyright (C) 1985, 2004,    Trustees of Columbia University in the City of New York.  Type ? or HELP for help.  C-Kermit>send log session session.log  C-Kermit>c  Connecting to /dev/ttyUSB0, speed 115200   Escape character: Ctrl-\ (ASCII 28, FS): enabled  Type the escape character followed by C to get back,  or followed by ? to see other options.  Session Log: session.log, text  ----------------------------------------------------  Octeon ebt3000# bootoct 0x100000 coremask=ffff break  Bootloader: Booting Octeon Executive application at 0x00100000, core mask: 0xffff, stack size: 0x100000, heap size: 0x300000  Bootloader: Done loading app on coremask: 0xffff  PP0:~CONSOLE-> Version: Cavium Networks Octeon SDK version 1.6.0, build 193  PP0:~CONSOLE-> Port 16: Up     1Gbs Full duplex  PP0:~CONSOLE-> Port 17: Down   1Gbs Half duplex  PP0:~CONSOLE-> Port 18: Down   1Gbs Half duplex  PP0:~CONSOLE-> Port 19: Down   1Gbs Half duplex  PP0:~CONSOLE-> Disabling backpressure

In this example we train with the traffic generator. After generating enough traffic we force the program to exit by hitting control-C on the console.

Note:
Since the break handler will need to write to the console, breaks will be ignored unless the the handler was able to secure access to the console. If the application is using the console extensively, it make take hitting control-c a few time before the dumping would start:
  PP0:~CONSOLE-> Disabling backpressure  ^C  Dumping files on core 0  begin 644 examples/passthrough/obj/passthrough.gcda0  M9V-D830P,2KFX&JS 0        (   "H3^8_ @&A      !8     0  M   !          $                    !          $          0...  begin 644 examples/passthrough/obj/octeon-model.gcda13  M9V-D830P,2KFX*(: 0        (    ND,M)Y &A       H  M  M  M  M                      $        "    &Q%MCS(!H0       @  M    H0        D         %0    $  M  "C        "22];ZT   97     0-*/+$      (//Z@      @\_J  %  `  end  Done dumping files on core 13

Note:
Cores can dump their profile data in any order. Make sure all have finished before you proceed to the next step.
Now you can extract the files using oct-uudecode.

  C-Kermit>q  Closing /dev/ttyUSB0...OK  $ oct-uudecode session.log  examples/passthrough/obj/passthrough.gcda0  examples/passthrough/obj/cvmx-fpa.gcda0  examples/passthrough/obj/cvmx-pko.gcda0  examples/passthrough/obj/cvmx-sysinfo.gcda0  examples/passthrough/obj/cvmx-coremask.gcda0...  examples/passthrough/obj/cvmx-ebt3000.gcda13  examples/passthrough/obj/octeon-model.gcda13  examples/passthrough/obj/cvmx-interrupt.gcda13  $

At this point per-core profile data files are under examples/passthrough/obj.

7. Rebuilding the application using profile data.

We pick core zero for the profile data and rename the files for core0 to end with .gcda.

Here we assumed that core0 was representative of the whole application. You can also merge all or specific per-core profile data files or you can merge profile data from different runs, see 8. Merging profile data.

  examples/passthrough$ for i in obj/*.gcda0; do cp $i ${i%0}; done  examples/passthrough$ make -s clean  # preserves .gcda files  examples/passthrough$ make FDO_PASS=2 run  cvmx-config config/executive-config.h  mipsisa64-octeon-elf-gcc -fprofile-use -g -O2 -c -o obj/passthrough.o passthrough.c ......

8. Merging profile data

You can merge profile data generated for different cores or across different executions of the program. This enables incrementally training the program based on the different run-time conditions.

The merging tool is called merge-gcda:

$ mipsisa64-octeon-elf-merge-gcda --helpUsage: mipsisa64-octeon-elf-merge-gcda <option(s)> <files>    or mipsisa64-octeon-elf-merge-gcda <option(s)> <directories>    or mipsisa64-octeon-elf-merge-gcda <option(s)> Merge GCC profile output files. In the first variant <file(s)> are merged into a single output file.  By default the output filename is derived from the first file name by removing the trailing numeric characters. In the second variant, files with the identical names from <directories> are merged into files with the same name under the current directory. If neither <files> nor <directories> are specified each <module>.gcda{0..15} file in the current directory is merged into a <module>.gcda file. Possible <option(s)> are:  -o, --output=<output>  When outputting a single file (first variant) this                         store the merged data in this file.  When                         generating multiple files (first and second                         variant), store the files under this directory.  -f, --factors=f1:f2:.. When merging, weigh the first input file                         with integer factor f1, second with f2, etc  -h, --help             Display help message

Assuming you have the profile data files for each module generated for each core in the current directory you can merge them per module by executing merge-gcda with no arguments. Assuming the passthrough example from above:

  examples/passthrough$ cd obj  examples/passthrough/obj$ mipsisa64-octeon-elf-merge-gcda  Merging these files into cvmx-app-init.gcda:    cvmx-app-init.gcda0    cvmx-app-init.gcda1    cvmx-app-init.gcda10    cvmx-app-init.gcda11    cvmx-app-init.gcda12    cvmx-app-init.gcda13    cvmx-app-init.gcda14    cvmx-app-init.gcda15    cvmx-app-init.gcda2    cvmx-app-init.gcda3    cvmx-app-init.gcda4    cvmx-app-init.gcda5    cvmx-app-init.gcda6    cvmx-app-init.gcda7    cvmx-app-init.gcda8    cvmx-app-init.gcda9  Merging these files into cvmx-bootmem-shared.gcda:    cvmx-bootmem-shared.gcda0    cvmx-bootmem-shared.gcda1    cvmx-bootmem-shared.gcda10    cvmx-bootmem-shared.gcda11...

You can also merge profile data across different runs. Assuming that you have profile data for two runs in directory run1 and run2 you can merge them by passing the directories tomerge-gcda:

  examples/passthrough/obj$ mipsisa64-octeon-elf-merge-gcda ../run1 ../run2  Merging these files into cvmx-app-init.gcda:    ../run1/cvmx-app-init.gcda    ../run2/cvmx-app-init.gcda  Merging these files into cvmx-bootmem-shared.gcda:    ../run1/cvmx-bootmem-shared.gcda    ../run2/cvmx-bootmem-shared.gcda...
:

9. Displaying profile data

The tool gcov can be used to display some aspects of the profile data collected. gcov is used to analyze the profile output in terms of coverage so the format of the output will only cover this part of the information and will only show how many times each source line was executed and point out the lines that had no coverage.

For example invoke gcov to display the profile data for passthrough's main module from the previous section:

  examples/passthrough$ mipsisa64-octeon-elf-gcov -o obj passthrough.c  File 'passthrough.c'  Lines executed:77.17% of 127  passthrough.c:creating 'passthrough.c.gcov'...  examples/passthrough$ cat passthrough.c.gcov     -:    0:Source:passthrough.c     -:    0:Graph:obj/passthrough.gcno     -:    0:Data:obj/passthrough.gcda     -:    0:Runs:1     -:    0:Programs:1     -:    1:/****************************************************************     -:    2: * Copyright (c) 2003-2005, Cavium Networks. All rights reserved....     -:  159:static void application_main_loop(void)     1:  160:{     -:  161:    cvmx_wqe_t *    work;     -:  162:    uint64_t        port;     -:  163:    cvmx_buf_ptr_t  packet_ptr;     -:  164:    cvmx_pko_command_word0_t pko_command;     -:  165:     -:  166:     -:  167:    /* Build a PKO pointer to this packet * /     1:  168:    pko_command.u64 = 0;     1:  169:    if (cvmx_sysinfo_get()->board_type == CVMX_BOARD_TYPE_SIM)     -:  170: {     1:  171:       pko_command.s.size0 = CVMX_FAU_OP_SIZE_64;     1:  172:       pko_command.s.subone0 = 1;     1:  173:       pko_command.s.reg0 = FAU_OUTSTANDING;     -:  174: }     -:  175:     -:  176:    while (1)     -:  177: {     -:  178:     -:  179:        /* get the next packet/work to process from the POW unit. * /  1518:  180:        if (cvmx_sysinfo_get()->board_type == CVMX_BOARD_SIM_TYPE)     -:  181: {  1518:  182:           work = cvmx_pow_work_request_sync(CVMX_POW_NO_WAIT);  1518:  183:           if (work == NULL) {

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
jQuery源码 框架分析(一)
Spring.NET教程(四)
jQuery页面加载初始化常用的三种方法
箭头函数详解
浅谈javascript面向对象
JS的绑定对象this
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服