Starting with LLVM and Encore

8 minute read

Published:

For my PhD, I need to work on typed optimisations for the Encore language. For fun and a bit for learning how to do this, I am going to describe the steps I take to perform alias analysis using different tools from the LLVM suite.

For instance, let’s say I would like to know if there could be any potential benefit in using the restrict qualifier in the C code generated by Encore. To do this, I am going to use the PingPong benchmark, and explain how to generate C code from Encore, transform C to LLVM IR, link different files together, perform alias analysis, optimise the code (possibly) and generate an assembly file that can be compiled by the system’s compiler to produce an executable.

Steps:

  • Step 1: encorec Main.enc transpiles Encore to C
  • Step 2: For each *.c file, compile the file to the LLVM IR (generating files with extension *.ll):
clang-3.8 -S -emit-llvm -std=gnu11 -Wall -fms-extensions -Wno-format\
 -Wno-microsoft -Wno-parentheses-equality -Wno-unused-variable\
 -Wno-unused-value -Wno-attributes\
 -I /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift-migrator/sdk/MacOSX.sdk/usr/include\
 -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
 -I . -I /Users/kikofernandezreyes/Code/encore/release/inc/ -O3 shared.c
  • Step 3: After this step, you need to link all the files LLVM IR together, for which you can use llvm-link-3.8:
llvm-link-3.8 -S -v -o runner.ll Main.encore.ll shared.ll PingActor.encore.ll \
  PongActor.encore.ll String.encore.ll

With the -S option, I specify that I would like to produce the output in LLVM IR.

opt-3.8 -O3 -S -basicaa -aa -adce -scev-aa -tbaa -aa-eval runner.ll -stats -o runnerOpt.ll

Some of the flags perform different kinds of alias analysis. The output is written to stdout:

  ===== Alias Analysis Evaluator Report =====
  21425 Total Alias Queries Performed
  2033 no alias responses (9.4%)
  18753 may alias responses (87.5%)
  279 partial alias responses (1.3%)
  360 must alias responses (1.6%)
  Alias Analysis Evaluator Pointer Alias Summary: 9%/87%/1%/1%
  126109 Total ModRef Queries Performed
  99429 no mod/ref responses (78.8%)
  0 mod responses (0.0%)
  2345 ref responses (1.8%)
  24335 mod & ref responses (19.2%)
  Alias Analysis Evaluator Mod/Ref Summary: 78%/0%/1%/19%
  ===-------------------------------------------------------------------------===
  ... Statistics Collected ...
  ===-------------------------------------------------------------------------===
  1854 basicaa          - Number of times a GEP is decomposed
  60 bdce             - Number of instructions trivialized (dead bits)
  1 cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC
  6 constmerge       - Number of global constants merged
  43 deadargelim      - Number of unread args replaced with undef
  12 early-cse        - Number of instructions CSE'd
  13 functionattrs    - Number of arguments marked nocapture
  13 functionattrs    - Number of arguments marked readonly
  10 globalsmodref-aa - Number of functions that do not access memory
  16 globalsmodref-aa - Number of functions that only read memory
  11 gvn              - Number of instructions deleted
  8 gvn              - Number of instructions simplified
  2 gvn              - Number of loads deleted
  1 indvars          - Number of indvars widened
  1 indvars          - Number of loop exit tests replaced
  1 inferattrs       - Number of arguments inferred as readonly
  39 inline           - Number of functions inlined
  93 inline-cost      - Number of call sites analyzed
  63 instcombine      - Number of dead inst eliminated
  1 instcombine      - Number of instructions sunk
  88 instcombine      - Number of insts combined
  6 instcombine      - Number of library calls simplified
  36 lcssa            - Number of live out of a loop variables
  140 loop-simplify    - Number of pre-header or exit blocks inserted
  45 loop-unswitch    - Total number of instructions analyzed
  17 loop-vectorize   - Number of loops analyzed for vectorization
  4 memdep           - Number of block queries that were completely cached
  114 memdep           - Number of fully cached non-local ptr responses
  289 memdep           - Number of uncached non-local ptr responses
  6 reassociate      - Number of insts reassociated
  36 scalar-evolution - Number of loops with predictable loop counts
  11 scalar-evolution - Number of loops without predictable loop counts
  14 sccp             - Number of instructions removed by IPSCCP
  112 simplifycfg      - Number of blocks simplified
  6 sroa             - Number of allocas analyzed for replacement
  • Step 5: After this step, we know the aliasing percentage of the original version. We are going to modify the C code and add the restrict qualifier in the calls that satisfy the restrict requirements.

  • Step 6: After these changes, repeat steps 2 - 4 and check the output report:

  ===== Alias Analysis Evaluator Report =====
  19228 Total Alias Queries Performed
  5748 no alias responses (29.8%)
  12841 may alias responses (66.7%)
  279 partial alias responses (1.4%)
  360 must alias responses (1.8%)
  Alias Analysis Evaluator Pointer Alias Summary: 29%/66%/1%/1%
  123674 Total ModRef Queries Performed
  100143 no mod/ref responses (80.9%)
  0 mod responses (0.0%)
  2315 ref responses (1.8%)
  21216 mod & ref responses (17.1%)
  Alias Analysis Evaluator Mod/Ref Summary: 80%/0%/1%/17%
  ===-------------------------------------------------------------------------===
  ... Statistics Collected ...
  ===-------------------------------------------------------------------------===

   1854 basicaa          - Number of times a GEP is decomposed
   60 bdce             - Number of instructions trivialized (dead bits)
   1 cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC
   6 constmerge       - Number of global constants merged
   43 deadargelim      - Number of unread args replaced with undef
   12 early-cse        - Number of instructions CSE'd
   13 functionattrs    - Number of arguments marked nocapture
   13 functionattrs    - Number of arguments marked readonly
   10 globalsmodref-aa - Number of functions that do not access memory
   16 globalsmodref-aa - Number of functions that only read memory
   11 gvn              - Number of instructions deleted
   8 gvn              - Number of instructions simplified
   2 gvn              - Number of loads deleted
   1 indvars          - Number of indvars widened
   1 indvars          - Number of loop exit tests replaced
   1 inferattrs       - Number of arguments inferred as readonly
   39 inline           - Number of functions inlined
   93 inline-cost      - Number of call sites analyzed
   63 instcombine      - Number of dead inst eliminated
   1 instcombine      - Number of instructions sunk
   88 instcombine      - Number of insts combined
   6 instcombine      - Number of library calls simplified
   36 lcssa            - Number of live out of a loop variables
   140 loop-simplify    - Number of pre-header or exit blocks inserted
   45 loop-unswitch    - Total number of instructions analyzed
   17 loop-vectorize   - Number of loops analyzed for vectorization
   4 memdep           - Number of block queries that were completely cached
   97 memdep           - Number of fully cached non-local ptr responses
   308 memdep           - Number of uncached non-local ptr responses
   4184 memory-builtins  - Number of arguments with unsolved size and offset
   6 reassociate      - Number of insts reassociated
   36 scalar-evolution - Number of loops with predictable loop counts
   11 scalar-evolution - Number of loops without predictable loop counts
   14 sccp             - Number of instructions removed by IPSCCP
   112 simplifycfg      - Number of blocks simplified
   6 sroa             - Number of allocas analyzed for replacement

  • Step 7: If there was any significant improvement, you can use the generated LLVM IR. Let’s assume there were some optimisations performed.

    Step 7.1. After this, we need to convert the LLVM IR generated from the optimiser’s tool (in this case there was no optimisation, just a report) into something that can be executable. For this step, we use llc, which is the LLVM static compiler:

llc-3.8 runner.ll

which produces the runner.s file with the necessary assembly instructions.

  • Step 8: Finally, the assembly can be translated into binary code (runnable):
clang -std=gnu11 -Wall -fms-extensions -Wno-format -Wno-microsoft \
 -Wno-parentheses-equality -Wno-unused-variable -Wno-unused-value \
 -lpthread -ldl -lm -Wno-attributes \
  /Users/kikofernandezreyes/Code/encore/release/lib/*.a \
 -I /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift-migrator/sdk/MacOSX.sdk/usr/include \
 -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
 -I . -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
 -I .. /Users/kikofernandezreyes/Code/encore/release/lib/*.a \
 /Users/kikofernandezreyes/Code/encore/release/lib/*.a -O3 runner.s

Don’t forget to import dependencies (inclusion and linking to static libraries and header files).

NOTE: It’s important to note that I am not using clang-3.8 in step 8, as I started to run into errors. I believe that clang-3.8 does the work if you specify the required platform and pass more information. I work on a Mac and using the default clang for compilation and linking was easier.

  • Step 9: You have now your optimised executable produced with LLVM tools!