Starting with LLVM and Encore

8 minute read

Published: October 17, 2016

For my PhD, I need to work on typed optimisations for the Encore language. For fun and a bit for learning how to do this, I am going to describe the steps I take to perform alias analysis using different tools from the LLVM suite.

For instance, let’s say I would like to know if there could be any potential benefit in using the restrict qualifier in the C code generated by Encore. To do this, I am going to use the PingPong benchmark, and explain how to generate C code from Encore, transform C to LLVM IR, link different files together, perform alias analysis, optimise the code (possibly) and generate an assembly file that can be compiled by the system’s compiler to produce an executable.

Steps:

Step 1: encorec Main.enc transpiles Encore to C
Step 2: For each *.c file, compile the file to the LLVM IR (generating files with extension *.ll):

clang-3.8 -S -emit-llvm -std=gnu11 -Wall -fms-extensions -Wno-format\
 -Wno-microsoft -Wno-parentheses-equality -Wno-unused-variable\
 -Wno-unused-value -Wno-attributes\
 -I /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift-migrator/sdk/MacOSX.sdk/usr/include\
 -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
 -I . -I /Users/kikofernandezreyes/Code/encore/release/inc/ -O3 shared.c

Step 3: After this step, you need to link all the files LLVM IR together, for which you can use llvm-link-3.8:

llvm-link-3.8 -S -v -o runner.ll Main.encore.ll shared.ll PingActor.encore.ll \
  PongActor.encore.ll String.encore.ll

With the -S option, I specify that I would like to produce the output in LLVM IR.

Step 4: Now that everything has been linked together, we can perform various analysis. In this particular case, we are going to perform aliasing analysis:

opt-3.8 -O3 -S -basicaa -aa -adce -scev-aa -tbaa -aa-eval runner.ll -stats -o runnerOpt.ll

Some of the flags perform different kinds of alias analysis. The output is written to stdout:

  ===== Alias Analysis Evaluator Report =====
Total Alias Queries Performed
no alias responses (9.4%)
may alias responses (87.5%)
partial alias responses (1.3%)
must alias responses (1.6%)
  Alias Analysis Evaluator Pointer Alias Summary: 9%/87%/1%/1%
  126109 Total ModRef Queries Performed
no mod/ref responses (78.8%)
mod responses (0.0%)
ref responses (1.8%)
mod & ref responses (19.2%)
  Alias Analysis Evaluator Mod/Ref Summary: 78%/0%/1%/19%
  ===-------------------------------------------------------------------------===
  ... Statistics Collected ...
  ===-------------------------------------------------------------------------===
basicaa          - Number of times a GEP is decomposed
bdce             - Number of instructions trivialized (dead bits)
cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC
constmerge       - Number of global constants merged
deadargelim      - Number of unread args replaced with undef
early-cse        - Number of instructions CSE'd
functionattrs    - Number of arguments marked nocapture
functionattrs    - Number of arguments marked readonly
globalsmodref-aa - Number of functions that do not access memory
globalsmodref-aa - Number of functions that only read memory
gvn              - Number of instructions deleted
gvn              - Number of instructions simplified
gvn              - Number of loads deleted
indvars          - Number of indvars widened
indvars          - Number of loop exit tests replaced
inferattrs       - Number of arguments inferred as readonly
inline           - Number of functions inlined
inline-cost      - Number of call sites analyzed
instcombine      - Number of dead inst eliminated
instcombine      - Number of instructions sunk
instcombine      - Number of insts combined
instcombine      - Number of library calls simplified
lcssa            - Number of live out of a loop variables
loop-simplify    - Number of pre-header or exit blocks inserted
loop-unswitch    - Total number of instructions analyzed
loop-vectorize   - Number of loops analyzed for vectorization
memdep           - Number of block queries that were completely cached
memdep           - Number of fully cached non-local ptr responses
memdep           - Number of uncached non-local ptr responses
reassociate      - Number of insts reassociated
scalar-evolution - Number of loops with predictable loop counts
scalar-evolution - Number of loops without predictable loop counts
sccp             - Number of instructions removed by IPSCCP
simplifycfg      - Number of blocks simplified
sroa             - Number of allocas analyzed for replacement

Step 5: After this step, we know the aliasing percentage of the original version. We are going to modify the C code and add the restrict qualifier in the calls that satisfy the restrict requirements.
Step 6: After these changes, repeat steps 2 - 4 and check the output report:

  ===== Alias Analysis Evaluator Report =====
Total Alias Queries Performed
no alias responses (29.8%)
may alias responses (66.7%)
partial alias responses (1.4%)
must alias responses (1.8%)
  Alias Analysis Evaluator Pointer Alias Summary: 29%/66%/1%/1%
  123674 Total ModRef Queries Performed
  100143 no mod/ref responses (80.9%)
mod responses (0.0%)
ref responses (1.8%)
mod & ref responses (17.1%)
  Alias Analysis Evaluator Mod/Ref Summary: 80%/0%/1%/17%
  ===-------------------------------------------------------------------------===
  ... Statistics Collected ...
  ===-------------------------------------------------------------------------===

basicaa          - Number of times a GEP is decomposed
bdce             - Number of instructions trivialized (dead bits)
cgscc-passmgr    - Maximum CGSCCPassMgr iterations on one SCC
constmerge       - Number of global constants merged
deadargelim      - Number of unread args replaced with undef
early-cse        - Number of instructions CSE'd
functionattrs    - Number of arguments marked nocapture
functionattrs    - Number of arguments marked readonly
globalsmodref-aa - Number of functions that do not access memory
globalsmodref-aa - Number of functions that only read memory
gvn              - Number of instructions deleted
gvn              - Number of instructions simplified
gvn              - Number of loads deleted
indvars          - Number of indvars widened
indvars          - Number of loop exit tests replaced
inferattrs       - Number of arguments inferred as readonly
inline           - Number of functions inlined
inline-cost      - Number of call sites analyzed
instcombine      - Number of dead inst eliminated
instcombine      - Number of instructions sunk
instcombine      - Number of insts combined
instcombine      - Number of library calls simplified
lcssa            - Number of live out of a loop variables
loop-simplify    - Number of pre-header or exit blocks inserted
loop-unswitch    - Total number of instructions analyzed
loop-vectorize   - Number of loops analyzed for vectorization
memdep           - Number of block queries that were completely cached
memdep           - Number of fully cached non-local ptr responses
memdep           - Number of uncached non-local ptr responses
memory-builtins  - Number of arguments with unsolved size and offset
reassociate      - Number of insts reassociated
scalar-evolution - Number of loops with predictable loop counts
scalar-evolution - Number of loops without predictable loop counts
sccp             - Number of instructions removed by IPSCCP
simplifycfg      - Number of blocks simplified
sroa             - Number of allocas analyzed for replacement

Step 7: If there was any significant improvement, you can use the generated LLVM IR. Let’s assume there were some optimisations performed.
Step 7.1. After this, we need to convert the LLVM IR generated from the optimiser’s tool (in this case there was no optimisation, just a report) into something that can be executable. For this step, we use llc, which is the LLVM static compiler:

llc-3.8 runner.ll

which produces the runner.s file with the necessary assembly instructions.

Step 8: Finally, the assembly can be translated into binary code (runnable):

clang -std=gnu11 -Wall -fms-extensions -Wno-format -Wno-microsoft \
 -Wno-parentheses-equality -Wno-unused-variable -Wno-unused-value \
 -lpthread -ldl -lm -Wno-attributes \
  /Users/kikofernandezreyes/Code/encore/release/lib/*.a \
 -I /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift-migrator/sdk/MacOSX.sdk/usr/include \
 -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
 -I . -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
 -I .. /Users/kikofernandezreyes/Code/encore/release/lib/*.a \
 /Users/kikofernandezreyes/Code/encore/release/lib/*.a -O3 runner.s

Don’t forget to import dependencies (inclusion and linking to static libraries and header files).

NOTE: It’s important to note that I am not using clang-3.8 in step 8, as I started to run into errors. I believe that clang-3.8 does the work if you specify the required platform and pass more information. I work on a Mac and using the default clang for compilation and linking was easier.

Step 9: You have now your optimised executable produced with LLVM tools!

Share on

Twitter Facebook Google+ LinkedIn

Dr Kiko Fernandez-Reyes

Starting with LLVM and Encore

Share on

You May Also Enjoy

Introduction to affine types in Encore

Integrating Task Parallelism with Actors

Optimus

Stack vs Heap allocation. Is it worth it?