KylmBlog

OffensiveLLVM Part 1

Introduction to LLVM

Disclaimer: I'm a novice with LLVM—my only experience is about two days of writing passes and trying to learn how everything works. If you spot any misinterpretations or errors, let me know! ;)

Have you already heard of OLLVM, a compiler that outputs obfuscated binaries? Probably. But what is it exactly? Can we achieve something similar ?

OLLVM is based on LLVM (Low Level Virtual Machine), which serves as a backend for various compilers. LLVM lifts code into an Intermediate Representation called LLVM IR (LLIR). The code is then compiled into machine code using a toolchain. The key feature here is that you can apply transformations to the code between the IR stage and the final machine code generation. These transformations are called passes.

Passes allow you to modify the code at the granularity of individual instructions. For example, you can add functions, split basic blocks, modify constants, etc.

I’ve been interested in obfuscation since I’ve came across es3n1n’s Bin2Bin obfuscator. A full binary-to-binary obfuscator seemed daunting, so I opted for a simpler source-to-bin approach. After a year of procrastination, I spent a full weekend writing LLVM passes.

The goal of these passes is to:

  • Encrypt constants, strings, and variables, and only decrypt them at runtime.
  • Apply simple MBA (Mixed Boolean-Arithmetic) transformations.
  • Apply CFF (Control Flow Flattening).
  • And finally some “offensive things”, like replacing GetProcAddress calls by a custom manual resolution function that uses API hashing for example.

A nice thing is that since these transformations happen between IR → MC (machine code), and many languages use LLVM (C, C++, Rust, Nim, and even Go with some OSS compilers) making it quite versatile.


Compiling LLVM

I’m doing this on Windows, so some steps might differ on Linux.

git clone https://github.com/llvm/llvm-project.git
cmake -G "Ninja" -DLLVM_ROOT=llvm-project\build ..
ninja

Registration example: We mentioned earlier that passes let you change IR with instruction-level granularity. But how do you write them? LLVM supports multiple kinds of passes FunctionPass, ModulePass, LoopPass, MachineCodePass etc. But also different ways of building them, in-tree and out-of-tree. In this example, I’ll focus on out-of-tree because it’s easier and more portable and only on FunctionPass for now.

A simple pass can be decomposed into two parts:

  • Registration: How we link our pass into LLVM.
  • Execution: The actual transformation logic.

Registration example:

extern "C" LLVM_ATTRIBUTE_WEAK PassPluginLibraryInfo llvmGetPassPluginInfo() {
  return {
    LLVM_PLUGIN_API_VERSION, "ObfsPass", LLVM_VERSION_STRING,
    [](PassBuilder &PB) {
      PB.registerPipelineParsingCallback(
        [](StringRef Name, FunctionPassManager &FPM, ArrayRef<PassBuilder::PipelineElement>) {
          if (Name == "myobf") {
            FPM.addPass(ObfsPass());
            return true;
          }
          return false;
        });
    }
  };
}

In this example, we use the new pass plugin manager.

  • ObfsPass is the name of our class containing the transformation logic.
  • "myobf" is the pass name used in the command line.

To run this pass:

opt -load-pass-plugin=./bin/LLVMMyObfsPass.dll -passes=myobf   -o test_obf.ll test.ll

if we modify it a little to get output

if (Name == "myobf") {
	llvm::outs() << "Hi" << std::endl;
            FPM.addPass(ObfsPass());
            return true;
          }
     
bin\opt -load-pass-plugin=./bin/LLVMMyObfsPass.dll -passes=myobf   -o .\test_obf.ll test.ll
Hi
Hi

Compiling the Pass

In the llvm/lib/Transformation/ directory, create a folder for your pass. Then, in the CmakeList.txt of Transforms add the name of folder you created.

add_subdirectory(MyObfsPass)

In this new directory, create your own CmakeList.txt.

#SHARED is used to create a dll
add_llvm_library(LLVMMyObfsPass SHARED
  <file>.cpp
  PLUGIN_TOOL
  opt
  LINK_COMPONENTS
  Core IRReader Support AsmParser BitReader Linker
)
#Define the name of our output
set_target_properties(LLVMMyObfsPass PROPERTIES
  PREFIX ""
  OUTPUT_NAME "LLVMMyObfsPass"
  SUFFIX ".dll"
  WINDOWS_EXPORT_ALL_SYMBOLS ON
)

To find the name of the build target, you can use the following command:

ninja -t targets | findstr.exe "Obf"
lib/LLVMMyObfsPass.lib: 
[...]
LLVMMyObfsPass: phony
LLVMMyObfsPass.dll: phony
[...]

Finally, we can build the pass by running the following command in LLVM_source/build:

ninja LLVMMyObfsPass

Now everything should be setup ! Let’s begin with a very basic example.

We will go through the main function, for each basic block of the function and each instruction we get the operand and if it’s a ConstantInt, let’s modify to 42.


#include "llvm/Transforms/MyObfsPass/Obf.h"

using namespace llvm;


PreservedAnalyses ObfsPass::run(Function &F, FunctionAnalysisManager &AM) {
//print the function name
  outs() << "Processing function: " << F.getName() << "\n";
  if (F.getName() != "main") {
    outs() << "Skipping function: " << F.getName() << "\n";
    return PreservedAnalyses::all();
  }
  IRBuilder<> Builder(F.getContext());
  bool Changed = false;
  //for the current function, find parse every BasicBlocks
  for (BasicBlock &BB : F) {
  //For each BasicBlock, parse every instruction
    for (Instruction &I : BB) {
    //Get operands for all instructions
      for (unsigned i = 0; i < I.getNumOperands(); i++) {
        Value *Op = I.getOperand(i);
        //if it's a ConstantInt we change it to 42
        if (ConstantInt *CI = dyn_cast<ConstantInt>(Op)) {
          errs() << "Found constant: " << CI->getValue() << " in instruction: " << I << "\n";
          I.setOperand(i, ConstantInt::get(CI->getType(), 42));
          Changed = true;
        }
      }
    }
  }
  return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();

}
[PassPluginLibraryInfo.... ]

Let’s test it!

We will take this simple C code as an example:

#add.c
#include <stdio.h>
int main(){
        int a = 10;
        int b = 12;
        printf("a + b = %d",a+b);
        return 1;
}
clang test.c

Now we can apply the pass we wrote and see if it works:

clang -emit-llvm -S -O0 -Xclang -disable-O0-optnone -g test.c -o test.ll
opt -load-pass-plugin=./bin/LLVMMyObfsPass.dll -passes=myobf   -o test_obf.ll test.ll


Processing function: main
Found constant: 1 in instruction:   %1 = alloca i32, align 4
Found constant: 1 in instruction:   %2 = alloca i32, align 4
Found constant: 1 in instruction:   %3 = alloca i32, align 4
Found constant: 0 in instruction:   store i32 0, ptr %1, align 4
Found constant: 10 in instruction:   store i32 10, ptr %2, align 4, !dbg !33
Found constant: 12 in instruction:   store i32 12, ptr %3, align 4, !dbg !35
Found constant: 1 in instruction:   ret i32 1, !dbg !37
Processing function: _vsprintf_l
clang test_obf.ll -o a.exe

As you can see, the constants have been modified.

The Encryption/Decryption Process in Depth

Now let’s see some real passes. The first one is to find every string in our code, encrypt it at compile time and add a simple stub that will decrypt and encrypt back the string during runtime, when the string is “used”.

We will only do this for C because, C-strings are very simple: bytes (char) with a null-byte terminator (’\0’).

We can find every string using the following code:

std::vector<StringUsage> FindAllStringUsages(Function &F) {
    std::vector<StringUsage> Usages;
    for (BasicBlock &BB : F) {
        for (Instruction &I : BB) {
            for (unsigned i = 0; i < I.getNumOperands(); ++i) {
                Value *Op = I.getOperand(i);
                // Check if operand is a GlobalVariable
                auto *GV = dyn_cast<GlobalVariable>(Op);
                if (!GV)
                    continue;  // Skip if not a GlobalVariable
                // Check if GV is constant and has an initializer
                if (!GV->isConstant() || !GV->hasInitializer())
                    continue;
                // Check if initializer is a ConstantDataArray
                auto *CA = dyn_cast<ConstantDataArray>(GV->getInitializer());
                if (!CA)
                    continue;
                // Check if it's a string
                if (!CA->isString())
                    continue;
                // All conditions met - record the string usage
                Usages.push_back({&I, GV, i});
[...]

The code is awful but it works. We take every Instruction of every BasicBlock, get the Operand, and if we find a ConstantDataArray that is a string (as described before), we can get all information in our structure.

Then, we can write the code for the stub.

// void deobfuscate(i32 key, i8* str)
    std::vector<Type*> ArgTypes = {
        Type::getInt32Ty(Ctx),
        Type::getInt8Ty(Ctx)->getPointerTo()
    };
    FunctionType *FT = FunctionType::get(Type::getVoidTy(Ctx), ArgTypes, false);
    Function *DeobfFunc = Function::Create(FT, Function::ExternalLinkage, "deobfuscate", &M);

This is how we can define our function. After that, we can parse arguments and create our BasicBlock using the following code:

BasicBlock *LoopCond = BasicBlock::Create(Ctx, "loop.cond", DeobfFunc);
BasicBlock *LoopBody = BasicBlock::Create(Ctx, "loop.body", DeobfFunc);
BasicBlock *LoopEnd  = BasicBlock::Create(Ctx, "loop.end", DeobfFunc);

This code creates three LLVM basic blocks inside DeobfFunc that represent a loop’s control flow: loop.cond checks the condition, loop.body contains the loop’s instructions, and loop.end runs after the loop finishes.

For example, this is the “body” the part of the code that will use quantum-proof encryption, also known as XOR.

//Add at the end of the basic block
B.SetInsertPoint(LoopBody);
Value *Key8 = B.CreateTrunc(KeyArg, Type::getInt8Ty(Ctx));
Value *Xord = B.CreateXor(Cur, Key8);
B.CreateStore(Xord, PtrPhi); //store into *ptr
Value *Next = B.CreateGEP(Type::getInt8Ty(Ctx), PtrPhi, B.getInt32(1));
//Phi instruction merges values from different control flow paths.
PtrPhi->addIncoming(Next, LoopBody);
B.CreateBr(LoopCond);

We can finaly create our encrypted string:

ConstantDataArray *CA = cast<ConstantDataArray>(GV->getInitializer());
StringRef Str = CA->getAsString();
std::string XoredStr;
for (char C : Str) {
	if (C == '\0') {
		XoredStr += '\0';
		continue;
		}
	XoredStr += C ^ (char)(Key & 0xFF);

We initialize a new String, take the value that we find earlier XOR it.

//Init new ConstDataArray, this is our new string encrypted
Constant *NewInit = ConstantDataArray::getString(Ctx, XoredStr, false);
            GV->setConstant(false);
            GV->setInitializer(NewInit);

Finally, we add a call to our stub:

RBuilder<> Builder(UserInst);
        Value *KeyVal = ConstantInt::get(Type::getInt32Ty(Ctx), Key);
        Value *StrPtr = Builder.CreatePointerCast(GV, Type::getInt8Ty(Ctx)->getPointerTo(), "str_ptr_cast");
        //We call our function 2 time before and after to re-encrypt the string
        Builder.CreateCall(DeobfFunc, {KeyVal, StrPtr});
        if (UserInst->getNextNode()) {
            Builder.SetInsertPoint(UserInst->getNextNode());
        } else {
            Builder.SetInsertPoint(UserInst->getParent());
        }
        Builder.CreateCall(DeobfFunc, {KeyVal, StrPtr});

Just before the string is used, we add our decryption routine, and revert back to its encrypted form when done.

Modifing function

Now, we want to replace a function by another one. But why? Because it’s fun. Also we could change a call to a function like a GetProcAdress with a custom one that uses API hashing. You don’t need to think about evasion if you compiler is kind enough to do it for you.

Let’s take our previous example :

// add.c
#include <stdio.h>
int add(int a, int b){
        return (a+b);
}
int main(){
        int a = 10;
        int b = 12;
        printf("a + b = %d",add(a,b));
        return 1;
}
#We transforme it to LLIR
clang -emit-llvm -S -O0 -Xclang -disable-O0-optnone -g add.c -o add.ll

Let’s create the replacement function in an other file:

#sub.c
int sub(int a, int b){
        return (a - b);
}
llvm-project\build>clang -emit-llvm -c sub.c -o sub.bc

llvm-project\build>xxd -i sub.bc
unsigned char sub_bc[] = {
  0x42, 0x43, 0xc0, 0xde, 0x35, 0x14, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00,
  0x62, 0x0c, 0x30, 0x24, 0x4a, 0x59, 0xbe, 0x66, 0xdd, 0xfb, 0xb5, 0x9f,
  [...]

We can now import this as a header file.

In our pass, it’s possible to create a function like this one, take current module, bytecode array and size, parse it using parseBitcodeFile and link it to the current module.

bool ObfsPass::InMemoryLLVM(llvm::Module &M, const unsigned char bc[], unsigned int bc_len) {
    LLVMContext &Context = M.getContext();
    auto MemBuffer = llvm::MemoryBuffer::getMemBuffer(
        llvm::StringRef(reinterpret_cast<const char*>(bc), bc_len),
        "bytecode",
        false
    );
    Module Module = parseBitcodeFile(MemBuffer->getMemBufferRef(), Context);
    if (!Module) {
        return false;
    }
    std::unique_ptr<Module> ExternalMod = std::move(*Module);
    ExternalMod->setModuleIdentifier("external_module");
    Linker L(M);
    if (L.linkInModule(std::move(ExternalMod))) {
        return false;
    }
    return true;
}

Finally we can replace every call to add with a call to sub.

    if (!InMemoryLLVM(M, sub_bc, sub_bc_len)) {
        return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
    }
    // Now let's find it (because we linked it into current module)
    Function *SubFunction = M.getFunction("sub");
    Function *AddFunction = M.getFunction("add");

    if(!SubFunction && AddFunction ){
        return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
    }

    //replace all calls to "sub" with a call to "add"

	//get all call of AddFunction, and for each one change to SubFunction
	for (auto &U : AddFunction->uses()) {

        if (CallInst *CI = dyn_cast<CallInst>(U.getUser())) {

            CI->setCalledFunction(SubFunction);

        }

    }

and if we compile it, and execute the pass, function call should be change, call to add() should be now call to sub().

don’t look at the debugbreak it’s for later.

As you can see, we replaced the called function by another.

Next time we will see how to create MachineLevelPasses that allow to change “things” during machine code generation.

Thanks Atsika for the review