Archive

MSIL Injection

MSIL Injection

MSIL Injection, or MSIL Insertion, is the process of modifying the MSIL instructions of an existing method. One says that we inject, or insert new instructions into an existing flow.

Injecting IL instructions is just a part of IL-level weaving. Modifying an existing code requires the following tasks to be done:

  1. Reading the metadata of the .NET module.
  2. Decoding the stream of IL instructions in a meaningful way.
  3. Detecting the points where new instructions should be injected.
  4. Restructuring the method body if exception handlers have to be added.
  5. Injecting IL instructions properly speaking.
  6. Assembling back the in-memory representation to a binary file.

These tasks are performed by an IL reader/writer, excepted the task 3 (detection of injection points), which is typically the task of the code weaver.

How to inject MSIL instructions?

If you are decided to inject directly MSIL instructions, prepare a stock of aspirins and eventually prepare your girlfriend to being absent from home. Then you may choose between the following approaches.

Standard System.Reflection/System.Reflection.Emit API

Since it ships with the .NET Framework, why not to use it? Well, these APIes seem adequate to create brandly new classes and methods, but are less appropriate to properly inject instructions, that is, modify an existing method. The principal problem is that the API is strictly stream oriented: you have to write the whole method in a single pass. This is not always feasible and is never easy.

Another problem is that the System.Reflection API does not give an exact image of a .NET module. It could cover 99% of your needs, but you will not have the possibility to support the last percent. For instance, System.Reflection does not make the difference between void* and System.IntPtr or int32 and System.Int32. Other advanced features like security attributes or marshaling may have an incomplete support.

Mono Cecil

Mono Cecil is a valid choice if you are looking for an IL reader/writer. It has a solid community and some commercial applications, which is a good assurance of quality. Even if the project is related to Mono, it supports also the Microsoft implementation. Read more on http://www.mono-project.com/Cecil.

PostSharp

PostSharp contains a IL reader/writer that covers the complete .NET specification for managed code. The APIes are optimized for high usability, so some users say it is easier to use than Cecil. The greater difference with Cecil is that PostSharp is designed as a platform where the IL reader/writer is only one component. The platform takes in charge the complete post-compilation process, including the integration with MSBuild, and offers a lot of additional services typically used by code weavers (type hierarchy analysis, use/used by analysis, ...).

Microsoft Phoenix

Phoenix is the new framework for the next generation of compilers at Microsoft. It abstracts the target machine, so it can work with MSIL assembly as well as Intel x86 assembly. Some users say that the price to pay is a greater complexity. The greatest advantage is of course the certainty of support and continuity. But pay attention to the license, it is currently reserved for academical research.

Rail

The initial aim of Rail is to implement an API that allows CLR assemblies to be manipulated and instrumented before they are loaded and executed. It uses static weaving of IL instructions. The APIes of Rail are often used as an IL reader/writer with some more advanced weaving capabilities. More on http://rail.dei.uc.pt/.

Extending the Language

The most popular approach so far has been to define extensions to the base language, i.e. a new language based on the language of the aspected code. The de-facto standard in the Java culture is AspectJ. The following example is taken from the AspectJ Programming Guide:

 1 aspect FaultHandler {
 2
 3   private boolean Server.disabled = false;
 4
 5   private void reportFault() {
 6     System.out.println("Failure! Please fix it.");
 7   }
 8
 9   public static void fixServer(Server s) {
10     s.disabled = false;
11   }
12
13   pointcut services(Server s): target(s) && call(public * *(..));
14
15   before(Server s): services(s) {
16     if (s.disabled) throw new DisabledException();
17   }
18
19   after(Server s) throwing (FaultException e): services(s) {
20     s.disabled = true;
21     reportFault();
22   }
23 }

As you can see, lines 3-11 is pure Java but 13-22 use the AspectJ extension. Even if the purpose of this section is not to explain the AspectJ syntax, note the pointcut, before and after keywords.

The advantage of this approach is that the semantics of the language adapted on purpose, so the resulting code is clean and consistent. No 'trick'. A drawback is that this approach is language-dependent, if your aim is to develop a weaver, you have to develop a code enhancer for every targeted language. See the Compile-Time Weaving techniques for details. A weaver developer should also ideally provide integration with the IDE (Intellisense).

It is not possible to write an aspect weaver of this type using PostSharp, because PostSharp supposes that the code has already been compiled. Also, Runtime Weaving techniques are all unadapted for this scenario.

Glossary

Glossary

You will often meet the following terms in AOP literature. I will try to explain in plain English...

Join point:
A location or a 'point' in the program. For instance: the entry or the exit of a method, a field access, ...
Pointcut:
A 'query' that 'selects' join points
Advice:
A piece of code that alters the behavior of the program. Advices are inserted at join points.
Aspect:
The encapsulation of a cross-cutting concern. Typically a set of advices.
Weaving:
The process of injecting advices at join points, i.e. the process of modifying the behavior of the program.