I want to discuss Software Watermarking techniques, because I'm going to use them in my Stoned project. They provide watermarking digitally at the lowest level - and protect intellectual property. It's useful if you plan to make your software open source or contains confidential information. I have seen it in malware already parts that I could assign to a software watermark, but more about that later. I am talking here about how to insert a watermark into software - at the lowest level, assembly language. Watermarking should provide a secure way to basically identify and assign software to a unique identifier (whether a software vendor or customer id).
Peter Kleissner, Software Architect (April 2009)
Software Watermarking via Assembly Code Transformations
Software watermarking means to digitally be able to add and verify a watermark in software code. The best way to do this is to do it at the lowest level, at the assembly code. C++ for example would not be recommended because the generated code differs with platforms and compilers. Assembler, in comparison, is the lowest stage. Of course this assumes to either add/modify assembly code to a project or to have a project which is already written in assembly language. For adding unique watermarks we have following techniques
- Adding junk code
- Modfying code to fit to a pattern but still have equality in execution
The first way is presented in the paper which I have linked at the end. It means to add junk assembly code which identifies the source. Let's take a look at following example:
push eax mov eax,12 pop eax
The code is declared as junk because no effective operation was made - after the three lines eax was not modified. However, we can extract in this example the customer id (12). This id could be now used to look up the license taker or other various informations. While such a watermark (more like a signature) could be easy detected and assigned, it could also be the same easy detected and removed. Apparently, there are already many plugins for debuggers existing to detect and remove junk code, so this digital signature technique would fail in strong security.
The second technique is to modify the code to fit to a pattern, but still be equal in its runtime execution. Take a look on following translation:
xor eax,eax -> mov eax,0 inc eax -> add eax,1
The code does exactly the same, the transformed one is only a bit slower. Of course it is more difficult to create such an algorithm (which transforms the code and finds transformed one) but it is much harder to detect from removers point of view. If you know that you want to look for a specific assembler instruction transformation then it's easy to find, but the hacker doesn't know what to look for.
It is recommended to create such a transformation pattern over 3 paragraphs (48 bytes), to be reliable to detect. If the software is written in assembly language, the code transformation can occur on source code base. This is very useful for Open Source Software, to protect intellectual property and rights.
Like given in the below linked paper, a Transformation File in connection with a customer id is used to create the watermark. Ther transformation file contains the information how to modify (transform) the code in order to fit to a pattern and be recognized as watermark. There should be different types of transformations appearing in the transformation file to provide a strong security set. The watermark (transformation) will include the customer id, so that later the origin can be extracted from a watermark.
In the above first example the transformation file would include the junk code and the information that the customer id is stored using the "mov eax,%id" instruction. The second example would define how to transform one instruction into another, in order to fit to the watermark pattern. It is important to understand that the transformation file must be kept confidential - in companies the access must be very limited (even to developers).
I want to give some examples and ideas for the first method:
Adding junk code - this can also be junk instructions like adding mov ecx,1984 somewhere in the code where ecx is unused xor ebx,0 nop nop add eax,12 sub eax,12
And some ideas for the second method:
Assembler Instruction Transformations which are equal in execution: xor eax,eax -> mov eax,0 -> sub eax,eax -> and eax,0 inc eax -> add eax,1 shl edx,1 -> imul edx,2 rol eax,16 -> ror eax,16 xchg eax,eax = nop
Of course there could be gone one step further, to metamorph assembler instructions. It would the second method but harder to detect and far more calculations would be necessary (relocation etc.). Metamorphic assembler code means to express one single assembler instruction with multiple others. This is always possible (for example push, loop, stosd) - but with the disadvantage that the resulting code has a bigger size and is much slower.
Software Watermarking through Assembly Code Transformations is a secure way to protect intellectual rights. It allows unique identification of software code. It is very useful to identify a leak of software. The amount of work to watermark its own software is quite low.