From bcdc65dc065d378b29ea47e4e9cbaa7d2ece89f3 Mon Sep 17 00:00:00 2001 From: Eric Lafortune Date: Thu, 2 Jan 2020 00:55:28 +0100 Subject: [PATCH] Added initial markdown documentation for the library. --- docs/api/analyzing.md | 41 ++++++++++ docs/api/creating.md | 34 ++++++++ docs/api/index.md | 64 +++++++++++++++ docs/api/patternmatching.md | 67 +++++++++++++++ docs/api/reading.md | 81 +++++++++++++++++++ .../api/src/proguard/examples/Preverify.java | 3 +- 6 files changed, 289 insertions(+), 1 deletion(-) create mode 100644 docs/api/analyzing.md create mode 100644 docs/api/creating.md create mode 100644 docs/api/index.md create mode 100644 docs/api/patternmatching.md create mode 100644 docs/api/reading.md diff --git a/docs/api/analyzing.md b/docs/api/analyzing.md new file mode 100644 index 00000000..e5a9634c --- /dev/null +++ b/docs/api/analyzing.md @@ -0,0 +1,41 @@ +## Basic control flow analysis + +You can extract a basic control flow graph of the instructions in a method +with the class BranchTargetFinder. The resulting graph is defined at the +instruction level: each instruction is labeled with potential branch targets +and branch origins. + + BranchTargetFinder branchTargetFinder = + new BranchTargetFinder(); + + branchTargetFinder.visitCodeAttribute(clazz, method, codeAttribute); + +Complete example: ApplyPeepholeOptimizations.java + +## Partial evaluation + +You can extract more information from a method, with partial evaluation (often +called abstract evaluation). You can tune the precision and the convergence +speed by injecting different value factories and different invocation units: + +- The value factories define the level of detail in representing values like + integers or reference types. The values can be very generic (any primitive + integer, a reference to any object) or more precise (the integer 42, or an + integer between 0 and 5, or a non-null reference to an instance of + java/lang/String). + +- The invocation units define the values returned from retrieved fields and + invoked methods. The values can again be very generic (any integer) or they + can also be values that were cached in previous evaluations. + + RangeValueFactory valueFactory = + new RangeValueFactory( + new ArrayReferenceValueFactory()); + + PartialEvaluator partialEvaluator = + new PartialEvaluator( new BasicInvocationUnit(valueFactory), + false); + + partialEvaluator.visitCodeAttribute(clazz, method, codeAttribute); + +Complete example: EvaluateCode.java diff --git a/docs/api/creating.md b/docs/api/creating.md new file mode 100644 index 00000000..b6df6e96 --- /dev/null +++ b/docs/api/creating.md @@ -0,0 +1,34 @@ +The easiest way to create a new class from scratch is with ClassBuilder. It +provides a fluent API to add fields and methods. For example, to create a +class that prints out "Hello, world!": + + ProgramClass programClass = + new ClassBuilder( + VersionConstants.CLASS_VERSION_1_8, + AccessConstants.PUBLIC, + "HelloWorld", + ClassConstants.NAME_JAVA_LANG_OBJECT) + + .addMethod( + AccessConstants.PUBLIC | + AccessConstants.STATIC, + "main", + "([Ljava/lang/String;)V", + 50, + + // + code -> code + .getstatic("java/lang/System", "out", "Ljava/io/PrintStream;") + .ldc(MESSAGE) + .invokevirtual("java/io/PrintStream", "println", "(Ljava/lang/String;)V") + .return_()) + + .getProgramClass(); + +You can also use it to add fields and methods to an existing class: + + ProgramClass programClass = + new ClassBuilder(existingClass) + ..... + +Complete example: CreateHelloWorldClass.java diff --git a/docs/api/index.md b/docs/api/index.md new file mode 100644 index 00000000..a36ae9e7 --- /dev/null +++ b/docs/api/index.md @@ -0,0 +1,64 @@ +**ProGuard Core** is a free library to read, write, modify, and analyze Java +class files. It is the core of the well-konown shrinker, optimizer, and +obfuscator [ProGuard](https://www,guardsquare.com/products/proguard). + +Typical applications: + +- Perform peephole optimizations in Java bytecode. +- Search for instruction patterns. +- Analyze code with abstract evaluation (partial evaluation). +- Advanced processing like ProGuard's optimization and obfuscation. + +## Design + +The library defines many small classes as the building blocks for applications +that contain the real logic. This is sometimes taken to the extreme: even +loops and conditional statements can often be implemented as separate classes. +Even though these classes are verbose and repetitive, the resulting main code +becomes much more compact, flexible, and robust. + +### Data classes + +Basic data classes define the Java bytecode data structures. They reflect the +Java bytecode specifications literally, to ensure that no data are lost when +reading, analyzing, and writing them. The data classes contain only a minimum +number of methods. They do have one or more accept methods to let the visitor +classes below operate on them. + +### Visitors + +The library applies the visitor pattern extensively. Visitor classes define +the operations on the data: reading, writing, editing, transforming, +analyzing, etc. The visitor classes have one or more visit methods to operate +on data classes of the same basic type. + +For example, a Java bytecode class contains a constant pool with constants of +different types: integer constants, float constants, string constants, etc. +The data classes IntegerConstant, FloatConstant, StringConstant, etc. all +implement the basic type Constant. The visitor interface ConstantVisitor +contains methods 'visitIntegerConstant', 'visitFloatConstant', +'visitStringConstant', etc. Implementations of this visitor interface can +perform all kinds of operations on the constants. + +The reasoning behind this pattern is that the data classes are very stable, +because they are directly based on the bytecode specifications. The operations +are more dynamic, since they depend on the final application. It is +practically impossible to add all possible operations in the data classes, but +it is easy to add another implementation of a visitor interface. Implementing +an interface in practice helps a lot to think of all possible cases. + +The visitor pattern uses visitor interfaces to operate on the similar elements +of a data structure. Each interface often has many implementations. A great +disadvantage at this time is that visitor methods can invoke one another +(directly or indirectly), but they can't communicate easily. Since the +implementations can't add their own parameters or return values, they often +have to rely on fields to pass values back and forth. This is more +error-prone. Still, the advantages of the visitor pattern outweigh the +disadvantages. + +### Dependency injection + +The library classes heavily use _constructor-based dependency injection_, to +create immutable instances. Notably the visitor classess are often like +commands that are combined in an immutable structure, via constructors. You +can execute such commands by applying the visitors to the data classes. diff --git a/docs/api/patternmatching.md b/docs/api/patternmatching.md new file mode 100644 index 00000000..e80f1ced --- /dev/null +++ b/docs/api/patternmatching.md @@ -0,0 +1,67 @@ +## Basic pattern matching + +The library has some powerful support to match patterns in bytecode +instruction sequences. You first define the pattern as a sequence of +instructions, with wildcards. For example: + + final int X = InstructionSequenceMatcher.X; + final int C = InstructionSequenceMatcher.C; + + InstructionSequenceBuilder ____ = + new InstructionSequenceBuilder(); + + Instruction[] pattern = + ____.iload(X) + .bipush(C) + .istore(X).__(); + + Constant[] constants = ____.constants(); + +You can then find that pattern in given code: + + clazz.accept( + new AllMethodVisitor( + new AllAttributeVisitor( + new AllInstructionVisitor( + new MyMatchPrinter( + new InstructionSequenceMatcher(constants, pattern)))))); + +Complete example: ApplyPeepholeOptimizations.java + +## Pattern replacement + +Instead of just matching instruction sequences, you can also replace matched +sequences by other instruction sequences, for example to optimize code or +instrument code. Say that you want to replace am instruction sequence +"putstatic/getstatic" by an equivalent "dup/putstatic": + + InstructionSequenceBuilder ____ = + new InstructionSequenceBuilder(); + + Instruction[][] replacements = + { + ____.putstatic(X) + .getstatic(X).__(), + + ____.dup() + .putstatic(X).__() + }; + + Constant[] constants = ____.constants(); + + BranchTargetFinder branchTargetFinder = new BranchTargetFinder(); + CodeAttributeEditor codeAttributeEditor = new CodeAttributeEditor(); + + clazz.accept( + new AllMethodVisitor( + new AllAttributeVisitor( + new PeepholeEditor(branchTargetFinder, codeAttributeEditor, + new InstructionSequenceReplacer(constants, + replacements, + branchTargetFinder, + codeAttributeEditor))))) + +You can define multiple patterns and their respective replacements in one go, +with the wrapper InstructionSequencesReplacer. + +Complete example: ApplyPeepholeOptimizations.java diff --git a/docs/api/reading.md b/docs/api/reading.md new file mode 100644 index 00000000..af8294f0 --- /dev/null +++ b/docs/api/reading.md @@ -0,0 +1,81 @@ +## Streaming classes from a jar file + +You can read classes from class files and various types of (nested) jar files +or jmod files, with some convenient utility classes and visitors. For example, +you can read the classes from a jar file and print them out in a streaming +fashion, without collecting their repreesentations: + + DirectoryPump directoryPump = + new DirectoryPump( + new File(inputJarFileName)); + + directoryPump.pumpDataEntries( + new JarReader( + new ClassFilter( + new ClassReader(false, false, false, false, null, + new ClassPrinter())))); + +Note the constructor-based dependency injection of visitor classes (and the +slightly unconventional indentation to make it easy to read). + +Complete example: PrintClasses.java + +## Collecting classes + +Alternatively, you may want to collect the classes in a so-called class pool +first, so you can perform more extensive analyses on them: + + ClassPool classPool = new ClassPool(); + + DirectoryPump directoryPump = + new DirectoryPump( + new File(jarFileName)); + + directoryPump.pumpDataEntries( + new JarReader(isLibrary, + new ClassFilter( + new ClassReader(isLibrary, false, false, false, null, + new ClassPoolFiller(classPool))))); + +Complete example: Preverify.java + +## Writing out streamed classes + +You can read classes, optionally perform some small modifications, and write +them out right away, again in a streaming fashion. + + JarWriter jarWriter = + new JarWriter( + new ZipWriter( + new FixedFileWriter( + new File(outputJarFileName)))); + + DirectoryPump directoryPump = + new DirectoryPump( + new File(inputJarFileName)); + + directoryPump.pumpDataEntries( + new JarReader( + new ClassFilter( + new ClassReader(false, false, false, false, null, + new DataEntryClassWriter(jarWriter))))); + + jarWriter.close(); + +Complete example: ApplyPeepholeOptimizations.java + +## Writing out a set of classes + +If you've collected a set of classes in a class pool, you can write them out +with the same visitors. + + JarWriter jarWriter = + new JarWriter( + new ZipWriter( + new FixedFileWriter( + new File(outputJarFileName)))); + + programClassPool.classesAccept( + new DataEntryClassWriter(jarWriter)); + + jarWriter.close(); diff --git a/examples/api/src/proguard/examples/Preverify.java b/examples/api/src/proguard/examples/Preverify.java index e5678946..2ed2c0e0 100644 --- a/examples/api/src/proguard/examples/Preverify.java +++ b/examples/api/src/proguard/examples/Preverify.java @@ -70,7 +70,8 @@ public class Preverify // Parse all classes from the input jar and // collect them in the class pool. DirectoryPump directoryPump = - new DirectoryPump(new File(jarFileName)); + new DirectoryPump( + new File(jarFileName)); directoryPump.pumpDataEntries( new JarReader(isLibrary,