This article is a guide to reverse engineer Simatic S7 PLC program blocks. 1
Last revision: May 10 2022.
PLC (Programmable Logic Controllers) are specialized computers designed to control industrial systems having real-time processing requirements. They take inputs provided by sensors and generate outputs for actuators. As programmable devices, they execute user-provided software and therefore are susceptible to some classes of software attacks. The most publicized demonstration of that was made by the Stuxnet malware, whose end-goal was to take control, damage, and destroy arrays of centrifuges in a uranium enrichment plant. The analysis of the malicious PLC payload proved to be a long and tedious road 2, and up to this day, tooling and knowledge related to those systems remain limited relative to broadly-known architectures such as x86 or arm.
We attempt to bridge some of this gap by providing S7 analysis modules for JEB Pro. This article shows how they can be used to acquire, analyze, disassemble and decompile PLC program blocks intended to run on Siemens Simatic S7-300 and S7-400 devices, a very popular line of PLC used to operate industrial processes.
Throughout the rest of this document, the terms PLC, S7 or S7 PLC are used interchangeably to refer to S7-300 or S7-400 PLC devices. Newer devices in the S7 product line, namely the S7-1200 and S7-1500, are not supported by this JEB extension and won’t be considered here.
The official IDE used to program S7 PLC is called Step 7. Step 7 may be used as-is or as a part of the larger software suite Totally Integrated Automation (TIA).
A PLC program is made of blocks, such as data blocks, function blocks, and organization blocks. In this document, the term program may be understood as (collection of) blocks.
A program is downloaded to a PLC from a Programming Station, that is, a Windows-based computer running the Step 7 editor. When a program is retrieved from a PLC, it is uploaded to the programming station.
The assembly language STL (Statements List) and its bytecode counterpart, MC7, are sometimes used interchangeably.
Finally, the names Simatic, Step 7, and Totally Integrated Automation are trademarks of Siemens AG (“Siemens”).
This section briefly presents what S7 programs are, their structure, as well as lower level details important to know from a reverse engineering perspective.
S7 PLC are programmed using Step 7 or TIA’s Step 7 (TIA is a platform required to program the most recent S7 devices), the IDE running on a Windows computer referred to as the Programming Device. Once the program is written, it can be downloaded onto a physical PLC or a simulator program (such as PLCSIM, part of Step 7).
A PLC program is a collection of blocks. Blocks have a type (data, code, etc.) and a number.
The distinction between FB and FC is subtle. Any FB could be written to perform equivalently as an FC, and vice versa. They exist as an easy way to distinguish between a function working as-is, like a C routine would (FC), and a function working on a collection of pseudo-encapsulated attributes, like a C++ class method would (FB).
There are various ways to write PLC code. Programmers may choose to write ladder diagrams (LAD) or function block diagrams (FBD); complex processes may be better expressed in statements list (STL) or in a high-level Pascal-like language (SCL). Regardless of source languages, the program is compiled to MC7 bytecode, whose specifications are not public.
A piece of MC7 bytecode is packaged in a block, along with some metadata (authoring information, flags, etc.) and the interface of the block. The interface of a data block is the block definition itself, a structure type. The interface of a logic block is its set of inputs, outputs, local variables, as well as static variables in the case of a FB, or return value in the case of a FC.
PLC may be programmed using a variety of methods, such as:
Step 7 compiles all source codes to MC7 bytecode, a representation that will be translated and executed by a virtual machine running on the PLC.
STL was relatively well-documented up until the S7-400 3. However, the binary specifications are not public at the time of writing. 4
The MC7 instructions map STL statements, with several notable exceptions (e.g. STL’s CALL is translated to UC/CC with additional code to prepare the Address Register pointer, opened Data Block, set up parameters on the Locals memory area in the case of FC/SFC call, etc.).
The execution environment for MC7 bytecode is the following:
JEB’s MC7 plugin mirrors the execution environment, and adds several synthetic (artificial) registers to help with MC7 code representation and code translation to IR for the decompiler. The processor details can be examined in the GUI client (menu Native, handler Processor Registers).
Familiarity with STL is a topic that PLC reverse engineers will need to get familiar with. However, a complete and detailed guide to general STL programming is outside the scope of this document. Specific STL instructions will be discussed as need-be.
The instructions are grouped into the following categories:
A summarized html version 5 of the reference STL documentation can be found on our website:
Instructions carry 0 or 1 operand. The operand type can be one of the following:
L MB 300
: load the global byte at address 300 (decimal) into ACCU1L L#1000
: load the double-integer value 1000 into ACCU1= I [MD 100]
: assign RLO to the input bit at X, where X is the pointer located at offset 100 of the global memory (M)X I [AR1, P#30.4]
: binary-xor RLO with the input bit located at *(AR1+30.4)AN [AR1, P#10.0]
: binary-and-not RLO with the bit located at *(AR1+10.0), the target area is specified in the MSB of AR1T QW [AR2, P#2.0]
: transfer ACCU1L to the word located at *(AR2+2.0)A I 2.0
: binary-and RLO with the input bit 2.0 (bit #0 of byte 2)O Q 40.4
: binary-or RLO with the output bit 40.4JU 15
: jump to “instruction address + 2 *15”T Z#6.0
: transfer ACCU1 to the third parameterNOP 0
NOP 1
Interestingly, some instructions encode the type of operand immediate (this allows for unambiguous STL code rendering). Below is a list of examples with the L
instruction, which loads ACCU1 with an immediate value. Note that the immediates are encoded big-endian:
TYPE INSTRUCTION BYTECODE IMM. (BE, 8- 16- or 32- bit) bin32 L 2#10101010 300200aa 0x00aa dec16 L 1000 300303e8 0x03e8 dec32 L L#1000000 3803000f4240 0x000f4240 hex8 L B#16#45 2845 0x45 hex16 L W#16#6677 30076677 0x6677 hex32 L DW#16#11223344 380711223344 0x11223344 float32 L 3.14 38014048f5c3 0x4048f5c3 char1 L 'z' 3005007a 0x007a char2 L 'ab' 30056162 0x6162 char4 L 'abcd' 380561626364 0x61626364 bytes2 L B#(3, 6) 30060306 0x0306 bytes4 L B#(3, 6, 7, 8) 380603060708 0x03060708 bcd L C#345 30080345 0x345 pointer L P#100.2 380400000322 0x00000322 (area NOT specified) pointer L P#M 10000.0 380483013880 0x83013880 (area specified) time L T#10s31ms 38090000272f 0x0000272f date L D#2022-4-25 300a2e1a 0x2e1a tod L TOD##16:20:59.100 380b03821e5c 0x03821e5c s5t L S5T#1m40s 300c2100 0x2100
The types used in STL or MC7 are described in the next section.
Newcomers to STL may be baffled by this type of code:
// assume a new routine
A I 0.0 // 1. binary-and
A I 0.1 // 2. binary-and
= Q 1.0 // 3. assign the result (in RLO) to output bit 1.0
If "A <SRC>"
means "RLO = RLO & <SRC>"
, what does line (1) do, and does it depend on the value of RLO at (1)? The general case answer is no. A more precise translation of A
would be:
if FC == 0: RLO = SRC FC = 1 else: RLO = RLO & SRC
If the FC flag is false, RLO takes the value of the source bit. What is the value of FC then? At the beginning of a program, it is false (because the sub-routine dispatch instructions – such as UC
– set it to 0). It is also set to false after an end-of-logic-string operation, such as =
(assign the RLO to a destination).
Every block, code or data, has an interface that defines…
The interface of an FC block consists of at most 4 sections. The order matters.
IN
: Input parametersRET
: single return valueIN_OUT
: input/output parametersOUT
: output parameters (any number of returned values)The interface of an FB block consists of at most 4 sections (they are not the same as FC’s though). The order matters as well, since it determines the memory layout of the associated DB.
IN
: input parametersOUT
: output parametersIN_OUT
: input/output parametersSTATIC
: the static data (held by the associated instance DB, and laid out right after the parameter data, that is, IN/OUT/IN_OUT)The interface of a logic block may also defines a TEMP
area, holding temporary local variables (area L). Note that the local storage, just like any other storage, may be accessed without the need to be defined in an interface. Example:
L LB 3 ; load the byte at 0x3 in local storage into ACCU1
T QB 4 ; transfer ACCU1 to the output byte at 0x4
In practice, L-variables are going to be defined for most user-generated code. However, many synthetic statements generated by the compiler for behind-the-scene operations use L-variables that are located after what’s defined by the interface of a logic block.
The binary interfaces located in compiled blocks do not carry the names used when defining those interfaces.
The variables defined in an interface belong to three general categories:
=> Elementary types: ("normal" types) TYPE BITSIZE DESCRIPTION BOOL 1 single bit stored on 1 byte BYTE 8 unsigned integer CHAR 8 ascii character WORD 16 unsigned integer INT 16 signed integer DWORD 32 unsigned integer DINT 32 signed integer REAL 32 ieee-754 fp32 number DATE 16 date (number of days since Jan 1 1990) S5TIME 16 elapsed time in [0, 2h46m30s] (*) TIME 32 elapsed time in ms, range +/- ~24d20h TIME_OF_DAY 32 time of day in ms since midnight => Complex types: ("normal" types, continued) TYPE BITSIZE DESCRIPTION DATE_AND_TIME 64 timestamp (*) STRING[n] var strings, 16 to 2048 bits, n in [0,254] (*) ARRAY var N-dimensional arrays (*) STRUCT var structures => Parameter types: ("special" types, used in IN/OUT/IN_OUT sections) TYPE BITSIZE DESCRIPTION POINTER 48 pointers (*) ANY 80 pointers with size (*) TIMER 16 timer number COUNTER 16 counter number BLOCK_FB 16 FB number BLOCK_FC 16 FC number BLOCK_DB 16 DB number BLOCK_SDB 16 SDB number (*) details follow
JEB generates equivalent native types. They carry the same names and may be examined with the Type Editor in the GUI (menu Native, handler Type Editor).
Most types are self-explanatory. A few types require additional information.
The S5TIME type is essentially a BCD (binary coded decimal) value ranging from 0 to 999 (in 1/10s), with a multiplier from 1 to 1000, stored on a word. The maximum value is therefore 9990 seconds, which is 2h46m30s.
This type, also referred to as DT, holds a date/time value (similar to another type S7TIME (described later), although the S7TIME uses 6-byte instead of 8). It is limited to dates after Jan 1 1984. Each component of the DT is BCD-coded:
Byte Value Description 0 Year 90-99=>1990-1999, 00-89=>2000-2089 1 Month 1 to 12 2 Day 1 to 31 3 Hour 0 to 23 4 Minute 0 to 59 5 Second 0 to 59 6 (hi) Millis2 0 to 9 (*100) 6 (lo) Millis1 0 to 9 (*10) 7 (hi) Millis0 0 to 9 7 (lo) DoW 1 to 7 (1=Sunday)
Examples of encodings:
90010100 00000002: DT#1990-1-1:0:0:0.0 22031406 13281232: DT#2022-3-14-6:13:28.123
Array types of single- or multi-dimensional types whose element type may be any primitive of complex type, with the exception of ARRAY.
Note that it is common practice for PLC programmers to use non-zero based arrays, e.g. ARRAY[1 ..10, 1..20 ] of INT
. The first element of this two-dimensional array would be [1,1]. Therefore the translated code to access an element [x,y] in memory is slightly more elaborate than RowLength*x+y
, it would be RowLength*(x-1)+(y-1)
.
The string types are fixed-length arrays of single-byte characters. They can hold from 0 to 254 characters. The layout in memory is as follows:
M L A(0) ... A(n-1) where: M is a byte holding the maximum length L is the current string length (L <= M) A(i) are the string bytes Example of a STRING[8]: 08 05 41 41 41 41 41 00 00 00 would be the 5-char string 'AAAAA', which can accommodate up to 8 characters
The string types are STRING[0], STRING[1], STRING[2], …, STRING[254]. The STRING type is an alias for STRING[254].
Just like other complex types (arrays, structs, DT), string types are always 16-byte aligned in memory.
The pointer type (referred to as MC7 pointer in this document) is used to reference the address of a variable. It is 6-byte long, and made of two parts:
A MC7 address has the following bit layout:
AAAAAAAA 00000BBB BBBBBBBB BBBBBXXX
where:
A is the area code
B the address in bytes [0,65535]
X the bit position in [0,7]
The area codes are as follows: (reference: S7.AreaType)
0x00: no area 0x81: I (digital input) 0x82: Q (digital output) 0x83: M (global memory) 0x84: DB (shared DB) 0x85: DI (instance DB) 0x86: L (local data, i.e. the stack) 0x87: V (previous local data, i.e. the caller's stack)
The diagram below summarizes the memory layout of a POINTER type.
The JEB native types associated with MC7 pointer types are:
MC7PTR_xxx
MC7P_xxx
Examples of encodings:
P# 100.0 : 00000320 (MC7 address) P#M 100.0 : 0000 83000320 (MC7 pointer) P#I 0.7 : 0000 81000007 (MC7 pointer) P#V 1.0 : 0000 81000008 (MC7 pointer) P#DB8.DBX10.2 : 0008 84000052 (MC7 pointer)
The ANY type, in its common form, is the combination of a pointer with a pointed non-special element type and a repetition count. It allows pointing an area of memory (including memory located in data blocks) with bounds, e.g. 7 DWORDs at memory address 100.0.
It is 10-byte long:
Format of ANY for normal types:
10 CC RR RR, followed by a POINTER (see above) where: - C is the data type code (see below) - R is the repetition count
The data type code may be one of: (refer to S7.DataType.getId())
0x01 BOOL 0x02 BYTE 0x03 CHAR 0x04 WORD 0x05 INT 0x06 DWORD 0x07 DINT 0x08 REAL 0x09 DATE 0x0A TIME_OF_DAY 0x0B TIME 0x0C S5TIME 0x0E DATE_AND_TIME 0x13 STRING
The diagram represents the ANY type layout for common types:
Examples of encodings:
P#M 50.0 BYTE 10 : 10 02 000A 0000 83000190 P#DB10.DBX10.0 S5TIME 5 : 10 0C 0005 000A 84000080
The ANY type is also used to provide or receive “any” data type. It is not just a “pointer with a pointed size”. That means that special types like counters, timers, or block numbers, may be specified as well. In this case, the format of ANY is different:
Format of ANY for special types: 0x10 CC 00 00 00 01 00 00 00 00 NN NN where: - CC is the data type code 0x17 BLOCK_FB 0x18 BLOCK_FC 0x19 BLOCK_DB 0x1A BLOCK_SDB 0x1C COUNTER 0x1D TIMER - NN is the block/timer/counter number - note that the repetition count is set to be 1 a single item may be provided by this type format - note that there is no offset, as they are N/A for the special types
The diagram below is another way to visualize the ANY type layout for special types:
Examples of encodings:
Passing FC9 to an ANY parameter : 10 18 0001 0000 00000009 Passing T2 to an ANY parameter : 10 1D 0001 0000 00000002
JEB Pro can be used to reverse one or several PLC blocks making up a full program.
Internally, Step 7 manipulates PLC blocks as binary blobs whose formats are officially undocumented. At least two formats appear to exist:
Both formats are supported by JEB (reference: interface IS7Block). Below is their binary specifications. Note the following:
AA AA AA AA BB BB where: B: big-endian WORD, number of days since Jan 1 1984 A: big-endian DWORD, number of milliseconds in the days (range: 0 to 86400000) example: 00 00 EA 60 00 01 represents the timestamp Jan 2 1984 00:01:00.000
The header is 0x4E bytes in length. There is no trailer. Integers are encoded little-endian.
The JEB native type for this type is S7_BLOCK1_HEADER
.
offset type description
00 word source language id (see S7.LangType)
02 word block type id (see S7.BlockType)
04 word block number
06 word format and/or version (?)
08 dword total block size (=0x4E+S1+S2+S3)
0C dword S1= payload size in bytes (*)
10 dword S2= interface size in bytes
14 dword S3= ? size in bytes
18 word ?
1A s7time last modification of the block
20 s7time last modification of the interface
26 dword key
2A char[8] author name
32 char[8] family name
3A char[8] block name
42 byte block version (major.minor)
43 byte ?
44 word crc
46 word ?
48 word ?
4A word ?
4C word ?
4E byte[S1] payload
4E+S1 byte[S2] interface
4E+S1+S2 byte[S3] ?
4E+S1+S2+S3 -
The payload is:
Both header and trailers are 0x24 bytes in length. Integers are encoded big-endian.
The equivalent JEB native types are S7_BLOCK2_HEADER
and S7_BLOCK2_TRAILER
.
offset type description
00 word magic ('pp')
02 byte source language id (see S7.LangType)
03 byte block type id (see S7.BlockType)
04 word block number
08 dword total block size
0C dword key
10 s7time last modification of the block
16 s7time last modification of the interface
1C word interface size in bytes
1E word ? length
20 word ? length
22 word payload size in bytes
24 byte[] payload bytes
24+S1 byte[] interface bytes
24+S1+S2 - trailer, see below
The trailer is defined as:
offset type description 00 char[8] author name 08 char[8] family name 10 char[8] block name 18 byte block version (major.minor) 19 byte ? 1A word crc 1C word ? 1E word ? 20 word ? 22 word ? 24 -
JEB can acquire blocks of type (1), living in the Step 7 editor program memory. Fire up the Step 7 editor, upload blocks in your Step 7 project, then start JEB, open the File menu, Acquire Simatic S7 Blocks handler.
The acquisition widget will show up. It will list binary blocks found in the Step 7 editor memory. You can save some or all of them as binary files or import them directly into a newly-created project.
Of course, PLC blocks may be collected by other third-party means, such as a network sniffer during upload/download, or by a memory scanner.
To create a project, either acquire blocks (as described in the above section) or use the File/Open handler in the GUI client to load up a block or archive of blocks:
IMPORTANT: To decompile a collection of blocks, zip them in an archive and rename it with “.s7zip” extension.
A new project will display the following minimal node hierarchy:
The container unit, of type simatic_s7, holds the blocks, parses them and decides where their code and data will be mapped in the child unit of type simatic_mc7. Note that this way of processing blocks is not related to how blocks are processed by a PLC. It is simply the plugin’s way to organize the blocks into an entity that fits within JEB’s public interfaces and representation models of plugins adhering to the native code analysis framework.
As can be seen in the “Segments” view of the container unit:
.code_<BlockName>
(where <BlockName>
consists of the block type appended with the block number, e.g. DB1000, FC1100, OB85).data_<BlockName>
.globals
, .inputs
, .outputs
, .counters
, .timers
Optional segments .blk_<BlockName>
holding the raw bytes of of PLC blocks may be created for informational purposes, but this option is disabled by default.
The base address used for mapping is 0x100000 (=BASE). In most cases, the MC7 codes will be found at address BASE+0x10. The data blocks will be mapped at BASE+0x10000, BASE+0x20000, etc. since a data block contains at most 65536 bytes of addressable bytes. Other segments (for M, I, Q, C, T areas) are also 0x1000-aligned and mapped after the data blocks.
The image unit, whose default name is “simatic_mc7 image”, owns a virtual memory object mapping the various segments described in the previous section. Those segments represent different parts of blocks (MC7 bytecode, data block bytes, memory areas, etc.).
Each segment is prefixed with block metadata information for convenience (names, timestamps, versions, etc.). Keep in my mind that most of this information is purely informative and should not be taken as-is: An attacker may manually edit block headers and change, for example, authorship information or timestamps.
In the example below, we can look at the MC7 code of FC2, who was mapped in a segment “.code_FC2”. Most of the code is standard STL code. Some instructions and idioms are not (e.g. UC FC, param-access instructions), they will be mentioned later.
The unified virtual memory also holds data block bytes. Below, one can see that DB888 was mapped at virtual address 0x10000 by the analyzer.
When creating a new project, parsing options will be presented to the user.
The currently available options are:
DisassembleCode
: true to disassemble the code. Keep this option on unless code examination or decompilation is not necessary.
MapRawBlocksAtZero
: true to map the raw bytes of blocks before mapping their payload (code or data). It may be useful to examine very specific bits not rendered as metadata in the various description strings present throughout the disassembly
GenerateInterfaceDescriptionUnits
: true to generate interface definition text units, false otherwise. The interface units are very useful to have a global look at the various fields that make up an interface, as well as (for data blocks), the default values and current values of those fields.
Example for a data block (DB 888):
MapActualBytesForDataBlocks
: true to use the current (actual) bytes of a data block when mapping the block to VM, false to use the default values.
Readers are encouraged to go through the JEB Manual6 pages related to Actions and Views to learn more about how to interact with the disassembly. Of particular interest, we recommend reviewing:
Most actions offered by the GUI client are located in the Action and Native menu.
The S7 plugin uses two custom calling conventions:
__FC_CC
for FC/SFC/OB blocks__FB_CC
for FB/SFB blocksYou may see their details by opening the Calling Convention Manager widget (in the Native menu)
To understand why two conventions area required to represent calls to sub-routines, we need to detail how sub-routine calls are implemented in MC7.
The order of parameter indexing is important: IN, RET, OUT, IN_OUT
.
Let’s assume FC 1001 with the following interface:
IN: 0.0: WORD IN0 2.0: DWORD IN1 RET: 6.0: DWORD 10.0: -
Note that this interface uses only primitives and does not have OUT or IN_OUT parameters.
In STL such an FC would be called, for example, like that:
L 3000
T #tmp
CALL FC 1001
IN0 :=#tmp // symbolic ref to a variable on the stack
IN1 :=DW#16#10002000 // literal immediate
RET_VAL:=MD100 // address in memory for a return value
Which a compiler may translate to this piece of MC7 code:
Note the following:
Reminder: MC7 address (4-byte): AAAAAAAA 00000XXX XXXXXXXX XXXXXBBB
where A is the area code, X the offset in bytes, B the bit position (0-7)
The area codes are as follows: (S7.AreaType)
With this laid out…
Because of how the MC7 VM deals with locals, it is simpler for JEB to not treat those parameters as stack parameters. Instead, they are assigned to individual synthetic registers named PAR0
, PAR1
, PAR2
, PARn
(limited to 16 entries). Those registers can be seen in the calling convention definition for FC/SFC/OB, namely “__FC_CC”.
Let’s look at the code for FC 1001:
L #IN0
L #IN1
+D
T #RET_VAL
Which was compiled to:
First, note the signature and prototype assigned by JEB:
void __FC_CC func_FC1001(WORD*, DWORD*, DWORD*)
As said above, in this example, parameters were provided by reference. The order follows the interface definition’s: the first parameter matches the first IN; the second parameter matches the second IN; the last parameter matches RET_VAL
What about other parameter types? Are all of them provided by reference? The answer is no. Some parameters are provided by value (obviously, they must be IN parameters as well). Others are provided by references to pointers or references to any variables.
Note that OB blocks are always assigned the following prototype:
void __FC_CC func_OBx()
FB (Function Blocks) mode of invocation is different. A DB is provided along with the call. The DB (referred to as the FB’s DI – that is, instance Data Block – in this context) will contain the call parameters (IN, OUT, IN_OUT), along with the rest of the block’s static data (referred to as STATIC).
The order is important: IN, OUT, IN_OUT, STATIC
.
Let’s assume FB 1001 to have the following interface header (TEMP omitted):
IN: 0.0: WORD x 2.0: WORD y OUT: 4.0: WORD res IN_OUT: 6.0: WORD seed STAT: 8.0: DWORD 12.0: BOOL
It is expected that the DB provided during a call have the same or a compatible interface. In this example, we will pass DB 1001.
In STL, the FB would be called like this:
CALL FB 1001 , DB1001
x :=W#16#7
y :=W#16#8
result:=MW10
iv :=MW14
The parameters will be copied into the provided block’s (DB 1001) actual slots. Compilation of this code:
.code_FB1:00000046 func_FB1003 proc
.code_FB1:00000046
.code_FB1:00000046 10 03 BLD 3
.code_FB1:00000048 41 60 00 04 = L 4.0
.code_FB1:0000004C FB 7C CDB ;1
.code_FB1:0000004E FB 79 03 E9 OPN DI 1001 ;2
.code_FB1:00000052 FE 6F 00 00 TAR2 LD 0 ;3
.code_FB1:00000056 30 03 00 07 L 7 ;4
.code_FB1:0000005A 7E 56 00 00 T DIW 0 ;...
.code_FB1:0000005E 30 03 00 08 L 8
.code_FB1:00000062 7E 56 00 02 T DIW 2
.code_FB1:00000066 12 0E L MW 14
.code_FB1:00000068 7E 56 00 06 T DIW 6
.code_FB1:0000006C FE 0B 84 00+ LAR2 P#DBX 0.0 ;5
.code_FB1:00000072 FB 72 03 E9 UC FB 1001 ;6
.code_FB1:00000076 FE 6B 00 00 LAR2 LD 0 ;7
.code_FB1:0000007A 7E 52 00 04 L DIW 4 ;8
.code_FB1:0000007E 13 0A T MW 10 ;...
.code_FB1:00000080 7E 52 00 06 L DIW 6
.code_FB1:00000084 13 0E T MW 14
.code_FB1:00000086 FB 7C CDB ;9
.code_FB1:00000088 10 04 BLD 4
.code_FB1:0000008A 65 00 BE
.code_FB1:0000008A
.code_FB1:0000008A func_FB1003 endp
Notes:
Unlike an FC call, the parameters are located in the instance data block. The transfer does not involve the local stack.
The prototype of FB methods uses the __FB_CC convention:
void __FB_CC func_FB1003(_DATA_FB1003*, DWORD)
They use two parameters:
The OB1 may be the most important block of your Simatic programs. While it adheres to the general structure of OB blocks (that is, a parameter-less version of FC blocks), OB1 has an important specificity to keep in mind: the first 20 (0x14) bytes of its local area is set up with important fields when the block is invoked.
off type name description 00 BYTE EV_CLASS event class (0x11= OB1 is active) 01 BYTE SCAN_1 scan type (*) 02 BYTE PRIORITY priority class (?) 03 BYTE OB_NUMBER OB number (1) 04 BYTE RESERVED_1 - 05 BYTE RESERVED_2 - 06 INT PREV_CYCLE run time of previous cycle (ms) 08 INT MIN_CYCLE min cycle time since last start-up 0A INT MAX_CYCLE max cycle time since last start-up 0C DATE_AND_TIME DATE_TIME OB calling timestamp (*) scan types: 1: completion of a warm restart 2: completion of a hot restart 3: completion of the main cycle 4: completion of a cold restart 5: first OB1 cycle of the new master CPU Refer to the reference documentation for more details on scan types.
You may see that by checking the interface of an OB1 block loaded in your analysis project. It is likely (although not necessary) that the interface TEMP data (locals) will start with 6 BYTEs, 3 INTs, and 1 DATE_AND_TIME fields.
The native structure used by JEB to represent this header is called OB1_HEADER. You may examine it using the native type editor widget (menu Native, Type Editor).
Other OB blocks also receive parameters on their stack upon execution. Refer to the S7 programming manuals for details.
The way N-way conditional branching is implemented in MC7 is via the JL instruction.
Example:
L MB 100 // load m[100] inside ACCU1LL (=x)
JL labx // default target (x>=5)
JU lab0 // target if x==0
JU lab1 // target if x==1
JU lab2 // target if x==2
JU lab1 // target if x==3
JU lab2 // target if x==4
labx: L 1
JU next
lab0: L W#16#10
JU next
lab1: L W#16#100
JU next
lab2: L W#16#1000
JU next
next: T #RET_VAL
This would get decompiled as something like:
...
switch(x) {
case 0: {
v0 = 0x10;
break;
}
case 1:
case 3: {
v0 = 0x100;
break;
}
case 2:
case 4: {
v0 = 0x1000;
break;
}
default: {
v0 = 1;
}
}
...
The S7 decompiler plugin is a gendec 7 plugin. As such, the plugin adheres to the INativeDecompilerPlugin
interface, and can itself be customized via INativeDecompilerExtension
plugin extensions.
Decompilation works on per-function basis. Select the function, then hit the TAB key (or menu Action, handler Decompile).
The decompiler generates a child unit of type “c“. It is represented by the client as pseudo-C code rendered in a separate fragment. (See an example below.) The pseudo-code unit, just like the disassembly code, has a flexible output actionable via the Action and Native menus. If you position the caret on a line of code and press TAB again, you will be brought back to the closest corresponding MC7 code in the disassembly view, matching the pseudo-C code.
The decompiler does not decompile to SCL. The output is not meant to be recompilable. It is meant to provide a higher-level representation of complicated, verbose, MC7 code, markable and analyzable for reverse-engineering and analysis purposes.
The decompiler may create the following custom operations (underlying IR: IEOperation
with a FunctionOptype
):
ExtractOff(mc7_address) -> byte_offset
: extract the offset from a 4-byte MC7 address. This is equivalent to “addr >> 3) & 0xFFFF”ExtractBit(mc7_address
) -> bit_position
: extract bit from a 4-byte MC7 address. This is equivalent to “addr & 7”ToNP(mc7_address) -> native_address
: convert a 4-byte MC7 address to a native VM addressToMC7P(native_address) -> mc7_address
: convert a 32-bit native address to a MC7 addressToMC7PPTR(native_address) -> mc7_address
: convert a 32-bit native address to a MC7 address referring to a MC7 pointerFPOP(fpval) -> result
: the following floating point operations: FPOP= SQR, SQRT, EXP, LN, SIN, COS, TAN, ASIN, ACOS, ATAN.IntToBCD(int_value) -> bcd_value
: convert an integer to a binary-coded decimal valueReadTimer(timer_number) -> value
ReadCounter(counter_number) -> value
GetDBAddress(db_number) -> native_address
GetOBAddress
, GetFBAddress
, GetFCAddress
, GetSFBAddress
, GetSFCAddress
GetDBLength(db_number) -> block size
BitAddr(byte_offset, bit_position) -> pointer
: a native pointer not referencing a byte (i.e. bit_position != 0)As a reminder, for FC blocks, the prototypes should be converted to:
However, when generating native prototypes for FC blocks, the converter does not do that for primitive type arguments: the generated prototype uses native reference types instead of MC7 opaque references.
e.g. a function (WORD,TIMER,STRING)
will have its native prototype set to (WORD*,WORD,MC7P_MC7PTR_STRING)
instead of (MC7P_WORD,WORD,MC7P_MC7PTR_STRING)
As for invocations: instead of rendering opaque MC7 references, such func1(0x87000010, 0x84001000)
, the decompiler will attempt to replace them by native references wrapped in ToMC7P
or ToMC7PPTR
operators, e.g.func1(ToMC7P(&varY), ToMC7P(&varZ))
Below is a list of limitations, at the time of writing. Some limitations will disappear as the decompiler matures.
A(
, O(
, )
, etc. are currently not translated and will fail a decompilationGenerally, decompilation of MC7 code presents challenges stemming from the execution environment of MC7 and the design of the MC7 virtual machine itself: multiple memory areas (no unified VM), unorthodox pointer structures, etc. While gendec deals with those constructs in a generic way and attempts to generate pseudo-C code best representing them, it will not succeed in producing the best or most readable code in many scenarios. Such issues will be ironed out by incremental upgrades. Power-users should also keep in mind that JEB offers an expansive API allowing them to craft all sorts of extensions, including decompiler IR optimizers or AST massagers.
While SFC and SFB blocks are reserved for system uses, the common convention is to reserve the low ranges of FC/FB block numbers for library code not classified as system code, such as utility routines whose interfaces were standardized by the IEC (International Electrotechnical Commission).
For a number of reasons, it may be inconvenient or impossible to include those blocks in your JEB project. Consequently, how would a call to a library FC or a system FC be rendered, since their prototype is theoretically unknown? While gendec has several way to recover prototypes by heuristics, the S7 extension also ships with a database of library block types and numbers with their common name and interface.
Example: if a call to FC 9 is found, but no FC 9 exists in the project, the block library will be checked for a match. In this case, the block will be understood as being “EQ_DT”. Refer to the S7 system reference manuals for details on well-known library and system blocks.
Users may craft extensions, such as scripts and plugins, in Java or Python. The reference documentation for JEB public API is located at https://www.pnfsoftware.com/jeb/apidoc.
The public types (classes, interfaces, unions) specifically exposed by the S7 analysis modules are located in the com.pnfsoftware.jeb.core.units.code.simatic
package.
A course to the general JEB API is outside the scope of this manual. Users are encouraged to visit
– https://www.pnfsoftware.com/jeb/manual/dev/introducing-jeb-extensions/
– https://github.com/pnfsoftware/jeb-samplecode/tree/master/scripts
as well as this blog to learn more on this topic.
This document’s original purpose was to be a usage manual for JEB S7 block analysis extensions.
It grew into a full-blown introduction to Simatic S7 PLC reverse engineering. While the first half is mostly tool-agnostic, the second half demonstrates how JEB can be used to speed up the analysis of S7-300/S7-400 PLC programs, from block acquisition to block analysis and code disassembly, interface recovery, and of course, decompilation.
This first draft will be updated and augmented in the future, as the extensions mature. Thank you for reading, and a big thank you to our users for your continued support!
—
Nicolas Falliere (nico at pnfsoftware dot com)
Twitter @jebdec, Slack @jebdecompiler