September 2012 – Guido Vranken

During the past couple of weeks I’ve been adding a set of features to myrrh which make it possible to access the emulator from external applications (including scripting languages such as Python and PHP) over a network connection. Through this method myrrh exposes a number of its internal functions (such as setting breakpoints, running/stepping the emulator, retrieving data from the emulated RAM memory and much more) to these external programs. As implied above, myrrh creates a server socket and listens for a client to connect. Once connected, server and client communicate over this connection by transmitting JSON-encoded data.

I wanted to add the server/JSON functionality for two main reasons:

1. This way, the emulator becomes fully scriptable from any scripting language that supports both TCP/IP network connections and JSON encoding/decoding. The advantages of a fully scriptable x86 emulator are endless and it is hard to fathom them all upon first thought. More on this later.

2. By exposing the emulator’s internal functions through a combination of two very widely supported protocols (TCP/IP and JSON) the possibilities obviously aren’t limited to scripting; for example, it would be fairly straightforward to create a graphical front-end which uses myrrh’s services, effectively creating a GUI-based x86 debugger. This can be very useful in the context of computer science-related education.

More on scripting

There already exists at least one scriptable x86 emulator: PyEmu. This is a good and interesting initiative, however, it seems to have a significant drawback: since it was developed for the purpose of examining and detecting malware and related software, it only seems to support emulation at the application level; its functionality is far from full-system emulation. While PyEmu is probably useful for attaining the aforementioned goal, its spectrum of possible uses is limited.

myrrh, on the other hand, because it’s a very generic emulator by design, can be used for a much wider range of applications. One could emulate a full Linux installation and meticulously monitor its behaviour to garner statistics on memory access, disk access, efficiency and other data which might be useful to computer scientists.

Of course, myrrh will also support pure application-level emulation of executables that can be run without an operating system being present. For this purpose I’m considering adding loaders for PE and ELF binaries and offering functionality to hook system calls of the systems that both these types of binaries respectively demand (thus effectively simulating Windows and Linux environments).

myrrh as an educational tool

In the world of computer science education, at some point students need be taught about the internals and specifics of one of the most widely used CPU architectures: the Intel x86 architecture. Besides having to know about the theories behind it, it is also beneficial to the learning process of the student to be able to experiment with what he or she has learned. From my own experience as a programmer I can assert that practical experience is even essential for learning anything in a thorough way. This is why I will soon commence work on a basic graphical front-end for myrrh. By using this front-end, anyone with an interest in x86 architecture internals can use this to educate himself or herself about instructions, registers, memory and their relationships. Since the environment is completely virtual, nothing can be broken.

Emulator scripting in practice

As I was gradually adding more functionality to the server part of myrrh, I wrote several Python scripts which use myrrh’s functions.

I first wrote a small script which hooks DOS’ interrupt 0x21 and prints a description of the function ID that was requests. It looks like this:

import myrrh
import json

# Create an instance of the myrrh class
m = myrrh.myrrh()

# Connect to the myrrh server
m.connect("localhost", 5010)

# Hook interrupt 0x21
m.f_set_breakpoint_int(str(0x21))

# Define the list of descriptions for the most commonly used INT 0x21
# requests
int21_str = \
{
0x01 : "Read character from STDIN",
0x02 : "Write character to STDOUT",
0x05 : "Write character to printer",
0x06 : "Console input/output",
0x07 : "Direct char read (STDIN), no echo",
0x08 : "Char read from STDIN, no echo",
0x09 : "Write string to STDOUT",
0x0A : "Buffered input",
0x0B : "Get STDIN status",
0x0C : "Flush buffer for STDIN",
0x0D : "Disk reset",
0x0E : "Select default drive",
0x19 : "Get current default drive",
0x25 : "Set interrupt vector",
0x2A : "Get system date",
0x2B : "Set system date",
0x2C : "Get system time",
0x2D : "Set system time",
0x2E : "Set verify flag",
0x30 : "Get DOS version",
0x35 : "Get interrupt vector",
0x36 : "Get free disk space",
0x39 : "Create subdirectory",
0x3A : "Remove subdirectory",
0x3B : "Set working directory",
0x3C : "Create file",
0x3D : "Open file",
0x3E : "Close file",
0x3F : "Read file",
0x40 : "Write file",
0x41 : "Delete file",
0x42 : "Seek file",
0x43 : "Get/set file attributes",
0x47 : "Get current directory",
0x4C : "Exit program",
0x4D : "Get return code",
0x54 : "Get verify flag",
0x56 : "Rename file",
0x57 : "Get/set file date"
}

# Intercept interrupt 0x21 800 times before exiting
for i in range(0, 800):

   # Run until the breakpoint occurs
   m.f_run()

   # At this point INT 0x21 was just executed and the breakpoint
   # has been triggered. Now request the current register values
   regs = json.loads(m.f_get_register_values())

   # Get and store value of AH (high byte of AX)
   AH = (regs["ReturnValues"]["EAX"] & 0xFF00) >> 8

   # Print the description associated with this interrupt 0x21
   # function
   print int21_str.get(AH, "")

print
print "Exiting.."

# Send exit command to server
m.f_exit()

You can watch a video demonstration of the above script here.

The above script only performs a simple task. It still might have been fairly easy to do the same with tools which are currently out there, for example by modifying the source code of an existing emulator. However, it’s easy to imagine that for more complex tasks a powerful scripting language such as Python is much more suited.

One more script that I wrote involves sending keystrokes from the script to the emulator. By using myrrh’s insert_keys function you can insert characters and whole strings into the emulated system. Internally myrrh generates a keyboard interrupt for each character.

The script performs a rather unusual task: the emulated system is booted and DOS’ editor, EDIT.COM is loaded. The script then retrieves the most recent tweets containing a certain search string from Twitter (by using their API) and inserts this data into EDIT.COM by using the insert_keys function.

import twitter
import time
import myrrh

m = myrrh.myrrh()

m.print_banner()

print "Connecting to server"
m.connect("localhost", 5000)

print "Executing 8000000 instructions"
x = m.f_run("8000000")

api = twitter.Api()

for n in range(0, 3):
   print "Searching for \"Amsterdam\""
   statuses = api.GetSearch("Amsterdam")

   for s in statuses:
      str = s.user.screen_name + " - " + s.text
      print str
      x = m.f_insert_keys(str + "\n")

   print

   if n < 2:
      print "Sleeping for 60 seconds"
      time.sleep(60)

print "Transferring control to emulator"
x = m.f_run("0")

Here is a video demonstration.

Although it is an unusual idea with no real world purpose, it does demonstrate another interesting aspect about emulator scripting: one can choose from many existing Python modules to integrate with the emulator scripting. For example, it would be fairly straightforward to import a module that allows for JPG/PNG generation and create a “call graph”, which graphically demonstrates a given piece of software’s code paths.

List of supported functions

The following is a list of functions that the myrrh server exposes to external applications. Feature requests are welcome.

# Executes the specified number of instructions.
# A value of 0 means run indefinitely
run(num_instructions)

# Executes one instruction
step()

# Retrieves the current register values
get_register_values()

# Sets a certain register to a certain value
set_register_value(reg, value)

# Retrieves base64-encoded binary data a memory address
get_memory(address, length)

# Writes data to the RAM memory
set_memory(data, address, length)

# Retrieves a list of currently loaded disks
enumerate_disks()

# Ejects a disk
eject_disk(disknum)

# Inserts data as a disk
insert_disk(disknum, data)

# Gets value of environment variable
get_variable(variable)

# Sets environment variable to value
set_variable(variable, value)

# Retrieves a list of currently active breakpoints
enumerate_breakpoints

# Sets breakpoint on instruction
set_breakpoint_inst(instruction)

# Sets breakpoint on specified interrupt number
set_breakpoint_int(interrupt)

# Sets breakpoint on read within memory range
set_breakpoint_memory_read(address, length)

# Sets breakpoint on write within memory range
set_breakpoint_memory_write(address, length)

# Sets breakpoint on reg == value
set_breakpoint_reg_value(reg, value)

# Sets breakpoint on register value change
set_breakpoint_reg_change(reg)

# Executes the specified instruction and its operands
inst(instruction, operands)

# Disassembles a range of memory
disassemble(address, length)

# Searches the memory for certain data (base64-encoded)
memory_search(data)

# Insert a string into the emulator as a series of keystrokes
insert_keys(data)

# Exits and breaks the connection with the server
exit()

Conclusion

I hope to have emphasised the potential of emulator scripting and clarified a little how myrrh can be scripted in practice. myrrh’s scripting facilities currently still are in an inmature stage of development. I will post more updates on the subject once development has progressed.

Month: September 2012

myrrh as a server