Close modal

Blog Post

Scriptable debugging proxy for unit testing and performance analysis

Development
Wed 10 January 2018
0 Comments


It can be important to diagnose the activity or data that either websites or apps are sending to ensure that code is reliable and/or performant. Many developers are familiar with tools such as Fiddler or Charles, but would't it be good if we could automate some this? good news, we can.

Here we will be discussing writing a plugin with MITMProxy.

The sample webpage

Let's provide a simple webpage to test, and provide some context:

<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script>
$(document).ready(function() {

  // place this within DOM ready function
  function loadSomethingElse() {     
    $.get( "/data.json", function( data ) {
      $("#result").text( "Data Loaded" );
      $("#message").text( data.message );
    });
 }

 // use setTimeout() to execute
 setTimeout(loadSomethingElse, 1000)

});
</script>
</head>
<body>

<h2>Async Demo</h2>
<h4 id="result">Data Loading</h4>
<p id="message"></p>
</body>
</html>

As you can see it is one of the most simple pages possible to demonstrate this operation. It uses some jQuery ajax to load a JSON request one second after the page is ready setTimeout(loadSomethingElse, 1000), this will demonstrate a waterfall sort of flow, albeit trivial, is a good example of processing the data captured.

The sample proxy plugin

We will also need a script for the MITM-proxy, which uses python as below:

"""
This scripts demonstrates how to use mitmproxy's filter pattern in scripts.
"""
import datetime as dt

from mitmproxy import flowfilter
from mitmproxy import ctx, http

all_sessions = list()

class FlowContext:
    def __init__(self, client_address):
        self.initial_filter = flowfilter.parse("~u .*example.html")
        self.final_filter = flowfilter.parse("~u .*data.json")
        self.time_start = 0
        self.client_address = client_address

    def request(self, flow: http.HTTPFlow) -> None:
        if flowfilter.match(self.initial_filter, flow):
            self.time_start = dt.datetime.now()
            self.connection_session = {'address': self.client_address,
                                       'start_date': dt.datetime.now(),
                                        'duration': 0.0}
            all_sessions.append(self.connection_session)

    def response(self, flow: http.HTTPFlow) -> None:
        if flowfilter.match(self.final_filter, flow):
            self.connection_session['duration'] = (dt.datetime.now() - self.connection_session['start_date']).total_seconds()
            print("{address} took {duration} seconds for a request starting {start_date}".format(**self.connection_session))
            self.connection_session = None


class Filter:
    def __init__(self):
        self.contexts = dict()

    def request(self, flow: http.HTTPFlow) -> None:
        client_address = flow.client_conn.address.host
        if client_address not in self.contexts:
            self.contexts[client_address] = FlowContext(client_address)
        self.contexts[client_address].request(flow)

    def response(self, flow: http.HTTPFlow) -> None:
        client_address = flow.client_conn.address.host
        if client_address in self.contexts:
            self.contexts[client_address].response(flow)


def start():
    return Filter()

It took a little bit of digging for me to understand exactly how MITM proxy plugins work as documentation is there but not extensive in terms of examples. We use a class here, namely Filter which implements

  • def request(self, flow: http.HTTPFlow) -> None:
  • def response(self, flow: http.HTTPFlow) -> None:

The critical thing to notice here, is that the methods can also be provided procedurally in the module (i.e. without any classes - just omit the self parameter), when the plugin is loaded, it will either look for an object returned by start() in the module space - or simply invoke the above methods if sitting at the module level.

It is also important to note the main parameter provided from the proxy to the plugin's request or response method, namely: flow: http.HTTPFlow, this is like a catch-all for attributes we are interested in. Above, salient members are those such as: flow.client_conn.address, and many more.

We add a second class here, called FlowContext because the important thing to notice is that all connections from any hosts come through, so we basically create a dictionary of hosts to contexts so we can divide up the events (assuming each ip address is a unique host such as on a LAN that doesn't have NAT or downstream proxies).

Lastly, there is a class provided called flowfilter, which is basically a regex filter that can be provided to easily check things such as request url, there are many options or short-codes, we simply create a regex to match the page name in the get url, either the initial html or the JSON file (initial_filter and final_filter respectively). Determining if the HTTPFlow (request or response) matches the criteria specified in the filter is as easy as using flowfilter.match(self.initial_filter, flow): which returns a boolean, indicating true if it matches.

Serve the content locally

Now for this example we will serve the sample page locally to simplify things, start a simple HTTP server with python's SimpleHTTPServer class which we should all have as it comes standard.

$ cd ~/my_project
$ python -m SimpleHTTPServer 8003

Start the proxy with the plugin

The webpage can now be accessed at http://127.0.0.1:8003, we will also need to start a an instance of MITM proxy (the default listen port is 8080), thus you can set your browser such as Firefox to use http://0.0.0.0:8080 as the http proxy without affecting the entire system. You can use mitmproxy or mitmdump, the later being more of a daemon and doesn't keep a buffer other than the recorded output file you may optionally specify (it lets you replay later). It is possible to run out of memory with mitmproxy if you leave it unattended for automated tests and do not clear it, both applications accept the -s option for flows, so we shall use mitmdump.

$ mitmdump -s flow.py
Loading script: flow.py
Proxy server listening at http://0.0.0.0:8080
127.0.0.1:65162: clientconnect
127.0.0.1:65162: GET http://127.0.0.1:8003/example.html
              << 200 OK 618b
127.0.0.1:65162: clientdisconnect
127.0.0.1:65164: clientconnect
127.0.0.1:65164: GET http://127.0.0.1:8003/jquery-3.2.1.min.js
              << 200 OK 84.63k
127.0.0.1:65164: clientdisconnect
127.0.0.1:65166: clientconnect
127.0.0.1 took 1.128691 seconds for a request starting 2017-12-22 14:55:37.419756
127.0.0.1:65166: GET http://127.0.0.1:8003/data.json
              << 200 OK 27b
127.0.0.1:65166: clientdisconnect

Perhaps the key bit of information you can see is here:

127.0.0.1 took 1.128691 seconds for a request starting 2017-12-22 14:55:37.419756

That was our plugin logging interpolated with the rest of the output (obviously in a real-world environment you would log to a database or different TTY), this was the most minimal example to highlight the process and analyse flow.

This was one of the most trivial examples possible, but highlighting the key concepts of capturing and querying the http data from the MITM proxy should allow many useful functions, especially for mobile app testing. On that note, if you are testing your own devices and require SSL support and don't have ATS exceptions, you can install the MITM-proxy root ca and ssl cert into the device an trust it. If there are more complicated issues with MITM-proxy getting access to the upstream certificates because of upstream proxies, the following will tell it to completely ignore the SSL upstream: --no-upstream-cert --insecure, and to use an upstream proxy in the first place (such as corporate environments) --upstream my.upstream.proxy.

mitmproxy --no-upstream-cert --insecure --upstream my.upstream.proxy