Hands-on Practice with BurpSuite and MLflow

We’ve looked at resources for learning the basic concepts and examined a hypothetical average AI library’s most likely vulnerabilities. Now let’s go into the real world using BurpSuite intercepting proxy and MLflow as a live example. We’ll go into many advanced BurpSuite technical tips as well as the environment setup for MLflow which is very similar to other MLOps tools.

Burp Suite Installation

Burp Suite Proxy is a tool for intercepting web requests between your browser or command line and the receiving server as well as automatically scanning for vulnerabilities. We highly recommend you buy the Professional version, it is only $450 per year meaning one vulnerability found in a bounty from Huntr more than pays for it. There is a free Community version as well but you’ll miss out on a significant portion of its power.

Download and installation is as simple as grabbing the installer from portswigger.com and running the installer.

Intercepting Browser Requests

Burp intercepts browser requests allowing you to see and modify the raw request. The easiest way of doing this is by going to the Proxy tab > Open browser. This is an automatically setup browser with some useful extensions for doing DOM-based testing.

If you wish to use your normal browser then there’s a few extra steps. Run Burp, visit http://burp in the browser, click CA Certificate and install the certificate so it can intercept TLS traffic.

More detailed guides on troubleshooting this step can be found in Burp’s documentation here: https://portswigger.net/burp/documentation/desktop/external-browser-config/certificate

We recommend using a proxy extension in your browser to handle the specifics of when Burp intercepts requests. FoxyProxy for Firefox is a generally trusted choice. Install it and create a new proxy for port 8080 which is the default address and port Burp runs on.

Most of the testing you’ll be doing for machine learning (ML) libraries will not be external websites, but servers you host locally. Add a pattern in FoxyProxy to capture localhost and loopback addresses so you can intercept these requests from the browser back to your local machine and save them.

Click the FoxyProxy extension icon in the top right corner of Firefox, click the customized proxy settings you just created and you’re ready to begin!

Advanced Burp Suite Settings

Burp Suite allows for a significant amount of customization that many testers overlook. Out of the box, Burp Suite is a capable tool for finding low hanging fruit, but once you tweak it a little bit you’ll have a major advantage over the competition.

Automated Scanner

This is the biggest juice for the squeeze. The default automated scanner settings will lose you a significant amount of quality findings and exploitable quirks in the application. Starting from the top of the settings and working our way down, we’ll show you all the most important things to maximize the surface area that the automated scanner attacks. First, send a request to the automated scanner.

Head over to the Dashboard tab and click the cog icon in the top right of the automated scan task.

Create a new scan configuration.

Minimize false negatives. False negatives are target application behaviors that the scanner identifies as not an issue when, in reality, it is a legitimate issue. We want to minimize these so that the scanner feeds us any and all application behaviors that are suspicious. This is the opposite of false positives: application behaviors which the scanner identifies as an issue but are not in fact an issue.

Handling application errors: this is a big one. Default settings will dramatically limit the thoroughness of the automated scan since they’re set to stop the testing after just 2 errors. Set these to 99999 so that the scan never stops. This will require you to monitor the scan for errors manually to confirm you’re not being blocked by a web application firewall or automatically logged out, but is absolutely worth it.

In Frequently Occuring Insertion Points, uncheck the boxes below. Sometimes a parameter like “filename=” may trigger multiple different code paths depending on the endpoint it’s sent to so we want to make sure we’re testing them all.

Misc Insertion Points should also be set to 99999 so that there’s no limit to the number of insertion points Burp Scanner uses. Remember that you can break up a request with hundreds of parameters by sending it to Intruder and just selecting a fraction of the parameters at a time.

Last step, click the settings labeled Current auditing configuration and delete it so we’re only scanning items using our new settings then hit OK at the bottom.

That’s it. An excellent, thorough automated scan that will automatically find you more bugs than your competition. At the bottom of the settings next to the Save button, check the box that says Save to Library and then the Save button. You’ll be able to reuse these settings for later scans cross-project.


Burp Suite allows user-created extensions to be installed which add to the functionality of Burp. Click the Extensions tab at the top, then the BApp Store subtab. In general, it’s a good idea to sort by popularity and grab a bunch of the most popular ones. The extensions with a black check mark next to them are the ones we usually install before testing.

The extensions you should install will vary a little project to project. This is a general list of the most useful ones. Most of these extensions simply add more automated security scanning tests although a few are used for manual testing. Some of the more important extensions that require manually running are HTTP Request Smuggler, Turbo Intruder, and Autorize. You can find more information about these extensions and their usage online.

Attacking an ML Library

Now that Burp is all set up, let’s move into the setup and attack of a popular AI library: MLflow. MLflow is a great target because it’s very popular and its utility is based on storing the crown jewels of an organization’s ML pipeline: the models and data. Additionally, its setup and usage is very similar to many other ML libraries in the bounty.

Static Code Analysis

Step one is recon of the codebase. Download an Integrated Development Environment such as PyCharm or VSCode. For this tutorial we’ll use PyCharm. Download and run the installer. Open up PyCharm and click Get from VCS. Pop in the MLflow github URL and the directory you want it to be downloaded in:

Now from the menus go to the PyCharm > Settings > Plugins. Search for “snyk” and install that plugin.

You should have a little icon near the bottom that says Snyk. Click that and then hit Run Scan.

A nice output of a bunch of potential vulnerabilities will be shown.

“Potential” is the key word here. The vast majority of the vulnerabilities that Snyk will find have some sort of mitigating context. For example, the XSS shown in the screenshot above is flagged because it doesn’t have any filtering on a user-controllable variable before being returned in the web UI. That’s largely irrelevant though, because it’s returned to the web UI as JSON with a Content-Type of application/json meaning no modern browser will interpret the response as HTML.

Another extremely common example of a nonissue that Snyk reports is Path Traversal. What you’ll find though, is that most of these path traversal issues Snyk finds are not in parts of the project that are user-facing. Most of the time when it reports path traversal it’s in some kind of helper command-line utility. If you have the ability to run one of these helper scripts, then you already have local access to the filesystem as they are generally not callable through remote interfaces like the web UI or API calls.

Snyk will find all kinds of issues that aren’t actually vulnerabilities but its real value comes in informing you of the parts of the code base that are most likely to have some kind of real vulnerability. If you see a dozen XSS issues in Snyk in a single file, then that file is worth investigating manually. It’s also a worthwhile tool for picking which library in the huntr.mlsecops.com bounties are most likely to have vulnerabilities. Download all the bounty libraries, scan them each with Snyk, and start your vulnerability hunting on the library with the most Snyk findings.

Manually searching the code for potentially unsafe functions is another fruitful endeavor that has directly led to discoveries of remote code execution in ML libraries within the bounties.

Hit Ctrl-Shift-F to do project-wide string searches for:

  • eval(
  • exec(
  • subprocess.
  • os.system
  • pickle.dumps
  • pickle.loads
  • shell=True
  • yaml.load

MLflow Setup

Create a virtual environment

python3 -m venv venv

source venv/bin/activate

Install mlflow

pip install mlflow

Run mlflow

mlflow ui

Browse to, make sure the requests are showing up in Burp, and you’re ready to get started.

Mapping Out The Application

Once you’ve opened MLflow’s web UI, start by mapping out the application. Click every link, upload different kinds of files, really try to cover every single bit of functionality the application possesses. Burp has an automated spider but we rarely use that as it often fills up the sitemap with garbage and doesn’t get great coverage of the more complex functionalities within the application.

Clicking around a default installation’s web UI will only get you so far, however, as the application hasn’t been populated with data. Additionally, many ML tools’ web applications have limited ability to trigger all the API calls that are actually allowed. To map out all the API calls, there are a few options:

  1. Pray the documentation of the library has a consumable specification file of all the API calls. These are usually called WSDLs, WADLs, or OpenAPI Specification documents. You may try browsing to http://<server>:<port>/docs to see if there is an OpenAPI specification there.

    1. In our experience, most of these tools do not have an API specification file but you may still be able to find one with Google searches such as, “Postman MLflow”.
    2. Alternatively, try asking ChatGPT to build you a specification file. There are certainly no guarantees this will be accurate but it can often save you time especially if you give it example calls from the documentation.

  2. Make the API calls programmatically.

    1. Use Python scripts to call the APIs available and pipe them through Burp.

To demonstrate option 2, below is a script taken from MLflow documentation showing how to set up a simple experiment and populate the web UI with some more data. First, proxy requests made from the terminal into Burp by running these commands in the terminal:

cd desktop

Then install the required library in the virtual environment

pip install sklearn

Copy the code below into a file named example.py

import mlflow

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_diabetes

from sklearn.ensemble import RandomForestRegressor



db = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.

rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)

rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.

predictions = rf.predict(X_test)

Now run that code

python example.py

This should start the process of populating your Burp file with more API requests and unlock some utility in the web UI as well. We recommend you sort through Burp HTTP History tab and send each unique API endpoint request to be saved in the Repeater tab for future use; renamed to the name of the API request.

Populating Burp with 100% of the valid API requests available will require either the OpenAPI specification document or going through the tool’s documentation of the API calls and making them individually from a Python script. Note that sometimes not all API requests are documented and reviewing the codebase is important for 100% coverage.

Automatic Testing

Once we have populated Burp with API calls, we can start automatically scanning each one. In previous steps we have already set up a highly thorough automatic Burp Scan configuration so all we have to do now is right click each API request and click “Do Active Scan”.

While the active scans are running, we highly recommend you click the Logger tab at the top of Burp and watch the requests fly by. Look for responses that have abnormally large or small response lengths or return out of the ordinary status codes. Burp is excellent at finding vulnerabilities, but it isn’t perfect and there have been many instances where we have identified vulnerabilities based on watching the logger window that Burp did not recognize.

Manual Testing

We will usually wait for the automatic testing to complete before starting the manual testing so that there’s no interference between the two. Manual testing consists of manually payloading requests with custom values to suss out potential vulnerabilities. Generally, this means you’ll live in the Repeater and Intruder tabs of Burp.

Initially, we like to send all the requests to Intruder and payload them all with a fuzzing list such as the Big List of Naughty Strings. We will add certain strings to this based on our needs. For AI/ML bug hunting, an extremely common vulnerability is too much access to the filesystem so one valuable string to add is a file path to a sensitive location on your filesystem such as “/home/dan/.ssh/” and “/home/dan/.ssh/id_rsa”. Then search the Logger tab for that string in responses to see if any requests stored that filepath server side anywhere.

Once the request is in the Intruder tab, we must specify where we want the fuzzing string to be injected. Below is a screenshot of the general pattern we use when identifying these locations.

Injection points are any place that is wrapped in two § characters. You’ll notice that in addition to selecting the entire value as a spot to be replaced with our fuzzing strings, we also add two more § characters to the end of each valid value as well. This is because it’s not uncommon for an application to look for a valid value in the request parameter by doing matching based on substrings or regexes. If the value the server is looking for doesn’t exist at all in the parameter, it will reject the request but if it does exist, then it will continue down the code path. Example: the application might be using a regex to search for a UUID in the run_uuid parameter but since our second injection point is after a valid UUID, the application thinks the value is valid and passes it down a different code path where our injection might cause an error.

ParameterValuePasses Application Validation

At the top of the Intruder Positions tab we have Sniper as the attack type. Sniper means Burp will sequentially inject each payload only once in the request for every injection point. Each request will only have one payload injected. This is useful for pinpointing where errors might be occurring. There are other attack options but generally Sniper is our default attack type.

Now we load our customized Big List of Naughty Strings fuzzing list and hit Start Attack in the top right corner.

Once the attack has completed we will once again review the response length and status code in the window that pops up for anomalies and further investigate anomalies and errors by sending the offending request back to Repeater and making minor adjustments.

Auth Testing

Authentication is the ability to login to an application, authorization is the ability to perform an action within the application after you’re authenticated. This is a vital part of the security of an application despite the fact that a large amount of AI/ML libraries have no authentication at all. As time goes on, the number of AI/ML libraries that have authentication will increase so this will be important to know regardless.

In the latest version of MLflow, authentication is an optional argument when starting the server. Inside the virtual environment, run this command:

mlflow server --app-name basic-auth

Now when you go to reach the MLflow server in a browser, it will require a username and password to access. Once you make a low privilege user, you put the low privilege user’s authentication token into Autorize then browse the site as the high privilege user. Autorize will automatically repeat every request you make with the high privilege user twice: once with the low privilege user token, and once with no authentication. In MLflow’s case, the authentication token is a Basic Authentication header so we just copy that into the Autorize configuration tab.

We put the low privilege user’s authentication token in the box inside the configuration tab on the right. On the left we see the proxy and Autorize history of requests made by the high privilege user. The column Authz. Status shows whether the low privilege user was allowed to make the high privilege API request, and the Unauthorized Status column shows whether a request with no authentication token was allowed to make the high privilege API request. Since all requests are green with “Enforced!” text, we can tell MLflow is properly implementing authorization and authentication.

What Am I Looking For?

Love, peace, and a good meal probably. But you’ll have to find that elsewhere. In the AI/ML tool ecosystem, the most common pattern we’ve seen is far too much access to the underlying file system. These are the vulnerabilities you’re most likely to find in order of prevalence:

  1. Arbitrary File Overwrite
    1. Most common in API calls that export models and data from the AI/ML library.
    2. An arbitrary file overwrite will often directly lead to remote code execution if you can overwrite files such as .bashrc or ssh credentials
  2. Local File Include
    1. API calls that are supposed to read or import models and data files are very susceptible to local file include. Gaining access to read sensitive files such as SSH or cloud keys is a high severity vulnerability.
    2. Most commonly found in API calls with a naming structure like GetArtifact or get-artifact
  3. Server-Side Request Forgery
    1. Look for API calls that fetch data from S3 buckets or any API call that takes a URL as a value.
    2. These can be used to make internal network calls to other services running on the host or to pull down internal cloud configuration files such as
  4. Remote Code Execution
    1. From our research these are often stemming from arbitrary file overwrites, but there are multiple examples we’ve seen of tools that place user input directly into a command that the server runs such as
     subprocess.Popen(‘bash -c <user input>’)

Get Started!

Sign up to huntr.com and get started on becoming an AI hacker and get paid to help secure the industry. Join the Discord channel to chat with other hackers.