How to Build a XSS Vulnerability Scanner in Python

Building a Python script that detects XSS vulnerability in web pages using requests and BeautifulSoup.
  · 5 min read · Updated sep 2022 · Ethical Hacking · Web Scraping

Before we get started, have you tried our new Python Code Assistant? It's like having an expert coder at your fingertips. Check it out!

Cross-site scripting (also known as XSS) is a web security vulnerability that allows an attacker to compromise users' interactions with a vulnerable web application. The attacker aims to execute scripts in the victim's web browser by including malicious code on a normal web page. These flaws that allow these types of attacks are widespread in web applications with user input.

In this tutorial, you will learn how you can write a Python script from scratch to detect this vulnerability.

RELATED: How to Build a SQL Injection Scanner in Python.

We gonna need to install these libraries:

pip3 install requests bs4

Alright, let's get started:

import requests
from pprint import pprint
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin

Get: Build 35+ Ethical Hacking Scripts & Tools with Python Book

Since this type of web vulnerabilities are exploited in user inputs and forms, as a result, we need to fill out any form we see by some javascript code. So, let's first make a function to get all the forms from the HTML content of any web page (grabbed from this tutorial):

def get_all_forms(url):
    """Given a `url`, it returns all forms from the HTML content"""
    soup = bs(requests.get(url).content, "html.parser")
    return soup.find_all("form")

Now this function returns a list of forms as soup objects, we need a way to extract every form details and attributes (such as action, method and various input attributes), the below function does exactly that:

def get_form_details(form):
    """
    This function extracts all possible useful information about an HTML `form`
    """
    details = {}
    # get the form action (target url)
    action = form.attrs.get("action", "").lower()
    # get the form method (POST, GET, etc.)
    method = form.attrs.get("method", "get").lower()
    # get all the input details such as type and name
    inputs = []
    for input_tag in form.find_all("input"):
        input_type = input_tag.attrs.get("type", "text")
        input_name = input_tag.attrs.get("name")
        inputs.append({"type": input_type, "name": input_name})
    # put everything to the resulting dictionary
    details["action"] = action
    details["method"] = method
    details["inputs"] = inputs
    return details

After we get the form details, we need another function to submit any given form:

def submit_form(form_details, url, value):
    """
    Submits a form given in `form_details`
    Params:
        form_details (list): a dictionary that contain form information
        url (str): the original URL that contain that form
        value (str): this will be replaced to all text and search inputs
    Returns the HTTP Response after form submission
    """
    # construct the full URL (if the url provided in action is relative)
    target_url = urljoin(url, form_details["action"])
    # get the inputs
    inputs = form_details["inputs"]
    data = {}
    for input in inputs:
        # replace all text and search values with `value`
        if input["type"] == "text" or input["type"] == "search":
            input["value"] = value
        input_name = input.get("name")
        input_value = input.get("value")
        if input_name and input_value:
            # if input name and value are not None, 
            # then add them to the data of form submission
            data[input_name] = input_value

    print(f"[+] Submitting malicious payload to {target_url}")
    print(f"[+] Data: {data}")
    if form_details["method"] == "post":
        return requests.post(target_url, data=data)
    else:
        # GET request
        return requests.get(target_url, params=data)

The above function takes form_details which is the output of the get_form_details() function we just wrote as an argument that contains all form details. It also accepts the url in which the original HTML form was put and the value that is set to every text or search input field.

After we extract the form information, we just submit the form using requests.get() or requests.post() methods (depending on the form method).

Now that we have ready functions to extract all form details from a web page and submit them, it is easy now to scan for the XSS vulnerability now:

def scan_xss(url):
    """
    Given a `url`, it prints all XSS vulnerable forms and 
    returns True if any is vulnerable, False otherwise
    """
    # get all the forms from the URL
    forms = get_all_forms(url)
    print(f"[+] Detected {len(forms)} forms on {url}.")
    js_script = "<Script>alert('hi')</scripT>"
    # returning value
    is_vulnerable = False
    # iterate over all forms
    for form in forms:
        form_details = get_form_details(form)
        content = submit_form(form_details, url, js_script).content.decode()
        if js_script in content:
            print(f"[+] XSS Detected on {url}")
            print(f"[*] Form details:")
            pprint(form_details)
            is_vulnerable = True
            # won't break because we want to print available vulnerable forms
    return is_vulnerable

Related: Build 24 Ethical Hacking Scripts & Tools with Python Book

Here is what the function does:

  • Given a URL, it grabs all the HTML forms and then prints the number of forms detected.
  • It then iterates all over the forms and submits them by putting the value of all text and search input fields with a Javascript code.
  • If the Javascript code is injected and successfully executed, then this is a clear sign that the web page is XSS vulnerable.

Let's try this out:

if __name__ == "__main__":
    url = "https://xss-game.appspot.com/level1/frame"
    print(scan_xss(url))

This is an intended XSS vulnerable website. Here is the result:

[+] Detected 1 forms on https://xss-game.appspot.com/level1/frame.
[+] XSS Detected on https://xss-game.appspot.com/level1/frame
[*] Form details:
{'action': '',
 'inputs': [{'name': 'query',
             'type': 'text',
             'value': "<Script>alert('hi')</scripT>"},
            {'name': None, 'type': 'submit'}],
 'method': 'get'}
True

As you may see, the XSS vulnerability is successfully detected. Now, this code isn't perfect for any XSS-vulnerable website. If you want to detect XSS for a specific website, you may need to refactor this code for your needs. This tutorial aims to make you aware of these kinds of attacks and to learn how to detect XSS vulnerabilities.

If you really want advanced tools to detect and even exploit XSS, there are a lot out there, XSStrike is such a great tool, and it is written purely in Python!

Alright, we are done with this tutorial; you can extend this code by extracting all website links and running the scanner on every link you find; this is a great challenge for you!

Finally, in our Ethical Hacking with Python Book, we build 36 hacking tools and scripts from scratch using Python. Make sure to check it out here if you're interested!

Learn also: How to Make a Port Scanner in Python using Socket Library.

Happy Coding ♥

Take the stress out of learning Python. Meet our Python Code Assistant – your new coding buddy. Give it a whirl!

View Full Code Switch My Framework
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!