Sigma Rules Automation with n8n

Project Overview

This project automates the process of generating, validating, and storing Sigma rules for new Common Vulnerabilities and Exposures (CVEs) using the n8n automation platform. Sigma rules are standardized detection rules used in cybersecurity to identify threats in logs, and this project streamlines their creation for a given set of CVEs. The workflow does the following:

Input:

Takes a list of CVEs in JSON format, each containing details like CVE ID, description, vendor, product, and severity.

Processing:

  • Generates Sigma rules for each CVE using an AI model (Together AI Llama 3.3 70B Free).

  • Validates the generated Sigma rules for syntax, completeness, and logical accuracy using another AI model (DeepSeek R1 Distill Llama 70B Free).

  • Filters the results to keep only valid Sigma rules.

Output:

  • Appends the valid Sigma rules to a Google Sheet named “Sigma Rules Log” with columns for CVE ID, Sigma Rule, and Validation Reason.

  • Stores each Sigma rule as a .yml file in a GitHub repository named sigma-rules-repo under the sigma-rules/ directory, with filenames based on CVE IDs (e.g., CVE-2023-5852.yml).

Purpose

The goal is to automate the creation of Sigma rules, which would otherwise be a manual and time-consuming task for cybersecurity professionals. By leveraging AI and automation, this project reduces human effort, ensures consistency in rule generation, and provides a centralized storage solution for easy access and collaboration.

Tools and Services Used
  • n8n: The automation platform that orchestrates the workflow.

  • Together AI: Provides the Llama 3.3 70B model for generating Sigma rules and DeepSeek R1 Distill Llama 70B model for validating Sigma rules.

  • Google Sheets: Stores the Sigma rules in a spreadsheet for easy viewing and analysis.

  • GitHub: Hosts the Sigma rules as .yml files in a repository for version control and sharing

Walkthrough: Step-by-Step Guide

Below is a detailed walkthrough of the n8n workflow, explaining each node’s purpose, configuration, and role in the process. Follow along to understand how the workflow operates and how to replicate it.

1- Schedule Trigger Node

Purpose

The "Schedule Trigger" node serves as the entry point for the workflow, initiating the entire process on a predefined schedule. This automation ensures that the Sigma rule generation and validation process runs daily without manual intervention, making it ideal for keeping up with new CVEs.

Configuration

  • Node Name: Schedule Trigger

  • Trigger Rules:

    • Trigger Interval: Days

    • Days Between Triggers: 1 (the workflow runs once every day)

    • Trigger at Hour: Midnight (the workflow triggers at 00:00)

    • Trigger at Minute: 0 (exactly at the start of the hour)

  • Output: The node doesn’t produce any specific data output; it simply triggers the next node in the workflow at the specified time.

The schedule can be adjusted based on your needs (e.g., run every 2 days or at a different time) by modifying the "Days Between Triggers" or "Trigger at Hour" settings.

If you prefer manual execution for testing, you can temporarily disable the schedule and use the "Execute Workflow" button in n8n.

2- Fetch Recent CVE Commits Node

Purpose

The "Fetch Recent CVE Commits" node retrieves a list of recent commits from the CVEProject/cvelistV5 GitHub repository. This repository contains CVE data in JSON format, and by fetching recent commits, the workflow identifies which CVE files have been updated in the last 24 hours. This ensures that the workflow processes only the latest or modified CVEs, keeping the Sigma rules up to date.

Configuration

  • Method: GET

  • URL: https://api.github.com/repos/CVEProject/cvelistV5/commits

  • Authentication:

    • Type: Predefined Credential Type (you can define your own credentials before start building the workflow)

    • Credential Type: GitHub API

    • Credential Name: GitHub CVE Access (this credential uses a GitHub personal access token with the repo scope to access public repositories; no write access is needed here since this is a read-only operation).

  • Send Query Parameters:

    • Specify Query Parameters: Using Fields Below

    • Query Parameters:

      • Name: since

        • Value: {{ new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString() }}

        • This expression calculates the timestamp for 24 hours ago, ensuring the API returns commits made in the last day. For example, if the current time is 2025-05-01T03:01:13.146Z, this evaluates to 2025-04-30T03:01:13.146Z.

      • Name: per_page

        • Value: 100

        • This sets the number of commits returned per page to 100, the maximum allowed by the GitHub API, to minimize pagination.

  • Send Headers:

    • Specify Headers: Using Fields Below

    • Header Parameters:

      • Name: Accept

        • Value: application/vnd.github.v3+json

        • This header specifies that the API should return data in the GitHub API v3 JSON format.

  • Send Body: Disabled (no body is needed for a GET request).

Output: The node returns a JSON array of commit objects, each containing details like the commit SHA, message, and files changed.

Authentication: The GitHub API requires authentication even for public repositories to avoid rate limits (unauthenticated requests are limited to 60 requests per hour; authenticated requests allow 5,000 per hour). The “GitHub CVE Access” credential must use a personal access token with at least the public_repo scope (included in the repo scope).

3- Parse Commit Messages for CVEs Node

Purpose

This node processes the commit messages retrieved by the "Fetch Recent CVE Commits" node to identify new and updated CVEs. The commit messages in the CVEProject/cvelistV5 repository follow a structured format where the second line lists new CVEs and the third line lists updated CVEs. This node extracts these lines, categorizes them as "new" or "updated," and prepares the data for further processing.

Configuration

  • Mode: Run Once for All Items (the code processes all commit items at once, rather than iterating over each item individually).

  • Language: JavaScript

  • Code:

return items.map(item => {
  const message = item.json.commit.message;
  if (typeof message !== 'string') {
    return { json: { error: 'Message is not a string', message } };
  }
  const lines = message.split('\n');
  const newCVEsLine = lines.length > 1 ? lines[1].trim() : '';
  const updatedCVEsLine = lines.length > 2 ? lines[2].trim() : '';
  const output = [];
  if (newCVEsLine) {
    output.push({
      json: {
        text: newCVEsLine,
        status: 'new', // Changed from type to status
        commitSha: item.json.sha
      }
    });
  }
  if (updatedCVEsLine) {
    output.push({
      json: {
        text: updatedCVEsLine,
        status: 'updated', // Changed from type to status
        commitSha: item.json.sha
      }
    });
  }
  return output;
}).flat();

Explanation of the Code:

  • Input Handling: The node takes the list of commits from the previous node (items), where each item has a json.commit.message field containing the commit message.

  • Validation: Checks if the commit message is a string. If not, it returns an error object.

  • Message Parsing: Splits the commit message into lines using split('\n').

  • Line Extraction:

    • newCVEsLine: The second line (index 1) of the commit message, which lists new CVEs (if present).

    • updatedCVEsLine: The third line (index 2), which lists updated CVEs (if present).

  • Output Creation:

    • For each non-empty newCVEsLine, creates an item with status: 'new', the line text, and the commit SHA.

    • For each non-empty updatedCVEsLine, creates an item with status: 'updated', the line text, and the commit SHA.

  • Flattening: Uses flat() to combine the arrays of new and updated CVE items into a single list.

Output: A list of items, each representing a line of CVE data (new or updated).

4- Filter for Actual CVEs Node

Purpose

This node filters the output of the "Parse Commit Messages for CVEs" node to keep only the commit message lines that actually contain CVE updates. The previous node extracts lines that may indicate new or updated CVEs, but some lines might be empty or not contain actual CVE data (e.g., "0 new CVEs"). This node ensures that only lines with a non-zero number of new or updated CVEs are passed to the next step, avoiding unnecessary processing of irrelevant data.

Configuration

  • Conditions:

    • Condition 1:

      • Expression: {{ $json.status === 'new' ? parseInt($json.text.match(/(\d+) new CVEs/)[1]) : 0 }}

      • Operator: is greater than

      • Value: 0

      • Explanation: This condition checks if the item has a status of "new". If true, it uses a regex ((\d+) new CVEs) to extract the number of new CVEs from the text field (e.g., "5 new CVEs" extracts "5"). The parseInt function converts the extracted number to an integer. If the number is greater than 0, the item passes the filter. If the status isn’t "new", the expression evaluates to 0, failing this condition.

    • OR

    • Condition 2:

      • Expression: {{ $json.status === 'updated' ? parseInt($json.text.match(/(\d+) updated CVEs/)[1]) : 0 }}

      • Operator: is greater than

      • Value: 0

      • Explanation: This condition checks if the item has a status of "updated". If true, it uses a regex ((\d+) updated CVEs) to extract the number of updated CVEs from the text field (e.g., "3 updated CVEs" extracts "3"). The parseInt function converts the extracted number to an integer. If the number is greater than 0, the item passes the filter. If the status isn’t "updated", the expression evaluates to 0, failing this condition.

  • Output: A filtered list of items where either the "new CVEs" or "updated CVEs" count is greater than 0.

5- Extract CVE IDs and File Paths Node

Purpose

This node processes the filtered commit message lines from the "Filter for Actual CVEs" node to extract individual CVE IDs and construct the file paths for their corresponding JSON files in the CVEProject/cvelistV5 repository. This node identifies all CVE IDs mentioned in the text field (e.g., "CVE-2023-5852 CVE-2023-5854") and generates the file paths (e.g., cves/2023/58xxx/CVE-2023-5852.json) needed to fetch the CVE details in the next step.

Configuration

  • Mode: Run Once for All Items (the code processes all items at once, rather than iterating over each item individually).

  • Language: JavaScript

  • Code:

return items.map((item, index) => {
  const text = item.json.text;
  const status = item.json.status;
  if (!status) {
    return { json: { error: 'Status is missing', text, commitSha: item.json.commitSha } };
  }
  const cveMatches = text.match(/CVE-\d{4}-\d{4,7}/g) || [];
  if (cveMatches.length === 0) {
    return { json: { error: 'No CVE ID found', text, commitSha: item.json.commitSha, status } };
  }
  return cveMatches.map(cve => {
    const year = cve.split('-')[1];
    const idPart = cve.split('-')[2];
    const dir = idPart.slice(0, -3) + 'xxx';
    const filePath = `cves/${year}/${dir}/${cve}.json`;
    return {
      json: {
        cveId: cve,
        filePath: filePath,
        commitSha: item.json.commitSha,
        sourceIndex: index // Track the source item’s index
      }
    };
  });
}).flat();

Explanation of the Code:

  • Input Handling: The node takes the filtered items from the previous node (items), where each item has a json.text field (e.g., "CVE-2023-5852 CVE-2023-5854"), a json.status field ("new" or "updated"), and a json.commitSha.

  • Validation:

    • Checks if the status field exists. If not, returns an error item with the message "Status is missing".

    • Uses a regex (/CVE-\d{4}-\d{4,7}/g) to extract all CVE IDs from the text field. The pattern matches strings like "CVE-2023-5852" (CVE, year, 4-7 digit ID). The || [] ensures an empty array if no matches are found.

    • If no CVE IDs are found, returns an error item with the message "No CVE ID found".

  • CVE Processing:

    • For each matched CVE ID, extracts the year (e.g., "2023" from "CVE-2023-5852") and the ID part (e.g., "5852").

    • Constructs a directory name by taking the first digit(s) of the ID part and appending "xxx" (e.g., "58xxx" for "5852").

    • Builds the file path in the format cves/{year}/{dir}/{cve}.json (e.g., cves/2023/58xxx/CVE-2023-5852.json).

  • Output Creation:

    • Creates an item for each CVE ID with fields: cveId (the CVE ID), filePath (the constructed path), commitSha (from the input item), and sourceIndex (the index of the input item for tracking purposes).

  • Flattening: Uses flat() to combine the arrays of CVE items into a single list.

Output: A list of items, each representing a single CVE ID with its corresponding file path and metadata.

6- Fetch CVE Files Node

Purpose

This node retrieves the CVE JSON files from the CVEProject/cvelistV5 GitHub repository using the file paths generated by the "Extract CVE IDs and File Paths" node. Each file contains detailed information about a specific CVE (e.g., description, vendor, product, severity), which is necessary for generating Sigma rules in the next steps.

Configuration

  • Method: GET

  • URL:https://api.github.com/repos/CVEProject/cvelistV5/contents/{{ $json.filePath }}

    • Explanation: The URL dynamically incorporates the filePath field from the previous node (e.g., cves/2023/58xxx/CVE-2023-5852.json), forming a complete API endpoint like https://api.github.com/repos/CVEProject/cvelistV5/contents/cves/2023/58xxx/CVE-2023-5852.json. This endpoint returns metadata about the file, including a download URL for the file’s content.

  • Authentication:

    • Type: Predefined Credential Type

    • Credential Type: GitHub API

    • Credential Name: GitHub CVE Access (this credential uses the same GitHub personal access token as the "Fetch Recent CVE Commits" node, with the repo scope, which includes public_repo access for reading public repositories).

  • Send Query Parameters: Disabled (no query parameters are needed)

  • Send Headers:

    • Specify Headers: Using Fields Below

    • Header Parameters:

      • Name: Accept

        • Value: application/vnd.github.v3+json

        Explanation: This header ensures the GitHub API returns data in the v3 JSON format.

  • Send Body: Disabled (no body is needed for a GET request).

  • Options: Response Format: JSON (the response is expected to be in JSON format).

Output: A list of items, each containing the GitHub API response for a CVE file. The response includes metadata about the file, such as its name, path, and a download_url to fetch the raw content.

7- Content Decoding Node

Purpose

This node decodes the Base64-encoded content of the CVE JSON files fetched from the CVEProject/cvelistV5 repository, parses the content as JSON, and extracts key information such as the CVE ID, vendors, and products. This node prepares the CVE data in a structured format for Sigma rule generation in the subsequent steps.

Configuration

  • Mode: Run Once for All Items (the code processes all items at once, rather than iterating over each item individually).

  • Language: JavaScript

  • Code:

return items.map(item => {
  const encodedContent = item.json.content;

  // Check if content exists
  if (!encodedContent) {
    return { json: { error: 'No content found', filePath: item.json.path, cveId: item.json.cveId } };
  }

  // Decode Base64 content
  let decodedContent;
  try {
    decodedContent = Buffer.from(encodedContent, 'base64').toString('utf-8');
  } catch (e) {
    return { json: { error: 'Failed to decode Base64', filePath: item.json.path, cveId: item.json.cveId, errorMessage: e.message } };
  }

  // Parse JSON
  let cveData;
  try {
    cveData = JSON.parse(decodedContent);
  } catch (e) {
    return { json: { error: 'Failed to parse JSON', filePath: item.json.path, cveId: item.json.cveId, decodedContent, errorMessage: e.message } };
  }

  // Extract vendors and products
  const affected = cveData.containers?.cna?.affected || [];
  const vendorsAndProducts = affected.map(entry => ({
    vendor: entry.vendor || 'unknown',
    product: entry.product || 'unknown'
  }));

  // Return the processed data
  return {
    json: {
      cveId: cveData.cveMetadata?.cveId || item.json.cveId || 'unknown',
      filePath: item.json.path,
      commitSha: item.json.commitSha,
      vendorsAndProducts: vendorsAndProducts,
      cveData: cveData
    }
  };
});

Explanation of the Code:

  • Input Handling: The node takes items from the previous node, expecting each item to have a json.content field (Base64-encoded CVE JSON data), a json.path (file path), a json.cveId, and a json.commitSha.

  • Validation: Checks if the content field exists. If not, returns an error item with the message "No content found".

  • Base64 Decoding:

    • Decodes the Base64-encoded content using Buffer.from(encodedContent, 'base64').toString('utf-8').

    • If decoding fails, returns an error item with the message "Failed to decode Base64" and the error details.

  • JSON Parsing:

    • Parses the decoded content into a JavaScript object using JSON.parse(decodedContent).

    • If parsing fails, returns an error item with the message "Failed to parse JSON", including the decoded content and error details.

  • Data Extraction:

    • Extracts the affected array from cveData.containers?.cna?.affected (if it exists, otherwise defaults to an empty array).

    • Maps the affected array into a list of vendor-product pairs, defaulting to "unknown" if vendor or product fields are missing.

  • Output Creation:

    • Returns a new item with fields: cveId (from the parsed data or fallback to the input cveId), filePath, commitSha, vendorsAndProducts (list of vendor-product pairs), and (the full parsed CVE data).

Output: A list of items, each containing the decoded and parsed CVE data along with extracted metadata.

8- Filter by Vendor Node

Purpose

This node filters the CVE data to keep only those CVEs associated with specific vendors of my interest: Cisco, Fortinet, Palo Alto, and Google. This node ensures that the workflow focuses on CVEs relevant to these vendors, avoiding unnecessary processing of CVEs from other vendors.

Configuration

  • Conditions:

    • Condition 1:

      • Expression: {{ $json.vendorsAndProducts[0]?.vendor?.toLowerCase() || 'unknown' }}

      • Operator: is equal to

      • Value: cisco

      Explanation: This condition checks the vendor field of the first entry in the vendorsAndProducts array (from the previous node). The ?. operator safely accesses nested fields, and toLowerCase() ensures case-insensitive comparison (e.g., "Cisco" or "CISCO" will match). If the field is missing, it defaults to 'unknown'. The item passes if the vendor matches "cisco".

    • OR

    • Condition 2:

      • Expression: {{ $json.vendorsAndProducts[0]?.vendor?.toLowerCase() || 'unknown' }}

      • Operator: is equal to

      • Value: fortinet

      • Explanation: Similar to the first condition, but checks if the vendor matches "fortinet".

    • OR

    • Condition 3:

      • Expression: {{ $json.vendorsAndProducts[0]?.vendor?.toLowerCase() || 'unknown' }}

      • Operator: is equal to

      • Value: paloalto

      • Explanation: Similar to the first condition, but checks if the vendor matches "paloalto".

    • OR

    • Condition 4:

      • Expression: {{ $json.vendorsAndProducts[0]?.vendor?.toLowerCase() || 'unknown' }}

      • Operator: is equal to

      • Value: google

      • Explanation: Similar to the first condition, but checks if the vendor matches "google".

  • Output: A filtered list of items where the first vendor in vendorsAndProducts matches one of the specified vendors (Cisco, Fortinet, Palo Alto, or Google).

9- Log CVE Items (Debug) Node

Purpose

This node logs the input CVE items to the console for debugging purposes and passes the items through unchanged. This node serves as a debugging checkpoint to inspect the filtered CVE data before proceeding to the next steps, such as generating Sigma rules.

Configuration

  • Mode: Run Once for All Items.

  • Language: JavaScript

  • Code:

console.log('Input items:', JSON.stringify($input.all(), null, 2));
console.log('First item:', JSON.stringify($input.first(), null, 2));
return $input.all();

Explanation of the Code:

  • Logging:

    • console.log('Input items:', JSON.stringify($input.all(), null, 2)): Logs all input items in a formatted JSON string. $input.all() returns an array of all items passed to the node, and JSON.stringify(..., null, 2) formats the output with indentation for readability.

    • console.log('First item:', JSON.stringify($input.first(), null, 2)): Logs the first item in the input array in a formatted JSON string. $input.first() returns the first item, useful for quickly inspecting the structure of the data.

  • Output: return $input.all() passes all input items through unchanged, ensuring the workflow continues with the same data.

Output: The same list of items received as input, unchanged.

10- Preprocess CVE Data Node

Purpose

This node simplifies the CVE data by extracting and formatting key fields (cveId, description, vendor, product, and severity) into a streamlined structure. This node prepares the data in a concise format for the next steps, such as generating Sigma rules, by reducing the complexity of the raw CVE JSON data.

Configuration

  • Mode: Run Once for All Items.

  • Language: JavaScript

  • Code:

// Get all items from "Log CVE Items (Debug)"
const items = $input.all().map(item => item.json);

// Log the input for debugging
console.log('Input to Preprocess CVE Data:', JSON.stringify(items, null, 2));

// Process each item into the simplified format
const processedItems = items.map(item => {
  const cveId = item.cveId || "unknown";
  const description = item.cveData && item.cveData.containers && item.cveData.containers.cna && item.cveData.containers.cna.descriptions && item.cveData.containers.cna.descriptions[0] ? item.cveData.containers.cna.descriptions[0].value : "No description available";
  const vendor = item.vendorsAndProducts && item.vendorsAndProducts[0] ? item.vendorsAndProducts[0].vendor : "unknown";
  const product = item.vendorsAndProducts && item.vendorsAndProducts[0] ? item.vendorsAndProducts[0].product : "unknown";
  const severity = item.cveData && item.cveData.containers && item.cveData.containers.cna && item.cveData.containers.cna.metrics && item.cveData.containers.cna.metrics[0] && item.cveData.containers.cna.metrics[0].cvssV3_1 ? item.cveData.containers.cna.metrics[0].cvssV3_1.baseSeverity : "MEDIUM";

  return {
    json: {
      cveId: cveId,
      description: description,
      vendor: vendor,
      product: product,
      severity: severity
    }
  };
});

// Log the output for debugging
console.log('Output from Preprocess CVE Data:', JSON.stringify(processedItems, null, 2));

// Return the array of processed items
return processedItems;

Explanation of the Code:

  • Input Handling: Retrieves all items from the previous node ($input.all()) and extracts the json field from each item using map(item => item.json).

  • Debug Logging:

    • Logs the input data with console.log('Input to Preprocess CVE Data:', JSON.stringify(items, null, 2)) for debugging, formatting the JSON with indentation for readability.

  • Data Processing:

    • For each item, extracts key fields with fallback values:

      • cveId: Taken from item.cveId, defaults to "unknown".

      • description: Extracted from item.cveData.containers.cna.descriptions[0].value using nested checks, defaults to "No description available".

      • vendor: Taken from the first entry of item.vendorsAndProducts[0].vendor, defaults to "unknown".

      • product: Taken from the first entry of item.vendorsAndProducts[0].product, defaults to "unknown" .

      • severity: Extracted from item.cveData.containers.cna.metrics[0].cvssV3_1.baseSeverity, defaults to "MEDIUM".

    • Creates a new item with a simplified structure containing only these fields.

  • Debug Logging: Logs the processed output with console.log('Output from Preprocess CVE Data:', JSON.stringify(processedItems, null, 2)).

  • Output Creation: Returns the array of processed items in the required n8n format (each item wrapped in a { json: {...} } object).

Output: A list of items, each containing a simplified version of the CVE data with only the key fields.

11- Loop Over CVEs Node

Purpose

This node iterates over the preprocessed CVE items one at a time, enabling individual processing of each CVE for Sigma rule generation. This node ensures that each CVE is handled independently, which is useful for making API calls (e.g., to an AI model for Sigma rule generation) while avoiding rate limits or ensuring isolated error handling.

Configuration

  • Batch Size: 1 (processes one CVE item at a time per iteration).

Output: Outputs one item at a time to the nodes within the loop.For example, in the first iteration:

[
  {
    "json": {
      "cveId": "CVE-2023-5852",
      "description": "A vulnerability in Google Chrome allows remote attackers to execute arbitrary code.",
      "vendor": "Google",
      "product": "Chrome",
      "severity": "HIGH"
    }
  }
]

This continues until all items from the "Preprocess CVE Data" node are processed.

12- Prepare Sigma Rule Request Body Node

Purpose

This node constructs the request body for an API call to Together AI’s LLaMA-3.3-70B model to generate a Sigma rule in YAML format for a single CVE. This node formats the CVE data into a structured prompt, includes detailed instructions for the AI model, and prepares the request for the next node (HTTP Request node) to send to the AI model.

Configuration

  • Mode: Run Once for All Items (the code processes all items at once, but since it’s inside the loop with a batch size of 1, it effectively processes one CVE item per iteration).

  • Language: JavaScript

  • Code:

return $input.all().map(item => {
  const cveData = item.json;
  // Validate required fields
  if (!cveData.cveId || !cveData.description || !cveData.vendor || !cveData.product || !cveData.severity) {
    console.error("Missing required fields in CVE data:", JSON.stringify(cveData, null, 2));
    throw new Error("Missing required fields in CVE data");
  }
  const requestBody = {
    model: "meta-llama/LLaMA-3.3-70B-Instruct-Turbo-Free",
    messages: [
      {
        role: "system",
        content: "You are a cybersecurity expert specialized in threat detection engineering. Your task is to generate a complete Sigma rule in YAML format based ONLY on the provided CVE data. Follow these instructions strictly:\n\n- Only use the information from the CVE.\n- Be accurate and do not invent data.\n- Write fields:\n  - title: Format as 'Detection of {cveId} - {brief description based on vulnerability type, e.g., Memory Consumption}' (infer the brief description from the CVE description).\n  - id: Format as 'cve-{cveId lowercase, replace 'CVE-' with 'cve-'}-detection' (e.g., cve-2025-46656-detection).\n  - description: Use the exact CVE description provided.\n  - status: Set to 'experimental'.\n  - author: Set to 'Generated by AI'.\n  - date: Set to the current date in 'YYYY-MM-DD' format (e.g., '2025-04-28'), which will be provided as {currentDate}.\n  - modified: Set to the same value as date (e.g., '{currentDate}').\n  - references: Include the provided reference URL as '- {referenceUrl}'. If no reference URL is provided, use '- https://nvd.nist.gov/vuln/detail/{cveId}'.\n  - logsource: Set category to 'windows' if the product or description mentions 'Windows', 'NTLM', or Microsoft products; use 'web' if the description mentions web-related terms (e.g., 'HTTP', 'web application'); use 'linux' if the product or description mentions 'Linux'; use 'database' if the product mentions database systems (e.g., 'MySQL', 'PostgreSQL'); otherwise, use 'application'. Set product to the CVE product (e.g., 'Windows 10 Version 1809') unless the description indicates a broader applicability, then use a general term (e.g., 'windows'). Set service to 'ntlm' if the description mentions 'NTLM'; use 'http' for web vulnerabilities; use 'sysmon' for Windows products if not NTLM; otherwise, omit the service field.\n  - detection: Define patterns based on the CVE description. For Windows/NTLM vulnerabilities, use 'event.category: \"authentication\"' and relevant 'message|contains' terms (e.g., 'message|contains: \"spoofing\"'). For web vulnerabilities, use fields like 'http.request.method'. Use multiple conditions if needed, but avoid requiring all terms unless explicitly related. Add a negative condition (e.g., 'event.code|not: \"4624\"' for Windows logon events, or 'message|contains|not: \"legitimate\"') to reduce false positives if applicable.\n  - condition: Set to 'selection'.\n  - falsepositives: List potential false positives based on the detection patterns (e.g., if 'message|contains: \"spoofing\"', use '- Legitimate network activity mentioning spoofing').\n  - level: Map CVE severity: LOW = low, MEDIUM = medium, HIGH = high, CRITICAL = critical.\n  - tags: Include the CVE ID as '- cve: {cveId}'. Additionally, infer a MITRE ATT&CK technique tag based on the description: use 'attack.t1557' for spoofing or man-in-the-middle attacks, 'attack.lateral_movement' for lateral movement, 'attack.privilege_escalation' for privilege escalation, 'attack.execution' for command execution; otherwise, omit additional tags.\n- Output a fully formatted YAML without any explanation or extra text."
      },
      {
        role: "user",
        content: `CVE Data:\ncveId: ${cveData.cveId}\ndescription: ${cveData.description}\nvendor: ${cveData.vendor}\nproduct: ${cveData.product}\nseverity: ${cveData.severity}\nreferenceUrl: https://github.com/CVEProject/cvelistV5/blob/main/cves/${cveData.cveId.split('-')[1]}/${cveData.cveId.split('-')[2].slice(0, 2)}xxx/${cveData.cveId}.json\ncurrentDate: ${$now.toFormat('yyyy-MM-dd')}\n\nGenerate the full Sigma rule now.`
      }
    ],
    max_tokens: 500
  };
  console.log("Request body for CVE", cveData.cveId, ":", JSON.stringify(requestBody, null, 2));
  return { json: requestBody };
});

Explanation of the Code:

  • Input Handling: The node takes the input from the loop ($input.all()), which contains a single CVE item per iteration due to the batch size of 1. It extracts the json field (item.json) containing the preprocessed CVE data (cveId, description, vendor, product, severity).

  • Validation:

    • Checks for the presence of required fields (cveId, description, vendor, product, severity).

    • If any field is missing, logs an error with the CVE data and throws an exception (throw new Error("Missing required fields in CVE data")), causing the loop iteration to fail for that item.

  • Request Body Construction:

    • Creates a requestBody object for the Together AI API with the following structure:

      • model: Specifies the AI model (meta-llama/LLaMA-3.3-70B-Instruct-Turbo-Free).

      • messages: An array of two messages:

        • A system message with detailed instructions for the AI model on how to generate a Sigma rule in YAML format, including rules for fields like title, id, description, logsource, detection, and tags.

        • A user message containing the CVE data in a structured format, including a dynamically constructed referenceUrl based on the CVE ID (e.g., https://github.com/CVEProject/cvelistV5/blob/main/cves/2023/58xxx/CVE-2023-5852.json) and the current date ($now.toFormat('yyyy-MM-dd'), which evaluates to 2025-05-01 on the current date).

      • max_tokens: Limits the AI response to 500 tokens to ensure the generated Sigma rule is concise.

    • Debug Logging: Logs the constructed requestBody for debugging with console.log("Request body for CVE", cveData.cveId, ":", JSON.stringify(requestBody, null, 2)).

    • Output Creation: Returns the requestBody wrapped in the required n8n format ({ json: requestBody }).

Output: A single item containing the request body for the Together AI API.

13- Generate Sigma Rules via Together AI Node

Purpose

This node sends an API request to Together AI’s LLaMA-3.3-70B model to generate a Sigma rule in YAML format for a single CVE, uses the request body prepared by the "Prepare Sigma Rule Request Body" node to make a POST request to the Together AI API, retrieving the AI-generated Sigma rule for further processing (e.g., validation and storage).

Configuration

  • Method: POST

  • URL: https://api.together.xyz/v1/chat/completions

  • Authentication: None (authentication is handled via the Authorization header).

  • Send Query Parameters: Disabled (no query parameters are needed).

  • Send Headers:

    • Specify Headers: Using Fields Below

    • Header Parameters:

      • Name: Authorization

        • Value: Bearer a*********************************

        Explanation: This header provides the API key for Together AI, authenticating the request. The value is a placeholder (partially masked for security); replace it with your actual Together AI API key.

      • Name: Content-Type

      • Value: application/json

      • Explanation: This header specifies that the request body is in JSON format, as required by the Together AI API.

  • Send Body:

    • Body Content Type: JSON

    • Specify Body: Using JSON

    • JSON: {{ $json }}

      • Explanation: This expression dynamically inserts the JSON request body prepared by the previous node ("Prepare Sigma Rule Request Body"). The body includes the model name, messages (system and user prompts), and max tokens, formatted as shown in the previous node's output.

    • Options: No properties (default options are used, such as expecting a JSON response).

Output: A single item containing the API response from Together AI, which includes the generated Sigma rule in YAML format within the response.

14- Delay after Generation Node

Purpose

This node introduces a 10-second delay after generating a Sigma rule via the Together AI API. This delay helps manage API rate limits by ensuring that requests to the Together AI API (or subsequent APIs in the loop) are spaced out, reducing the risk of hitting rate limit errors (e.g., HTTP 429 Too Many Requests).

Configuration

  • Mode: Run Once for All Items.

  • Language: JavaScript

  • Code:

await new Promise(resolve => setTimeout(resolve, 10000)); // 10-second delay
return $input.all();

Explanation of the Code:

  • Delay Implementation: await new Promise(resolve => setTimeout(resolve, 10000)) creates a promise that resolves after 10,000 milliseconds (10 seconds), effectively pausing the execution of the node for that duration. The await keyword ensures the delay is asynchronous, allowing n8n to handle the pause without blocking other processes.

  • Output: return $input.all() passes all input items through unchanged after the delay. Since the node is inside a loop with a batch size of 1, it processes one item at a time, meaning the delay applies to each iteration of the loop.

Output: The same item received as input, unchanged, after a 10-second delay.

15- Prepare Sigma Rule Validation Inputs Node

Purpose

This node extracts and formats the CVE description and the generated Sigma rule for validation. It prepares a structured output containing the CVE description and the Sigma rule in YAML format, which will be used by a subsequent node ( to validate the Sigma rule using a diffrent AI model).

Configuration

  • Mode: Run Once for All Items.

  • Language: JavaScript

  • Code:

return $input.all().map(item => {
  // Log the structure to confirm
  console.log("Item structure:", JSON.stringify(item, null, 2));

  // The output from Generate Sigma Rules (via Delay after generation)
  const sigmaRuleOutput = item.json;

  // The original input should be accessible; let's try to find it
  // If the original input is not under item.json.input, it might be elsewhere
  let originalInput = item.json.input || item.json; // Fallback to item.json if input is not defined

  // Ensure we can access cveData; add a fallback if it's missing
  const cveDescription = originalInput.cveData?.containers?.cna?.descriptions?.[0]?.value || "Description not found";

  return {
    json: {
      cve_description: cveDescription,
      sigma_rule_yaml: sigmaRuleOutput.choices[0].message.content
    }
  };
});

Explanation of the Code:

  • Input Handling: The node takes the input from the "Delay after Generation" node ($input.all()), which contains a single item per iteration due to the loop’s batch size of 1. The item includes the API response from Together AI, with the generated Sigma rule in item.json.choices[0].message.content.

  • Debug Logging: Logs the structure of the input item using console.log("Item structure:", JSON.stringify(item, null, 2)) to help with debugging and confirm the data structure.

  • Sigma Rule Extraction: Extracts the generated Sigma rule from item.json.choices[0].message.content, which contains the YAML-formatted Sigma rule generated by Together AI via the first model.

  • Original Input Access:

    • Attempts to access the original CVE data via item.json.input, which should contain the preprocessed CVE data from earlier nodes (e.g., "Preprocess CVE Data").

    • Falls back to item.json if item.json.input is not defined, indicating a potential issue with how the original input is passed through the loop.

  • CVE Description Extraction: Extracts the CVE description from originalInput.cveData?.containers?.cna?.descriptions?.[0]?.value using optional chaining to safely navigate the nested structure. If the description is not found, defaults to "Description not found".

  • Output Creation: Returns a new item with a simplified structure containing two fields:

    • cve_description: The extracted CVE description (or the fallback value).

    • sigma_rule_yaml: The generated Sigma rule in YAML format.

Output: A single item containing the CVE description and the Sigma rule, formatted for validation.

16- Validate Sigma Rule via DeepSeek Node

Purpose

This node sends an API request to Together AI’s DeepSeek R1 Distill Llama 70B Free model to validate the Sigma rule generated for a single CVE. It uses the CVE description and Sigma rule prepared by the "Prepare Sigma Rule Validation Inputs" node to construct a prompt, asking the AI model to verify the rule’s syntax, completeness, and logical accuracy, ensuring the rule is usable for threat detection.

Configuration

  • Method: POST

  • URL: https://api.together.xyz/v1/chat/completions

  • Authentication: None (authentication is handled via the Authorization header).

  • Send Query Parameters: Disabled (no query parameters are needed).

  • Send Headers:

    • Specify Headers: Using Fields Below

    • Header Parameters:

      • Name: Authorization

        • Value: Bearer a*********************************

        • Explanation: This header provides the API key for Together AI, authenticating the request. The value is a placeholder (partially masked for security); replace it with your actual Together AI API key. Note that this is the same API key used in the "Generate Sigma Rules via Together AI" node, and it’s recommended to use predefined credentials for better security).

      • Name: Content-Type

        • Value: application/json

        • Explanation: This header specifies that the request body is in JSON format, as required by the Together AI API.

  • Send Body:

    • Body Content Type: JSON

    • Specify Body: Using JSON

    • JSON: {{ $json }}

      • Explanation: This expression dynamically inserts the JSON request body prepared by the previous node ("Prepare Sigma Rule Validation Inputs"). The body typically includes the model name, a system prompt instructing the AI to validate the Sigma rule, and a user prompt containing the CVE description and Sigma rule in YAML format.

  • Options: No properties (default options are used, such as expecting a JSON response).

Output: A single item containing the API response from Together AI, which includes the validation result for the Sigma rule.

17-Parse Sigma Rule Validation Output Node

Purpose

This node processes the validation response from the "Validate Sigma Rule via DeepSeek" node to extract a JSON object containing the validation result. It removes extraneous formatting (such as <think> tags and code block markers) from the DeepSeek API response, parses the embedded JSON string, and formats the result for further processing in the workflow.

Configuration

  • Mode: Run Once for All Items.

  • Language: JavaScript

  • Code:

return $input.all().map(item => {
  const rawContent = item.json.choices[0].message.content;
  // Extract the JSON object by removing <think> tags and code block markers
  const jsonStart = rawContent.indexOf('{');
  const jsonEnd = rawContent.lastIndexOf('}') + 1;
  const jsonString = rawContent.substring(jsonStart, jsonEnd);
  const validationResult = JSON.parse(jsonString);
  return {
    json: validationResult
  };
});

Explanation of the Code:

  • Input Handling: The node takes the input from the "Validate Sigma Rule via DeepSeek" node ($input.all()), which contains a single item per iteration due to the loop’s batch size of 1. The item includes the API response from Together AI, with the validation feedback in item.json.choices[0].message.content.

  • Raw Content Extraction: Extracts the raw content from item.json.choices[0].message.content, which contains the DeepSeek model’s response, expected to include a JSON object wrapped in additional text (e.g., <think> tags or code block markers like ```json).

  • JSON Extraction:

    • Finds the start of the JSON object using rawContent.indexOf('{') to locate the first opening brace.

    • Finds the end of the JSON object using rawContent.lastIndexOf('}') + 1 to locate the last closing brace (plus 1 to include the brace itself).

    • Extracts the JSON string using rawContent.substring(jsonStart, jsonEnd), which removes any text before the opening brace and after the closing brace (e.g., <think> tags or code block markers).

  • JSON Parsing: Parses the extracted JSON string into a JavaScript object using JSON.parse(jsonString).

  • Output Creation: Returns a new item with the parsed validation result in the json field.

Output: A single item containing the parsed validation result as a JSON object.

18- Delay after Validation Node

Purpose

This node introduces a 5-second delay after parsing the Sigma rule validation output. This delay helps manage API rate limits by ensuring that requests to the Together AI API (used in the "Generate Sigma Rules via Together AI" and "Validate Sigma Rule via DeepSeek" nodes) are spaced out across loop iterations, reducing the risk of hitting rate limit errors (e.g., HTTP 429 Too Many Requests).

Configuration

  • Node Type: Wait

  • Resume: After Time Interval

  • Wait Amount: 5.00

  • Wait Unit: Seconds

Output: The node passes through the input items unchanged after the 5-second delay.

19- Filter Valid Sigma Rules Node

Purpose

This node filters the output of the "Loop Over CVEs for Sigma Rule Generation" loop to keep only the Sigma rules that were marked as valid during validation. It uses the isValid field from the validation result to determine which rules to retain, ensuring that only high-quality, validated Sigma rules proceed to the next steps (e.g., storage in Google Sheets and GitHub).

Configuration

  • Mode: Run Once for All Items.

  • Language: JavaScript

  • Code:

// Filter only the valid Sigma rules
return items.filter(item => item.json.isValid === true);

Explanation of the Code:

  • Input Handling: The node takes the output of the "Loop Over CVEs for Sigma Rule Generation" loop, which is a list of items where each item represents the validation result for a single CVE. Each item has a json field containing the parsed validation result (from the "Parse Sigma Rule Validation Output" node), with fields like isValid, reason, and suggestions.

  • Filtering Logic: The items.filter() method keeps only the items where item.json.isValid is true, discarding any Sigma rules that failed validation (i.e., where isValid is false).

  • Output Creation: Returns a new list containing only the items that pass the filter (i.e., valid Sigma rules).

Output: A filtered list of items, containing only the validation results for Sigma rules that are valid.

20- Append Valid Sigma Rules to Sheet Node

Purpose

This node appends the validated Sigma rules, along with their associated metadata, to a Google Sheet(already existed). It extracts the CVE ID from the Sigma rule, the Sigma rule itself, and the validation reason, then adds each rule as a new row in the specified sheet. This node ensures that the generated and validated Sigma rules are persistently stored for further analysis or use.

Configuration

  • Credential to Connect With: Google Sheets account

  • Resource: Sheet Within Document

  • Operation: Append Row

  • Document:

    • Selection Method: By ID

    • Document ID: 1cGc1IOpV-jWg49QdiOXxbEbRWbsj0nc_SA1TQTcf7UQ

      • Explanation: This is the unique ID of the Google Sheet, found in the sheet’s URL (e.g., https://docs.google.com/spreadsheets/d/1cGc1IOpV-jWg49QdiOXxbEbRWbsj0nc_SA1TQTcf7UQ/edit). It identifies the target spreadsheet for appending the Sigma rules.

  • Sheet:

    • Selection Method: By Name

    • Sheet Name: Sheet1

      • Explanation: This specifies the name of the sheet within the Google Sheet document where the rows will be appended. Ensure that "Sheet1" exists in the specified document and has columns labeled "CVE ID", "Sigma Rule", and "Validation Reason" to match the data being sent.

  • Mapping Column Mode: Map Each Column Manually

  • Values to Send:

    • CVE ID:

      • Expression: {{ $json.sigma_rule.match(/^title: Detection of (CVE-\d{4}-\d{4,5})/)[1] }}

      • Explanation: Extracts the CVE ID (e.g., CVE-2023-5852) from the Sigma rule’s title using a regular expression. The Sigma rule is expected to be in item.json.sigma_rule, with a title line like title: Detection of CVE-2023-5852 - Remote Code Execution. The regex ^title: Detection of (CVE-\d{4}-\d{4,5}) captures the CVE ID in the first capturing group ([1]), matching the format CVE-YYYY-NNNN or CVE-YYYY-NNNNN.

    • Sigma Rule:

      • Expression: {{ $json.sigma_rule }}

      • Explanation: Provides the full Sigma rule in YAML format, as stored in item.json.sigma_rule. This field contains the rule generated by the "Generate Sigma Rules via Together AI" node.

    • Validation Reason:

      • Expression: {{ $json.reason }}

      • Explanation: Provides the validation reason from the DeepSeek model, stored in item.json.reason (from the "Parse Sigma Rule Validation Output" node), explaining why the Sigma rule was deemed valid.

  • Options: No properties (default options are used, such as not including a header row since the sheet is assumed to already have headers).

Output: The node appends one row per valid Sigma rule to the Google Sheet and returns the input items unchanged.

  • The node appends a row to "Sheet1" in the specified Google Sheet with the following values:

    • CVE ID: CVE-2023-5852 (extracted from the Sigma rule’s title)

    • Sigma Rule: The full YAML string (as shown above)

    • Validation Reason: The Sigma rule is syntactically correct and logically sound. The detection patterns align with the CVE description.

21- Create Sigma Rule Files in Repo Node

Purpose

This node creates a new file in a GitHub repository for each validated Sigma rule. It stores the Sigma rule as a YAML file in the specified repository, with the file path based on the CVE ID extracted from the Sigma rule’s title. This node ensures that the generated and validated Sigma rules are version-controlled in a GitHub repository, making them accessible for collaboration, auditing, and integration with SIEM.

Configuration

  • Credential to Connect With: Create new credential that has access over the repo.

enter your access token that you have created

Make sure that the token has access over the repo.

while generating it or you can edit it
  • Resource: File

  • Operation: Create

  • Repository Owner:

    • Selection Method: By Name

    • Owner Name: your_username

      • Explanation: Specifies the GitHub user or organization that owns the target repository. Ensure that the authenticated GitHub account has write access to repositories owned by that organization.

  • Repository Name:

    • Selection Method: By Name

    • Repository Name: sigma-rules-repo

      • Explanation: Specifies the target repository where the Sigma rule files will be created. Ensure that the repository sigma-rules-repo exists under the owner and that the authenticated account has write permissions.

  • File Path:

    • Expression: {{ "sigma-rules/" + $json.sigma_rule.match(/^title: Detection of (CVE-\d{4}-\d{4,5})/)[1] + ".yml" }}

      • Explanation: Dynamically constructs the file path for the Sigma rule file. For example, if the Sigma rule’s title is title: Detection of CVE-2023-5852 - Remote Code Execution, the regex ^title: Detection of (CVE-\d{4}-\d{4,5}) extracts CVE-2023-5852. The file path becomes sigma-rules/CVE-2023-5852.yml, placing the file in a sigma-rules directory within the repository.

  • Binary File: Not used (the file content is provided as text, not binary data).

  • File Content:

    • Expression: {{ $json.sigma_rule }}

      • Explanation: Provides the full Sigma rule in YAML format, stored in item.json.sigma_rule. This content is written to the file in the repository.

  • Commit Message:

    • Expression: Add Sigma rule for {{ $json.sigma_rule.match(/^title: Detection of (CVE-\d{4}-\d{4,5})/)[1] }}

      • Explanation: Constructs a commit message based on the CVE ID extracted from the Sigma rule’s title. For example, if the CVE ID is CVE-2023-5852, the commit message will be Add Sigma rule for CVE-2023-5852.

Output: The node creates a new file in the GitHub repository for each Sigma rule and returns the input items unchanged.

  • The node creates a file at sigma-rules/CVE-2023-5852.yml in the hla-7/sigma-rules-repo repository with the Sigma rule content, commits it with the message

Key Achievements
  • Automation of Sigma Rule Generation: The workflow successfully automates the creation of Sigma rules for CVEs, reducing manual effort for cybersecurity professionals.

  • Robust Validation: By leveraging DeepSeek’s AI model, the workflow ensures that only syntactically correct and logically sound Sigma rules are stored, enhancing their reliability for threat detection.

  • Dual Storage: Valid Sigma rules are stored in both Google Sheets and a GitHub repository, providing flexibility, redundancy, and accessibility for further analysis or integration.

  • Rate Limit Management: The inclusion of delay nodes ("Delay after Generation" and "Delay after Validation") prevents API rate limit issues, ensuring the workflow runs smoothly even with a large number of CVEs.

  • Error Handling and Debugging: Throughout the workflow, notes on error handling, debugging, and data propagation (e.g., preserving the Sigma rule through the loop) help maintain robustness and transparency.

Last updated