What ArmorClaw protects against — and what it doesn't

This page is the threat model for ArmorClaw 0.3.0. It describes what the release addresses, what it partially addresses, and what is explicitly out of scope. No third-party security audit has been performed; this is self-assessed.

Addressed in 0.3.0

Eight protections that ship with the release. Each runs on every relevant tool call, every turn, or every audit log write — not opt-in.

Outbound tool-arg filter

Screens what the agent passes to tool calls. Instruction-override patterns, system-prompt references, and encoded payloads are blocked before the tool call executes. The agent-paused gate also runs here: while paused, every tool call is blocked regardless of content.

outbound-tool-arg-filter.ts

Inbound content classifier

Scores content the agent reads for prompt-injection risk on every turn. External content (web pages, email, file output, bash output) is wrapped in <external-content> framing with a provenance tag before reaching the model. High-scoring content triggers a system-level warning.

inbound-content-classifier.ts, source-tagger.ts

Permission manifests

Every skill declares a static permission manifest at load time. Skills cannot escalate to permissions they didn't declare. Hard-banned levels (system:root, files:global, system:exec) throw at load and are never executed.

permissions.ts, skill-registry.ts

File skill sandbox

The file skill is scoped to a single folder you choose during setup. Path traversal attempts are rejected and logged. The sandbox path cannot be a system directory or contain ~/.armorclaw/ itself.

secure-files/, validators.ts

Browser domain allowlist

Browser navigation is gated by a user-managed allowlist. Non-allowlisted domains are hard-blocked and logged. RFC 1918, loopback, and IPv6 link-local addresses are always blocked even if explicitly listed — DNS-rebinding defence. The allowlist is a navigation-URL filter, not a flow tracker: once at an allow-listed domain, server-driven redirects (HTTP 302, meta-refresh) can reach origins that are not on the list.

browser-allowlist.ts, browser-allowlist-filter.ts

Tamper-evident audit log

Every skill invocation is logged in NDJSON at ~/.armorclaw/audit.log. Each entry is signed with HMAC-SHA256 and chained by SHA-256 of the previous serialised line. Tampering produces a verification failure visible in the dashboard. Tamper-evidence holds against external file corruption and partial-state attacks; it does not hold against a process running as your user that can read the keychain or replace the log file (Cat-3).

audit-logger.ts, audit-verify.ts

Budget hard-stop

The model adapter refuses further API calls once monthly spend exceeds your cap. This is a hard stop, not a notification. The gate covers wrapper-mediated completions (the inbound classifier and future skills); OpenClaw's own model loop is bounded by downstream cost alerting, not a pre-call refusal. A before_prompt_build gate that blocks the main loop on the same flag is tracked for 0.3.1.

token-tracker/store.ts, model-adapter.ts

Approval gate with literal payload

Actions requiring approval are suspended until you confirm in the dashboard. The approval card shows the tool name and parameters as raw JSON — not a model-generated description. The agent cannot describe an action one way and execute another. Long payloads — a multi-paragraph email body, a large config dump — exceed the card's visible area and require scrolling inside the card to see the full content.

permissions.ts

Partially addressed

Threats with real but incomplete mitigations. We name them honestly so you know where the gaps are.

Memory poisoning

The memory.md file loads into the system prompt every session. A successful injection that triggers "remember that…" persists across sessions. Source-tagging and trust-gating are in place. No guarantee an adversarial memory write can be prevented in every scenario.

Cross-skill information flow

The model holds context across skill invocations and could pass sensitive data from one skill to another's tool arguments. No cross-skill information-flow control in v1. Mitigated in practice by permission manifests preventing skills from accessing each other's data stores, but not at the model-context layer.

Novel injection patterns

The outbound filter and inbound classifier handle common patterns. Injection using Unicode homoglyphs, non-English paraphrasing, or novel encoding may score below the classifier's reject threshold. Splitting an instruction-override phrase across separate tool arguments also bypasses the outbound filter — each argument is checked individually and the model only assembles them at prompt time. The inbound classifier is the layer that sees fully-assembled content. No filter is perfect at any sophistication level; we still recommend not pointing the agent at content from untrusted sources.

Approval payload visibility

The approval card shows every key and value precisely, but a crafted payload can mimic familiar field names — e.g. a decorative "draft_email_to" entry placed before the real "send_to" field. Each value is rendered correctly; the failure mode is visual, not data omission. Read the full payload before approving, especially for sends and deletions, rather than skimming for familiar keys.

Out of scope

Threats outside the architectural boundary. These are not oversights — they are constraints of running local software as your user account.

What ArmorClaw does not protect against

ArmorClaw is a local application running as your user. If malware is already running on your computer with the same permissions, ArmorClaw does not protect you. A process with your user permissions can read the audit log, write to the sandbox folder, and interact with the agent's local files. This is an architectural constraint of local software; it is not unique to ArmorClaw. Category 3 threat model (existing malware) is explicitly out of scope.

Audit status

No third-party audit. We have not had a third-party security audit and don't currently have the budget for one. This page is self-assessed. Before reopening signups, we will publish an internal red-team log documenting what we tested and what we found.