Data for
AI
Security

High-quality labeled datasets, adversarial samples, and structured threat intelligence — purpose-built for teams training the next generation of AI security models.

The Problem

AI security
teams are
data-starved

Building threat detection models requires massive amounts of labeled security data — the kind that's nearly impossible to collect, clean, and annotate at scale without specialized expertise.

Most teams resort to synthetic data, small academic benchmarks, or proprietary siloes that don't generalize. ShieldSet solves this.

01

Scarce Ground Truth

Real-world malware, phishing, and intrusion samples are rare and fragmented across dozens of siloed repositories.

02

Poor Label Quality

Academic benchmarks contain errors, bias, and stale threat signatures that degrade model performance in production.

03

Adversarial Gaps

Models trained without adversarial samples fail the moment threat actors adapt. Defense requires knowing how attackers think.

Dataset Catalog

What
We Offer

03 Categories

Dataset 01

Labeled Threat Intel

Structured, expert-labeled threat intelligence data spanning APT campaigns, IOCs, TTPs, and malware families — mapped to MITRE ATT&CK.

Dataset 02

Adversarial Samples

Carefully crafted adversarial examples for phishing, social engineering, and evasion attacks — battle-tested against production detection systems.

Dataset 03

Network Intrusion Logs

Normalized pcap-derived features, flow records, and labeled intrusion events — covering lateral movement, C2, exfiltration, and more.

Process

How
It
Works

From raw threat telemetry to model-ready datasets — ShieldSet handles the entire pipeline so your team can focus on building, not wrangling data.

01

Ingest & Normalize

We continuously collect threat telemetry from honeypots, dark web monitoring, partner feeds, and proprietary sensors — then normalize everything into consistent schema.

STIX/TAXIIJSONParquet

02

Expert Annotation

Every sample is reviewed by a team of certified security researchers. Multi-pass labeling with adversarial disagreement resolution ensures >99.4% accuracy.

MITRE ATT&CKCVEMulti-label

03

Versioned Delivery

Datasets are versioned, checksummed, and delivered via API or direct download. Every update includes a diff log so your pipelines stay reproducible.

REST APIS3Versioned

04

Continuous Updates

The threat landscape never sleeps. ShieldSet datasets are refreshed with new samples weekly — your models stay current without rebuilding from scratch.

Weekly RefreshDelta Feeds

Coverage

Threat
Categories
Covered

01

Malware Classification

Ransomware, trojans, worms, rootkits — PE binaries with dynamic + static features.

8.2M samples

02

Phishing Detection

URLs, email headers, HTML content, and screenshot features for phishing identification.

12.4M samples

03

Intrusion Detection

Network flows and host logs from simulated and real-world intrusion events.

6.1M events

04

Vulnerability Intel

CVE-linked exploit samples, patch diffs, and severity context at machine scale.

3.8M records

05

Social Engineering

Spearphishing lures, pretexting scripts, and vishing transcripts with intent labels.

2.1M samples

06

Supply Chain

Dependency confusion, typosquatting, and package tampering indicators.

940K packages

07

C2 Traffic

Labeled command-and-control communication patterns across known RAT families.

4.4M flows

08

Threat Actor TTPs

MITRE ATT&CK-aligned TTP chains mapped to nation-state and eCrime groups.

6,200 groups

Dataset Catalog

Cybersecurity
Datasets

Expert-labeled, production-ready datasets for AI security teams. Log in to view full metadata and download options.

Services

Malware &
Threat
Detection

Comprehensive labeled datasets for training malware detection and classification models — sourced from active honeypots, dark web monitoring, and proprietary sensors, continuously refreshed.

Malware Classification

PE Binary Analysis

Ransomware, trojans, worms, rootkits with dynamic and static feature extraction. 8.2M labeled PE binaries with behavioral tags.

8.2M samples

Adversarial Samples

Evasion-Tested Examples

Adversarial examples crafted to bypass production detection systems, enabling robust model training against adaptive threat actors.

22M+ samples

MITRE ATT&CK Mapped

TTP-Aligned Intelligence

Every record mapped to MITRE ATT&CK tactics and techniques, enabling precise model training for structured threat detection.

6,200 groups

Threat Intelligence

APT & IOC Feeds

Structured intelligence across APT campaigns, indicators of compromise, and nation-state TTP chains — refreshed weekly.

18M+ records

Services

Network
Intrusion
& Traffic

Normalized pcap-derived features, labeled flow records, and C2 traffic patterns for network anomaly detection models — covering lateral movement, exfiltration, and C2 beaconing.

Network Intrusion

Labeled Intrusion Events

Network flows and host logs from real and simulated intrusion events covering lateral movement, privilege escalation, and data exfiltration.

6.1M events

C2 Traffic Analysis

Command & Control Patterns

Labeled C2 communication flows across known RAT families — enabling detection of beaconing, tunneling, and encrypted C2 channels.

4.4M flows

Services

Phishing
& Fraud
Data

Multi-modal phishing datasets spanning URL analysis, email headers, HTML content, and screenshot features — plus social engineering lures for spearphishing and BEC campaigns.

Phishing Detection

Multi-Modal Phishing Data

URLs, email headers, raw HTML, and rendered screenshot features — enabling multi-modal phishing classifiers with 12.4M labeled samples.

12.4M samples

Social Engineering

Spearphishing & Pretexting

Spearphishing lures, pretexting scripts, and vishing transcripts with fine-grained intent labels for social engineering detection models.

2.1M samples

Services

Vulnerability
& Exploit
Data

CVE-linked exploit samples, patch diffs, and severity context at machine scale — enabling AI models that predict exploitability, prioritize patching, and detect active exploitation.

Vulnerability Intel

CVE-Linked Exploit Data

3.8M CVE-linked records with exploit proof-of-concept samples, patch diffs, and CVSS-enriched severity context for vulnerability prioritization models.

3.8M records

Services

Supply
Chain
Risk Data

Indicators of supply chain compromise spanning dependency confusion attacks, typosquatting packages, and package tampering events — enabling AI-driven software supply chain security.

Supply Chain

Package Risk Intelligence

Dependency confusion, typosquatting, and package tampering indicators across 940K malicious or suspicious packages — tagged by attack type and severity.

940K packages

Industries — Startups

Built for
Teams
Moving Fast

You're building the next generation of security tooling — but acquiring and labeling threat data shouldn't eat your runway. ShieldSet gives early-stage security companies immediate access to production-grade datasets so you can focus on building your product, not your data pipeline.

Ship faster

Skip months of collection

Get API access in minutes and integrate labeled threat data directly into your training pipeline. Skip the collection, cleaning, and annotation work entirely.

Stay lean

Pay-as-you-go pricing

Pay-as-you-go and Growth plans scale with your usage. No enterprise contracts, no procurement headaches, no minimums.

Stay current

Weekly dataset refreshes

Weekly dataset refreshes mean your models train on the latest threat signatures — without rebuilding your data infrastructure from scratch.

Industries — Academic / Research Labs

Rigorous Data
for Serious
Research

Academic benchmarks in cybersecurity are often outdated, small-scale, or contaminated with labeling errors. ShieldSet provides research labs with large-scale, reproducible datasets that reflect the current threat landscape.

Reproducibility

Versioned & checksummed

Every dataset is versioned and checksummed. Cite a specific version and other researchers can reproduce your exact experimental setup.

Label Quality

>99.4% accuracy

>99.4% label accuracy across all datasets — validated by certified security researchers with multi-pass review and disagreement resolution.

Documentation

Publication-ready

Datasets come with full schema documentation, label taxonomies, and provenance metadata — everything you need for a rigorous methods section.

Industries — Enterprise

Enterprise-
Grade Data
at Scale

Large security platforms need data that scales with their operations — custom schemas, high-frequency updates, SLA guarantees, and dedicated support. ShieldSet Enterprise is purpose-built for exactly this.

Customization

Custom datasets

Don't see exactly what you need? Our team builds custom datasets to your specification — mapped to your threat model and label taxonomy.

Freshness

15-minute updates

Enterprise customers receive data refreshes every 15 minutes — keeping detection models current against rapidly evolving threats.

Reliability

SLA guarantee

Uptime and delivery SLAs backed by contract. Your data pipelines are mission-critical — we treat them that way.

Support

Dedicated team

A named support team that knows your use case — not a ticket queue. A direct line to people who understand your data architecture.

Industries — Government

Data for
National
Security

Government agencies and defense contractors require data that meets strict provenance, compliance, and operational security requirements. ShieldSet works with public sector teams to deliver structured threat intelligence that meets federal standards.

Compliance

Provenance documentation

Full chain-of-custody documentation for every dataset — supporting compliance and audit requirements in regulated environments.

Interoperability

STIX/TAXII compatible

Data delivered in STIX/TAXII formats compatible with existing government threat intelligence platforms and SIEM integrations.

Delivery

Custom delivery options

Air-gapped delivery, on-premise licensing, and custom ingestion pipelines available for sensitive operational environments.

Developer

Quick
Start

Get your first dataset in under 5 minutes. No setup required — just an API key and a few lines of code.

Step 1 — Get your API key

Sign up for a free account and copy your API key from the dashboard. No credit card required.

Step 2 — Install the SDK

pip install shieldset

Step 3 — Pull your first dataset

      import shieldset as ss

      # Initialize with your API key

      client = ss.Client(api_key="your_api_key_here")

      # List available datasets

      datasets = client.datasets.list()

      # Pull a dataset

      df = client.datasets.pull(

          dataset_id="malware-classification-v3",

          format="parquet",

          limit=10000

      )

      print(df.head())

REST API — curl

      curl -X GET https://api.shieldset.com/v1/datasets \

        -H "Authorization: Bearer YOUR_API_KEY" \

        -H "Content-Type: application/json"

Developer

Docu
mentation

Complete reference for the ShieldSet REST API and Python SDK.

Contents

Authentication

All API requests must include a valid API key passed as a Bearer token in the Authorization header.

Authorization: Bearer ss_live_xxxxxxxxxxxxxxxx

Generate and manage API keys from your . Keys are prefixed by plan: ss_free_, ss_live_, or ss_ent_.

Never expose your API key in client-side code or public repositories. Rotate compromised keys immediately from your dashboard.

Key types

Prefix	Plan	Permissions
`ss_free_`	Free	Read — limited to 5 pulls, 1 dataset
`ss_live_`	Pay As You Go / Growth	Read — all datasets
`ss_ent_`	Enterprise	Read + custom dataset access + SLA

Datasets API

The Datasets API is the core of ShieldSet. Use it to browse the catalog, retrieve metadata, and pull data into your pipeline.

List datasets

GET /v1/datasets

Returns a paginated list of datasets available on your plan.

Parameter	Type	Description
`page`	integer	Page number (default: 1)
`limit`	integer	Results per page, max 100 (default: 20)
`category`	string	Filter by threat category
`format`	string	Filter by available format

# Example response
{
  "data": [
    {
      "id": "malware-clf",
      "name": "Malware Classification Dataset v3",
      "category": "Malware & Threat",
      "records": 8200000,
      "version": "v3.1.0",
      "updated_at": "2026-04-28"
    }
  ],
  "pagination": { "page": 1, "limit": 20, "total": 9 }
}

Get dataset metadata

GET /v1/datasets/:dataset_id

Returns full metadata for a single dataset including schema, sample fields, and provenance information.

curl -X GET https://api.shieldset.com/v1/datasets/malware-clf \
  -H "Authorization: Bearer YOUR_API_KEY"

Pull dataset

POST /v1/datasets/:dataset_id/pull

Initiates a dataset pull. Returns a signed download URL or streams the data depending on format and size.

Body param	Type	Description
`format`	string	Output format: `parquet`, `json`, `csv`, `stix`
`limit`	integer	Max records to return (omit for full dataset)
`version`	string	Specific version to pull (default: latest)
`filters`	object	Field-level filters (see Filtering)

curl -X POST https://api.shieldset.com/v1/datasets/malware-clf/pull \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"format":"parquet","limit":50000}'

Get pull status

GET /v1/pulls/:pull_id

Check the status of an async pull job. Returns pending, processing, ready, or failed along with a download URL when ready.

Pagination

All list endpoints return paginated results. Use the page and limit query parameters to navigate through large result sets.

GET /v1/datasets?page=2&limit=50

Every paginated response includes a pagination object:

{
  "data": [ ... ],
  "pagination": {
    "page": 2,
    "limit": 50,
    "total": 9,
    "total_pages": 1,
    "has_next": false,
    "has_prev": true
  }
}

Parameter	Type	Default	Max
`page`	integer	1	—
`limit`	integer	20	100

Cursor-based pagination (large pulls)

For dataset pulls exceeding 1M records, ShieldSet uses cursor-based pagination. The response includes a next_cursor token to pass in your next request:

POST /v1/datasets/malware-clf/pull

{
  "format": "parquet",
  "cursor": "eyJsYXN0X2lkIjoiYWJjMTIzIn0="
}

Filtering

Use the filters object in pull requests to retrieve a subset of records matching specific field values.

Basic filter

{
  "format": "parquet",
  "filters": {
    "malware_family": "ransomware"
  }
}

Operators

Operator	Description	Example
`eq`	Equals (default)	`"label": "malicious"`
`neq`	Not equals	`"label": {"neq": "benign"}`
`in`	Matches any in list	`"family": {"in": ["ransomware","trojan"]}`
`gte`	Greater than or equal	`"cvss_score": {"gte": 7.0}`
`lte`	Less than or equal	`"cvss_score": {"lte": 9.9}`
`contains`	String contains	`"description": {"contains": "lateral"}`
`date_range`	Date range	`"collected_at": {"gte": "2026-01-01", "lte": "2026-04-30"}`

Combined filters

{
  "filters": {
    "malware_family": { "in": ["ransomware", "trojan"] },
    "collected_at": { "gte": "2026-01-01" },
    "label_confidence": { "gte": 0.95 }
  }
}

Available filter fields vary by dataset. Refer to each dataset's schema documentation — accessible via GET /v1/datasets/:id/schema — for the full list of filterable fields.

Formats

ShieldSet datasets are available in four output formats. Specify the format in your pull request body.

Format	Value	Best for	Plans
Apache Parquet	`parquet`	Large-scale ML training pipelines, Spark, pandas	All
JSON Lines	`json`	Streaming ingestion, lightweight integrations	All
CSV	`csv`	Spreadsheet analysis, quick exploration	All
STIX 2.1	`stix`	Threat intel platforms, SIEM ingestion	Growth, Enterprise
PCAP	`pcap`	Network datasets raw packet replay	Enterprise

Format recommendation

Use parquet for any dataset over 100K records — it is 5–10× smaller than CSV and loads significantly faster in pandas, Spark, and PyArrow.

import pandas as pd
import shieldset as ss

client = ss.Client(api_key="YOUR_API_KEY")

# Pull as Parquet — returns a pandas DataFrame directly
df = client.datasets.pull(
    dataset_id="malware-clf",
    format="parquet"
)
print(df.dtypes)

STIX delivery

STIX bundles are delivered as a single .json file conforming to STIX 2.1. Each record is represented as a STIX object — indicator, malware, attack-pattern, or threat-actor depending on the dataset.

Versioning

Every ShieldSet dataset is versioned using semantic versioning (MAJOR.MINOR.PATCH). By default, pull requests return the latest version. Pin a specific version for reproducible pipelines.

Pinning a version

POST /v1/datasets/malware-clf/pull

{
  "format": "parquet",
  "version": "v3.0.1"
}

List versions

GET /v1/datasets/:dataset_id/versions

Returns all available versions with release dates and change summaries.

{
  "versions": [
    { "version": "v3.1.0", "released": "2026-04-28", "records": 8200000, "latest": true },
    { "version": "v3.0.1", "released": "2026-03-10", "records": 7900000, "latest": false }
  ]
}

Version changelog

GET /v1/datasets/:dataset_id/versions/:version/changelog

Returns a structured diff of what changed between the specified version and its predecessor — new records added, records removed, label corrections, and schema changes.

We recommend pinning versions in production training pipelines and upgrading deliberately. Subscribe to dataset update webhooks to be notified of new versions automatically.

Rate Limits

Rate limits are applied per API key. Exceeding a limit returns a 429 Too Many Requests response with a Retry-After header.

Plan	Requests / min	Pulls / day	Concurrent pulls
Free	10	5 (lifetime)	1
Pay As You Go	60	Unlimited	2
Growth	300	Unlimited	5
Enterprise	Custom	Unlimited	Custom

Rate limit headers

Every API response includes rate limit headers so you can track consumption:

X-RateLimit-Limit: 300
X-RateLimit-Remaining: 287
X-RateLimit-Reset: 1746451200
Retry-After: 12

Handling 429 responses

import time, shieldset as ss

client = ss.Client(api_key="YOUR_API_KEY")

try:
    df = client.datasets.pull("malware-clf")
except ss.RateLimitError as e:
    time.sleep(e.retry_after)
    df = client.datasets.pull("malware-clf")

SDKs

ShieldSet provides official SDKs for Python and JavaScript. Both SDKs handle authentication, retries, rate limiting, and streaming automatically.

Python SDK

pip install shieldset

import shieldset as ss

client = ss.Client(api_key="YOUR_API_KEY")

# List datasets
datasets = client.datasets.list(category="Network")

# Pull with filters
df = client.datasets.pull(
    dataset_id="phishing",
    format="parquet",
    filters={ "confidence": { "gte": 0.9 } },
    limit=100000
)

# Stream a large dataset
with client.datasets.stream("threat-intel") as stream:
    for batch in stream.iter_batches(size=10000):
        process(batch)

JavaScript / Node.js SDK

npm install @shieldset/sdk

import ShieldSet from '@shieldset/sdk';

const client = new ShieldSet({ apiKey: 'YOUR_API_KEY' });

// List datasets
const datasets = await client.datasets.list();

// Pull a dataset
const result = await client.datasets.pull('malware-clf', {
    format: 'json',
    limit: 5000
});
console.log(result.data);

Community SDKs

Community-maintained SDKs for R, Go, and Ruby are available. These are not officially supported — refer to their respective GitHub repositories for documentation.

Errors

ShieldSet uses standard HTTP status codes. All error responses include a JSON body with a machine-readable code and a human-readable message.

{
  "error": {
    "code": "DATASET_NOT_FOUND",
    "message": "Dataset 'xyz' does not exist or is not available on your plan.",
    "status": 404
  }
}

Status	Code	Description
`400`	`INVALID_REQUEST`	Missing or malformed request parameter
`401`	`UNAUTHORIZED`	Missing or invalid API key
`403`	`FORBIDDEN`	Dataset not available on your current plan
`404`	`DATASET_NOT_FOUND`	Dataset ID does not exist
`409`	`VERSION_NOT_FOUND`	Requested version does not exist
`422`	`FILTER_INVALID`	Filter field or operator not supported for this dataset
`429`	`RATE_LIMIT_EXCEEDED`	Too many requests — check `Retry-After` header
`500`	`INTERNAL_ERROR`	Server error — contact support@shieldset.com
`503`	`SERVICE_UNAVAILABLE`	Temporary outage — retry with exponential backoff

Retry strategy

For 429 and 503 errors, implement exponential backoff with jitter. The Python and JS SDKs handle this automatically.

import time, random

def pull_with_retry(client, dataset_id, max_retries=4):
    for attempt in range(max_retries):
        try:
            return client.datasets.pull(dataset_id)
        except ss.RateLimitError as e:
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)

Pricing

Simple,
Transparent
Pricing

Start free, scale when you're ready. No surprises, no lock-in.

Free

$0

Forever

Instant API access, no credit card required. Perfect for exploring our datasets before committing.

5 dataset pulls
5 downloads
Access to 1 dataset
Instant API key on signup
Community support

Pay As You Go

$25

Per dataset pull or download

Only pay for what you use. No monthly commitment. Ideal for variable or unpredictable data needs.

Unlimited dataset pulls
Unlimited downloads
Access to all datasets
Instant API key
Email support

★ Most Popular

Growth

$599

Per month

Unlimited access at one flat monthly rate. Built for AI security teams that need fresh, reliable threat data every single day.

Unlimited dataset pulls
Unlimited downloads
Access to all datasets
Hourly data updates
Usage dashboard
Priority support

Enterprise

Custom

Pricing

Purpose-built data solutions for large security platforms. Custom datasets, volume pricing, and dedicated support.

Unlimited dataset pulls
Unlimited downloads
Custom datasets to spec
15-minute data updates
SLA guarantee
Dedicated support

All plans include access to ShieldSet's high-quality labeled cybersecurity datasets, adversarial samples, and structured threat intelligence — purpose-built for AI security teams.

ShieldSet

Welcome back

Log in to your ShieldSet account

or

Email

Password

Don't have an account?

ShieldSet

Create your account

Start with 5 free dataset pulls. No credit card required.

or

First Name *

M.I.

Last Name *

Work Email *

Password

At least 8 characters
Uppercase letter
Lowercase letter
Number
Symbol (!@#$...)

Confirm Password

Company (optional)

Already have an account?

ShieldSet

✉️

Check your email

We sent a confirmation link to your email address. Click it to verify your account, then come back and log in.

Didn't receive it? Check your spam folder or

Dashboard

U

Welcome back, User

Free Plan · API key active

Account Tier

Free

Current plan

API Requests

0

All time

Dataset Pulls

0 / 5

Used this month

Downloads

0 / 5

Used this month

API Key Status

Active

No issues detected

API Usage — Last 7 Days

Default API Key

ss_free_••••••••••••••••

Recent Activity

DatasetTimestampStatus

API Tokens

Create named tokens for different integrations. Each token inherits your plan permissions.

Token Name

Suggest a Dataset or Improvement

Tell us about a dataset you'd like to see, or a way we can improve ShieldSet.

Thanks — your suggestion has been received.

Your suggestion

Account Balance

You have no balance at this time.

$0.00

Promotions

Expires: 2027-01-01

Accrued Charges

Since last invoice

$0.00

Billing Contact

—

Payment Methods

VISA

Visa ••••3742 · Expires 11/27

Default

MC

Mastercard ••••6815 · Expires 03/29

Billing & Payment History

Account active since 2026-01-19

DescriptionDateAmount

Invoice #00001 2026-05-01 04:43 $0.00

Invoice #00002 2026-04-01 04:49 $0.00

Contact

Get
In
Touch

Whether you're evaluating ShieldSet for your team or ready to get started, we'd love to hear from you. We respond to all serious inquiries within one business day.

Message sent — we'll be in touch shortly.

Name

Work email

Company

Subject

Message

General

This is the one inbox for everything — questions, partnership opportunities, sales inquiries, technical support, press requests, or just wanting to chat.

We're a small team and we actually read every email that comes in. Whatever's on your mind, send it here and we'll get back to you within one business day.

hello@shieldset.com

Company

We Make
Threat Data
Work Harder

ShieldSet is a remote-first data engineering company on a mission to eliminate the data bottleneck holding back AI security teams. We continuously collect, annotate, and deliver production-grade cybersecurity datasets — so the teams building the next generation of threat detection can spend their time building, not wrangling data.

We're a small team with deep roots in both data engineering and offensive security. We've seen firsthand how poor data quality kills model performance in production, and we built ShieldSet to fix that — with expert annotation, rigorous versioning, and a relentless focus on keeping datasets current.

Our Mission

The definitive source for cybersecurity training data.

We exist to give every AI security team access to the reliable, up-to-date data they need to build models that work in production. We take data quality personally, and are proud to be the platform the industry turns to for cybersecurity AI data.

Our Values

Remote First

Outcomes, not offices

We believe great work happens when talented people have autonomy and flexibility. We hire for excellence and trust our team to deliver — regardless of where they're based. What matters is the work.

Learning & Growth

Always getting sharper

The threat landscape evolves constantly, and so do we. We invest in continuous learning, share knowledge openly across the team, and genuinely believe our collective expertise is our biggest competitive advantage.

Inclusion

Small team, mighty culture

We're a diverse team that treats everyone with respect and takes inclusion seriously — not as a policy, but as a foundation for building something we're all proud of. Every voice here matters.

Benefits

Health coverage

Health, dental, and vision insurance for you and your dependents.

Flexible time off

Paid time off, sick leave, and company holidays — with no pressure to not use them.

Remote-first

Fully remote roles. Work from wherever you do your best work.

Performance bonus

Bonus plan tied to individual and company goals — we share in the wins together.

ShieldSet swag

Quality gear for the team. Because working in security should feel like it.

Open Role

Sales Engineer

Remote · Full-time · $120k–$160k

Careers

Join the
ShieldSet
Team

We're a remote-first company hiring people who care deeply about data quality, security, and building things that work. If that's you, we want to hear from you.

Open Roles

Don't see your role?

We're always open to great people

Send us your resume and tell us how you'd contribute.

Blog

Insights
from the
Team

Perspectives on AI security, data quality, and the evolving threat landscape — written by the people building ShieldSet.

Partnership

Let's Build
Something
Together

ShieldSet partners with organizations at the intersection of cybersecurity and data. Whether you produce threat intelligence that complements our datasets, or you're a security platform looking to integrate our data, we want to talk.

Data Providers

Sell your data through ShieldSet

If you produce proprietary threat telemetry, malware samples, or network intelligence, we can help you structure, annotate, and distribute it to AI teams — with revenue sharing that works for you.

Technology Partners

Integrate ShieldSet into your platform

Security platforms, SOAR vendors, and AI tooling providers can integrate ShieldSet data directly via API or white-label licensing. We offer flexible arrangements for platform-level partnerships.

Research Partners

Collaborate on the hardest problems

We partner with research institutions and national labs on data collection, annotation methodology, and dataset benchmarking initiatives. Joint publications and dataset co-creation welcome.

Interested in partnering?

Tell us about your organization and what kind of partnership you have in mind. We respond to all serious inquiries within two business days.

Legal & Privacy

Privacy
Policy

Last updated: June 1, 2025

ShieldSet, Inc. ("ShieldSet," "we," "our," or "us") is committed to protecting your privacy. This Privacy Policy explains how we collect, use, disclose, and safeguard information about you when you use our website and API services (the "Services").

1. Information We Collect

Information you provide directly

When you create an account, we collect your name, email address, company name, and password. Payment information is handled by our payment processor — we do not store full credit card numbers.

Usage and technical data

We automatically collect information about how you use our Services, including API request logs, IP addresses, browser type, operating system, and referring URLs.

2. How We Use Your Information

We use the information we collect to provide, maintain, and improve the Services; process transactions; respond to your comments and requests; send technical notices and security alerts; and monitor usage patterns.

3. Data Sharing

We do not sell your personal information. We share data only with trusted service providers bound by confidentiality obligations — and only as required by law.

4. Data Retention

We retain account data for as long as your account is active or as needed to provide the Services. API request logs are retained for 90 days. You may request deletion by contacting privacy@shieldset.com.

5. Security

We implement industry-standard security measures including encryption in transit (TLS), encrypted storage of credentials, and access controls.

6. Your Rights

Depending on your jurisdiction, you may have the right to access, correct, delete, or restrict processing of your personal data. Contact privacy@shieldset.com to exercise these rights.

7. Contact

For privacy-related questions, contact privacy@shieldset.com.

Legal & Privacy

Terms of
Service

Last updated: June 1, 2025

These Terms of Service ("Terms") govern your access to and use of the ShieldSet, Inc. website and API services. By creating an account or accessing the Services, you agree to be bound by these Terms.

1. Access and Use

Subject to your compliance with these Terms and payment of applicable fees, ShieldSet grants you a limited, non-exclusive, non-transferable right to access and use the Services for your internal business or research purposes. You may not resell or distribute ShieldSet datasets to third parties without express written permission.

2. Acceptable Use

You agree to use the Services only for lawful purposes. You may not use the Services to train models intended to facilitate offensive cyberattacks against unauthorized systems, infringe intellectual property rights, or violate any applicable law.

3. API Usage

Your use of the API is subject to rate limits and usage quotas specified in your plan. ShieldSet reserves the right to suspend API access if abuse is detected.

4. Intellectual Property

ShieldSet and its licensors retain all intellectual property rights in the Services and datasets. Dataset access grants a license to use the data for the purposes described in your plan — it does not transfer ownership.

5. Payment

Paid subscriptions are billed monthly in advance. All fees are non-refundable except as required by law.

6. Termination

Either party may terminate these Terms at any time. ShieldSet may suspend or terminate your access immediately for material breach of these Terms.

7. Disclaimer

The Services are provided "as is" without warranty. ShieldSet's liability is limited to fees paid in the 12 months preceding the claim.

8. Governing Law

These Terms are governed by the laws of the State of Florida.

ShieldSet

Complete your profile

Help us personalize your experience. Takes 30 seconds.

First Name

M.I.

Last Name

Phone Number

Job Title

Company or Organization

Primary Use Case

Team Size

Admin Panel

ShieldSet Admin

Content management & user administration

Careers · Apply

Apply for
Role

Fill out the form below. We review every application carefully and respond within 5 business days.

Full Name *

Email *

Phone

LinkedIn URL

Portfolio / GitHub

Cover Letter

Resume / CV * Click to upload or drag & drop — PDF, DOC, DOCX (max 10MB)

How did you hear about us?

Data forAISecurity

AI securityteams aredata-starved

WhatWe Offer

HowItWorks

ThreatCategoriesCovered

CybersecurityDatasets

Malware &ThreatDetection

NetworkIntrusion& Traffic

Phishing& FraudData

Vulnerability& ExploitData

SupplyChainRisk Data

Built forTeamsMoving Fast

Rigorous Datafor SeriousResearch

Enterprise-Grade Dataat Scale

Data forNationalSecurity

QuickStart

Documentation

Authentication

Key types

Datasets API

List datasets

Get dataset metadata

Pull dataset

Get pull status

Pagination

Cursor-based pagination (large pulls)

Filtering

Basic filter

Operators

Combined filters

Formats

Format recommendation

STIX delivery

Versioning

Pinning a version

List versions

Version changelog

Rate Limits

Rate limit headers

Handling 429 responses

SDKs

Python SDK

JavaScript / Node.js SDK

Community SDKs

Errors

Retry strategy

Simple,TransparentPricing

GetInTouch

We MakeThreat DataWork Harder

The definitive source for cybersecurity training data.

Join theShieldSetTeam

Insightsfrom theTeam

Let's BuildSomethingTogether

PrivacyPolicy

1. Information We Collect

Information you provide directly

Usage and technical data

2. How We Use Your Information

3. Data Sharing

4. Data Retention

5. Security

6. Your Rights

7. Contact

Terms ofService

1. Access and Use

2. Acceptable Use

3. API Usage

4. Intellectual Property

5. Payment

6. Termination

7. Disclaimer

8. Governing Law

Apply forRole

Data for
AI
Security

AI security
teams are
data-starved

What
We Offer

How
It
Works

Threat
Categories
Covered

Cybersecurity
Datasets

Malware &
Threat
Detection

Network
Intrusion
& Traffic

Phishing
& Fraud
Data

Vulnerability
& Exploit
Data

Supply
Chain
Risk Data

Built for
Teams
Moving Fast

Rigorous Data
for Serious
Research

Enterprise-
Grade Data
at Scale

Data for
National
Security

Quick
Start

Docu
mentation

Simple,
Transparent
Pricing

Get
In
Touch

We Make
Threat Data
Work Harder

Join the
ShieldSet
Team

Insights
from the
Team

Let's Build
Something
Together

Privacy
Policy

Terms of
Service

Apply for
Role