Loading...
Loading...
Loading...
> Compiled from: Secludy website crawl, 4 Medium/blog articles, 10 GitHub repos, 2 deep research reports (85+ sources total), LinkedIn profiles, Google Scholar, web searches.
# LLM Privacy Layer — Complete Research Synthesis
> Compiled from: Secludy website crawl, 4 Medium/blog articles, 10 GitHub repos, 2 deep research reports (85+ sources total), LinkedIn profiles, Google Scholar, web searches.
---
## 1. What Is Secludy Building (Exact Product)
### Core Product: Privacy-Safe Synthetic Data for AI Training
Secludy generates **synthetic replicas** of sensitive datasets using **differential privacy (DP)**, enabling organizations to train AI models without risking PII leakage. They position themselves as "release and approval infrastructure for blocked data."
### Two Data Types
**Unstructured Text Synthesis:**
- Input: Free-form text containing PII (emails, chat logs, legal docs, medical records)
- Output: Semantically equivalent text with PII replaced by fake but contextually appropriate data
- Example from their site:
- Original: "I live by the golden gate bridge in San Francisco. My SSN is 231-34-2495."
- Synthetic: "Our place is near the Bay Bridge. Here's my SSN 345-34-6975."
- Preserves: narrative structure, topics, sentiment, linguistic patterns
- Removes: all identifiable information
**Tabular Data Synthesis:**
- Input: Structured data (CSV, databases) with PII columns
- Output: Synthetic rows preserving statistical distributions
- Example from their site: Names, credit scores, SSNs all replaced while maintaining column correlations
### Technical Stack (Confirmed from GitHub + Deep Research)
| Component | Technology |
|-----------|-----------|
| PII Detection (NER) | BERT-based NER model |
| Text Replacement | Qwen-2.5-7B-Instruct (fine-tuned with DP-SGD via LoRA/PEFT) |
| Inference Engine | vLLM with BitsAndBytes 4-bit quantization |
| DP Training | fast-differential-privacy (AWS Labs fork) |
| Distributed Training | DeepSpeed |
| Fine-tuning Acceleration | Unsloth |
| Vector DB Integration | Milvus (for synthetic data quality validation) |
| Leak Detection | Aho-Corasick multi-pattern string matching |
| Deployment | Docker/Kubernetes (Helm charts), AWS SageMaker |
| Infrastructure | Self-hosted in customer VPC, no data leaves network |
### Product Features (From Website + Repos)
1. **Private Synthetic Data Generation** — Create synthetic replicas preserving statistical properties
2. **Secure Fine-Tuning Pipeline** — Set privacy budgets (epsilon/delta) for model training
3. **Leakage Detection Toolkit** — Canary PII injection + leak detection to verify zero leakage
4. **Bias & Diversity Measurement** — Reduce bias, improve dataset balance
5. **Fully Self-Hosted Deployment** — VPC/on-prem with AWS, Azure, GCP, Databricks
6. **Easy Enterprise Integration** — Plug into existing MLOps platforms
7. **"PrivateTransformer Agent"** — Natural language config setup (marketing name, no public technical docs)
### AWS Marketplace Products
1. **PII Leakage Detection and Monitor** — $299/month, Docker on ECS/EKS, 50+ PII categories
2. **PII-Free Synthetic Text Replicas** — $10/hr training (ml.g5.xlarge), $16/hr inference (ml.p3.2xlarge)
- Epsilon: 1.0-16.0 (default 8.0)
- Batch size: 1-128 (default 1)
- Epochs: 0.01-100.0 (default 0.01)
- Learning rate: 0.0001-0.1 (default 0.001)
- Max sequence length: 128-2048 tokens (default 512)
### How It Works (4-Step Pipeline from Website)
1. **Configure Privacy** — Select a privacy preset or customize epsilon/delta
2. **Generate Unstructured Data** — DP-guaranteed synthetic text retaining semantic meaning
3. **Generate Tabular Data** — DP-guaranteed synthetic data retaining statistical properties
4. **Safe Downstream AI Application** — Use for fine-tuning, testing, vendor sharing, simulations
### Compliance Claims
| Regulation | Claim |
|-----------|-------|
| GDPR | DP synthetic data is "fully anonymized" — out of scope for GDPR (NOTE: contested, see Critical Analysis) |
| HIPAA | Patient data safe for AI training via synthetic replicas |
| CCPA | No actual consumer records in synthetic data |
| Corporate IP | Locked-down datasets usable via synthetic replicas |
---
## 2. The Team
### Ben Cerchio — CEO & Co-founder
- **Education**: BBA, Lindner College of Business, University of Cincinnati
- **Career**:
- IT Risk Advisory @ Schneider Downs (Jan 2017 - Mar 2019)
- IT Internal Auditor @ TriNet (Mar 2019 - May 2020)
- Innovation Lab + Technology Auditor @ PayPal (May 2020 - Dec 2021)
- Product Privacy @ TikTok (Dec 2021 - Jun 2023)
- CEO @ Secludy (2024 - present)
- **Profile**: Privacy/compliance expert, not technical ML. Understands enterprise buyer pain points from audit/privacy roles. Active at IAPP conferences (DPI 2025, GPS 2025). Most viral LinkedIn post: "Another day, another AI agent leaks sensitive data" — 2,089 reactions.
- **No academic publications.**
### Mingze He, Ph.D. — CTO & Co-founder
- **Education**:
- B.S. Biotechnology, South China University of Technology (2012)
- Ph.D. Bioinformatics & Computational Biology, Iowa State University (2018)
- Advisors: Dr. Carolyn J. Lawrence-Dill, Dr. Peng Liu
- ORCID: 0000-0002-8164-2480
- **Career**:
- Bioinformatics Researcher @ BGI-Shenzhen (2010-2013)
- Visiting Researcher @ UC-Berkeley (2012-2013)
- Graduate Researcher @ Iowa State University (2013-2018)
- ML Researcher @ SRI International (Feb 2019 - Jan 2020)
- AI/ML Lead (Sr. Data Scientist) @ Williams-Sonoma (Jan 2020 - Jul 2024)
- CTO @ Secludy (2024 - present)
- **Currently hiring**: Machine Learning Engineer (on-site, SF Bay Area)
- **Key milestone**: "Kicked off pilots with leading fintech and pharma companies" (2025-2026)
#### Mingze He's Complete Publication Record (9 Papers)
**CRITICAL FINDING: Zero publications on differential privacy, text anonymization, or synthetic data. All papers are in genomics/bioinformatics.**
| # | Title | Venue | Year | Citations | Role |
|---|-------|-------|------|-----------|------|
| 1 | Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA | **Nature** | 2014 | ~944 | Co-author (9th of 27) |
| 2 | Detection of clinically relevant genetic variants in autism spectrum disorder | Am. J. Human Genetics | 2013 | ~488 | Co-author |
| 3 | Whole-genome sequencing in an autism multiplex family | Molecular Autism | 2013 | ~80 | Co-author |
| 4 | An effort to use human-based exome capture methods for chimpanzee/macaque exomes | PLoS ONE | 2012 | ~29 | Co-author |
| 5 | An ontology approach to comparative phenomics in plants | Plant Methods | 2015 | ~26 | Co-author |
| 6 | A hypothesis-driven approach to RNA expression levels among gene groups | bioRxiv/Current Plant Biology | 2017 | ~6 | **Lead author** |
| 7 | Response to persistent ER stress in plants | The Plant Cell | 2018 | ~76 | Co-author |
| 8 | C-REx: Compare expression profiles for gene groups | J. Open Source Software | 2019 | ~0 | **Lead author** |
| 9 | Validation of computationally predicted synthetic lethal interactions for KRAS | Cancer Research (AACR) | 2021 | ~0 | Co-author (SRI International) |
**PhD Dissertation**: "Analysis of G-quadruplexes as environmental sensors" (Iowa State, 2018) — about maize gene expression under salt stress.
**Disambiguation Warning**: A different "Mingze He" at CUNY publishes on photonics/polaritons (Google Scholar iPWtyYMAAAAJ). Another "Mingze He" publishes on digital watermarking (DBLP pid/243/7441). Neither is the Secludy CTO.
#### Applied Privacy Research (Medium/Secludy — Not Peer-Reviewed)
| # | Title | Date | Key Finding |
|---|-------|------|-------------|
| 1 | Can Synthetic Data Be Reversed? | Sep 2024 | DP makes synthetic data mathematically improbable to reverse-engineer |
| 2 | How to Detect PII Leakage in Real-time | 2025 | Canary PII injection + Aho-Corasick detection tool |
| 3 | Fine-tuning LLMs on Sensitive Data → 19% PII Leakage | Jan 2025 | Headline finding: 19% leakage without DP, 0% with DP |
| 4 | How to Build Trust with Private Synthetic Data | Nov 2024 | Trust framework for synthetic data |
| 5 | Why Data Masking Doesn't Work in the World of LLMs | Jun 2025 | Contextual re-identification defeats masking |
#### What's Incorporable From His Background Into Our MVP
1. **Statistical comparison methodology (C-REx)** — Verify synthetic data preserves statistical distributions of original
2. **ML classification expertise (99.8% tumor classification)** — Build high-accuracy PII detection
3. **NLP + ontology structuring** — Categorize PII in unstructured text
4. **Pipeline engineering (Docker, AWS, HPC)** — Build the data processing pipeline
5. **Canary PII injection methodology** — Testing framework for validating privacy guarantees
6. **DP-SGD with epsilon tuning** — Privacy mechanism (epsilon 2-8 proven effective)
7. **Aho-Corasick string matching** — Efficient multi-pattern PII scanning in outputs
### David Zagardo — Researcher/Engineer
- **Education**: M.S. Privacy Engineering, Carnegie Mellon University
- **Certification**: CIPT (Certified Information Privacy Technologist)
- **Key Research**: "Empirical Analysis of PII Leakage in LLM Training: A Case Study Using Differential Privacy"
- **GitHub**: github.com/dzagardo
- **Key Findings**:
- 19% PII leakage at injection rate 4 (non-DP training)
- 0% leakage with DP-SGD at epsilon 2.0, 4.0, 8.0
- Token frequency correlates strongly with leakage probability
- Bitcoin wallets and VINs leak most (distinctive patterns); SSNs leak least
### Company Details
- **Founded**: 2024, San Francisco, CA (Delaware incorporation, entity #6363226)
- **Employees**: ~3 (Ben, Mingze, David)
- **Pre-seed Round** (Sep 2024): Oversubscribed, est. $500K-$2M
- Lead: Forty Three ("43")
- Participants: Script Capital, Precursor Ventures, Hustle Fund, Karman Ventures
- **Seed Round**: Reportedly just closed (per interview Apr 2, 2026) — not yet public
- **Press Coverage**: Minimal — no TechCrunch, VentureBeat, etc. Self-generated content only.
- **Conference Activity**: IAPP GPS 2025, DPI 2025; Unstructured Data Meetup (Milvus integration demo)
---
## 3. Secludy's GitHub Repositories (Technical Intel)
| Repo | Type | Description | Key Tech |
|------|------|-------------|----------|
| pii-redaction | Fork (OpenPipe) | LLM-based PII detection. 20 PII types, tag/redact/replace modes | Transformers + vLLM backends |
| fast-differential-privacy | Fork (AWS Labs) | DP-SGD for PyTorch. RDP/GLW accountants | Core DP library |
| Secludy-PII-Free-Synthetic-Text-Replicas | Original | AWS Marketplace synthetic text generator | Epsilon control, S3 I/O |
| LLM_fine-tuning-leaking-PII | Original | PII leakage research experiments | Mistral-7B, LoRA, TRL |
| PII-inject-detect-tool | Original | Canary injection + Aho-Corasick detection | Docker, Apache 2.0 |
| Mask9Leak1 | Original | Data masking failure demo: BERT + regex + Qwen-2.5-7B pipeline still leaks ~10% | Jupyter/Python |
| Secludy_AutoPilot_Redacto_helm_charts | Original | K8s deployment (product name: "AutoPilot Redacto") | Helm/Smarty |
| unsloth_GPUs | Fork | LLM fine-tuning 2x faster, 70% less memory | Optimization |
| DeepSpeed | Fork | Distributed training | Scale-out |
| data-caterer | Fork | Test data management | Scala |
| supabase-mcp-server | Fork | Supabase via chat interface | Python |
---
## 4. Key Research Findings (From Blog Posts + Experiments)
### Why Data Masking Fails with LLMs
1. **Contextual re-identification**: "top scorer of the national bar exam in 2022" — identifiable even without name
2. **Pattern reconstruction**: LLMs infer masked values from surrounding context
3. **Unstructured text gaps**: PII in natural language doesn't follow fixed formats — masking tools miss it
4. **Measured leakage rates from masked data**: Driver's License 9.35%, VINs 7.32%, SSNs 6.5%, Bitcoin 5.47%
5. **Mask9Leak1 finding**: Even BERT + regex + LLM masking pipeline fails ~10% of the time
### Canary PII Injection Experiment (Zagardo/He)
- **Model**: Mistral-7B-Instruct-v0.3, fine-tuned with LoRA/PEFT (4-bit quantization)
- **Dataset**: 4,660 corporate email records (Costco-style), 19 categories
- **Canaries**: 100 unique PII per type (SSN, VIN, Driver's License, Bitcoin), at injection rates 1, 2, 4
- **Training**: Batch 4 (effective 64), LR 5e-4, 3 epochs, max seq 512, adamw_8bit
- **Generation**: vLLM, temperature 0.8, top-p 0.95, ~4,655 synthetic outputs
- **Detection**: Aho-Corasick multi-pattern matching
| Injection Rate | Non-DP Leakage | DP (epsilon=2) | DP (epsilon=4) | DP (epsilon=8) |
|:-:|:-:|:-:|:-:|:-:|
| 1 | 0% | 0% | 0% | 0% |
| 2 | 0% | 0% | 0% | 0% |
| 4 | **19% (306 instances)** | 0% | 0% | 0% |
### Broader Academic Findings Relevant to MVP Design
- **ICLR 2025**: Strict DP (epsilon=3) on text makes utility **worse than random** — this justifies our MVP's decision to use Faker-based replacement rather than DP-SGD (which requires GPU infrastructure and degrades text quality)
- **SynBench 2025**: Pre-training contamination means DP during fine-tuning doesn't protect against info memorized during pre-training — relevant because our proxy sends data to pre-trained LLMs that may already "know" things about the user's domain
- **IEEE S&P 2025**: Industry privacy metrics (similarity-based) are fundamentally broken — means we should not claim quantitative privacy guarantees, only "reduces PII exposure"
- **Secludy's Mask9Leak1**: Even a 3-layer pipeline (BERT + regex + LLM) fails ~10% — sets realistic expectations for our simpler spaCy + regex detector
---
## 5. Competitive Landscape
### Market Size
- **$2.5-9.7B by 2030** (31-42% CAGR depending on scope definition)
- Gartner predicts 75% of businesses will use generative AI for synthetic data by 2026
- 55% of organizations already invested in PETs; 36% planning to within 12-24 months
### Acquisition Wave (2022-2025)
| Company | Acquirer | Date | Amount/Context |
|---------|----------|------|---------------|
| Gretel AI | NVIDIA | Mar 2025 | $320M+ valuation, ~80 employees |
| Hazy | SAS | Nov 2024 | $11.3M raised, UCL spinout |
| YData | KPMG | Oct 2025 | $3.24M raised, Portuguese |
| Statice | Anonos | Nov 2022 | Berlin-based |
**Implication**: Synthetic data increasingly viewed as a feature for larger platforms, not a standalone category.
### Independent Competitors
| Company | Raised | Key Differentiator |
|---------|--------|-------------------|
| Tonic.ai | $45M | Broadest platform (Fabricate + Structural + **Textual**). Most direct competitor for text PII. |
| Mostly AI | $31.1M | Open-source SDK, European/GDPR-first positioning |
| Synthesized | $30.5M | UBS + Deutsche Bank backing |
| DataCebo | $8.5M | MIT-origin Synthetic Data Vault (SDV), open-source ecosystem |
### Market Failures (Cautionary Tales)
- **Datagen**: Raised $70M, shut down in 2024 despite having $20M cash reserves
- **AI.Reverie**: Acquired by Meta purely as talent acquisition (product abandoned)
- **Key insight** (Pebblous analysis): "Single-function synthetic data tools alone cannot sustain viable businesses" — survivors need multi-module platforms, deep workflow embedding, ecosystem partnerships
### Secludy's Differentiation
1. **Text-focused** (most competitors focus on tabular)
2. **Self-hosted** (data never leaves customer network)
3. **Dramatically underfunded** ($500K-$2M vs $30-65M for peers)
---
## 6. Critical Analysis (From Deep Research)
### GDPR "Out of Scope" Claim — Contested
- No DPA has definitively approved DP synthetic data as "anonymized" under GDPR
- GDPR Recital 26 requires anonymization to be irreversible — still debated for synthetic data
- Canadian Privacy Commissioner has expressed skepticism
- **Accurate framing**: DP significantly reduces risk, but regulatory classification is not settled
### Academic Challenges to Synthetic Data
1. **DP-SGD outperforms synthetic data** — arXiv:2502.12976 (2025): Direct DP training is **385x cheaper** and strictly better for privacy than synthetic data generation
2. **Privacy metrics are broken** — IEEE S&P 2025 Distinguished Paper (Ganev & De Cristofaro): ReconSyn attack reconstructs **78-100% of outlier records** from data certified "truly anonymous" by industry metrics
3. **ALL records vulnerable, not just outliers** — USENIX Security 2024 (Annamalai et al., Oxford): Linear reconstruction attack works on **arbitrary records**, not only outliers
4. **Text DP is especially hard** — ICLR 2025: Text sanitization leaves **74% of information inferable**; strict DP (epsilon=3) makes utility **worse than random** (-0.46); coherence drops 36%
5. **Canary testing has limits** — ICML 2025 (Microsoft Research, "The Canary's Echo"): Canaries designed for model attacks are "sub-optimal for privacy auditing when only synthetic data is released"
6. **Pre-training contamination** — DP during fine-tuning doesn't protect against info memorized during pre-training (SynBench 2025)
7. **New attack classes emerging** — TAMIS (cheaper than MAMA-MIA), LLM-based re-identification from DP data, ensemble attacks improving success rates
8. **Masking pipelines fail ~10%** — Secludy's own Mask9Leak1 repo shows BERT + regex + Qwen-2.5-7B replacement still leaks ~10% of entities
### Secludy's Unsubstantiated Claims
- "99.99% privacy and IP leakage proof" — no published methodology
- "PrivateTransformer Agent" — no technical documentation exists
- "Self-improving algorithms" — marketing only
- Default epsilon 8.0 — weak by academic standards (NIST says >10 may not be meaningful)
---
## 7. What We're Building (Features for Our MVP)
### MVP Privacy Model — Critical Distinction
**Our MVP does NOT provide differential privacy guarantees.** It uses deterministic PII replacement (spaCy NER + regex detection, Faker-based synthetic generation). This is fundamentally different from Secludy's DP-SGD approach.
- **Our privacy guarantee**: Detection coverage (recall of the PII detector). Any PII the detector misses passes through unmodified.
- **Secludy's privacy guarantee**: Mathematical bound via epsilon/delta differential privacy.
- **Why this is the right MVP choice**: Strict DP on text (epsilon=3) makes utility worse than random (ICLR 2025). DP-SGD requires GPU infrastructure. Faker replacement is fast, deterministic, and produces readable output.
- **What this means for claims**: We should NOT claim mathematical privacy guarantees or regulatory compliance. We claim "significantly reduces PII exposure risk." Users should evaluate compliance with their own legal teams.
### PII Detection Limitations (Known Weaknesses)
spaCy's `en_core_web_sm` model has known limitations that set realistic expectations:
- **Informal text**: Poor performance on chat messages, social media, non-standard formatting
- **Non-English names**: Lower recall on names from non-Western cultures
- **Domain-specific entities**: Medical record numbers, case IDs, and custom identifiers require additional regex patterns
- **Contextual PII**: Indirect identifiers ("top scorer of the bar exam in 2022") cannot be caught by NER or regex
- **Expected recall**: Even Secludy's 3-layer pipeline (BERT + regex + Qwen-2.5-7B) leaks ~10%. Our simpler detector will likely have **15-25% miss rate** depending on text type.
- **Mitigation**: The test suite should include recall measurement. Future improvements can add transformer-based NER or LLM-based detection.
### Product A: Dataset Scrubbing Pipeline (Replicates Secludy's core product)
**Features**:
1. PII detection using spaCy NER + regex patterns (20+ PII categories)
2. Three processing modes: Replace (Faker synthetic), Redact (remove), Tag (XML markers)
3. Session consistency — same PII always maps to same fake value
4. Multi-format support: CSV, JSON, JSONL, plain text
5. Processing report: PII found by type, counts, replacement summary
6. Side-by-side comparison view (original vs scrubbed)
**PII Categories**: Person names, emails, phone numbers, SSNs, credit cards (Luhn validation), street addresses, dates of birth, IP addresses, organizations, medical conditions, VINs, Bitcoin wallets, driver's license numbers, passwords, bank accounts, demographics, custom regex patterns
### Product B: Real-Time LLM Privacy Proxy (Our additional product idea)
**Features**:
1. Reverse proxy between users and LLM APIs (OpenAI, Anthropic)
2. Intercept outgoing messages, detect and replace PII
3. Bidirectional mapping: forward (scrub) + reverse (re-inject) passes
4. Users see seamless experience with original PII in responses
5. Companies can host internally — data never leaves network
6. Session-scoped mapping tables for consistency
**This product has a larger TAM than dataset scrubbing** — it lets any company's employees safely use ChatGPT/Claude/etc. without risk of sending sensitive data to external APIs.
### Product C: PII Leak Detection (POST-MVP — Validates both products)
> Descoped from MVP per design spec. The `/api/v1/detect/scan` endpoint provides detection-only functionality. The full canary injection + leak testing framework below is a future enhancement.
**Post-MVP Features**:
1. Canary PII injection into datasets
2. Aho-Corasick multi-pattern matching for leak detection
3. Per-category leakage statistics
4. Detection report (JSON + visual)
### Web Dashboard
**Pages**:
- Dashboard — overview stats, feature cards, system health
- Scrub — file upload, text input, mode selection, side-by-side results
- Proxy — provider config, chat interface with PII highlighting
- Results — history of all scrubbing jobs with download
### Tech Stack
| Component | Technology |
|-----------|-----------|
| Backend | Python 3.11+ / FastAPI |
| PII Detection | spaCy (en_core_web_sm) + regex |
| Synthetic Data | Faker |
| LLM Proxy | httpx (async HTTP) |
| Frontend | Next.js 15 / TypeScript / Tailwind CSS |
| Deployment | Docker / Docker Compose |
| Testing | pytest |
### API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| POST | /api/v1/scrub/text | Scrub PII from text |
| POST | /api/v1/scrub/file | Upload and scrub file |
| GET | /api/v1/scrub/result/{id} | Get scrub result |
| POST | /api/v1/proxy/chat | Proxy chat through PII scrubbing |
| POST | /api/v1/proxy/config | Configure proxy target |
| POST | /api/v1/detect/scan | Detect PII without replacing |
| GET | /api/v1/health | Health check |
---
## 8. Implementation Recommendations (From Deep Research)
1. **Adopt Secludy's canary injection methodology** as testing/validation — open-source, Apache 2.0, well-documented
2. **Build on established academic DP work**: DP-SGD (Abadi et al. CCS 2016), Yue et al. synthetic text recipe (ACL 2023), NIST SP 800-226 guidelines
3. **Target epsilon <= 4.0** for meaningful privacy guarantees in future DP implementation (Secludy defaults to 8.0 which is weak; NIST says >10 is meaningless). Not applicable to MVP which uses Faker replacement.
4. **Layer defenses**: PII detection + synthetic replacement + canary validation + runtime monitoring
5. **Account for pre-training contamination** in any privacy claims
6. **For MVP**: spaCy + regex is sufficient for PII detection (no need for BERT/LLM-based detection yet)
7. **Faker-based replacement** is appropriate for MVP — deterministic seeding ensures session consistency
### Key Academic Papers to Reference (for credibility)
| Paper | Venue | Year | Why It Matters |
|-------|-------|------|---------------|
| Abadi et al. — "Deep Learning with Differential Privacy" | CCS | 2016 | Foundational DP-SGD algorithm |
| Yue et al. — "Synthetic Text Generation with DP: A Simple Recipe" | ACL | 2023 | Practical playbook for DP text generation |
| Kurakin et al. — "Harnessing LLMs to Generate Private Synthetic Text" | Google | 2023 | Paper Ming explicitly cites in his Secludy article |
| Feyisetan et al. — "Privacy-Preserving Textual Analysis" | WSDM | 2020 | Word-level metric DP: embed → noise → decode |
| Igamberdiev & Habernal — "DP-BART" | ACL | 2023 | Document-level text rewriting under local DP |
| Carvalho et al. — "TEM: Truncated Exponential Mechanism" | SDM | 2023 | Higher-utility word perturbation |
| NIST SP 800-226 | NIST | 2024 | Official guidelines for evaluating DP guarantees |
| Ganev & De Cristofaro — "Inadequacy of Similarity-based Privacy Metrics" | IEEE S&P | 2025 | Why industry privacy metrics are broken |
---
## Sources
### Deep Research Reports (Full versions in ~/Documents/Research/)
- Secludy Company Analysis: ~/Documents/Research/Secludy_Company_20260404_49669314/ (406 lines, 38+ sources)
- Mingze He Research: ~/Documents/Research/Mingze_He_Research_20260404_BFD26A44/ (459 lines, 47 sources)
### Primary Sources
- secludy.com (full website crawl via Chrome extension)
- medium.com/secludy (4 blog posts)
- github.com/Secludy (10 repositories analyzed)
- LinkedIn: Ben Cerchio, Mingze He (profiles + activity feeds)
- Iowa State BCB: bcb.iastate.edu/people/mingze-he
- Google Scholar searches (confirmed disambiguation of 3 different "Mingze He" researchers)
- AWS Marketplace product listings
- Crunchbase, Tracxn, PitchBook company profiles
- David Zagardo's Medium articles and GitHub
Complete feature support matrix and compliance details for rrule_plpgsql.
A consistent policy & compliance layer ensures platform guardrails are **predictable, observable, progressive, and reversible**. This document outlines how to use **Kyverno** (cluster runtime admission / mutation / validation) and **Checkov** (CI Infrastructure-as-Code scanning) under the same GitOps promotion model (App‑of‑Apps) to prevent last‑minute surprises.
**Document versie**: 1.3
title: "Specification"