Loading...
Loading...
Loading...
---
title: Security and IAM for GenAI
date: 2026-02-18
tags:
- aws
- genai
- iam
- security
- encryption
- data-masking
- least-privilege
- certification
- API-C01
---
# Security and IAM for GenAI
**Related Notes:** [[Amazon S3]], [[Amazon Bedrock]], [[Amazon Bedrock Guardrails]], [[Responsible AI and Enterprise Integration]], [[AWS Lake Formation]], [[AWS Glue and Data Processing Services]], [[AWS GenAI Developer - First Class Notes]]
## The Big Picture
Security is the foundation of every production GenAI application. Before granting [[Amazon Bedrock]] access to data in [[Amazon S3]] or connecting a [[RAG Architecture|Knowledge Base]], you need to **control who can access what, protect sensitive data in transit and at rest, and ensure credentials and PII are never exposed**. AWS IAM, data masking, and cryptographic techniques form the **security layer** that wraps around your entire GenAI pipeline.
```mermaid
graph LR
subgraph "Identity Layer"
ROOT[Root Account<br/>Do NOT Use]
USERS[IAM Users]
GROUPS[IAM Groups]
ROLES[IAM Roles]
end
subgraph "Policy Layer"
POL[IAM Policies<br/>Least Privilege]
AA[IAM Access<br/>Analyzer]
end
subgraph "Data Protection Layer"
MASK[Data Masking<br/>Glue DataBrew / Redshift]
ANON[Anonymization<br/>Encrypt / Hash / Shuffle]
SALT[Key Salting<br/>Rainbow Table Defense]
end
USERS --> GROUPS
GROUPS --> POL
ROLES --> POL
AA --> POL
POL --> MASK
POL --> ANON
MASK --> SALT
ANON --> SALT
```
---
## Principle of Least Privilege
Grant **only the permissions required** for the task at hand -- nothing more. This is the single most important security principle in AWS.
### The Approach
1. **Start broad while developing** -- give yourself enough permissions to move fast during prototyping
2. **Lock down once services and operations are known** -- tighten policies to only what's actually used
3. Use ==IAM Access Analyzer== to **generate least-privilege policies** based on observed access activity
> [!tip] IAM Access Analyzer
> IAM Access Analyzer watches what actions your roles and users actually perform, then generates a **least-privilege policy** that grants only those actions. This removes the guesswork from writing tight policies.
### Example: Least-Privilege S3 Policy
A well-scoped policy targets a **specific bucket**, a **specific prefix**, and **specific actions only**:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-genai-bucket/training-data/*"
}
]
}
```
> [!warning] Exam Insight
> If a question asks how to **determine the minimum permissions** needed for a service, the answer is ==IAM Access Analyzer==. It generates policies from actual access activity -- not guesswork.
---
## Data Masking and Anonymization
When dealing with **PII or sensitive data**, you must protect it before it enters your GenAI pipeline. There are two main strategies: **masking** and **anonymization**.
### Data Masking
Masking ==obfuscates data while preserving its format== -- the data looks real but is not reversible.
| Example | Original | Masked |
|---------|----------|--------|
| **Credit card** | 4111-2233-4455-6677 | XXXX-XXXX-XXXX-6677 (last 4 digits only) |
| **Password** | MyS3cretP@ss | ************ |
| **SSN** | 123-45-6789 | XXX-XX-6789 |
**Supported AWS Services for Masking:**
| Service | How It Helps |
|---------|-------------|
| **AWS Glue DataBrew** | Built-in PII detection and masking transformations in data prep pipelines |
| **Amazon Redshift** | SQL-level masking policies applied at query time |
#### Redshift Masking Policy Example
```sql
CREATE MASKING POLICY mask_credit_card
WITH (credit_card VARCHAR)
USING ('XXXX-XXXX-XXXX-' || RIGHT(credit_card, 4));
```
> [!note] Masking vs. Anonymization
> **Masking** hides parts of data (you can still see the format). **Anonymization** transforms or removes data entirely so the original cannot be recovered.
### Anonymization Techniques
```mermaid
graph TD
PII[PII / Sensitive Data] --> REPLACE[Replace with<br/>Random Values]
PII --> SHUFFLE[Shuffle<br/>Within Column]
PII --> ENCRYPT[Encrypt<br/>Deterministic or<br/>Probabilistic]
PII --> HASH[Hash<br/>One-Way Transform]
PII --> DELETE[Delete / Don't<br/>Import at All]
ENCRYPT --> DET[Deterministic:<br/>Same input → Same output<br/>Allows equality queries]
ENCRYPT --> PROB[Probabilistic:<br/>Same input → Different output<br/>Stronger privacy]
```
| Technique | How It Works | Trade-off |
|-----------|-------------|-----------|
| **Replace with random** | Swap real values with fake but realistic ones | Preserves data distribution, loses real data |
| **Shuffle** | Randomly reorder values within a column | Preserves distribution, breaks row-level correlation |
| **Encrypt (deterministic)** | Same input always produces same ciphertext | Allows equality queries, less private |
| **Encrypt (probabilistic)** | Same input produces different ciphertext each time | Stronger privacy, no equality queries |
| **Hashing** | One-way transformation (e.g., SHA-256) | Irreversible, but vulnerable to rainbow tables without salting |
| **Delete / don't import** | Remove sensitive fields entirely | Simplest approach -- ==if you don't need it, don't store it== |
> [!important] Exam Tip
> Know the difference between **deterministic** and **probabilistic** encryption:
> - ==Deterministic==: same plaintext always produces the same ciphertext -- allows `WHERE email = encrypted_value` queries
> - ==Probabilistic==: same plaintext produces different ciphertext each time -- stronger privacy but no equality matching
---
## Key Salting
Salting is the practice of ==appending or prepending a random value ("salt") to data before hashing==. It defends against **pre-computed rainbow table attacks**.
### The Problem Without Salting
Without a salt, the **same input always produces the same hash**. An attacker with a pre-computed table of common passwords and their hashes can instantly reverse the hash.
### How Salting Works
```mermaid
graph LR
PASS[Password:<br/>'MyPassword'] --> CONCAT["Concatenate:<br/>'MyPassword' + Salt"]
SALT[Unique Salt:<br/>'x9k2mQ'] --> CONCAT
CONCAT --> HASH["SHA-256 Hash"]
HASH --> STORED["Stored Hash:<br/>a7f3b2c1d4..."]
```
Because each user has a **unique salt**, even identical passwords produce **completely different hashes**.
### Example: Salting in Practice
| Username | Salt Value | String to Hash | SHA-256 Hashed Value |
|----------|-----------|----------------|---------------------|
| alice | `x9k2mQ` | `MyPasswordx9k2mQ` | `a7f3b2c1d4e5f6...` |
| bob | `p3nR7z` | `MyPasswordp3nR7z` | `8b1c4d2e9f0a3b...` |
| carol | `w5tL1j` | `Secret123w5tL1j` | `c3d2e1f4a5b6c7...` |
Alice and Bob use the **same password**, but their hashes are completely different because of unique salts.
### Best Practices for Key Salting
| Practice | Why It Matters |
|----------|---------------|
| **Use cryptographically secure random values** | Predictable salts defeat the purpose |
| **Rotate salts periodically** | Limits exposure window if a salt is compromised |
| **Each user gets a unique salt** | Prevents identical passwords from producing identical hashes |
| **Salt and hash before storing** | Never store plaintext -- always salt + hash first |
> [!warning] Exam Insight
> If a question mentions defending against **rainbow table attacks** or ensuring that **identical inputs don't produce identical hashes**, the answer is ==key salting==. Remember: salt is random, unique per user, and applied before hashing.
---
## IAM Users and Groups
IAM (Identity and Access Management) is a ==global service== -- it is not region-specific. It controls **who can access what** across your entire AWS account.
### Core Concepts
```mermaid
graph TD
ROOT[Root Account<br/>Created by default<br/>NEVER use or share] --> IAM[IAM Service<br/>Global Scope]
IAM --> U1[User: Alice]
IAM --> U2[User: Bob]
IAM --> U3[User: Carol]
IAM --> U4[User: Dave]
subgraph "Groups"
G1[Developers Group]
G2[Data Scientists Group]
end
U1 --> G1
U2 --> G1
U2 --> G2
U3 --> G2
U4 -.->|No group<br/>allowed but<br/>not best practice| NONE[ ]
G1 --> P1[Policy: S3 + Lambda Access]
G2 --> P2[Policy: Bedrock + SageMaker Access]
style ROOT fill:#ff6b6b,color:#fff
style NONE fill:transparent,stroke:none
```
### Key Rules
| Rule | Detail |
|------|--------|
| **Root account** | Created by default -- ==never use it, never share it== |
| **Users** | Represent people in your organization, each with their own credentials |
| **Groups** | Contain **only users** -- groups cannot contain other groups |
| **Multiple groups** | A user can belong to **multiple groups** (e.g., Bob is in both Developers and Data Scientists) |
| **No group** | A user can belong to **no group** -- allowed but not a best practice |
| **Global** | IAM is a global service -- users and groups are not region-specific |
> [!tip] Best Practice
> Always assign permissions through **groups**, not directly to individual users. This makes it far easier to manage permissions at scale -- add a user to the "Bedrock Developers" group rather than attaching 15 individual policies.
> [!danger] Root Account Warning
> The root account has **unrestricted access** to everything. ==Never use it for day-to-day operations.== Create IAM users for all work, even administrative tasks. Enable MFA on the root account and lock it away.
---
## Summary: Key Takeaways for the Exam
> [!success] What to Remember
> 1. **Principle of Least Privilege** -- start broad during development, lock down for production. Use ==IAM Access Analyzer== to generate least-privilege policies from actual access activity
> 2. **Data Masking** obfuscates data while preserving format -- supported in **Glue DataBrew** and **Redshift** (CREATE MASKING POLICY)
> 3. **Anonymization techniques**: replace with random, shuffle, encrypt (deterministic vs probabilistic), hash, or delete entirely
> 4. **Deterministic encryption** allows equality queries; **probabilistic encryption** is stronger but prevents equality matching
> 5. **Key Salting** defends against rainbow table attacks -- append a unique random salt per user before hashing
> 6. **IAM is global** -- users represent people, groups contain only users (not other groups), users can belong to multiple groups
> 7. **Never use or share the root account** -- create IAM users for all operations
> 8. ==If you don't need sensitive data, don't store it== -- the simplest and most secure approach
---
## Related Notes
- [[Amazon S3]] -- Bucket policies, encryption at rest, least-privilege access
- [[Amazon Bedrock]] -- IAM roles for model invocation, cross-account access
- [[Amazon Bedrock Guardrails]] -- PII detection and redaction at the model layer
- [[Responsible AI and Enterprise Integration]] -- Cross-account IAM, Well-Architected security pillar
- [[AWS Lake Formation]] -- Fine-grained data access control for data lakes
- [[AWS Glue and Data Processing Services]] -- Glue DataBrew masking, data catalog security
- [[AWS GenAI Developer - First Class Notes]] -- Course overview and service map
---
<img src="https://gfassets.fra1.cdn.digitaloceanspaces.com/logo/logo-mono.png" /><br /><br />
[](https://www.python.org/downloads/)
**AI Penetration Testing Framework: Scoping, CVE/CWE Mapping, and Threat Correlation**
<img src="assets/GraphBit_Final_GB_Github_GIF.gif" style="max-width: 600px; height: auto;" alt="Logo" />