# Farm Auth Service - Architecture Documentation

## 1. High-Level Overview

The Farm Auth Service is a Node.js + Express authentication and security service that provides phone-based authentication using OTP (One-Time Password) via SMS, JWT-based access and refresh tokens, comprehensive rate limiting, security hardening, and audit logging. The service is designed for a mobile application ecosystem where users authenticate using their phone numbers.

**Core Functionality:**
- Phone number-based authentication with OTP verification via SMS (Twilio)
- JWT access tokens (short-lived) and refresh tokens (long-lived) with rotation
- Device tracking and multi-device session management
- Comprehensive rate limiting at multiple levels (phone, IP, user)
- Security hardening: CORS validation, security headers, field-level encryption, timing attack protection, enumeration detection
- Audit logging with risk scoring and webhook alerting
- Admin dashboard for security event monitoring

**External Systems:**
- **PostgreSQL Database**: Stores users, OTP codes, refresh tokens, devices, and audit logs
- **Redis** (optional): Used for rate limiting counters and OTP tracking (falls back to in-memory store)
- **Twilio**: SMS provider for OTP delivery (optional - service works without it for development)
- **Webhook Endpoints**: For security alerts (Slack, Discord, or custom webhooks)

---

## 2. Architecture & Components

### 2.1 HTTP/API Layer

**Files:**
- `src/index.js` - Express server setup and middleware configuration
- `src/routes/authRoutes.js` - Authentication endpoints
- `src/routes/userRoutes.js` - User profile and device management endpoints
- `src/routes/adminRoutes.js` - Admin security dashboard endpoints

**Responsibilities:**
- Request routing and middleware orchestration
- Input validation and sanitization
- Response formatting
- Error handling

**Middleware Order (Critical):**
1. Trust proxy configuration (if behind reverse proxy)
2. CORS validation (startup and runtime)
3. JSON body parser
4. Security headers (global)
5. Route-specific middleware (validation, rate limiting, auth)

**Key Configuration:**
- `TRUST_PROXY`: Set to `'true'` if behind reverse proxy (nginx, load balancer)
- `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed origins (required in production)
- `ENABLE_ADMIN_DASHBOARD`: Set to `'true'` to enable admin routes

### 2.2 Authentication Core

**Files:**
- `src/services/otpService.js` - OTP generation, hashing (bcrypt), storage, and verification
- `src/services/tokenService.js` - JWT access/refresh token issuance, rotation, and validation
- `src/services/jwtKeys.js` - JWT key management with rotation support
- `src/middleware/authMiddleware.js` - JWT access token validation
- `src/middleware/stepUpAuth.js` - Step-up authentication for sensitive operations

**Responsibilities:**
- OTP generation (6-digit random codes)
- OTP hashing with bcrypt (10 rounds)
- OTP storage in database with expiry and attempt tracking
- JWT token signing with key rotation support
- Refresh token rotation and reuse detection
- Device fingerprinting and tracking

**Key Features:**
- **OTP Security**: Hashed with bcrypt, constant-time verification to prevent timing attacks
- **Token Rotation**: Refresh tokens rotate on each use, old tokens are revoked
- **Reuse Detection**: Detects if a refresh token is reused (theft indicator)
- **Step-Up Auth**: Requires recent OTP verification for sensitive operations

### 2.3 Security Layer

**Files:**
- `src/middleware/rateLimitMiddleware.js` - OTP request/verification rate limiting
- `src/middleware/userRateLimit.js` - User route rate limiting (read/write/sensitive)
- `src/middleware/adminRateLimit.js` - Admin route rate limiting
- `src/middleware/securityHeaders.js` - Security headers (CSP, HSTS, X-Frame-Options, etc.)
- `src/utils/corsValidator.js` - CORS configuration validation
- `src/utils/timingProtection.js` - Timing attack protection for OTP flows
- `src/utils/enumerationDetection.js` - Phone number enumeration detection
- `src/services/riskScoring.js` - Risk scoring for login/refresh attempts
- `src/middleware/validation.js` - Input validation middleware

**Responsibilities:**
- Rate limiting at multiple levels (phone, IP, user, admin)
- Security headers enforcement
- CORS origin validation (startup and runtime)
- Timing attack mitigation (constant-time OTP verification)
- Enumeration detection and IP blocking
- Risk scoring based on IP/device changes
- Input validation and sanitization

**Key Features:**
- **Multi-Level Rate Limiting**: Phone-based, IP-based, and user-based limits
- **Enumeration Protection**: Detects and blocks IPs attempting phone number enumeration
- **Timing Attack Protection**: All OTP operations use constant-time execution
- **Risk Scoring**: Calculates risk scores for suspicious login/refresh attempts

### 2.4 Persistence Layer

**Files:**
- `src/db.js` - PostgreSQL connection pool and query wrapper
- `src/middleware/dbAccessLogger.js` - Optional database access logging
- `src/utils/fieldEncryption.js` - Field-level encryption for PII (phone numbers)
- `src/utils/encryptedPhoneSearch.js` - Phone number search with encryption support

**Database Tables:**
- `users` - User accounts (phone number, name, role, user_type)
- `otp_codes` - OTP codes (hashed, with expiry and attempt tracking)
- `refresh_tokens` - Refresh tokens (hashed, with rotation tracking)
- `user_devices` - Device tracking (platform, model, OS, app version)
- `auth_audit` - Security audit logs (all authentication events)

**Responsibilities:**
- Database connection management
- Query execution with optional logging
- Field-level encryption for sensitive data (phone numbers)
- Database schema management (auto-creates tables if missing)

**Key Features:**
- **Field-Level Encryption**: Phone numbers encrypted at rest (AES-256-GCM)
- **Database Access Logging**: Optional logging of all DB queries (for security auditing)
- **Backward Compatibility**: Handles both encrypted and plaintext phone numbers during migration

### 2.5 Integration Layer

**Files:**
- `src/services/smsService.js` - Twilio SMS integration
- `src/services/auditLogger.js` - Audit logging with webhook alerting
- `src/services/redisClient.js` - Redis client with graceful fallback

**Responsibilities:**
- SMS delivery via Twilio (with fallback logging)
- Security event logging to database
- Webhook alerting for high-risk events
- Redis connection management (optional, falls back to in-memory)

**Key Features:**
- **Twilio Integration**: Sends OTP via SMS (optional - works without for development)
- **Webhook Alerting**: Sends alerts to Slack/Discord/custom webhooks for SUSPICIOUS/HIGH_RISK events
- **Redis Fallback**: Gracefully falls back to in-memory store if Redis unavailable

---

## 3. Request Flows

### 3.1 OTP Login Flow

**Step-by-Step:**

1. **Client requests OTP** (`POST /auth/request-otp`)
   - Input validation (phone number format)
   - Check for active OTP (2-minute no-resend rule)
   - Rate limit by phone number (3 per 10 min, 10 per day)
   - Rate limit by IP address (20 per 10 min, 100 per day)
   - Check if IP is blocked (enumeration or CIDR ranges)
   - Enumeration detection (if suspicious, apply stricter limits)
   - Timing protection wrapper (constant-time execution)
   - Normalize phone number (E.164 format)
   - Generate 6-digit OTP code
   - Hash OTP with bcrypt (10 rounds)
   - Encrypt phone number (if encryption enabled)
   - Store OTP in database (delete old OTPs for same phone)
   - Mark OTP as active in Redis/memory (2-minute TTL)
   - Send SMS via Twilio (or log to console if not configured)
   - Log audit event (otp_request, INFO risk level)
   - Return success (even if SMS fails - OTP is generated)

2. **Client verifies OTP** (`POST /auth/verify-otp`)
   - Input validation (phone number, 6-digit code, device_id, device_info)
   - Rate limit failed verifications (10 per hour per phone)
   - Check if IP is blocked
   - Timing protection wrapper (constant-time execution)
   - Normalize phone number
   - Encrypt phone number for search
   - Query OTP from database (with constant-time dummy hash if not found)
   - Check expiry, max attempts, and verify code (all with constant-time bcrypt.compare)
   - If invalid: increment attempt count, log suspicious event, return generic error
   - If valid: delete OTP, find or create user, decrypt phone number
   - Update user last_login_at
   - Upsert device record (track platform, model, OS, app version)
   - Calculate risk score (IP change, device change, user agent change)
   - Log audit event (login, risk level based on score)
   - Check for anomalies (multiple failed attempts, high-risk IPs)
   - Issue access token (with high_assurance flag) and refresh token
   - Return user data, tokens, and device info

**Mermaid Sequence Diagram:**

```mermaid
sequenceDiagram
    participant Client
    participant API
    participant RateLimiter
    participant OTPService
    participant DB
    participant Twilio
    participant AuditLogger

    Client->>API: POST /auth/request-otp<br/>{phone_number}
    API->>API: Validate input
    API->>RateLimiter: Check active OTP (2-min rule)
    RateLimiter-->>API: No active OTP
    API->>RateLimiter: Rate limit by phone (3/10min)
    RateLimiter-->>API: Allowed
    API->>RateLimiter: Rate limit by IP (20/10min)
    RateLimiter-->>API: Allowed
    API->>API: Check IP blocking
    API->>OTPService: Generate OTP
    OTPService->>DB: Store hashed OTP
    OTPService->>RateLimiter: Mark active (2-min TTL)
    API->>Twilio: Send SMS
    Twilio-->>API: SMS sent (or error)
    API->>AuditLogger: Log otp_request event
    API-->>Client: {ok: true}

    Client->>API: POST /auth/verify-otp<br/>{phone_number, code, device_id}
    API->>API: Validate input
    API->>RateLimiter: Check failed attempts (10/hour)
    RateLimiter-->>API: Allowed
    API->>OTPService: Verify OTP (constant-time)
    OTPService->>DB: Query OTP (with dummy hash if not found)
    OTPService->>OTPService: bcrypt.compare (constant-time)
    alt OTP Valid
        OTPService->>DB: Delete OTP
        API->>DB: Find or create user
        API->>DB: Upsert device
        API->>API: Calculate risk score
        API->>AuditLogger: Log login (with risk level)
        API->>API: Issue access + refresh tokens
        API-->>Client: {user, access_token, refresh_token}
    else OTP Invalid
        OTPService->>DB: Increment attempt count
        API->>AuditLogger: Log suspicious attempt
        API-->>Client: {error: "OTP invalid or expired"}
    end
```

### 3.2 Token Refresh Flow

**Step-by-Step:**

1. **Client requests token refresh** (`POST /auth/refresh`)
   - Input validation (refresh_token)
   - Check if IP is blocked
   - Decode refresh token to get key ID
   - Verify refresh token signature (try all keys if key ID not found)
   - Validate JWT claims (iss, aud, exp, iat)
   - Query refresh token from database (by token_id)
   - Verify token hash matches (bcrypt.compare)
   - Check if token is revoked or expired
   - Check refresh token idle timeout (max idle minutes)
   - Calculate risk score (IP change, device change, user agent change)
   - If suspicious: log suspicious refresh event
   - If suspicious and REQUIRE_OTP_ON_SUSPICIOUS_REFRESH: return step_up_required error
   - Update token last_used_at
   - Revoke old refresh token
   - Issue new access token and new refresh token (rotation)
   - Update device last_seen_at
   - Log audit event (token_refresh, risk level based on score)
   - Return new tokens

**Mermaid Sequence Diagram:**

```mermaid
sequenceDiagram
    participant Client
    participant API
    participant TokenService
    participant JWTKeys
    participant DB
    participant RiskScoring
    participant AuditLogger

    Client->>API: POST /auth/refresh<br/>{refresh_token}
    API->>API: Validate input
    API->>API: Check IP blocking
    API->>TokenService: Verify refresh token
    TokenService->>JWTKeys: Get key secret (by key ID)
    JWTKeys-->>TokenService: Key secret
    TokenService->>TokenService: Verify JWT signature
    TokenService->>TokenService: Validate claims (iss, aud, exp)
    TokenService->>DB: Query refresh token (by token_id)
    DB-->>TokenService: Token record
    TokenService->>TokenService: Verify token hash (bcrypt)
    alt Token Valid
        TokenService->>TokenService: Check expiry & idle timeout
        API->>RiskScoring: Calculate risk score
        RiskScoring->>DB: Get previous auth info
        RiskScoring-->>API: Risk score & reasons
        alt Suspicious Refresh
            API->>AuditLogger: Log suspicious refresh
            alt Require OTP
                API-->>Client: {error: "step_up_required"}
            else Allow with Risk
                API->>TokenService: Rotate refresh token
                TokenService->>DB: Revoke old token
                TokenService->>DB: Store new token
                API->>AuditLogger: Log refresh (SUSPICIOUS/HIGH_RISK)
                API-->>Client: {access_token, refresh_token}
            end
        else Normal Refresh
            API->>TokenService: Rotate refresh token
            TokenService->>DB: Revoke old token
            TokenService->>DB: Store new token
            API->>DB: Update device last_seen_at
            API->>AuditLogger: Log refresh (INFO)
            API-->>Client: {access_token, refresh_token}
        end
    else Token Invalid
        API-->>Client: {error: "Invalid refresh token"}
    end
```

### 3.3 Logout Flow

**Step-by-Step:**

1. **Single-device logout** (`POST /auth/logout`)
   - Input validation (refresh_token)
   - Verify refresh token (same as refresh flow)
   - If token invalid/already revoked: return success (idempotent)
   - Revoke all refresh tokens for user + device
   - Log audit event (logout, INFO)
   - Return success

2. **Logout all other devices** (`POST /users/me/logout-all-other-devices`)
   - Requires authentication (access token)
   - Requires step-up auth (recent OTP or high_assurance token)
   - Rate limited (10 per hour per user)
   - Get current device_id from header or body
   - Mark all other devices as inactive
   - Revoke refresh tokens for all other devices
   - Log audit event (logout_all_other_devices, INFO)
   - Return count of revoked devices

3. **Logout from all devices** (`POST /users/me/logout-all-devices`)
   - Requires authentication (access token)
   - Requires step-up auth (recent OTP or high_assurance token)
   - Rate limited (10 per hour per user)
   - Revoke all refresh tokens for the user (all devices)
   - Mark all devices as inactive
   - Increment user's `token_version` to invalidate all existing access tokens
   - Log audit event (logout_all_devices, HIGH_RISK) - triggers security alert
   - Return success with revoked tokens count
   - **Security Note**: This is a critical security operation used when account compromise is suspected. All existing access tokens become invalid immediately, even if they haven't expired yet.

4. **Revoke specific device** (`DELETE /users/me/devices/:device_id`)
   - Requires authentication (access token)
   - Requires step-up auth (recent OTP or high_assurance token)
   - Rate limited (10 per hour per user)
   - Validate device_id parameter
   - Mark device as inactive
   - Revoke refresh tokens for device
   - Log audit event (device_revoked, INFO)
   - Return success

**Mermaid Sequence Diagram:**

```mermaid
sequenceDiagram
    participant Client
    participant API
    participant TokenService
    participant DB
    participant AuditLogger

    Note over Client,AuditLogger: Single Device Logout
    Client->>API: POST /auth/logout<br/>{refresh_token}
    API->>TokenService: Verify refresh token
    TokenService-->>API: Token info
    API->>TokenService: Revoke refresh token
    TokenService->>DB: Mark token revoked
    API->>AuditLogger: Log logout event
    API-->>Client: {ok: true}

    Note over Client,AuditLogger: Logout All Other Devices
    Client->>API: POST /users/me/logout-all-other-devices<br/>{current_device_id}
    API->>API: Verify access token
    API->>API: Check step-up auth
    API->>API: Rate limit check (10/hour)
    API->>DB: Mark other devices inactive
    API->>TokenService: Revoke tokens for other devices
    TokenService->>DB: Revoke tokens
    API->>AuditLogger: Log logout_all_other_devices
    API-->>Client: {ok: true, revoked_devices_count: N}

    Note over Client,AuditLogger: Logout All Devices (Global Logout)
    Client->>API: POST /users/me/logout-all-devices
    API->>API: Verify access token
    API->>API: Check step-up auth
    API->>API: Rate limit check (10/hour)
    API->>TokenService: Revoke all user tokens
    TokenService->>DB: Revoke all refresh tokens
    TokenService->>DB: Mark all devices inactive
    TokenService->>DB: Increment token_version
    API->>AuditLogger: Log logout_all_devices (HIGH_RISK)
    AuditLogger->>AuditLogger: Trigger security alert
    API-->>Client: {ok: true, revoked_tokens_count: N}
```

### 3.4 Admin Security Events Flow

**Step-by-Step:**

1. **Admin requests security events** (`GET /admin/security-events`)
   - Requires authentication (access token)
   - Requires admin role (security_admin)
   - Rate limited (100 per 15 minutes per admin)
   - Validate and sanitize query parameters (risk_level, limit, offset, search)
   - Build parameterized SQL query (prevent injection)
   - Query auth_audit table with filters
   - Mask phone numbers (keep last 4 digits)
   - Sanitize all output fields
   - Get total count for pagination
   - Get statistics (last 24 hours: total, high_risk, suspicious, info)
   - Log admin access event (admin_view_security_events, INFO)
   - Return events, pagination info, and statistics

**Mermaid Sequence Diagram:**

```mermaid
sequenceDiagram
    participant Admin
    participant API
    participant AuthMiddleware
    participant AdminAuth
    participant AdminRateLimit
    participant DB
    participant AuditLogger

    Admin->>API: GET /admin/security-events<br/>?risk_level=HIGH_RISK&limit=200
    API->>AuthMiddleware: Verify access token
    AuthMiddleware-->>API: User info
    API->>AdminAuth: Check admin role
    AdminAuth-->>API: Authorized
    API->>AdminRateLimit: Check rate limit (100/15min)
    AdminRateLimit-->>API: Allowed
    API->>API: Sanitize query params
    API->>DB: Query auth_audit (parameterized)
    DB-->>API: Events data
    API->>API: Mask phone numbers
    API->>API: Sanitize output
    API->>DB: Get total count
    API->>DB: Get statistics (24h)
    API->>AuditLogger: Log admin access
    API-->>Admin: {events, pagination, stats}
```

---

## 4. Timeouts, Expiry, and Limits

| Name | ENV Variable / Config | Default Value | Defined In | What It Affects |
|------|----------------------|---------------|------------|-----------------|
| **OTP Expiry** | `OTP_TTL_SECONDS` | `120` (2 minutes) | `src/services/otpService.js:10` | OTP validity period |
| **OTP Resend Throttle** | (hardcoded) | `120` seconds | `src/middleware/rateLimitMiddleware.js:154` | Minimum time between OTP requests for same phone |
| **Max OTP Verification Attempts** | `OTP_VERIFY_MAX_ATTEMPTS` | `5` | `src/services/otpService.js:12` | Maximum attempts to verify an OTP before it's invalidated |
| **JWT Access Token Expiry** | `JWT_ACCESS_TTL` | `'15m'` (15 minutes) | `src/config.js:72` | Access token lifetime |
| **JWT Refresh Token Expiry** | `JWT_REFRESH_TTL` | `'7d'` (7 days) | `src/config.js:73` | Refresh token lifetime |
| **Refresh Token Max Idle** | `REFRESH_MAX_IDLE_MINUTES` | `4320` (3 days) | `src/config.js:58-60` | Maximum idle time before refresh token expires |
| **Step-Up Auth Window** | `STEP_UP_OTP_WINDOW_MINUTES` | `5` minutes | `src/middleware/stepUpAuth.js:26` | Time window for "recent" OTP verification for step-up auth |
| **OTP Request - Phone (10 min)** | `OTP_REQ_PHONE_10MIN_LIMIT` | `3` | `src/middleware/rateLimitMiddleware.js:24` | Max OTP requests per phone per 10 minutes |
| **OTP Request - Phone (24h)** | `OTP_REQ_PHONE_DAY_LIMIT` | `10` | `src/middleware/rateLimitMiddleware.js:25` | Max OTP requests per phone per 24 hours |
| **OTP Request - IP (10 min)** | `OTP_REQ_IP_10MIN_LIMIT` | `20` | `src/middleware/rateLimitMiddleware.js:26` | Max OTP requests per IP per 10 minutes |
| **OTP Request - IP (24h)** | `OTP_REQ_IP_DAY_LIMIT` | `100` | `src/middleware/rateLimitMiddleware.js:27` | Max OTP requests per IP per 24 hours |
| **OTP Verify Failed (1h)** | `OTP_VERIFY_FAILED_PER_HOUR_LIMIT` | `10` | `src/middleware/rateLimitMiddleware.js:31` | Max failed verification attempts per phone per hour |
| **Enumeration IP Block Duration** | `ENUMERATION_BLOCK_DURATION` | `3600` (1 hour) | `src/middleware/rateLimitMiddleware.js:40` | Duration IP is blocked after enumeration detection |
| **User Rate Limit - Read** | `USER_RATE_LIMIT_READ_MAX` | `100` | `src/middleware/userRateLimit.js:25` | Max read requests per user per 15 minutes |
| **User Rate Limit - Read Window** | `USER_RATE_LIMIT_READ_WINDOW` | `900` (15 min) | `src/middleware/userRateLimit.js:26` | Time window for read rate limit |
| **User Rate Limit - Write** | `USER_RATE_LIMIT_WRITE_MAX` | `20` | `src/middleware/userRateLimit.js:29` | Max write requests per user per 15 minutes |
| **User Rate Limit - Write Window** | `USER_RATE_LIMIT_WRITE_WINDOW` | `900` (15 min) | `src/middleware/userRateLimit.js:30` | Time window for write rate limit |
| **User Rate Limit - Sensitive** | `USER_RATE_LIMIT_SENSITIVE_MAX` | `10` | `src/middleware/userRateLimit.js:33` | Max sensitive requests per user per hour |
| **User Rate Limit - Sensitive Window** | `USER_RATE_LIMIT_SENSITIVE_WINDOW` | `3600` (1 hour) | `src/middleware/userRateLimit.js:34` | Time window for sensitive rate limit |
| **Admin Rate Limit** | `ADMIN_RATE_LIMIT_MAX` | `100` | `src/middleware/adminRateLimit.js:23` | Max admin requests per admin per 15 minutes |
| **Admin Rate Limit Window** | `ADMIN_RATE_LIMIT_WINDOW` | `900` (15 min) | `src/middleware/adminRateLimit.js:24` | Time window for admin rate limit |
| **Twilio HTTP Timeout** | (hardcoded) | `5000` ms | `src/services/auditLogger.js:459` | Webhook request timeout (also used for Twilio if configured) |
| **Webhook Retry Delay** | (hardcoded) | `3000` ms | `src/services/auditLogger.js:498` | Delay before retrying failed webhook alerts |
| **OTP Request Min Delay** | `OTP_REQUEST_MIN_DELAY` | `500` ms | `src/utils/timingProtection.js:26` | Minimum delay for OTP requests (timing attack protection) |
| **OTP Verify Min Delay** | `OTP_VERIFY_MIN_DELAY` | `300` ms | `src/utils/timingProtection.js:30` | Minimum delay for OTP verification (timing attack protection) |
| **Timing Max Jitter** | `TIMING_MAX_JITTER` | `100` ms | `src/utils/timingProtection.js:34` | Maximum random jitter added to delays |
| **Enumeration Max Phones/IP (10min)** | `ENUMERATION_MAX_PHONES_PER_IP_10MIN` | `5` | `src/utils/enumerationDetection.js:32` | Max unique phone numbers per IP in 10 minutes |
| **Enumeration Max Phones/IP (1h)** | `ENUMERATION_MAX_PHONES_PER_IP_HOUR` | `20` | `src/utils/enumerationDetection.js:33` | Max unique phone numbers per IP in 1 hour |
| **Enumeration Alert Threshold (10min)** | `ENUMERATION_ALERT_THRESHOLD_10MIN` | `10` | `src/utils/enumerationDetection.js:40` | Unique phones threshold for alert (10 min) |
| **Enumeration Alert Threshold (1h)** | `ENUMERATION_ALERT_THRESHOLD_HOUR` | `50` | `src/utils/enumerationDetection.js:41` | Unique phones threshold for alert (1 hour) |

---

## 5. Security Features

### 5.1 CORS Behavior

**Configuration:**
- **Startup Validation**: CORS configuration is validated at startup (`src/index.js:29-34`)
- **Runtime Monitoring**: Runtime CORS checks log warnings for suspicious patterns (`src/index.js:58-63`)
- **Origin Whitelisting**: Only explicitly configured origins are allowed (never wildcard `*` when credentials are involved)
- **No-Origin Requests**: Requests without origin (mobile apps, Postman) are allowed

**Implementation:**
- `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed origins (required in production)
- Development mode: Allows all origins if no origins configured (with warning)
- Production mode: Throws error if `CORS_ALLOWED_ORIGINS` is empty

**Files:**
- `src/index.js:36-86` - CORS middleware configuration
- `src/utils/corsValidator.js` - CORS validation utilities

### 5.2 Security Headers

**Headers Set Globally:**
- `X-Frame-Options: DENY` - Prevents clickjacking
- `X-Content-Type-Options: nosniff` - Prevents MIME type sniffing
- `X-XSS-Protection: 1; mode=block` - Enables XSS filter (legacy browsers)
- `Strict-Transport-Security` - HSTS (only in production, max-age=31536000, includeSubDomains, preload)
- `Content-Security-Policy` - CSP with nonce support for inline scripts/styles
- `Referrer-Policy: strict-origin-when-cross-origin` - Controls referrer information
- `Permissions-Policy` - Restricts browser features (geolocation, microphone, camera, etc.)

**Files:**
- `src/middleware/securityHeaders.js` - Security headers middleware

### 5.3 Authentication & Authorization

**Authentication:**
- **OTP-Based**: Phone number + 6-digit OTP code
- **JWT Access Tokens**: Short-lived (15 minutes), signed with HS256, include `token_version` claim
- **JWT Refresh Tokens**: Long-lived (7 days), stored hashed in database, rotated on each use
- **Device Tracking**: Tracks device identifier, platform, model, OS version, app version
- **Token Versioning**: Access tokens include `token_version` claim that is validated against user's current version in database. When user logs out from all devices, `token_version` is incremented, invalidating all existing access tokens immediately.

**Authorization:**
- **Role-Based**: Admin routes require `role === 'security_admin'`
- **Step-Up Auth**: Sensitive operations require recent OTP verification or `high_assurance` token flag
- **Token Claims**: Validates `iss` (issuer), `aud` (audience), `exp` (expiration), `iat` (issued at), `token_version` (for access token invalidation)

**Files:**
- `src/middleware/authMiddleware.js` - Access token validation
- `src/middleware/adminAuth.js` - Admin role check
- `src/middleware/stepUpAuth.js` - Step-up authentication

### 5.4 Audit Logging

**Events Logged:**
- `otp_request` - OTP request (success/failed)
- `otp_verify` - OTP verification (success/failed)
- `login` - User login (success/blocked)
- `token_refresh` - Token refresh (success, with risk level)
- `logout` - User logout
- `device_revoked` - Device revocation
- `logout_all_other_devices` - Logout all other devices
- `logout_all_devices` - Logout from all devices (HIGH_RISK, triggers security alert)
- `admin_view_security_events` - Admin access to security dashboard

**Risk Levels:**
- `INFO` - Normal operations
- `SUSPICIOUS` - Unusual patterns (IP change, device change, multiple failures)
- `HIGH_RISK` - Blocked IPs, high risk scores (>=50), enumeration attempts

**Alerting:**
- **Webhook Integration**: Sends alerts to `SECURITY_ALERT_WEBHOOK_URL` for SUSPICIOUS/HIGH_RISK events
- **Anomaly Detection**: Detects patterns (multiple failed OTPs, multiple high-risk events from same IP)
- **Retry Logic**: Retries failed webhook alerts once after 3 seconds

**Files:**
- `src/services/auditLogger.js` - Audit logging and webhook alerting
- `src/services/riskScoring.js` - Risk score calculation

### 5.5 Data Protection

**Field-Level Encryption:**
- **Algorithm**: AES-256-GCM (authenticated encryption)
- **Fields Encrypted**: Phone numbers (before storing in database)
- **Key Management**: 32-byte key from `ENCRYPTION_KEY` (base64 encoded)
- **Backward Compatibility**: Handles both encrypted and plaintext data during migration

**Database Access Logging:**
- **Optional Feature**: Enabled with `DB_ACCESS_LOGGING_ENABLED=true`
- **Logs**: All database queries with context (user ID, IP, user agent)
- **Use Case**: Security auditing, compliance

**Files:**
- `src/utils/fieldEncryption.js` - Field-level encryption
- `src/middleware/dbAccessLogger.js` - Database access logging

### 5.6 Protection Against Attacks

**Brute-Force / Enumeration:**
- Rate limiting at multiple levels (phone, IP, user)
- Enumeration detection (tracks unique phone numbers per IP)
- IP blocking for enumeration attempts (1 hour block)
- Stricter rate limits when enumeration detected

**Timing Attacks:**
- Constant-time OTP verification (always performs bcrypt.compare, uses dummy hash if OTP not found)
- Timing protection wrappers for OTP request and verification flows
- Minimum delay enforcement to prevent timing leaks

**Man-in-the-Middle:**
- HTTPS enforcement via HSTS header (production)
- Security headers (CSP, X-Frame-Options) prevent various MITM attacks
- JWT token validation with signature verification

**Token Replay:**
- Refresh token rotation (new token issued, old token revoked)
- Reuse detection (if old token is used, all tokens for device are revoked)
- Access token short expiry (15 minutes) limits replay window
- Token versioning: Access tokens include `token_version` claim that is validated on each request. When user logs out from all devices, version is incremented, immediately invalidating all existing access tokens (even if not expired)

**Files:**
- `src/utils/timingProtection.js` - Timing attack protection
- `src/utils/enumerationDetection.js` - Enumeration detection
- `src/services/tokenService.js` - Token rotation and reuse detection

---

## 6. Error Handling & Failure Modes

### 6.1 OTP Sending Failures

**Behavior:**
- If Twilio is not configured: OTP is logged to console, request still succeeds
- If Twilio fails: Error is logged, OTP is still generated and stored, request succeeds
- **Rationale**: OTP generation should not fail if SMS delivery fails (user can check logs in development)

**Error Response:**
- Success response returned even if SMS fails (for development/testing)
- Production recommendation: Return error if SMS fails (uncomment error return in `src/routes/authRoutes.js:213`)

**Files:**
- `src/services/smsService.js` - SMS sending with fallback logging

### 6.2 Database Failures

**Behavior:**
- Connection pool errors: Logged, process exits (`src/db.js:11-14`)
- Query errors: Propagated to route handler, return 500 error
- **No Retries**: Database queries are not retried automatically (application-level retries can be added)

**Error Response:**
- `500 Internal Server Error` with generic message: `{error: 'Internal server error'}`

**Files:**
- `src/db.js` - Database connection and query wrapper

### 6.3 JWT Validation Errors

**Behavior:**
- Invalid token format: `401 Unauthorized` - `{error: 'Invalid token format'}`
- Invalid/expired token: `401 Unauthorized` - `{error: 'Invalid or expired token'}`
- Invalid claims: `401 Unauthorized` - `{error: 'Invalid token claims'}`
- Missing Authorization header: `401 Unauthorized` - `{error: 'Missing Authorization header'}`

**Key Rotation:**
- If key ID not found: Tries all available keys (for rotation support)
- If no key matches: Returns `401 Unauthorized`

**Files:**
- `src/middleware/authMiddleware.js` - JWT validation
- `src/services/tokenService.js` - Refresh token validation

### 6.4 Rate Limit Exceeded

**Behavior:**
- OTP request rate limit: `429 Too Many Requests` - `{success: false, message: 'Too many OTP requests...'}`
- OTP verify rate limit: `429 Too Many Requests` - `{success: false, message: 'Too many attempts...'}`
- User route rate limit: `429 Too Many Requests` - `{error: 'Too many requests', retry_after: seconds}`
- Admin route rate limit: `429 Too Many Requests` - `{error: 'Too many requests', retry_after: seconds}`

**Headers:**
- `X-RateLimit-Limit`: Maximum requests allowed
- `X-RateLimit-Remaining`: Remaining requests in window
- `X-RateLimit-Reset`: ISO timestamp when limit resets
- `X-RateLimit-Type`: Type of rate limit (read/write/sensitive/admin)

**Files:**
- `src/middleware/rateLimitMiddleware.js` - OTP rate limiting
- `src/middleware/userRateLimit.js` - User route rate limiting
- `src/middleware/adminRateLimit.js` - Admin rate limiting

### 6.5 Retries & Fallbacks

**Redis Fallback:**
- If Redis unavailable: Falls back to in-memory store (per-process, not shared)
- Rate limiting continues to work (with per-instance limits, not global)
- Warning logged on first failure, then silent

**Webhook Alerting:**
- If webhook fails: Retries once after 3 seconds
- If retry fails: Error logged, but main request flow continues (non-blocking)

**Files:**
- `src/services/redisClient.js` - Redis client with graceful fallback
- `src/services/auditLogger.js:334-516` - Webhook alerting with retry

---

## 7. Configuration & Environment Variables

### 7.1 Required Variables

| Variable | Description | Example | Required |
|----------|-------------|---------|----------|
| `DATABASE_URL` | PostgreSQL connection string | `postgres://user:pass@localhost:5432/dbname` | ✅ Yes |
| `JWT_ACCESS_SECRET` | Secret for signing access tokens (min 32 chars) | `hex-string-32-chars-minimum` | ✅ Yes |
| `JWT_REFRESH_SECRET` | Secret for signing refresh tokens (min 32 chars) | `hex-string-32-chars-minimum` | ✅ Yes |

### 7.2 Optional Variables - Timeouts & Expiry

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `JWT_ACCESS_TTL` | Access token expiry | `15m` | `15m`, `1h` |
| `JWT_REFRESH_TTL` | Refresh token expiry | `7d` | `7d`, `30d` |
| `REFRESH_MAX_IDLE_MINUTES` | Refresh token max idle time | `4320` (3 days) | `4320` |
| `OTP_TTL_SECONDS` | OTP validity in seconds | `120` (2 min) | `120` |
| `STEP_UP_OTP_WINDOW_MINUTES` | Step-up auth window | `5` | `5` |

### 7.3 Optional Variables - Rate Limits

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `OTP_REQ_PHONE_10MIN_LIMIT` | Max OTP requests per phone (10 min) | `3` | `3` |
| `OTP_REQ_PHONE_DAY_LIMIT` | Max OTP requests per phone (24h) | `10` | `10` |
| `OTP_REQ_IP_10MIN_LIMIT` | Max OTP requests per IP (10 min) | `20` | `20` |
| `OTP_REQ_IP_DAY_LIMIT` | Max OTP requests per IP (24h) | `100` | `100` |
| `OTP_VERIFY_MAX_ATTEMPTS` | Max OTP verification attempts | `5` | `5` |
| `OTP_VERIFY_FAILED_PER_HOUR_LIMIT` | Max failed verifications per phone (1h) | `10` | `10` |
| `USER_RATE_LIMIT_READ_MAX` | Max read requests per user (15 min) | `100` | `100` |
| `USER_RATE_LIMIT_WRITE_MAX` | Max write requests per user (15 min) | `20` | `20` |
| `USER_RATE_LIMIT_SENSITIVE_MAX` | Max sensitive requests per user (1h) | `10` | `10` |
| `ADMIN_RATE_LIMIT_MAX` | Max admin requests per admin (15 min) | `100` | `100` |

### 7.4 Optional Variables - Security Features

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `ENCRYPTION_ENABLED` | Enable field-level encryption | `false` | `true` |
| `ENCRYPTION_KEY` | 32-byte encryption key (base64) | - | `base64-encoded-32-byte-key` |
| `DB_ACCESS_LOGGING_ENABLED` | Enable database access logging | `false` | `true` |
| `DB_ACCESS_LOG_LEVEL` | DB access log level ('all' or 'sensitive') | `sensitive` | `all`, `sensitive` |
| `CORS_ALLOWED_ORIGINS` | Comma-separated allowed origins | - | `https://app.example.com,https://api.example.com` |
| `ENUMERATION_MAX_PHONES_PER_IP_10MIN` | Max unique phones per IP (10 min) | `5` | `5` |
| `ENUMERATION_MAX_PHONES_PER_IP_HOUR` | Max unique phones per IP (1h) | `20` | `20` |
| `ENUMERATION_ALERT_THRESHOLD_10MIN` | Alert threshold for enumeration (10 min) | `10` | `10` |
| `ENUMERATION_ALERT_THRESHOLD_HOUR` | Alert threshold for enumeration (1h) | `50` | `50` |
| `OTP_REQUEST_MIN_DELAY` | Min delay for OTP requests (ms) | `500` | `500` |
| `OTP_VERIFY_MIN_DELAY` | Min delay for OTP verify (ms) | `300` | `300` |
| `TIMING_MAX_JITTER` | Max jitter for timing protection (ms) | `100` | `100` |
| `BLOCKED_IP_RANGES` | Comma-separated CIDR blocks | - | `10.0.0.0/8,172.16.0.0/12` |
| `REQUIRE_OTP_ON_SUSPICIOUS_REFRESH` | Require OTP on suspicious refresh | `false` | `true` |
| `SECURITY_ALERT_WEBHOOK_URL` | Webhook URL for security alerts | - | `https://hooks.slack.com/...` |
| `SECURITY_ALERT_MIN_LEVEL` | Minimum risk level for alerts | `HIGH_RISK` | `SUSPICIOUS`, `HIGH_RISK` |

### 7.5 Optional Variables - JWT Key Rotation

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `JWT_ACTIVE_KEY_ID` | Key ID for signing new tokens | `1` | `1`, `2` |
| `JWT_KEYS_JSON` | JSON mapping key IDs to secrets | - | `{"1":"secret1","2":"secret2"}` |
| `JWT_REFRESH_KEY_ID` | Key ID for refresh tokens | Same as active | `1` |
| `JWT_ISSUER` | JWT issuer claim | `farm-auth-service` | `farm-auth-service` |
| `JWT_AUDIENCE` | JWT audience claim | `mobile-app` | `mobile-app` |

### 7.6 Optional Variables - External Services

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `TWILIO_ACCOUNT_SID` | Twilio account SID | - | `ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` |
| `TWILIO_AUTH_TOKEN` | Twilio auth token | - | `your_auth_token` |
| `TWILIO_MESSAGING_SERVICE_SID` | Twilio messaging service SID | - | `MGxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` |
| `TWILIO_FROM_NUMBER` | Twilio phone number (E.164) | - | `+1234567890` |
| `REDIS_URL` | Redis connection URL | - | `redis://localhost:6379` |
| `REDIS_HOST` | Redis host | `localhost` | `localhost` |
| `REDIS_PORT` | Redis port | `6379` | `6379` |
| `REDIS_PASSWORD` | Redis password | - | `password` |

### 7.7 Optional Variables - Server Configuration

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `PORT` | Server port | `3000` | `3000` |
| `NODE_ENV` | Environment | - | `development`, `production` |
| `TRUST_PROXY` | Trust proxy headers | `false` | `true` |
| `ENABLE_ADMIN_DASHBOARD` | Enable admin routes | `false` | `true` |

---

## 8. Future Improvements / Notes

### 8.1 Planned Improvements (from TODOs in code)

1. **Secrets Manager Integration**
   - Load JWT keys from AWS Secrets Manager / HashiCorp Vault (instead of environment variables)
   - Load encryption keys from secrets manager
   - **File**: `src/services/jwtKeys.js:161-174` (TODO comment)

2. **Automated Key Rotation**
   - Implement automated JWT key rotation without downtime
   - Re-encrypt existing data when encryption keys are rotated
   - **File**: `src/services/jwtKeys.js` (key rotation support exists, but automation needed)

3. **SIEM Integration**
   - Integrate with SIEM systems (Splunk, ELK, etc.) for centralized log aggregation
   - Export audit logs to SIEM for advanced threat detection
   - **File**: `src/services/auditLogger.js` (webhook exists, but SIEM integration needed)

4. **CSP Nonces**
   - Fully implement CSP nonces for inline scripts/styles (currently allows `unsafe-inline` for compatibility)
   - **File**: `src/middleware/securityHeaders.js:28-29` (nonce support exists but not fully utilized)

5. **Database Connection Pooling Tuning**
   - Add configuration for connection pool size, timeout, etc.
   - **File**: `src/db.js` (basic pool, no tuning options)

6. **Rate Limiting Improvements**
   - Implement distributed rate limiting (currently per-instance if Redis unavailable)
   - Add rate limit headers to all rate-limited endpoints
   - **File**: `src/middleware/rateLimitMiddleware.js` (Redis fallback exists, but distributed limiting needed)

7. **OTP Delivery Alternatives**
   - Support multiple SMS providers (fallback if Twilio fails)
   - Support email OTP delivery
   - Support push notification OTP delivery
   - **File**: `src/services/smsService.js` (only Twilio supported)

8. **Advanced Risk Scoring**
   - Machine learning-based risk scoring
   - Geographic anomaly detection (unusual locations)
   - Device fingerprinting improvements
   - **File**: `src/services/riskScoring.js` (basic scoring exists)

### 8.2 Potential Risks & Technical Debt

1. **In-Memory Rate Limiting**
   - If Redis is unavailable, rate limiting uses in-memory store (per-instance, not shared)
   - **Risk**: Rate limits are per-instance, not global (can be bypassed with multiple instances)
   - **Mitigation**: Always use Redis in production, or implement distributed rate limiting

2. **OTP Storage**
   - OTPs are stored in database (not just Redis)
   - **Risk**: Database can become a bottleneck for high-volume OTP requests
   - **Mitigation**: Consider moving OTP storage to Redis entirely (with DB backup for audit)

3. **Phone Number Encryption Migration**
   - Handles both encrypted and plaintext phone numbers (backward compatibility)
   - **Risk**: Plaintext phone numbers still in database if encryption was enabled after data existed
   - **Mitigation**: Implement migration script to encrypt all existing phone numbers

4. **Webhook Alerting**
   - Webhook failures are logged but don't block requests
   - **Risk**: Security alerts might be missed if webhook is down
   - **Mitigation**: Implement alert queue (Redis/RabbitMQ) with retry logic and dead-letter queue

5. **Database Access Logging**
   - Database access logging is optional and can impact performance
   - **Risk**: Performance degradation if enabled in high-traffic scenarios
   - **Mitigation**: Use async logging, batch writes, or separate logging database

6. **JWT Key Rotation**
   - Key rotation support exists, but manual process
   - **Risk**: Manual key rotation can cause downtime if not done correctly
   - **Mitigation**: Implement automated key rotation with gradual rollout

7. **CORS Configuration**
   - CORS validation at startup, but runtime checks are warnings only
   - **Risk**: Misconfiguration might not be caught until runtime
   - **Mitigation**: Add stricter runtime validation or fail-fast on suspicious patterns

8. **Error Messages**
   - Some error messages are generic to prevent information leakage
   - **Risk**: Generic errors can make debugging difficult
   - **Mitigation**: Log detailed errors server-side, return generic errors to clients

---

## Appendix: Database Schema

### Key Tables

**users**
- `id` (UUID, PK)
- `phone_number` (VARCHAR(20), UNIQUE, encrypted if ENCRYPTION_ENABLED)
- `name` (VARCHAR(255))
- `role` (enum: 'user', 'admin', 'moderator')
- `user_type` (enum: 'seller', 'buyer', 'service_provider')
- `token_version` (INT, DEFAULT 1) - Incremented on logout-all-devices to invalidate all access tokens
- `created_at`, `updated_at`, `last_login_at`

**otp_codes**
- `id` (UUID, PK)
- `phone_number` (VARCHAR(20), encrypted if ENCRYPTION_ENABLED)
- `otp_hash` (VARCHAR(255), bcrypt hash)
- `expires_at` (TIMESTAMPTZ)
- `attempt_count` (INT)
- `created_at` (TIMESTAMPTZ)

**refresh_tokens**
- `id` (UUID, PK)
- `user_id` (UUID, FK)
- `token_id` (UUID, UNIQUE)
- `token_hash` (VARCHAR(255), bcrypt hash)
- `device_id` (VARCHAR(255))
- `user_agent` (TEXT)
- `ip_address` (VARCHAR(45))
- `expires_at` (TIMESTAMPTZ)
- `last_used_at` (TIMESTAMPTZ)
- `revoked_at` (TIMESTAMPTZ, NULL = active)
- `reuse_detected_at` (TIMESTAMPTZ)
- `rotated_from_id` (UUID, FK to refresh_tokens)

**user_devices**
- `id` (UUID, PK)
- `user_id` (UUID, FK)
- `device_identifier` (TEXT)
- `device_platform` (TEXT)
- `device_model` (TEXT)
- `os_version` (TEXT)
- `app_version` (TEXT)
- `language_code` (TEXT)
- `timezone` (TEXT)
- `first_seen_at` (TIMESTAMPTZ)
- `last_seen_at` (TIMESTAMPTZ)
- `is_active` (BOOLEAN)
- UNIQUE (user_id, device_identifier)

**auth_audit**
- `id` (UUID, PK)
- `user_id` (UUID, FK, nullable)
- `action` (VARCHAR(100))
- `status` (VARCHAR(50))
- `risk_level` (VARCHAR(20): 'INFO', 'SUSPICIOUS', 'HIGH_RISK')
- `ip_address` (VARCHAR(45))
- `user_agent` (TEXT)
- `device_id` (VARCHAR(255))
- `meta` (JSONB)
- `created_at` (TIMESTAMPTZ)

---

## Document Version

- **Version**: 1.0
- **Last Updated**: 2024
- **Author**: Architecture Documentation Generator
- **Maintained By**: Development Team