43 KiB

Raw Blame History

Farm Auth Service - Architecture Documentation

1. High-Level Overview

The Farm Auth Service is a Node.js + Express authentication and security service that provides phone-based authentication using OTP (One-Time Password) via SMS, JWT-based access and refresh tokens, comprehensive rate limiting, security hardening, and audit logging. The service is designed for a mobile application ecosystem where users authenticate using their phone numbers.

Core Functionality:

Phone number-based authentication with OTP verification via SMS (Twilio)
JWT access tokens (short-lived) and refresh tokens (long-lived) with rotation
Device tracking and multi-device session management
Comprehensive rate limiting at multiple levels (phone, IP, user)
Security hardening: CORS validation, security headers, field-level encryption, timing attack protection, enumeration detection
Audit logging with risk scoring and webhook alerting
Admin dashboard for security event monitoring

External Systems:

PostgreSQL Database: Stores users, OTP codes, refresh tokens, devices, and audit logs
Redis (optional): Used for rate limiting counters and OTP tracking (falls back to in-memory store)
Twilio: SMS provider for OTP delivery (optional - service works without it for development)
Webhook Endpoints: For security alerts (Slack, Discord, or custom webhooks)

2. Architecture & Components

2.1 HTTP/API Layer

Files:

src/index.js - Express server setup and middleware configuration
src/routes/authRoutes.js - Authentication endpoints
src/routes/userRoutes.js - User profile and device management endpoints
src/routes/adminRoutes.js - Admin security dashboard endpoints

Responsibilities:

Request routing and middleware orchestration
Input validation and sanitization
Response formatting
Error handling

Middleware Order (Critical):

Trust proxy configuration (if behind reverse proxy)
CORS validation (startup and runtime)
JSON body parser
Security headers (global)
Route-specific middleware (validation, rate limiting, auth)

Key Configuration:

TRUST_PROXY: Set to 'true' if behind reverse proxy (nginx, load balancer)
CORS_ALLOWED_ORIGINS: Comma-separated list of allowed origins (required in production)
ENABLE_ADMIN_DASHBOARD: Set to 'true' to enable admin routes

2.2 Authentication Core

Files:

src/services/otpService.js - OTP generation, hashing (bcrypt), storage, and verification
src/services/tokenService.js - JWT access/refresh token issuance, rotation, and validation
src/services/jwtKeys.js - JWT key management with rotation support
src/middleware/authMiddleware.js - JWT access token validation
src/middleware/stepUpAuth.js - Step-up authentication for sensitive operations

Responsibilities:

OTP generation (6-digit random codes)
OTP hashing with bcrypt (10 rounds)
OTP storage in database with expiry and attempt tracking
JWT token signing with key rotation support
Refresh token rotation and reuse detection
Device fingerprinting and tracking

Key Features:

OTP Security: Hashed with bcrypt, constant-time verification to prevent timing attacks
Token Rotation: Refresh tokens rotate on each use, old tokens are revoked
Reuse Detection: Detects if a refresh token is reused (theft indicator)
Step-Up Auth: Requires recent OTP verification for sensitive operations

2.3 Security Layer

Files:

src/middleware/rateLimitMiddleware.js - OTP request/verification rate limiting
src/middleware/userRateLimit.js - User route rate limiting (read/write/sensitive)
src/middleware/adminRateLimit.js - Admin route rate limiting
src/middleware/securityHeaders.js - Security headers (CSP, HSTS, X-Frame-Options, etc.)
src/utils/corsValidator.js - CORS configuration validation
src/utils/timingProtection.js - Timing attack protection for OTP flows
src/utils/enumerationDetection.js - Phone number enumeration detection
src/services/riskScoring.js - Risk scoring for login/refresh attempts
src/middleware/validation.js - Input validation middleware

Responsibilities:

Rate limiting at multiple levels (phone, IP, user, admin)
Security headers enforcement
CORS origin validation (startup and runtime)
Timing attack mitigation (constant-time OTP verification)
Enumeration detection and IP blocking
Risk scoring based on IP/device changes
Input validation and sanitization

Key Features:

Multi-Level Rate Limiting: Phone-based, IP-based, and user-based limits
Enumeration Protection: Detects and blocks IPs attempting phone number enumeration
Timing Attack Protection: All OTP operations use constant-time execution
Risk Scoring: Calculates risk scores for suspicious login/refresh attempts

2.4 Persistence Layer

Files:

src/db.js - PostgreSQL connection pool and query wrapper
src/middleware/dbAccessLogger.js - Optional database access logging
src/utils/fieldEncryption.js - Field-level encryption for PII (phone numbers)
src/utils/encryptedPhoneSearch.js - Phone number search with encryption support

Database Tables:

users - User accounts (phone number, name, role, user_type)
otp_codes - OTP codes (hashed, with expiry and attempt tracking)
refresh_tokens - Refresh tokens (hashed, with rotation tracking)
user_devices - Device tracking (platform, model, OS, app version)
auth_audit - Security audit logs (all authentication events)

Responsibilities:

Database connection management
Query execution with optional logging
Field-level encryption for sensitive data (phone numbers)
Database schema management (auto-creates tables if missing)

Key Features:

Field-Level Encryption: Phone numbers encrypted at rest (AES-256-GCM)
Database Access Logging: Optional logging of all DB queries (for security auditing)
Backward Compatibility: Handles both encrypted and plaintext phone numbers during migration

2.5 Integration Layer

Files:

src/services/smsService.js - Twilio SMS integration
src/services/auditLogger.js - Audit logging with webhook alerting
src/services/redisClient.js - Redis client with graceful fallback

Responsibilities:

SMS delivery via Twilio (with fallback logging)
Security event logging to database
Webhook alerting for high-risk events
Redis connection management (optional, falls back to in-memory)

Key Features:

Twilio Integration: Sends OTP via SMS (optional - works without for development)
Webhook Alerting: Sends alerts to Slack/Discord/custom webhooks for SUSPICIOUS/HIGH_RISK events
Redis Fallback: Gracefully falls back to in-memory store if Redis unavailable

3. Request Flows

Step-by-Step:

Client requests OTP (POST /auth/request-otp)
- Input validation (phone number format)
- Check for active OTP (2-minute no-resend rule)
- Rate limit by phone number (3 per 10 min, 10 per day)
- Rate limit by IP address (20 per 10 min, 100 per day)
- Check if IP is blocked (enumeration or CIDR ranges)
- Enumeration detection (if suspicious, apply stricter limits)
- Timing protection wrapper (constant-time execution)
- Normalize phone number (E.164 format)
- Generate 6-digit OTP code
- Hash OTP with bcrypt (10 rounds)
- Encrypt phone number (if encryption enabled)
- Store OTP in database (delete old OTPs for same phone)
- Mark OTP as active in Redis/memory (2-minute TTL)
- Send SMS via Twilio (or log to console if not configured)
- Log audit event (otp_request, INFO risk level)
- Return success (even if SMS fails - OTP is generated)
Client verifies OTP (POST /auth/verify-otp)
- Input validation (phone number, 6-digit code, device_id, device_info)
- Rate limit failed verifications (10 per hour per phone)
- Check if IP is blocked
- Timing protection wrapper (constant-time execution)
- Normalize phone number
- Encrypt phone number for search
- Query OTP from database (with constant-time dummy hash if not found)
- Check expiry, max attempts, and verify code (all with constant-time bcrypt.compare)
- If invalid: increment attempt count, log suspicious event, return generic error
- If valid: delete OTP, find or create user, decrypt phone number
- Update user last_login_at
- Upsert device record (track platform, model, OS, app version)
- Calculate risk score (IP change, device change, user agent change)
- Log audit event (login, risk level based on score)
- Check for anomalies (multiple failed attempts, high-risk IPs)
- Issue access token (with high_assurance flag) and refresh token
- Return user data, tokens, and device info

Mermaid Sequence Diagram:

sequenceDiagram
    participant Client
    participant API
    participant RateLimiter
    participant OTPService
    participant DB
    participant Twilio
    participant AuditLogger

    Client->>API: POST /auth/request-otp<br/>{phone_number}
    API->>API: Validate input
    API->>RateLimiter: Check active OTP (2-min rule)
    RateLimiter-->>API: No active OTP
    API->>RateLimiter: Rate limit by phone (3/10min)
    RateLimiter-->>API: Allowed
    API->>RateLimiter: Rate limit by IP (20/10min)
    RateLimiter-->>API: Allowed
    API->>API: Check IP blocking
    API->>OTPService: Generate OTP
    OTPService->>DB: Store hashed OTP
    OTPService->>RateLimiter: Mark active (2-min TTL)
    API->>Twilio: Send SMS
    Twilio-->>API: SMS sent (or error)
    API->>AuditLogger: Log otp_request event
    API-->>Client: {ok: true}

    Client->>API: POST /auth/verify-otp<br/>{phone_number, code, device_id}
    API->>API: Validate input
    API->>RateLimiter: Check failed attempts (10/hour)
    RateLimiter-->>API: Allowed
    API->>OTPService: Verify OTP (constant-time)
    OTPService->>DB: Query OTP (with dummy hash if not found)
    OTPService->>OTPService: bcrypt.compare (constant-time)
    alt OTP Valid
        OTPService->>DB: Delete OTP
        API->>DB: Find or create user
        API->>DB: Upsert device
        API->>API: Calculate risk score
        API->>AuditLogger: Log login (with risk level)
        API->>API: Issue access + refresh tokens
        API-->>Client: {user, access_token, refresh_token}
    else OTP Invalid
        OTPService->>DB: Increment attempt count
        API->>AuditLogger: Log suspicious attempt
        API-->>Client: {error: "OTP invalid or expired"}
    end

3.2 Token Refresh Flow

Step-by-Step:

Client requests token refresh (POST /auth/refresh)
- Input validation (refresh_token)
- Check if IP is blocked
- Decode refresh token to get key ID
- Verify refresh token signature (try all keys if key ID not found)
- Validate JWT claims (iss, aud, exp, iat)
- Query refresh token from database (by token_id)
- Verify token hash matches (bcrypt.compare)
- Check if token is revoked or expired
- Check refresh token idle timeout (max idle minutes)
- Calculate risk score (IP change, device change, user agent change)
- If suspicious: log suspicious refresh event
- If suspicious and REQUIRE_OTP_ON_SUSPICIOUS_REFRESH: return step_up_required error
- Update token last_used_at
- Revoke old refresh token
- Issue new access token and new refresh token (rotation)
- Update device last_seen_at
- Log audit event (token_refresh, risk level based on score)
- Return new tokens

Mermaid Sequence Diagram:

sequenceDiagram
    participant Client
    participant API
    participant TokenService
    participant JWTKeys
    participant DB
    participant RiskScoring
    participant AuditLogger

    Client->>API: POST /auth/refresh<br/>{refresh_token}
    API->>API: Validate input
    API->>API: Check IP blocking
    API->>TokenService: Verify refresh token
    TokenService->>JWTKeys: Get key secret (by key ID)
    JWTKeys-->>TokenService: Key secret
    TokenService->>TokenService: Verify JWT signature
    TokenService->>TokenService: Validate claims (iss, aud, exp)
    TokenService->>DB: Query refresh token (by token_id)
    DB-->>TokenService: Token record
    TokenService->>TokenService: Verify token hash (bcrypt)
    alt Token Valid
        TokenService->>TokenService: Check expiry & idle timeout
        API->>RiskScoring: Calculate risk score
        RiskScoring->>DB: Get previous auth info
        RiskScoring-->>API: Risk score & reasons
        alt Suspicious Refresh
            API->>AuditLogger: Log suspicious refresh
            alt Require OTP
                API-->>Client: {error: "step_up_required"}
            else Allow with Risk
                API->>TokenService: Rotate refresh token
                TokenService->>DB: Revoke old token
                TokenService->>DB: Store new token
                API->>AuditLogger: Log refresh (SUSPICIOUS/HIGH_RISK)
                API-->>Client: {access_token, refresh_token}
            end
        else Normal Refresh
            API->>TokenService: Rotate refresh token
            TokenService->>DB: Revoke old token
            TokenService->>DB: Store new token
            API->>DB: Update device last_seen_at
            API->>AuditLogger: Log refresh (INFO)
            API-->>Client: {access_token, refresh_token}
        end
    else Token Invalid
        API-->>Client: {error: "Invalid refresh token"}
    end

3.3 Logout Flow

Step-by-Step:

Single-device logout (POST /auth/logout)
- Input validation (refresh_token)
- Verify refresh token (same as refresh flow)
- If token invalid/already revoked: return success (idempotent)
- Revoke all refresh tokens for user + device
- Log audit event (logout, INFO)
- Return success
Logout all other devices (POST /users/me/logout-all-other-devices)
- Requires authentication (access token)
- Requires step-up auth (recent OTP or high_assurance token)
- Rate limited (10 per hour per user)
- Get current device_id from header or body
- Mark all other devices as inactive
- Revoke refresh tokens for all other devices
- Log audit event (logout_all_other_devices, INFO)
- Return count of revoked devices
Logout from all devices (POST /users/me/logout-all-devices)
- Requires authentication (access token)
- Requires step-up auth (recent OTP or high_assurance token)
- Rate limited (10 per hour per user)
- Revoke all refresh tokens for the user (all devices)
- Mark all devices as inactive
- Increment user's token_version to invalidate all existing access tokens
- Log audit event (logout_all_devices, HIGH_RISK) - triggers security alert
- Return success with revoked tokens count
- Security Note: This is a critical security operation used when account compromise is suspected. All existing access tokens become invalid immediately, even if they haven't expired yet.
Revoke specific device (DELETE /users/me/devices/:device_id)
- Requires authentication (access token)
- Requires step-up auth (recent OTP or high_assurance token)
- Rate limited (10 per hour per user)
- Validate device_id parameter
- Mark device as inactive
- Revoke refresh tokens for device
- Log audit event (device_revoked, INFO)
- Return success

Mermaid Sequence Diagram:

sequenceDiagram
    participant Client
    participant API
    participant TokenService
    participant DB
    participant AuditLogger

    Note over Client,AuditLogger: Single Device Logout
    Client->>API: POST /auth/logout<br/>{refresh_token}
    API->>TokenService: Verify refresh token
    TokenService-->>API: Token info
    API->>TokenService: Revoke refresh token
    TokenService->>DB: Mark token revoked
    API->>AuditLogger: Log logout event
    API-->>Client: {ok: true}

    Note over Client,AuditLogger: Logout All Other Devices
    Client->>API: POST /users/me/logout-all-other-devices<br/>{current_device_id}
    API->>API: Verify access token
    API->>API: Check step-up auth
    API->>API: Rate limit check (10/hour)
    API->>DB: Mark other devices inactive
    API->>TokenService: Revoke tokens for other devices
    TokenService->>DB: Revoke tokens
    API->>AuditLogger: Log logout_all_other_devices
    API-->>Client: {ok: true, revoked_devices_count: N}

    Note over Client,AuditLogger: Logout All Devices (Global Logout)
    Client->>API: POST /users/me/logout-all-devices
    API->>API: Verify access token
    API->>API: Check step-up auth
    API->>API: Rate limit check (10/hour)
    API->>TokenService: Revoke all user tokens
    TokenService->>DB: Revoke all refresh tokens
    TokenService->>DB: Mark all devices inactive
    TokenService->>DB: Increment token_version
    API->>AuditLogger: Log logout_all_devices (HIGH_RISK)
    AuditLogger->>AuditLogger: Trigger security alert
    API-->>Client: {ok: true, revoked_tokens_count: N}

3.4 Admin Security Events Flow

Step-by-Step:

Admin requests security events (GET /admin/security-events)
- Requires authentication (access token)
- Requires admin role (security_admin)
- Rate limited (100 per 15 minutes per admin)
- Validate and sanitize query parameters (risk_level, limit, offset, search)
- Build parameterized SQL query (prevent injection)
- Query auth_audit table with filters
- Mask phone numbers (keep last 4 digits)
- Sanitize all output fields
- Get total count for pagination
- Get statistics (last 24 hours: total, high_risk, suspicious, info)
- Log admin access event (admin_view_security_events, INFO)
- Return events, pagination info, and statistics

Mermaid Sequence Diagram:

sequenceDiagram
    participant Admin
    participant API
    participant AuthMiddleware
    participant AdminAuth
    participant AdminRateLimit
    participant DB
    participant AuditLogger

    Admin->>API: GET /admin/security-events<br/>?risk_level=HIGH_RISK&limit=200
    API->>AuthMiddleware: Verify access token
    AuthMiddleware-->>API: User info
    API->>AdminAuth: Check admin role
    AdminAuth-->>API: Authorized
    API->>AdminRateLimit: Check rate limit (100/15min)
    AdminRateLimit-->>API: Allowed
    API->>API: Sanitize query params
    API->>DB: Query auth_audit (parameterized)
    DB-->>API: Events data
    API->>API: Mask phone numbers
    API->>API: Sanitize output
    API->>DB: Get total count
    API->>DB: Get statistics (24h)
    API->>AuditLogger: Log admin access
    API-->>Admin: {events, pagination, stats}

4. Timeouts, Expiry, and Limits

Name	ENV Variable / Config	Default Value	Defined In	What It Affects
OTP Expiry	`OTP_TTL_SECONDS`	`120` (2 minutes)	`src/services/otpService.js:10`	OTP validity period
OTP Resend Throttle	(hardcoded)	`120` seconds	`src/middleware/rateLimitMiddleware.js:154`	Minimum time between OTP requests for same phone
Max OTP Verification Attempts	`OTP_VERIFY_MAX_ATTEMPTS`	`5`	`src/services/otpService.js:12`	Maximum attempts to verify an OTP before it's invalidated
JWT Access Token Expiry	`JWT_ACCESS_TTL`	`'15m'` (15 minutes)	`src/config.js:72`	Access token lifetime
JWT Refresh Token Expiry	`JWT_REFRESH_TTL`	`'7d'` (7 days)	`src/config.js:73`	Refresh token lifetime
Refresh Token Max Idle	`REFRESH_MAX_IDLE_MINUTES`	`4320` (3 days)	`src/config.js:58-60`	Maximum idle time before refresh token expires
Step-Up Auth Window	`STEP_UP_OTP_WINDOW_MINUTES`	`5` minutes	`src/middleware/stepUpAuth.js:26`	Time window for "recent" OTP verification for step-up auth
OTP Request - Phone (10 min)	`OTP_REQ_PHONE_10MIN_LIMIT`	`3`	`src/middleware/rateLimitMiddleware.js:24`	Max OTP requests per phone per 10 minutes
OTP Request - Phone (24h)	`OTP_REQ_PHONE_DAY_LIMIT`	`10`	`src/middleware/rateLimitMiddleware.js:25`	Max OTP requests per phone per 24 hours
OTP Request - IP (10 min)	`OTP_REQ_IP_10MIN_LIMIT`	`20`	`src/middleware/rateLimitMiddleware.js:26`	Max OTP requests per IP per 10 minutes
OTP Request - IP (24h)	`OTP_REQ_IP_DAY_LIMIT`	`100`	`src/middleware/rateLimitMiddleware.js:27`	Max OTP requests per IP per 24 hours
OTP Verify Failed (1h)	`OTP_VERIFY_FAILED_PER_HOUR_LIMIT`	`10`	`src/middleware/rateLimitMiddleware.js:31`	Max failed verification attempts per phone per hour
Enumeration IP Block Duration	`ENUMERATION_BLOCK_DURATION`	`3600` (1 hour)	`src/middleware/rateLimitMiddleware.js:40`	Duration IP is blocked after enumeration detection
User Rate Limit - Read	`USER_RATE_LIMIT_READ_MAX`	`100`	`src/middleware/userRateLimit.js:25`	Max read requests per user per 15 minutes
User Rate Limit - Read Window	`USER_RATE_LIMIT_READ_WINDOW`	`900` (15 min)	`src/middleware/userRateLimit.js:26`	Time window for read rate limit
User Rate Limit - Write	`USER_RATE_LIMIT_WRITE_MAX`	`20`	`src/middleware/userRateLimit.js:29`	Max write requests per user per 15 minutes
User Rate Limit - Write Window	`USER_RATE_LIMIT_WRITE_WINDOW`	`900` (15 min)	`src/middleware/userRateLimit.js:30`	Time window for write rate limit
User Rate Limit - Sensitive	`USER_RATE_LIMIT_SENSITIVE_MAX`	`10`	`src/middleware/userRateLimit.js:33`	Max sensitive requests per user per hour
User Rate Limit - Sensitive Window	`USER_RATE_LIMIT_SENSITIVE_WINDOW`	`3600` (1 hour)	`src/middleware/userRateLimit.js:34`	Time window for sensitive rate limit
Admin Rate Limit	`ADMIN_RATE_LIMIT_MAX`	`100`	`src/middleware/adminRateLimit.js:23`	Max admin requests per admin per 15 minutes
Admin Rate Limit Window	`ADMIN_RATE_LIMIT_WINDOW`	`900` (15 min)	`src/middleware/adminRateLimit.js:24`	Time window for admin rate limit
Twilio HTTP Timeout	(hardcoded)	`5000` ms	`src/services/auditLogger.js:459`	Webhook request timeout (also used for Twilio if configured)
Webhook Retry Delay	(hardcoded)	`3000` ms	`src/services/auditLogger.js:498`	Delay before retrying failed webhook alerts
OTP Request Min Delay	`OTP_REQUEST_MIN_DELAY`	`500` ms	`src/utils/timingProtection.js:26`	Minimum delay for OTP requests (timing attack protection)
OTP Verify Min Delay	`OTP_VERIFY_MIN_DELAY`	`300` ms	`src/utils/timingProtection.js:30`	Minimum delay for OTP verification (timing attack protection)
Timing Max Jitter	`TIMING_MAX_JITTER`	`100` ms	`src/utils/timingProtection.js:34`	Maximum random jitter added to delays
Enumeration Max Phones/IP (10min)	`ENUMERATION_MAX_PHONES_PER_IP_10MIN`	`5`	`src/utils/enumerationDetection.js:32`	Max unique phone numbers per IP in 10 minutes
Enumeration Max Phones/IP (1h)	`ENUMERATION_MAX_PHONES_PER_IP_HOUR`	`20`	`src/utils/enumerationDetection.js:33`	Max unique phone numbers per IP in 1 hour
Enumeration Alert Threshold (10min)	`ENUMERATION_ALERT_THRESHOLD_10MIN`	`10`	`src/utils/enumerationDetection.js:40`	Unique phones threshold for alert (10 min)
Enumeration Alert Threshold (1h)	`ENUMERATION_ALERT_THRESHOLD_HOUR`	`50`	`src/utils/enumerationDetection.js:41`	Unique phones threshold for alert (1 hour)

5. Security Features

5.1 CORS Behavior

Configuration:

Startup Validation: CORS configuration is validated at startup (src/index.js:29-34)
Runtime Monitoring: Runtime CORS checks log warnings for suspicious patterns (src/index.js:58-63)
Origin Whitelisting: Only explicitly configured origins are allowed (never wildcard * when credentials are involved)
No-Origin Requests: Requests without origin (mobile apps, Postman) are allowed

Implementation:

CORS_ALLOWED_ORIGINS: Comma-separated list of allowed origins (required in production)
Development mode: Allows all origins if no origins configured (with warning)
Production mode: Throws error if CORS_ALLOWED_ORIGINS is empty

Files:

src/index.js:36-86 - CORS middleware configuration
src/utils/corsValidator.js - CORS validation utilities

5.2 Security Headers

Headers Set Globally:

X-Frame-Options: DENY - Prevents clickjacking
X-Content-Type-Options: nosniff - Prevents MIME type sniffing
X-XSS-Protection: 1; mode=block - Enables XSS filter (legacy browsers)
Strict-Transport-Security - HSTS (only in production, max-age=31536000, includeSubDomains, preload)
Content-Security-Policy - CSP with nonce support for inline scripts/styles
Referrer-Policy: strict-origin-when-cross-origin - Controls referrer information
Permissions-Policy - Restricts browser features (geolocation, microphone, camera, etc.)

Files:

src/middleware/securityHeaders.js - Security headers middleware

5.3 Authentication & Authorization

Authentication:

OTP-Based: Phone number + 6-digit OTP code
JWT Access Tokens: Short-lived (15 minutes), signed with HS256, include token_version claim
JWT Refresh Tokens: Long-lived (7 days), stored hashed in database, rotated on each use
Device Tracking: Tracks device identifier, platform, model, OS version, app version
Token Versioning: Access tokens include token_version claim that is validated against user's current version in database. When user logs out from all devices, token_version is incremented, invalidating all existing access tokens immediately.

Authorization:

Role-Based: Admin routes require role === 'security_admin'
Step-Up Auth: Sensitive operations require recent OTP verification or high_assurance token flag
Token Claims: Validates iss (issuer), aud (audience), exp (expiration), iat (issued at), token_version (for access token invalidation)

Files:

src/middleware/authMiddleware.js - Access token validation
src/middleware/adminAuth.js - Admin role check
src/middleware/stepUpAuth.js - Step-up authentication

5.4 Audit Logging

Events Logged:

otp_request - OTP request (success/failed)
otp_verify - OTP verification (success/failed)
login - User login (success/blocked)
token_refresh - Token refresh (success, with risk level)
logout - User logout
device_revoked - Device revocation
logout_all_other_devices - Logout all other devices
logout_all_devices - Logout from all devices (HIGH_RISK, triggers security alert)
admin_view_security_events - Admin access to security dashboard

Risk Levels:

INFO - Normal operations
SUSPICIOUS - Unusual patterns (IP change, device change, multiple failures)
HIGH_RISK - Blocked IPs, high risk scores (>=50), enumeration attempts

Alerting:

Webhook Integration: Sends alerts to SECURITY_ALERT_WEBHOOK_URL for SUSPICIOUS/HIGH_RISK events
Anomaly Detection: Detects patterns (multiple failed OTPs, multiple high-risk events from same IP)
Retry Logic: Retries failed webhook alerts once after 3 seconds

Files:

src/services/auditLogger.js - Audit logging and webhook alerting
src/services/riskScoring.js - Risk score calculation

5.5 Data Protection

Field-Level Encryption:

Algorithm: AES-256-GCM (authenticated encryption)
Fields Encrypted: Phone numbers (before storing in database)
Key Management: 32-byte key from ENCRYPTION_KEY (base64 encoded)
Backward Compatibility: Handles both encrypted and plaintext data during migration

Database Access Logging:

Optional Feature: Enabled with DB_ACCESS_LOGGING_ENABLED=true
Logs: All database queries with context (user ID, IP, user agent)
Use Case: Security auditing, compliance

Files:

src/utils/fieldEncryption.js - Field-level encryption
src/middleware/dbAccessLogger.js - Database access logging

5.6 Protection Against Attacks

Brute-Force / Enumeration:

Rate limiting at multiple levels (phone, IP, user)
Enumeration detection (tracks unique phone numbers per IP)
IP blocking for enumeration attempts (1 hour block)
Stricter rate limits when enumeration detected

Timing Attacks:

Constant-time OTP verification (always performs bcrypt.compare, uses dummy hash if OTP not found)
Timing protection wrappers for OTP request and verification flows
Minimum delay enforcement to prevent timing leaks

Man-in-the-Middle:

HTTPS enforcement via HSTS header (production)
Security headers (CSP, X-Frame-Options) prevent various MITM attacks
JWT token validation with signature verification

Token Replay:

Refresh token rotation (new token issued, old token revoked)
Reuse detection (if old token is used, all tokens for device are revoked)
Access token short expiry (15 minutes) limits replay window
Token versioning: Access tokens include token_version claim that is validated on each request. When user logs out from all devices, version is incremented, immediately invalidating all existing access tokens (even if not expired)

Files:

src/utils/timingProtection.js - Timing attack protection
src/utils/enumerationDetection.js - Enumeration detection
src/services/tokenService.js - Token rotation and reuse detection

6. Error Handling & Failure Modes

6.1 OTP Sending Failures

Behavior:

If Twilio is not configured: OTP is logged to console, request still succeeds
If Twilio fails: Error is logged, OTP is still generated and stored, request succeeds
Rationale: OTP generation should not fail if SMS delivery fails (user can check logs in development)

Error Response:

Success response returned even if SMS fails (for development/testing)
Production recommendation: Return error if SMS fails (uncomment error return in src/routes/authRoutes.js:213)

Files:

src/services/smsService.js - SMS sending with fallback logging

6.2 Database Failures

Behavior:

Connection pool errors: Logged, process exits (src/db.js:11-14)
Query errors: Propagated to route handler, return 500 error
No Retries: Database queries are not retried automatically (application-level retries can be added)

Error Response:

500 Internal Server Error with generic message: {error: 'Internal server error'}

Files:

src/db.js - Database connection and query wrapper

6.3 JWT Validation Errors

Behavior:

Invalid token format: 401 Unauthorized - {error: 'Invalid token format'}
Invalid/expired token: 401 Unauthorized - {error: 'Invalid or expired token'}
Invalid claims: 401 Unauthorized - {error: 'Invalid token claims'}
Missing Authorization header: 401 Unauthorized - {error: 'Missing Authorization header'}

Key Rotation:

If key ID not found: Tries all available keys (for rotation support)
If no key matches: Returns 401 Unauthorized

Files:

src/middleware/authMiddleware.js - JWT validation
src/services/tokenService.js - Refresh token validation

6.4 Rate Limit Exceeded

Behavior:

OTP request rate limit: 429 Too Many Requests - {success: false, message: 'Too many OTP requests...'}
OTP verify rate limit: 429 Too Many Requests - {success: false, message: 'Too many attempts...'}
User route rate limit: 429 Too Many Requests - {error: 'Too many requests', retry_after: seconds}
Admin route rate limit: 429 Too Many Requests - {error: 'Too many requests', retry_after: seconds}

Headers:

X-RateLimit-Limit: Maximum requests allowed
X-RateLimit-Remaining: Remaining requests in window
X-RateLimit-Reset: ISO timestamp when limit resets
X-RateLimit-Type: Type of rate limit (read/write/sensitive/admin)

Files:

src/middleware/rateLimitMiddleware.js - OTP rate limiting
src/middleware/userRateLimit.js - User route rate limiting
src/middleware/adminRateLimit.js - Admin rate limiting

6.5 Retries & Fallbacks

Redis Fallback:

If Redis unavailable: Falls back to in-memory store (per-process, not shared)
Rate limiting continues to work (with per-instance limits, not global)
Warning logged on first failure, then silent

Webhook Alerting:

If webhook fails: Retries once after 3 seconds
If retry fails: Error logged, but main request flow continues (non-blocking)

Files:

src/services/redisClient.js - Redis client with graceful fallback
src/services/auditLogger.js:334-516 - Webhook alerting with retry

7. Configuration & Environment Variables

7.1 Required Variables

Variable	Description	Example	Required
`DATABASE_URL`	PostgreSQL connection string	`postgres://user:pass@localhost:5432/dbname`	✅ Yes
`JWT_ACCESS_SECRET`	Secret for signing access tokens (min 32 chars)	`hex-string-32-chars-minimum`	✅ Yes
`JWT_REFRESH_SECRET`	Secret for signing refresh tokens (min 32 chars)	`hex-string-32-chars-minimum`	✅ Yes

7.2 Optional Variables - Timeouts & Expiry

Variable	Description	Default	Example
`JWT_ACCESS_TTL`	Access token expiry	`15m`	`15m`, `1h`
`JWT_REFRESH_TTL`	Refresh token expiry	`7d`	`7d`, `30d`
`REFRESH_MAX_IDLE_MINUTES`	Refresh token max idle time	`4320` (3 days)	`4320`
`OTP_TTL_SECONDS`	OTP validity in seconds	`120` (2 min)	`120`
`STEP_UP_OTP_WINDOW_MINUTES`	Step-up auth window	`5`	`5`

7.3 Optional Variables - Rate Limits

Variable	Description	Default	Example
`OTP_REQ_PHONE_10MIN_LIMIT`	Max OTP requests per phone (10 min)	`3`	`3`
`OTP_REQ_PHONE_DAY_LIMIT`	Max OTP requests per phone (24h)	`10`	`10`
`OTP_REQ_IP_10MIN_LIMIT`	Max OTP requests per IP (10 min)	`20`	`20`
`OTP_REQ_IP_DAY_LIMIT`	Max OTP requests per IP (24h)	`100`	`100`
`OTP_VERIFY_MAX_ATTEMPTS`	Max OTP verification attempts	`5`	`5`
`OTP_VERIFY_FAILED_PER_HOUR_LIMIT`	Max failed verifications per phone (1h)	`10`	`10`
`USER_RATE_LIMIT_READ_MAX`	Max read requests per user (15 min)	`100`	`100`
`USER_RATE_LIMIT_WRITE_MAX`	Max write requests per user (15 min)	`20`	`20`
`USER_RATE_LIMIT_SENSITIVE_MAX`	Max sensitive requests per user (1h)	`10`	`10`
`ADMIN_RATE_LIMIT_MAX`	Max admin requests per admin (15 min)	`100`	`100`

7.4 Optional Variables - Security Features

Variable	Description	Default	Example
`ENCRYPTION_ENABLED`	Enable field-level encryption	`false`	`true`
`ENCRYPTION_KEY`	32-byte encryption key (base64)	-	`base64-encoded-32-byte-key`
`DB_ACCESS_LOGGING_ENABLED`	Enable database access logging	`false`	`true`
`DB_ACCESS_LOG_LEVEL`	DB access log level ('all' or 'sensitive')	`sensitive`	`all`, `sensitive`
`CORS_ALLOWED_ORIGINS`	Comma-separated allowed origins	-	`https://app.example.com,https://api.example.com`
`ENUMERATION_MAX_PHONES_PER_IP_10MIN`	Max unique phones per IP (10 min)	`5`	`5`
`ENUMERATION_MAX_PHONES_PER_IP_HOUR`	Max unique phones per IP (1h)	`20`	`20`
`ENUMERATION_ALERT_THRESHOLD_10MIN`	Alert threshold for enumeration (10 min)	`10`	`10`
`ENUMERATION_ALERT_THRESHOLD_HOUR`	Alert threshold for enumeration (1h)	`50`	`50`
`OTP_REQUEST_MIN_DELAY`	Min delay for OTP requests (ms)	`500`	`500`
`OTP_VERIFY_MIN_DELAY`	Min delay for OTP verify (ms)	`300`	`300`
`TIMING_MAX_JITTER`	Max jitter for timing protection (ms)	`100`	`100`
`BLOCKED_IP_RANGES`	Comma-separated CIDR blocks	-	`10.0.0.0/8,172.16.0.0/12`
`REQUIRE_OTP_ON_SUSPICIOUS_REFRESH`	Require OTP on suspicious refresh	`false`	`true`
`SECURITY_ALERT_WEBHOOK_URL`	Webhook URL for security alerts	-	`https://hooks.slack.com/...`
`SECURITY_ALERT_MIN_LEVEL`	Minimum risk level for alerts	`HIGH_RISK`	`SUSPICIOUS`, `HIGH_RISK`

7.5 Optional Variables - JWT Key Rotation

Variable	Description	Default	Example
`JWT_ACTIVE_KEY_ID`	Key ID for signing new tokens	`1`	`1`, `2`
`JWT_KEYS_JSON`	JSON mapping key IDs to secrets	-	`{"1":"secret1","2":"secret2"}`
`JWT_REFRESH_KEY_ID`	Key ID for refresh tokens	Same as active	`1`
`JWT_ISSUER`	JWT issuer claim	`farm-auth-service`	`farm-auth-service`
`JWT_AUDIENCE`	JWT audience claim	`mobile-app`	`mobile-app`

7.6 Optional Variables - External Services

Variable	Description	Default	Example
`TWILIO_ACCOUNT_SID`	Twilio account SID	-	`ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
`TWILIO_AUTH_TOKEN`	Twilio auth token	-	`your_auth_token`
`TWILIO_MESSAGING_SERVICE_SID`	Twilio messaging service SID	-	`MGxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
`TWILIO_FROM_NUMBER`	Twilio phone number (E.164)	-	`+1234567890`
`REDIS_URL`	Redis connection URL	-	`redis://localhost:6379`
`REDIS_HOST`	Redis host	`localhost`	`localhost`
`REDIS_PORT`	Redis port	`6379`	`6379`
`REDIS_PASSWORD`	Redis password	-	`password`

7.7 Optional Variables - Server Configuration

Variable	Description	Default	Example
`PORT`	Server port	`3000`	`3000`
`NODE_ENV`	Environment	-	`development`, `production`
`TRUST_PROXY`	Trust proxy headers	`false`	`true`
`ENABLE_ADMIN_DASHBOARD`	Enable admin routes	`false`	`true`

8. Future Improvements / Notes

8.1 Planned Improvements (from TODOs in code)

Secrets Manager Integration
- Load JWT keys from AWS Secrets Manager / HashiCorp Vault (instead of environment variables)
- Load encryption keys from secrets manager
- File: src/services/jwtKeys.js:161-174 (TODO comment)
Automated Key Rotation
- Implement automated JWT key rotation without downtime
- Re-encrypt existing data when encryption keys are rotated
- File: src/services/jwtKeys.js (key rotation support exists, but automation needed)
SIEM Integration
- Integrate with SIEM systems (Splunk, ELK, etc.) for centralized log aggregation
- Export audit logs to SIEM for advanced threat detection
- File: src/services/auditLogger.js (webhook exists, but SIEM integration needed)
CSP Nonces
- Fully implement CSP nonces for inline scripts/styles (currently allows unsafe-inline for compatibility)
- File: src/middleware/securityHeaders.js:28-29 (nonce support exists but not fully utilized)
Database Connection Pooling Tuning
- Add configuration for connection pool size, timeout, etc.
- File: src/db.js (basic pool, no tuning options)
Rate Limiting Improvements
- Implement distributed rate limiting (currently per-instance if Redis unavailable)
- Add rate limit headers to all rate-limited endpoints
- File: src/middleware/rateLimitMiddleware.js (Redis fallback exists, but distributed limiting needed)
OTP Delivery Alternatives
- Support multiple SMS providers (fallback if Twilio fails)
- Support email OTP delivery
- Support push notification OTP delivery
- File: src/services/smsService.js (only Twilio supported)
Advanced Risk Scoring
- Machine learning-based risk scoring
- Geographic anomaly detection (unusual locations)
- Device fingerprinting improvements
- File: src/services/riskScoring.js (basic scoring exists)

8.2 Potential Risks & Technical Debt

In-Memory Rate Limiting
- If Redis is unavailable, rate limiting uses in-memory store (per-instance, not shared)
- Risk: Rate limits are per-instance, not global (can be bypassed with multiple instances)
- Mitigation: Always use Redis in production, or implement distributed rate limiting
OTP Storage
- OTPs are stored in database (not just Redis)
- Risk: Database can become a bottleneck for high-volume OTP requests
- Mitigation: Consider moving OTP storage to Redis entirely (with DB backup for audit)
Phone Number Encryption Migration
- Handles both encrypted and plaintext phone numbers (backward compatibility)
- Risk: Plaintext phone numbers still in database if encryption was enabled after data existed
- Mitigation: Implement migration script to encrypt all existing phone numbers
Webhook Alerting
- Webhook failures are logged but don't block requests
- Risk: Security alerts might be missed if webhook is down
- Mitigation: Implement alert queue (Redis/RabbitMQ) with retry logic and dead-letter queue
Database Access Logging
- Database access logging is optional and can impact performance
- Risk: Performance degradation if enabled in high-traffic scenarios
- Mitigation: Use async logging, batch writes, or separate logging database
JWT Key Rotation
- Key rotation support exists, but manual process
- Risk: Manual key rotation can cause downtime if not done correctly
- Mitigation: Implement automated key rotation with gradual rollout
CORS Configuration
- CORS validation at startup, but runtime checks are warnings only
- Risk: Misconfiguration might not be caught until runtime
- Mitigation: Add stricter runtime validation or fail-fast on suspicious patterns
Error Messages
- Some error messages are generic to prevent information leakage
- Risk: Generic errors can make debugging difficult
- Mitigation: Log detailed errors server-side, return generic errors to clients

Appendix: Database Schema

Key Tables

users

id (UUID, PK)
phone_number (VARCHAR(20), UNIQUE, encrypted if ENCRYPTION_ENABLED)
name (VARCHAR(255))
role (enum: 'user', 'admin', 'moderator')
user_type (enum: 'seller', 'buyer', 'service_provider')
token_version (INT, DEFAULT 1) - Incremented on logout-all-devices to invalidate all access tokens
created_at, updated_at, last_login_at

otp_codes

id (UUID, PK)
phone_number (VARCHAR(20), encrypted if ENCRYPTION_ENABLED)
otp_hash (VARCHAR(255), bcrypt hash)
expires_at (TIMESTAMPTZ)
attempt_count (INT)
created_at (TIMESTAMPTZ)

refresh_tokens

id (UUID, PK)
user_id (UUID, FK)
token_id (UUID, UNIQUE)
token_hash (VARCHAR(255), bcrypt hash)
device_id (VARCHAR(255))
user_agent (TEXT)
ip_address (VARCHAR(45))
expires_at (TIMESTAMPTZ)
last_used_at (TIMESTAMPTZ)
revoked_at (TIMESTAMPTZ, NULL = active)
reuse_detected_at (TIMESTAMPTZ)
rotated_from_id (UUID, FK to refresh_tokens)

user_devices

id (UUID, PK)
user_id (UUID, FK)
device_identifier (TEXT)
device_platform (TEXT)
device_model (TEXT)
os_version (TEXT)
app_version (TEXT)
language_code (TEXT)
timezone (TEXT)
first_seen_at (TIMESTAMPTZ)
last_seen_at (TIMESTAMPTZ)
is_active (BOOLEAN)
UNIQUE (user_id, device_identifier)

auth_audit

id (UUID, PK)
user_id (UUID, FK, nullable)
action (VARCHAR(100))
status (VARCHAR(50))
risk_level (VARCHAR(20): 'INFO', 'SUSPICIOUS', 'HIGH_RISK')
ip_address (VARCHAR(45))
user_agent (TEXT)
device_id (VARCHAR(255))
meta (JSONB)
created_at (TIMESTAMPTZ)

Document Version

Version: 1.0
Last Updated: 2024
Author: Architecture Documentation Generator
Maintained By: Development Team

43 KiB Raw Blame History

Farm Auth Service - Architecture Documentation

1. High-Level Overview

2. Architecture & Components

2.1 HTTP/API Layer

2.2 Authentication Core

2.3 Security Layer

2.4 Persistence Layer

2.5 Integration Layer

3. Request Flows

3.1 OTP Login Flow

3.2 Token Refresh Flow

3.3 Logout Flow

3.4 Admin Security Events Flow

4. Timeouts, Expiry, and Limits

5. Security Features

5.1 CORS Behavior

5.2 Security Headers

5.3 Authentication & Authorization

5.4 Audit Logging

5.5 Data Protection

5.6 Protection Against Attacks

6. Error Handling & Failure Modes

6.1 OTP Sending Failures

6.2 Database Failures

6.3 JWT Validation Errors

6.4 Rate Limit Exceeded

6.5 Retries & Fallbacks

7. Configuration & Environment Variables

7.1 Required Variables

7.2 Optional Variables - Timeouts & Expiry

7.3 Optional Variables - Rate Limits

7.4 Optional Variables - Security Features

7.5 Optional Variables - JWT Key Rotation

7.6 Optional Variables - External Services

7.7 Optional Variables - Server Configuration

8. Future Improvements / Notes

8.1 Planned Improvements (from TODOs in code)

8.2 Potential Risks & Technical Debt

Appendix: Database Schema

Key Tables

Document Version

43 KiB

Raw Blame History