43 KiB
Farm Auth Service - Architecture Documentation
1. High-Level Overview
The Farm Auth Service is a Node.js + Express authentication and security service that provides phone-based authentication using OTP (One-Time Password) via SMS, JWT-based access and refresh tokens, comprehensive rate limiting, security hardening, and audit logging. The service is designed for a mobile application ecosystem where users authenticate using their phone numbers.
Core Functionality:
- Phone number-based authentication with OTP verification via SMS (Twilio)
- JWT access tokens (short-lived) and refresh tokens (long-lived) with rotation
- Device tracking and multi-device session management
- Comprehensive rate limiting at multiple levels (phone, IP, user)
- Security hardening: CORS validation, security headers, field-level encryption, timing attack protection, enumeration detection
- Audit logging with risk scoring and webhook alerting
- Admin dashboard for security event monitoring
External Systems:
- PostgreSQL Database: Stores users, OTP codes, refresh tokens, devices, and audit logs
- Redis (optional): Used for rate limiting counters and OTP tracking (falls back to in-memory store)
- Twilio: SMS provider for OTP delivery (optional - service works without it for development)
- Webhook Endpoints: For security alerts (Slack, Discord, or custom webhooks)
2. Architecture & Components
2.1 HTTP/API Layer
Files:
src/index.js- Express server setup and middleware configurationsrc/routes/authRoutes.js- Authentication endpointssrc/routes/userRoutes.js- User profile and device management endpointssrc/routes/adminRoutes.js- Admin security dashboard endpoints
Responsibilities:
- Request routing and middleware orchestration
- Input validation and sanitization
- Response formatting
- Error handling
Middleware Order (Critical):
- Trust proxy configuration (if behind reverse proxy)
- CORS validation (startup and runtime)
- JSON body parser
- Security headers (global)
- Route-specific middleware (validation, rate limiting, auth)
Key Configuration:
TRUST_PROXY: Set to'true'if behind reverse proxy (nginx, load balancer)CORS_ALLOWED_ORIGINS: Comma-separated list of allowed origins (required in production)ENABLE_ADMIN_DASHBOARD: Set to'true'to enable admin routes
2.2 Authentication Core
Files:
src/services/otpService.js- OTP generation, hashing (bcrypt), storage, and verificationsrc/services/tokenService.js- JWT access/refresh token issuance, rotation, and validationsrc/services/jwtKeys.js- JWT key management with rotation supportsrc/middleware/authMiddleware.js- JWT access token validationsrc/middleware/stepUpAuth.js- Step-up authentication for sensitive operations
Responsibilities:
- OTP generation (6-digit random codes)
- OTP hashing with bcrypt (10 rounds)
- OTP storage in database with expiry and attempt tracking
- JWT token signing with key rotation support
- Refresh token rotation and reuse detection
- Device fingerprinting and tracking
Key Features:
- OTP Security: Hashed with bcrypt, constant-time verification to prevent timing attacks
- Token Rotation: Refresh tokens rotate on each use, old tokens are revoked
- Reuse Detection: Detects if a refresh token is reused (theft indicator)
- Step-Up Auth: Requires recent OTP verification for sensitive operations
2.3 Security Layer
Files:
src/middleware/rateLimitMiddleware.js- OTP request/verification rate limitingsrc/middleware/userRateLimit.js- User route rate limiting (read/write/sensitive)src/middleware/adminRateLimit.js- Admin route rate limitingsrc/middleware/securityHeaders.js- Security headers (CSP, HSTS, X-Frame-Options, etc.)src/utils/corsValidator.js- CORS configuration validationsrc/utils/timingProtection.js- Timing attack protection for OTP flowssrc/utils/enumerationDetection.js- Phone number enumeration detectionsrc/services/riskScoring.js- Risk scoring for login/refresh attemptssrc/middleware/validation.js- Input validation middleware
Responsibilities:
- Rate limiting at multiple levels (phone, IP, user, admin)
- Security headers enforcement
- CORS origin validation (startup and runtime)
- Timing attack mitigation (constant-time OTP verification)
- Enumeration detection and IP blocking
- Risk scoring based on IP/device changes
- Input validation and sanitization
Key Features:
- Multi-Level Rate Limiting: Phone-based, IP-based, and user-based limits
- Enumeration Protection: Detects and blocks IPs attempting phone number enumeration
- Timing Attack Protection: All OTP operations use constant-time execution
- Risk Scoring: Calculates risk scores for suspicious login/refresh attempts
2.4 Persistence Layer
Files:
src/db.js- PostgreSQL connection pool and query wrappersrc/middleware/dbAccessLogger.js- Optional database access loggingsrc/utils/fieldEncryption.js- Field-level encryption for PII (phone numbers)src/utils/encryptedPhoneSearch.js- Phone number search with encryption support
Database Tables:
users- User accounts (phone number, name, role, user_type)otp_codes- OTP codes (hashed, with expiry and attempt tracking)refresh_tokens- Refresh tokens (hashed, with rotation tracking)user_devices- Device tracking (platform, model, OS, app version)auth_audit- Security audit logs (all authentication events)
Responsibilities:
- Database connection management
- Query execution with optional logging
- Field-level encryption for sensitive data (phone numbers)
- Database schema management (auto-creates tables if missing)
Key Features:
- Field-Level Encryption: Phone numbers encrypted at rest (AES-256-GCM)
- Database Access Logging: Optional logging of all DB queries (for security auditing)
- Backward Compatibility: Handles both encrypted and plaintext phone numbers during migration
2.5 Integration Layer
Files:
src/services/smsService.js- Twilio SMS integrationsrc/services/auditLogger.js- Audit logging with webhook alertingsrc/services/redisClient.js- Redis client with graceful fallback
Responsibilities:
- SMS delivery via Twilio (with fallback logging)
- Security event logging to database
- Webhook alerting for high-risk events
- Redis connection management (optional, falls back to in-memory)
Key Features:
- Twilio Integration: Sends OTP via SMS (optional - works without for development)
- Webhook Alerting: Sends alerts to Slack/Discord/custom webhooks for SUSPICIOUS/HIGH_RISK events
- Redis Fallback: Gracefully falls back to in-memory store if Redis unavailable
3. Request Flows
3.1 OTP Login Flow
Step-by-Step:
-
Client requests OTP (
POST /auth/request-otp)- Input validation (phone number format)
- Check for active OTP (2-minute no-resend rule)
- Rate limit by phone number (3 per 10 min, 10 per day)
- Rate limit by IP address (20 per 10 min, 100 per day)
- Check if IP is blocked (enumeration or CIDR ranges)
- Enumeration detection (if suspicious, apply stricter limits)
- Timing protection wrapper (constant-time execution)
- Normalize phone number (E.164 format)
- Generate 6-digit OTP code
- Hash OTP with bcrypt (10 rounds)
- Encrypt phone number (if encryption enabled)
- Store OTP in database (delete old OTPs for same phone)
- Mark OTP as active in Redis/memory (2-minute TTL)
- Send SMS via Twilio (or log to console if not configured)
- Log audit event (otp_request, INFO risk level)
- Return success (even if SMS fails - OTP is generated)
-
Client verifies OTP (
POST /auth/verify-otp)- Input validation (phone number, 6-digit code, device_id, device_info)
- Rate limit failed verifications (10 per hour per phone)
- Check if IP is blocked
- Timing protection wrapper (constant-time execution)
- Normalize phone number
- Encrypt phone number for search
- Query OTP from database (with constant-time dummy hash if not found)
- Check expiry, max attempts, and verify code (all with constant-time bcrypt.compare)
- If invalid: increment attempt count, log suspicious event, return generic error
- If valid: delete OTP, find or create user, decrypt phone number
- Update user last_login_at
- Upsert device record (track platform, model, OS, app version)
- Calculate risk score (IP change, device change, user agent change)
- Log audit event (login, risk level based on score)
- Check for anomalies (multiple failed attempts, high-risk IPs)
- Issue access token (with high_assurance flag) and refresh token
- Return user data, tokens, and device info
Mermaid Sequence Diagram:
sequenceDiagram
participant Client
participant API
participant RateLimiter
participant OTPService
participant DB
participant Twilio
participant AuditLogger
Client->>API: POST /auth/request-otp<br/>{phone_number}
API->>API: Validate input
API->>RateLimiter: Check active OTP (2-min rule)
RateLimiter-->>API: No active OTP
API->>RateLimiter: Rate limit by phone (3/10min)
RateLimiter-->>API: Allowed
API->>RateLimiter: Rate limit by IP (20/10min)
RateLimiter-->>API: Allowed
API->>API: Check IP blocking
API->>OTPService: Generate OTP
OTPService->>DB: Store hashed OTP
OTPService->>RateLimiter: Mark active (2-min TTL)
API->>Twilio: Send SMS
Twilio-->>API: SMS sent (or error)
API->>AuditLogger: Log otp_request event
API-->>Client: {ok: true}
Client->>API: POST /auth/verify-otp<br/>{phone_number, code, device_id}
API->>API: Validate input
API->>RateLimiter: Check failed attempts (10/hour)
RateLimiter-->>API: Allowed
API->>OTPService: Verify OTP (constant-time)
OTPService->>DB: Query OTP (with dummy hash if not found)
OTPService->>OTPService: bcrypt.compare (constant-time)
alt OTP Valid
OTPService->>DB: Delete OTP
API->>DB: Find or create user
API->>DB: Upsert device
API->>API: Calculate risk score
API->>AuditLogger: Log login (with risk level)
API->>API: Issue access + refresh tokens
API-->>Client: {user, access_token, refresh_token}
else OTP Invalid
OTPService->>DB: Increment attempt count
API->>AuditLogger: Log suspicious attempt
API-->>Client: {error: "OTP invalid or expired"}
end
3.2 Token Refresh Flow
Step-by-Step:
- Client requests token refresh (
POST /auth/refresh)- Input validation (refresh_token)
- Check if IP is blocked
- Decode refresh token to get key ID
- Verify refresh token signature (try all keys if key ID not found)
- Validate JWT claims (iss, aud, exp, iat)
- Query refresh token from database (by token_id)
- Verify token hash matches (bcrypt.compare)
- Check if token is revoked or expired
- Check refresh token idle timeout (max idle minutes)
- Calculate risk score (IP change, device change, user agent change)
- If suspicious: log suspicious refresh event
- If suspicious and REQUIRE_OTP_ON_SUSPICIOUS_REFRESH: return step_up_required error
- Update token last_used_at
- Revoke old refresh token
- Issue new access token and new refresh token (rotation)
- Update device last_seen_at
- Log audit event (token_refresh, risk level based on score)
- Return new tokens
Mermaid Sequence Diagram:
sequenceDiagram
participant Client
participant API
participant TokenService
participant JWTKeys
participant DB
participant RiskScoring
participant AuditLogger
Client->>API: POST /auth/refresh<br/>{refresh_token}
API->>API: Validate input
API->>API: Check IP blocking
API->>TokenService: Verify refresh token
TokenService->>JWTKeys: Get key secret (by key ID)
JWTKeys-->>TokenService: Key secret
TokenService->>TokenService: Verify JWT signature
TokenService->>TokenService: Validate claims (iss, aud, exp)
TokenService->>DB: Query refresh token (by token_id)
DB-->>TokenService: Token record
TokenService->>TokenService: Verify token hash (bcrypt)
alt Token Valid
TokenService->>TokenService: Check expiry & idle timeout
API->>RiskScoring: Calculate risk score
RiskScoring->>DB: Get previous auth info
RiskScoring-->>API: Risk score & reasons
alt Suspicious Refresh
API->>AuditLogger: Log suspicious refresh
alt Require OTP
API-->>Client: {error: "step_up_required"}
else Allow with Risk
API->>TokenService: Rotate refresh token
TokenService->>DB: Revoke old token
TokenService->>DB: Store new token
API->>AuditLogger: Log refresh (SUSPICIOUS/HIGH_RISK)
API-->>Client: {access_token, refresh_token}
end
else Normal Refresh
API->>TokenService: Rotate refresh token
TokenService->>DB: Revoke old token
TokenService->>DB: Store new token
API->>DB: Update device last_seen_at
API->>AuditLogger: Log refresh (INFO)
API-->>Client: {access_token, refresh_token}
end
else Token Invalid
API-->>Client: {error: "Invalid refresh token"}
end
3.3 Logout Flow
Step-by-Step:
-
Single-device logout (
POST /auth/logout)- Input validation (refresh_token)
- Verify refresh token (same as refresh flow)
- If token invalid/already revoked: return success (idempotent)
- Revoke all refresh tokens for user + device
- Log audit event (logout, INFO)
- Return success
-
Logout all other devices (
POST /users/me/logout-all-other-devices)- Requires authentication (access token)
- Requires step-up auth (recent OTP or high_assurance token)
- Rate limited (10 per hour per user)
- Get current device_id from header or body
- Mark all other devices as inactive
- Revoke refresh tokens for all other devices
- Log audit event (logout_all_other_devices, INFO)
- Return count of revoked devices
-
Logout from all devices (
POST /users/me/logout-all-devices)- Requires authentication (access token)
- Requires step-up auth (recent OTP or high_assurance token)
- Rate limited (10 per hour per user)
- Revoke all refresh tokens for the user (all devices)
- Mark all devices as inactive
- Increment user's
token_versionto invalidate all existing access tokens - Log audit event (logout_all_devices, HIGH_RISK) - triggers security alert
- Return success with revoked tokens count
- Security Note: This is a critical security operation used when account compromise is suspected. All existing access tokens become invalid immediately, even if they haven't expired yet.
-
Revoke specific device (
DELETE /users/me/devices/:device_id)- Requires authentication (access token)
- Requires step-up auth (recent OTP or high_assurance token)
- Rate limited (10 per hour per user)
- Validate device_id parameter
- Mark device as inactive
- Revoke refresh tokens for device
- Log audit event (device_revoked, INFO)
- Return success
Mermaid Sequence Diagram:
sequenceDiagram
participant Client
participant API
participant TokenService
participant DB
participant AuditLogger
Note over Client,AuditLogger: Single Device Logout
Client->>API: POST /auth/logout<br/>{refresh_token}
API->>TokenService: Verify refresh token
TokenService-->>API: Token info
API->>TokenService: Revoke refresh token
TokenService->>DB: Mark token revoked
API->>AuditLogger: Log logout event
API-->>Client: {ok: true}
Note over Client,AuditLogger: Logout All Other Devices
Client->>API: POST /users/me/logout-all-other-devices<br/>{current_device_id}
API->>API: Verify access token
API->>API: Check step-up auth
API->>API: Rate limit check (10/hour)
API->>DB: Mark other devices inactive
API->>TokenService: Revoke tokens for other devices
TokenService->>DB: Revoke tokens
API->>AuditLogger: Log logout_all_other_devices
API-->>Client: {ok: true, revoked_devices_count: N}
Note over Client,AuditLogger: Logout All Devices (Global Logout)
Client->>API: POST /users/me/logout-all-devices
API->>API: Verify access token
API->>API: Check step-up auth
API->>API: Rate limit check (10/hour)
API->>TokenService: Revoke all user tokens
TokenService->>DB: Revoke all refresh tokens
TokenService->>DB: Mark all devices inactive
TokenService->>DB: Increment token_version
API->>AuditLogger: Log logout_all_devices (HIGH_RISK)
AuditLogger->>AuditLogger: Trigger security alert
API-->>Client: {ok: true, revoked_tokens_count: N}
3.4 Admin Security Events Flow
Step-by-Step:
- Admin requests security events (
GET /admin/security-events)- Requires authentication (access token)
- Requires admin role (security_admin)
- Rate limited (100 per 15 minutes per admin)
- Validate and sanitize query parameters (risk_level, limit, offset, search)
- Build parameterized SQL query (prevent injection)
- Query auth_audit table with filters
- Mask phone numbers (keep last 4 digits)
- Sanitize all output fields
- Get total count for pagination
- Get statistics (last 24 hours: total, high_risk, suspicious, info)
- Log admin access event (admin_view_security_events, INFO)
- Return events, pagination info, and statistics
Mermaid Sequence Diagram:
sequenceDiagram
participant Admin
participant API
participant AuthMiddleware
participant AdminAuth
participant AdminRateLimit
participant DB
participant AuditLogger
Admin->>API: GET /admin/security-events<br/>?risk_level=HIGH_RISK&limit=200
API->>AuthMiddleware: Verify access token
AuthMiddleware-->>API: User info
API->>AdminAuth: Check admin role
AdminAuth-->>API: Authorized
API->>AdminRateLimit: Check rate limit (100/15min)
AdminRateLimit-->>API: Allowed
API->>API: Sanitize query params
API->>DB: Query auth_audit (parameterized)
DB-->>API: Events data
API->>API: Mask phone numbers
API->>API: Sanitize output
API->>DB: Get total count
API->>DB: Get statistics (24h)
API->>AuditLogger: Log admin access
API-->>Admin: {events, pagination, stats}
4. Timeouts, Expiry, and Limits
| Name | ENV Variable / Config | Default Value | Defined In | What It Affects |
|---|---|---|---|---|
| OTP Expiry | OTP_TTL_SECONDS |
120 (2 minutes) |
src/services/otpService.js:10 |
OTP validity period |
| OTP Resend Throttle | (hardcoded) | 120 seconds |
src/middleware/rateLimitMiddleware.js:154 |
Minimum time between OTP requests for same phone |
| Max OTP Verification Attempts | OTP_VERIFY_MAX_ATTEMPTS |
5 |
src/services/otpService.js:12 |
Maximum attempts to verify an OTP before it's invalidated |
| JWT Access Token Expiry | JWT_ACCESS_TTL |
'15m' (15 minutes) |
src/config.js:72 |
Access token lifetime |
| JWT Refresh Token Expiry | JWT_REFRESH_TTL |
'7d' (7 days) |
src/config.js:73 |
Refresh token lifetime |
| Refresh Token Max Idle | REFRESH_MAX_IDLE_MINUTES |
4320 (3 days) |
src/config.js:58-60 |
Maximum idle time before refresh token expires |
| Step-Up Auth Window | STEP_UP_OTP_WINDOW_MINUTES |
5 minutes |
src/middleware/stepUpAuth.js:26 |
Time window for "recent" OTP verification for step-up auth |
| OTP Request - Phone (10 min) | OTP_REQ_PHONE_10MIN_LIMIT |
3 |
src/middleware/rateLimitMiddleware.js:24 |
Max OTP requests per phone per 10 minutes |
| OTP Request - Phone (24h) | OTP_REQ_PHONE_DAY_LIMIT |
10 |
src/middleware/rateLimitMiddleware.js:25 |
Max OTP requests per phone per 24 hours |
| OTP Request - IP (10 min) | OTP_REQ_IP_10MIN_LIMIT |
20 |
src/middleware/rateLimitMiddleware.js:26 |
Max OTP requests per IP per 10 minutes |
| OTP Request - IP (24h) | OTP_REQ_IP_DAY_LIMIT |
100 |
src/middleware/rateLimitMiddleware.js:27 |
Max OTP requests per IP per 24 hours |
| OTP Verify Failed (1h) | OTP_VERIFY_FAILED_PER_HOUR_LIMIT |
10 |
src/middleware/rateLimitMiddleware.js:31 |
Max failed verification attempts per phone per hour |
| Enumeration IP Block Duration | ENUMERATION_BLOCK_DURATION |
3600 (1 hour) |
src/middleware/rateLimitMiddleware.js:40 |
Duration IP is blocked after enumeration detection |
| User Rate Limit - Read | USER_RATE_LIMIT_READ_MAX |
100 |
src/middleware/userRateLimit.js:25 |
Max read requests per user per 15 minutes |
| User Rate Limit - Read Window | USER_RATE_LIMIT_READ_WINDOW |
900 (15 min) |
src/middleware/userRateLimit.js:26 |
Time window for read rate limit |
| User Rate Limit - Write | USER_RATE_LIMIT_WRITE_MAX |
20 |
src/middleware/userRateLimit.js:29 |
Max write requests per user per 15 minutes |
| User Rate Limit - Write Window | USER_RATE_LIMIT_WRITE_WINDOW |
900 (15 min) |
src/middleware/userRateLimit.js:30 |
Time window for write rate limit |
| User Rate Limit - Sensitive | USER_RATE_LIMIT_SENSITIVE_MAX |
10 |
src/middleware/userRateLimit.js:33 |
Max sensitive requests per user per hour |
| User Rate Limit - Sensitive Window | USER_RATE_LIMIT_SENSITIVE_WINDOW |
3600 (1 hour) |
src/middleware/userRateLimit.js:34 |
Time window for sensitive rate limit |
| Admin Rate Limit | ADMIN_RATE_LIMIT_MAX |
100 |
src/middleware/adminRateLimit.js:23 |
Max admin requests per admin per 15 minutes |
| Admin Rate Limit Window | ADMIN_RATE_LIMIT_WINDOW |
900 (15 min) |
src/middleware/adminRateLimit.js:24 |
Time window for admin rate limit |
| Twilio HTTP Timeout | (hardcoded) | 5000 ms |
src/services/auditLogger.js:459 |
Webhook request timeout (also used for Twilio if configured) |
| Webhook Retry Delay | (hardcoded) | 3000 ms |
src/services/auditLogger.js:498 |
Delay before retrying failed webhook alerts |
| OTP Request Min Delay | OTP_REQUEST_MIN_DELAY |
500 ms |
src/utils/timingProtection.js:26 |
Minimum delay for OTP requests (timing attack protection) |
| OTP Verify Min Delay | OTP_VERIFY_MIN_DELAY |
300 ms |
src/utils/timingProtection.js:30 |
Minimum delay for OTP verification (timing attack protection) |
| Timing Max Jitter | TIMING_MAX_JITTER |
100 ms |
src/utils/timingProtection.js:34 |
Maximum random jitter added to delays |
| Enumeration Max Phones/IP (10min) | ENUMERATION_MAX_PHONES_PER_IP_10MIN |
5 |
src/utils/enumerationDetection.js:32 |
Max unique phone numbers per IP in 10 minutes |
| Enumeration Max Phones/IP (1h) | ENUMERATION_MAX_PHONES_PER_IP_HOUR |
20 |
src/utils/enumerationDetection.js:33 |
Max unique phone numbers per IP in 1 hour |
| Enumeration Alert Threshold (10min) | ENUMERATION_ALERT_THRESHOLD_10MIN |
10 |
src/utils/enumerationDetection.js:40 |
Unique phones threshold for alert (10 min) |
| Enumeration Alert Threshold (1h) | ENUMERATION_ALERT_THRESHOLD_HOUR |
50 |
src/utils/enumerationDetection.js:41 |
Unique phones threshold for alert (1 hour) |
5. Security Features
5.1 CORS Behavior
Configuration:
- Startup Validation: CORS configuration is validated at startup (
src/index.js:29-34) - Runtime Monitoring: Runtime CORS checks log warnings for suspicious patterns (
src/index.js:58-63) - Origin Whitelisting: Only explicitly configured origins are allowed (never wildcard
*when credentials are involved) - No-Origin Requests: Requests without origin (mobile apps, Postman) are allowed
Implementation:
CORS_ALLOWED_ORIGINS: Comma-separated list of allowed origins (required in production)- Development mode: Allows all origins if no origins configured (with warning)
- Production mode: Throws error if
CORS_ALLOWED_ORIGINSis empty
Files:
src/index.js:36-86- CORS middleware configurationsrc/utils/corsValidator.js- CORS validation utilities
5.2 Security Headers
Headers Set Globally:
X-Frame-Options: DENY- Prevents clickjackingX-Content-Type-Options: nosniff- Prevents MIME type sniffingX-XSS-Protection: 1; mode=block- Enables XSS filter (legacy browsers)Strict-Transport-Security- HSTS (only in production, max-age=31536000, includeSubDomains, preload)Content-Security-Policy- CSP with nonce support for inline scripts/stylesReferrer-Policy: strict-origin-when-cross-origin- Controls referrer informationPermissions-Policy- Restricts browser features (geolocation, microphone, camera, etc.)
Files:
src/middleware/securityHeaders.js- Security headers middleware
5.3 Authentication & Authorization
Authentication:
- OTP-Based: Phone number + 6-digit OTP code
- JWT Access Tokens: Short-lived (15 minutes), signed with HS256, include
token_versionclaim - JWT Refresh Tokens: Long-lived (7 days), stored hashed in database, rotated on each use
- Device Tracking: Tracks device identifier, platform, model, OS version, app version
- Token Versioning: Access tokens include
token_versionclaim that is validated against user's current version in database. When user logs out from all devices,token_versionis incremented, invalidating all existing access tokens immediately.
Authorization:
- Role-Based: Admin routes require
role === 'security_admin' - Step-Up Auth: Sensitive operations require recent OTP verification or
high_assurancetoken flag - Token Claims: Validates
iss(issuer),aud(audience),exp(expiration),iat(issued at),token_version(for access token invalidation)
Files:
src/middleware/authMiddleware.js- Access token validationsrc/middleware/adminAuth.js- Admin role checksrc/middleware/stepUpAuth.js- Step-up authentication
5.4 Audit Logging
Events Logged:
otp_request- OTP request (success/failed)otp_verify- OTP verification (success/failed)login- User login (success/blocked)token_refresh- Token refresh (success, with risk level)logout- User logoutdevice_revoked- Device revocationlogout_all_other_devices- Logout all other deviceslogout_all_devices- Logout from all devices (HIGH_RISK, triggers security alert)admin_view_security_events- Admin access to security dashboard
Risk Levels:
INFO- Normal operationsSUSPICIOUS- Unusual patterns (IP change, device change, multiple failures)HIGH_RISK- Blocked IPs, high risk scores (>=50), enumeration attempts
Alerting:
- Webhook Integration: Sends alerts to
SECURITY_ALERT_WEBHOOK_URLfor SUSPICIOUS/HIGH_RISK events - Anomaly Detection: Detects patterns (multiple failed OTPs, multiple high-risk events from same IP)
- Retry Logic: Retries failed webhook alerts once after 3 seconds
Files:
src/services/auditLogger.js- Audit logging and webhook alertingsrc/services/riskScoring.js- Risk score calculation
5.5 Data Protection
Field-Level Encryption:
- Algorithm: AES-256-GCM (authenticated encryption)
- Fields Encrypted: Phone numbers (before storing in database)
- Key Management: 32-byte key from
ENCRYPTION_KEY(base64 encoded) - Backward Compatibility: Handles both encrypted and plaintext data during migration
Database Access Logging:
- Optional Feature: Enabled with
DB_ACCESS_LOGGING_ENABLED=true - Logs: All database queries with context (user ID, IP, user agent)
- Use Case: Security auditing, compliance
Files:
src/utils/fieldEncryption.js- Field-level encryptionsrc/middleware/dbAccessLogger.js- Database access logging
5.6 Protection Against Attacks
Brute-Force / Enumeration:
- Rate limiting at multiple levels (phone, IP, user)
- Enumeration detection (tracks unique phone numbers per IP)
- IP blocking for enumeration attempts (1 hour block)
- Stricter rate limits when enumeration detected
Timing Attacks:
- Constant-time OTP verification (always performs bcrypt.compare, uses dummy hash if OTP not found)
- Timing protection wrappers for OTP request and verification flows
- Minimum delay enforcement to prevent timing leaks
Man-in-the-Middle:
- HTTPS enforcement via HSTS header (production)
- Security headers (CSP, X-Frame-Options) prevent various MITM attacks
- JWT token validation with signature verification
Token Replay:
- Refresh token rotation (new token issued, old token revoked)
- Reuse detection (if old token is used, all tokens for device are revoked)
- Access token short expiry (15 minutes) limits replay window
- Token versioning: Access tokens include
token_versionclaim that is validated on each request. When user logs out from all devices, version is incremented, immediately invalidating all existing access tokens (even if not expired)
Files:
src/utils/timingProtection.js- Timing attack protectionsrc/utils/enumerationDetection.js- Enumeration detectionsrc/services/tokenService.js- Token rotation and reuse detection
6. Error Handling & Failure Modes
6.1 OTP Sending Failures
Behavior:
- If Twilio is not configured: OTP is logged to console, request still succeeds
- If Twilio fails: Error is logged, OTP is still generated and stored, request succeeds
- Rationale: OTP generation should not fail if SMS delivery fails (user can check logs in development)
Error Response:
- Success response returned even if SMS fails (for development/testing)
- Production recommendation: Return error if SMS fails (uncomment error return in
src/routes/authRoutes.js:213)
Files:
src/services/smsService.js- SMS sending with fallback logging
6.2 Database Failures
Behavior:
- Connection pool errors: Logged, process exits (
src/db.js:11-14) - Query errors: Propagated to route handler, return 500 error
- No Retries: Database queries are not retried automatically (application-level retries can be added)
Error Response:
500 Internal Server Errorwith generic message:{error: 'Internal server error'}
Files:
src/db.js- Database connection and query wrapper
6.3 JWT Validation Errors
Behavior:
- Invalid token format:
401 Unauthorized-{error: 'Invalid token format'} - Invalid/expired token:
401 Unauthorized-{error: 'Invalid or expired token'} - Invalid claims:
401 Unauthorized-{error: 'Invalid token claims'} - Missing Authorization header:
401 Unauthorized-{error: 'Missing Authorization header'}
Key Rotation:
- If key ID not found: Tries all available keys (for rotation support)
- If no key matches: Returns
401 Unauthorized
Files:
src/middleware/authMiddleware.js- JWT validationsrc/services/tokenService.js- Refresh token validation
6.4 Rate Limit Exceeded
Behavior:
- OTP request rate limit:
429 Too Many Requests-{success: false, message: 'Too many OTP requests...'} - OTP verify rate limit:
429 Too Many Requests-{success: false, message: 'Too many attempts...'} - User route rate limit:
429 Too Many Requests-{error: 'Too many requests', retry_after: seconds} - Admin route rate limit:
429 Too Many Requests-{error: 'Too many requests', retry_after: seconds}
Headers:
X-RateLimit-Limit: Maximum requests allowedX-RateLimit-Remaining: Remaining requests in windowX-RateLimit-Reset: ISO timestamp when limit resetsX-RateLimit-Type: Type of rate limit (read/write/sensitive/admin)
Files:
src/middleware/rateLimitMiddleware.js- OTP rate limitingsrc/middleware/userRateLimit.js- User route rate limitingsrc/middleware/adminRateLimit.js- Admin rate limiting
6.5 Retries & Fallbacks
Redis Fallback:
- If Redis unavailable: Falls back to in-memory store (per-process, not shared)
- Rate limiting continues to work (with per-instance limits, not global)
- Warning logged on first failure, then silent
Webhook Alerting:
- If webhook fails: Retries once after 3 seconds
- If retry fails: Error logged, but main request flow continues (non-blocking)
Files:
src/services/redisClient.js- Redis client with graceful fallbacksrc/services/auditLogger.js:334-516- Webhook alerting with retry
7. Configuration & Environment Variables
7.1 Required Variables
| Variable | Description | Example | Required |
|---|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgres://user:pass@localhost:5432/dbname |
✅ Yes |
JWT_ACCESS_SECRET |
Secret for signing access tokens (min 32 chars) | hex-string-32-chars-minimum |
✅ Yes |
JWT_REFRESH_SECRET |
Secret for signing refresh tokens (min 32 chars) | hex-string-32-chars-minimum |
✅ Yes |
7.2 Optional Variables - Timeouts & Expiry
| Variable | Description | Default | Example |
|---|---|---|---|
JWT_ACCESS_TTL |
Access token expiry | 15m |
15m, 1h |
JWT_REFRESH_TTL |
Refresh token expiry | 7d |
7d, 30d |
REFRESH_MAX_IDLE_MINUTES |
Refresh token max idle time | 4320 (3 days) |
4320 |
OTP_TTL_SECONDS |
OTP validity in seconds | 120 (2 min) |
120 |
STEP_UP_OTP_WINDOW_MINUTES |
Step-up auth window | 5 |
5 |
7.3 Optional Variables - Rate Limits
| Variable | Description | Default | Example |
|---|---|---|---|
OTP_REQ_PHONE_10MIN_LIMIT |
Max OTP requests per phone (10 min) | 3 |
3 |
OTP_REQ_PHONE_DAY_LIMIT |
Max OTP requests per phone (24h) | 10 |
10 |
OTP_REQ_IP_10MIN_LIMIT |
Max OTP requests per IP (10 min) | 20 |
20 |
OTP_REQ_IP_DAY_LIMIT |
Max OTP requests per IP (24h) | 100 |
100 |
OTP_VERIFY_MAX_ATTEMPTS |
Max OTP verification attempts | 5 |
5 |
OTP_VERIFY_FAILED_PER_HOUR_LIMIT |
Max failed verifications per phone (1h) | 10 |
10 |
USER_RATE_LIMIT_READ_MAX |
Max read requests per user (15 min) | 100 |
100 |
USER_RATE_LIMIT_WRITE_MAX |
Max write requests per user (15 min) | 20 |
20 |
USER_RATE_LIMIT_SENSITIVE_MAX |
Max sensitive requests per user (1h) | 10 |
10 |
ADMIN_RATE_LIMIT_MAX |
Max admin requests per admin (15 min) | 100 |
100 |
7.4 Optional Variables - Security Features
| Variable | Description | Default | Example |
|---|---|---|---|
ENCRYPTION_ENABLED |
Enable field-level encryption | false |
true |
ENCRYPTION_KEY |
32-byte encryption key (base64) | - | base64-encoded-32-byte-key |
DB_ACCESS_LOGGING_ENABLED |
Enable database access logging | false |
true |
DB_ACCESS_LOG_LEVEL |
DB access log level ('all' or 'sensitive') | sensitive |
all, sensitive |
CORS_ALLOWED_ORIGINS |
Comma-separated allowed origins | - | https://app.example.com,https://api.example.com |
ENUMERATION_MAX_PHONES_PER_IP_10MIN |
Max unique phones per IP (10 min) | 5 |
5 |
ENUMERATION_MAX_PHONES_PER_IP_HOUR |
Max unique phones per IP (1h) | 20 |
20 |
ENUMERATION_ALERT_THRESHOLD_10MIN |
Alert threshold for enumeration (10 min) | 10 |
10 |
ENUMERATION_ALERT_THRESHOLD_HOUR |
Alert threshold for enumeration (1h) | 50 |
50 |
OTP_REQUEST_MIN_DELAY |
Min delay for OTP requests (ms) | 500 |
500 |
OTP_VERIFY_MIN_DELAY |
Min delay for OTP verify (ms) | 300 |
300 |
TIMING_MAX_JITTER |
Max jitter for timing protection (ms) | 100 |
100 |
BLOCKED_IP_RANGES |
Comma-separated CIDR blocks | - | 10.0.0.0/8,172.16.0.0/12 |
REQUIRE_OTP_ON_SUSPICIOUS_REFRESH |
Require OTP on suspicious refresh | false |
true |
SECURITY_ALERT_WEBHOOK_URL |
Webhook URL for security alerts | - | https://hooks.slack.com/... |
SECURITY_ALERT_MIN_LEVEL |
Minimum risk level for alerts | HIGH_RISK |
SUSPICIOUS, HIGH_RISK |
7.5 Optional Variables - JWT Key Rotation
| Variable | Description | Default | Example |
|---|---|---|---|
JWT_ACTIVE_KEY_ID |
Key ID for signing new tokens | 1 |
1, 2 |
JWT_KEYS_JSON |
JSON mapping key IDs to secrets | - | {"1":"secret1","2":"secret2"} |
JWT_REFRESH_KEY_ID |
Key ID for refresh tokens | Same as active | 1 |
JWT_ISSUER |
JWT issuer claim | farm-auth-service |
farm-auth-service |
JWT_AUDIENCE |
JWT audience claim | mobile-app |
mobile-app |
7.6 Optional Variables - External Services
| Variable | Description | Default | Example |
|---|---|---|---|
TWILIO_ACCOUNT_SID |
Twilio account SID | - | ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
TWILIO_AUTH_TOKEN |
Twilio auth token | - | your_auth_token |
TWILIO_MESSAGING_SERVICE_SID |
Twilio messaging service SID | - | MGxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
TWILIO_FROM_NUMBER |
Twilio phone number (E.164) | - | +1234567890 |
REDIS_URL |
Redis connection URL | - | redis://localhost:6379 |
REDIS_HOST |
Redis host | localhost |
localhost |
REDIS_PORT |
Redis port | 6379 |
6379 |
REDIS_PASSWORD |
Redis password | - | password |
7.7 Optional Variables - Server Configuration
| Variable | Description | Default | Example |
|---|---|---|---|
PORT |
Server port | 3000 |
3000 |
NODE_ENV |
Environment | - | development, production |
TRUST_PROXY |
Trust proxy headers | false |
true |
ENABLE_ADMIN_DASHBOARD |
Enable admin routes | false |
true |
8. Future Improvements / Notes
8.1 Planned Improvements (from TODOs in code)
-
Secrets Manager Integration
- Load JWT keys from AWS Secrets Manager / HashiCorp Vault (instead of environment variables)
- Load encryption keys from secrets manager
- File:
src/services/jwtKeys.js:161-174(TODO comment)
-
Automated Key Rotation
- Implement automated JWT key rotation without downtime
- Re-encrypt existing data when encryption keys are rotated
- File:
src/services/jwtKeys.js(key rotation support exists, but automation needed)
-
SIEM Integration
- Integrate with SIEM systems (Splunk, ELK, etc.) for centralized log aggregation
- Export audit logs to SIEM for advanced threat detection
- File:
src/services/auditLogger.js(webhook exists, but SIEM integration needed)
-
CSP Nonces
- Fully implement CSP nonces for inline scripts/styles (currently allows
unsafe-inlinefor compatibility) - File:
src/middleware/securityHeaders.js:28-29(nonce support exists but not fully utilized)
- Fully implement CSP nonces for inline scripts/styles (currently allows
-
Database Connection Pooling Tuning
- Add configuration for connection pool size, timeout, etc.
- File:
src/db.js(basic pool, no tuning options)
-
Rate Limiting Improvements
- Implement distributed rate limiting (currently per-instance if Redis unavailable)
- Add rate limit headers to all rate-limited endpoints
- File:
src/middleware/rateLimitMiddleware.js(Redis fallback exists, but distributed limiting needed)
-
OTP Delivery Alternatives
- Support multiple SMS providers (fallback if Twilio fails)
- Support email OTP delivery
- Support push notification OTP delivery
- File:
src/services/smsService.js(only Twilio supported)
-
Advanced Risk Scoring
- Machine learning-based risk scoring
- Geographic anomaly detection (unusual locations)
- Device fingerprinting improvements
- File:
src/services/riskScoring.js(basic scoring exists)
8.2 Potential Risks & Technical Debt
-
In-Memory Rate Limiting
- If Redis is unavailable, rate limiting uses in-memory store (per-instance, not shared)
- Risk: Rate limits are per-instance, not global (can be bypassed with multiple instances)
- Mitigation: Always use Redis in production, or implement distributed rate limiting
-
OTP Storage
- OTPs are stored in database (not just Redis)
- Risk: Database can become a bottleneck for high-volume OTP requests
- Mitigation: Consider moving OTP storage to Redis entirely (with DB backup for audit)
-
Phone Number Encryption Migration
- Handles both encrypted and plaintext phone numbers (backward compatibility)
- Risk: Plaintext phone numbers still in database if encryption was enabled after data existed
- Mitigation: Implement migration script to encrypt all existing phone numbers
-
Webhook Alerting
- Webhook failures are logged but don't block requests
- Risk: Security alerts might be missed if webhook is down
- Mitigation: Implement alert queue (Redis/RabbitMQ) with retry logic and dead-letter queue
-
Database Access Logging
- Database access logging is optional and can impact performance
- Risk: Performance degradation if enabled in high-traffic scenarios
- Mitigation: Use async logging, batch writes, or separate logging database
-
JWT Key Rotation
- Key rotation support exists, but manual process
- Risk: Manual key rotation can cause downtime if not done correctly
- Mitigation: Implement automated key rotation with gradual rollout
-
CORS Configuration
- CORS validation at startup, but runtime checks are warnings only
- Risk: Misconfiguration might not be caught until runtime
- Mitigation: Add stricter runtime validation or fail-fast on suspicious patterns
-
Error Messages
- Some error messages are generic to prevent information leakage
- Risk: Generic errors can make debugging difficult
- Mitigation: Log detailed errors server-side, return generic errors to clients
Appendix: Database Schema
Key Tables
users
id(UUID, PK)phone_number(VARCHAR(20), UNIQUE, encrypted if ENCRYPTION_ENABLED)name(VARCHAR(255))role(enum: 'user', 'admin', 'moderator')user_type(enum: 'seller', 'buyer', 'service_provider')token_version(INT, DEFAULT 1) - Incremented on logout-all-devices to invalidate all access tokenscreated_at,updated_at,last_login_at
otp_codes
id(UUID, PK)phone_number(VARCHAR(20), encrypted if ENCRYPTION_ENABLED)otp_hash(VARCHAR(255), bcrypt hash)expires_at(TIMESTAMPTZ)attempt_count(INT)created_at(TIMESTAMPTZ)
refresh_tokens
id(UUID, PK)user_id(UUID, FK)token_id(UUID, UNIQUE)token_hash(VARCHAR(255), bcrypt hash)device_id(VARCHAR(255))user_agent(TEXT)ip_address(VARCHAR(45))expires_at(TIMESTAMPTZ)last_used_at(TIMESTAMPTZ)revoked_at(TIMESTAMPTZ, NULL = active)reuse_detected_at(TIMESTAMPTZ)rotated_from_id(UUID, FK to refresh_tokens)
user_devices
id(UUID, PK)user_id(UUID, FK)device_identifier(TEXT)device_platform(TEXT)device_model(TEXT)os_version(TEXT)app_version(TEXT)language_code(TEXT)timezone(TEXT)first_seen_at(TIMESTAMPTZ)last_seen_at(TIMESTAMPTZ)is_active(BOOLEAN)- UNIQUE (user_id, device_identifier)
auth_audit
id(UUID, PK)user_id(UUID, FK, nullable)action(VARCHAR(100))status(VARCHAR(50))risk_level(VARCHAR(20): 'INFO', 'SUSPICIOUS', 'HIGH_RISK')ip_address(VARCHAR(45))user_agent(TEXT)device_id(VARCHAR(255))meta(JSONB)created_at(TIMESTAMPTZ)
Document Version
- Version: 1.0
- Last Updated: 2024
- Author: Architecture Documentation Generator
- Maintained By: Development Team