auth-service-rs

View the Project on GitHub zs-dima/auth-service-rs

Rust gRPC authentication service + cross-platform client.

~35 min read

How Modern Type Systems Enable Infrastructure You Can Trust

Authentication is the front door of your platform. Every request hits it, every incident starts or ends there, and every millisecond of latency multiplies across your entire product.

For years, backend engineers have faced painful trade-offs:

Rust eliminates these compromises.

By combining Rust with Tonic (gRPC), SQLx (async SQL), and Tower (middleware), you get a stack where:

This article dissects a production-grade authentication service that demonstrates why infrastructure-critical services are increasingly being built in Rust. We explore architectural innovations that are only possible with Rust’s type system:

Note on numbers: The memory/latency/cost figures below are typical “order of magnitude” comparisons. Your results will depend on dependencies, runtime flags, database, traffic shape, and deployment environment.


What This System Delivers

Live Demo: auth-demo.dmitrii.app
Source Code: Server (Rust)Client (Flutter)

TL;DR (For Skimmers)

Who This Article Is For

How to Read This (So You Don’t Get Lost)


Table of Contents

Part I: Why Rust Changes Everything

Part II: Architecture

Part III: Security & OWASP Compliance

Part IV: Authentication Flows

Part V: Production Operations

Part VI: Getting Started & Comparison

Wrap-Up

Appendix


Part I: Why Rust Changes Everything

The Resource Efficiency Revolution

Authentication services are high-traffic gateways — every request touches them. They must be fast, secure, and economical to run at scale.

The Economics of Memory

Runtime Idle Memory P99 Latency Cost @ 100 Replicas
Java/Spring Boot 250–500 MB 50–200 ms 25–50 GB requested
Node.js/Express 70–150 MB 20–80 ms 7–15 GB requested
Go/gRPC 20–40 MB 5–20 ms 2–4 GB requested
Rust/Tonic 5–15 MB 2–10 ms 0.5–1.5 GB

These ranges are illustrative, not guarantees. Baseline memory/latency depends heavily on framework choices, TLS, logging/metrics, connection pools, and workload. Use this as a mental model, then benchmark your own service.

In Kubernetes — where you often pay for requested resources — a smaller and more predictable baseline makes it easier to right-size requests and scale more granularly.

The real breakthrough isn’t just efficiency — it’s predictability. Without a garbage collector, there are no GC pauses — removing one common source of tail-latency spikes. P99 latencies remain consistently low under load, which is critical when every millisecond of authentication latency cascades across your platform.


Compile-Time SQL Verification

Most database bugs surface in production. A renamed column, a missing WHERE clause, or a type mismatch — each produces a runtime error that could have been prevented.

SQLx eliminates this entire class of bugs through compile-time verification.

How It Works

let user = sqlx::query_as!(
    User,
    "SELECT id, email FROM auth.users WHERE email = $1",
    email
)
.fetch_optional(&pool)
.await?;

During cargo build, SQLx:

  1. Connects to your database
  2. Parses and validates the SQL query
  3. Verifies column types match your Rust struct
  4. Fails compilation if anything doesn’t match

CI tip: SQLx also supports an offline workflow (via cargo sqlx prepare) so builds don’t need a live database connection. This keeps CI deterministic while preserving compile-time validation.

A renamed email column? The build breaks. This moves bugs from runtime (where they hurt users) to compile-time (where they only annoy developers).

The Impact

Before SQLx, you’d discover this in production:

ERROR: column "email" does not exist

With SQLx, you discover it during cargo build:

error: no such column: `email` in table `users`
 --> src/services/auth.rs:42:5

Rust’s macro ecosystem makes this kind of schema-aware checking unusually ergonomic. SQLx’s online/offline workflow is one of the most seamless ways to get compile-time confidence in real SQL.


Zero-Cost Async & Type-Driven Security

Zero-Cost Abstractions

Rust compiles async/await into state machines at compile time. There are no runtime allocations and no scheduler overhead. Futures remain lightweight (often stack-resident until you spawn them), while Tokio provides the scheduling and I/O layer with low, predictable overhead:

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config::init()?;
    telemetry::init(&config.telemetry)?;
    
    let (app, addr) = startup::build_app(&config).await?;
    let listener = TcpListener::bind(addr).await?;
    
    info!("Server listening on {addr}");
    axum::serve(listener, app)
        .with_graceful_shutdown(shutdown_signal())
        .await?;
    
    Ok(())
}

Every await becomes a state transition in that generated state machine. In practice, the runtime cost is often close to what you’d write by hand in a lower-level language—while still preserving memory safety.

Type-Driven Security

Rust’s type system enforces security patterns:

1. Secret Protection

use secrecy::SecretString;

pub struct Config {
    jwt_secret: SecretString,  // Can't accidentally log
    db_password: SecretString, // Debug trait redacts value
}

2. Lifetime Safety

// Compiler ensures database connections don't outlive transactions
pub async fn create_session<'a>(
    pool: &'a PgPool,
    params: CreateSessionParams<'a>,  // Borrows from caller
) -> Result<Session, AppError>

3. Thread-Safe Sharing

// Arc provides safe sharing without locks
let context = Arc::new(ServiceContext::new(db, email, s3));
let auth_service = AuthService::new(context.clone());
let user_service = UserService::new(context.clone());

During development, the compiler caught timing attack vulnerabilities that would have been subtle runtime bugs in other languages. For example, accidentally cloning a token hash instead of referencing it would create a timing side-channel — Rust’s borrow checker flagged this immediately.


Part II: Architecture

Architecture Overview

The service follows a layered architecture with clear separation of concerns:

graph TB
    subgraph Clients["📱 Client Layer"]
        Mobile["Mobile Apps<br/>(Native gRPC)"]
        Web["Web Apps<br/>(gRPC-Web)"]
        Browser["Browsers<br/>(REST)"]
    end

    subgraph Gateway["🚪 Gateway - Single Port"]
        Router["Axum Router<br/>(Protocol Detection)"]
    end

    subgraph Middleware["🛡️ Middleware Stack (Tower)"]
        MW["RequestID → Tracing → Auth → CORS → Timeout"]
    end

    subgraph Protocol["📡 Protocol Handlers"]
        GrpcWeb["gRPC + gRPC-Web"]
        RestAPI["REST API"]
    end

    subgraph Services["⚙️ Service Layer"]
        AuthSvc["AuthService"]
        UserSvc["UserService"]
        HealthSvc["HealthService"]
    end

    subgraph Context["🔧 ServiceContext (Shared Infrastructure)"]
        DB["Database Pool"]
        Email["Email Provider"]
        S3["S3 Storage"]
        GeoIP["GeoIP"]
        JWT["JWT Validator"]
        URLs["URL Builder"]
    end

    subgraph Data["💾 Data Layer"]
        Postgres[("PostgreSQL<br/>Atomic Functions")]
    end

    Mobile & Web & Browser --> Router
    Router --> MW
    MW --> GrpcWeb & RestAPI

    GrpcWeb --> AuthSvc & UserSvc & HealthSvc
    RestAPI --> HealthSvc

    AuthSvc --> DB & Email & GeoIP & JWT
    UserSvc --> DB & S3 & Email
    HealthSvc --> DB

    DB --> Postgres
    
    style Gateway fill:#4CAF50,color:#fff
    style Middleware fill:#2196F3,color:#fff
    style Services fill:#FF9800,color:#fff
    style Data fill:#FF5722,color:#fff

Key Architectural Principles

1. Single Responsibility via Crate Organization

The project uses Cargo workspace to separate concerns:

auth-service-rs/
├── src/                     # Service binary (wiring, routing)
│   ├── main.rs              # Entry point, telemetry setup
│   ├── startup.rs           # Server wiring, dependency injection
│   ├── routes.rs            # REST endpoints (health, verify-email)
│   ├── config.rs            # Environment config with validation
│   ├── services/            # gRPC services implementations
│   ├── middleware/          # Tower layers (auth, tracing, IP)
│   └── core/                # Service context, URLs, password utils
├── crates/
│   ├── proto/               # Protobuf definitions + codegen
│   ├── core/                # Domain logic (JWT, tokens, password, validation)
│   ├── db/                  # SQLx repositories, migrations, atomic SQL functions
│   ├── email/               # SMTP provider
│   ├── mailjet/             # Mailjet provider
│   ├── storage/             # S3-compatible storage
│   └── telemetry/           # OTel, Prometheus, Sentry glue
└── proto/                   # Raw .proto files (auth.proto, users.proto)

Benefits of this structure:

2. Dependency Injection via ServiceContext

3. Compile-Time Verification

4. Zero-Trust Security

5. Graceful Degradation


Protocol Multiplexing: One Port for Everything

Traditional microservices run gRPC and REST on separate ports, requiring:

The Innovation

This service handles gRPC, gRPC-Web, and REST on one port using Axum’s router composition:

pub async fn build_app(config: &Config) -> Result<(Router, SocketAddr)> {
    // gRPC services
    let grpc_routes = Routes::new(health_service)
        .add_service(auth_service)
        .add_service(user_service);

    // REST routes
    let rest_router = rest_routes(state);

    // gRPC-Web layer enables browsers (HTTP/1.1)
    let grpc_router = grpc_routes
        .into_axum_router()
        .layer(GrpcWebLayer::new());

    let app = rest_router.merge(grpc_router).layer(middleware);
    Ok((app, addr))
}

Protocol Detection

Requests are routed based on the Content-Type header:

The Impact

DevOps complexity: eliminated


Unified Tower Middleware

The service implements a unified middleware pipeline that works seamlessly for both gRPC and REST, eliminating the need for separate interceptor chains. This section covers two key innovations: perfect-hash route matching and the Tower middleware stack.

Perfect Hash Route Matching

Most auth middleware needs a “public route allowlist” (health checks, sign-up, recovery endpoints). Many implementations use HashMap or linear checks.

This service uses a compile-time perfect hash:

use phf::phf_set;

static PUBLIC_ROUTES: phf::Set<&'static str> = phf_set! {
    // gRPC methods
    "Authenticate", "SignUp", "RecoveryStart", "RefreshTokens",
    // REST endpoints
    "/health", "/metrics", "/verify-email",
};

fn is_public_route(path: &str) -> bool {
    PUBLIC_ROUTES.contains(path)  // O(1), zero allocation
}

phf_set! generates a perfect hash at compile time:

Protocol-Aware Error Responses

The same auth layer returns the right “shape” of error for gRPC vs REST:

fn build_error_response(err: &JwtError, is_grpc: bool) -> Response<Body> {
    if is_grpc {
        // gRPC typically uses HTTP 200 and carries status in `grpc-status` trailers
        Response::builder()
            .status(StatusCode::OK)
            .header("grpc-status", "16") // UNAUTHENTICATED
            .header("grpc-message", err.to_string())
            .body(Body::empty())
    } else {
        // REST uses standard HTTP status codes
        Response::builder()
            .status(StatusCode::UNAUTHORIZED)
            .header("www-authenticate", "Bearer")
            .body(json!({"error": err}).into())
    }
}

This enables one authentication layer for all protocols.

The Tower Middleware Stack

Here’s the complete middleware pipeline:

let middleware = ServiceBuilder::new()
    .layer(RequestIdLayer::new())                      // 1. Extract/generate request ID
    .layer(TraceLayer::new_for_http())                 // 2. Distributed tracing
    .layer(TimeoutLayer::new(Duration::from_secs(10))) // 3. Request timeout
    .layer(cors)                                       // 4. CORS handling
    .layer(AuthLayer::new(jwt_validator));             // 5. JWT authentication

Key insight: Tower’s middleware abstraction is protocol-agnostic — the same Layer trait works for gRPC and REST because both are HTTP-based.

Benefits of unified middleware:

  1. Single Implementation — Write authentication once; it applies everywhere
  2. Consistent Behavior — Same error handling, logging, and tracing for all protocols
  3. Composable Layers — Stack middleware like building blocks using Tower’s zero-cost abstraction
  4. Protocol-Aware Responses — Middleware detects Content-Type and formats errors appropriately

If you need gRPC-only behavior (custom metadata/trailers), Tonic interceptors are still available:

let server = AuthServiceServer::with_interceptor(
    auth_service,
    |req: Request<()>| {
        if let Some(_custom) = req.metadata().get("x-custom") {
            // gRPC-specific logic here
        }
        Ok(req)
    }
);

This architecture provides simplicity by default with power when needed — the best of both worlds.


Atomic Database Operations

Race conditions in authentication systems cause critical security bugs: double token use, account takeover, and privilege escalation. Traditional multi-query transactions leave windows where concurrent requests can corrupt state. This section demonstrates how to eliminate these vulnerabilities entirely.

PostgreSQL Functions

Instead of multi-step transactions in application code, critical operations are atomic database functions:

CREATE OR REPLACE FUNCTION auth.verify_email(p_token_hash BYTEA)
RETURNS auth.users
LANGUAGE plpgsql
AS $$
DECLARE
  v_user_id UUID;
  v_status auth.user_status;
BEGIN
  -- Step 1: Consume token atomically
  UPDATE auth.email_verification_tokens
  SET used_at = now()
  WHERE token_hash = p_token_hash
    AND used_at IS NULL
    AND expires_at > now()
  RETURNING id_user INTO v_user_id;

  IF v_user_id IS NULL THEN
    RAISE EXCEPTION 'TOKEN_INVALID';
  END IF;

  -- Step 2: Validate user status
  SELECT status INTO v_status
  FROM auth.users WHERE id = v_user_id;

  IF v_status IN ('suspended', 'deleted') THEN
    RAISE EXCEPTION 'ACCOUNT_SUSPENDED';
  END IF;

  -- Step 3: Verify + activate
  UPDATE auth.users
  SET email_verified = TRUE,
      status = CASE WHEN status = 'pending' THEN 'active' ELSE status END
  WHERE id = v_user_id
  RETURNING * INTO result;

  RETURN result;
END;
$$;

Why This Matters

Without atomic function (multi-query approach):

// ❌ Race condition possible between queries
let token = db.get_token(hash).await?;
if token.used_at.is_some() {
    return Err(AppError::TokenInvalid);
}
db.mark_token_used(token.id).await?;  // Another request could use token here
db.verify_user_email(token.user_id).await?;

With atomic function:

// ✅ Single atomic operation
let user = db.verify_email(hash).await?;

One call, one transaction, no race window.

Another Example: Sliding Session Expiration

Session Touch (Sliding Expiration)

CREATE FUNCTION auth.touch_session(p_token_hash BYTEA, p_extend_by INTERVAL)
RETURNS TABLE (id_user UUID, expires_at TIMESTAMPTZ)
AS $$
  UPDATE auth.sessions
  SET last_seen_at = now(),
      expires_at = GREATEST(expires_at, now() + p_extend_by),
      activity_count = activity_count + 1
  WHERE refresh_token = p_token_hash AND expires_at > now()
  RETURNING id_user, expires_at;
$$;

Every token refresh atomically validates, updates activity, and extends expiration — preventing session hijacking through precise timing attacks.

Database Schema (Safety + Performance)

The schema leverages PostgreSQL-specific features for safety and performance:

1. UUIDv7 for Time-Ordered IDs

CREATE TABLE auth.users (
  id UUID PRIMARY KEY DEFAULT uuidv7(),  -- Time-ordered, index-friendly, PostgreSQL 18+
  ...
);

2. Domain types for normalization + validation

CREATE DOMAIN email AS TEXT
  CHECK (
    VALUE = lower(VALUE) AND
    VALUE ~ '^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$' AND
    length(VALUE) <= 254
  );

CREATE DOMAIN phone_e164 AS TEXT
  CHECK (VALUE ~ '^\\+[1-9][0-9]{1,14}$');

3. Partial Unique Indexes (Soft Delete)

CREATE UNIQUE INDEX user_email_active_ux
  ON auth.users (email)
  WHERE deleted_at IS NULL AND email IS NOT NULL;

4. Constraint-Based Business Rules

CONSTRAINT user_auth_method_ck
  CHECK (status != 'active' OR email IS NOT NULL OR phone IS NOT NULL)

5. JSONB for Flexible Metadata

metadata JSONB DEFAULT '{}'::JSONB
CONSTRAINT role_permissions_valid_ck
  CHECK (jsonb_typeof(permissions) = 'object')

The ServiceContext Pattern

Rather than passing individual dependencies everywhere, a shared context encapsulates all infrastructure:

#[derive(Clone)]
pub struct ServiceContext {
    db: Database,
    email: Option<EmailProvider>,  // SMTP or Mailjet
    s3: Option<Arc<S3Storage>>,
    urls: UrlBuilder,
}

Benefits

1. Single Source of Truth

// Services share infrastructure via Arc
let context = Arc::new(ServiceContext::new(db, email, s3, urls));
let auth_service = AuthService::new(config, context.clone());
let user_service = UserService::new(config, context.clone());

2. Graceful Degradation

// Optional features don't break the service
impl ServiceContext {
    pub fn send_email(&self, /* params */) {
        let Some(email) = &self.email else {
            return;  // Email not configured — service still works
        };
        // Send email...
    }
}

3. Fire-and-Forget Background Tasks

pub fn send_welcome_email(&self, user_id: Uuid, email: String, /* ... */) {
    let db = self.db.clone();
    let email_provider = self.email.clone();
    
    tokio::spawn(async move {
        // Create verification token
        // Send email
        // Log result
    });  // User response is sent immediately; email delivery happens in background
}

Sign-up returns in milliseconds, while email delivery happens asynchronously. Failures are logged but never block users.


Embedded Validation with Protobuf

The service leverages compile-time validation through Protocol Buffers annotations. This approach catches invalid requests before they reach business logic, eliminates boilerplate validation code, and ensures consistent enforcement across all clients.

Validation Rules in Proto Files

message SignUpRequest {
  string email = 1 [
    (validate.rules).string.email = true,
    (validate.rules).string.max_len = 255
  ];
  
  string password = 2 [
    (validate.rules).string.min_len = 8,
    (validate.rules).string.max_len = 128
  ];
  
  string name = 3 [
    (validate.rules).string.min_len = 1,
    (validate.rules).string.max_len = 255
  ];
}

Automatic Code Generation

The prost-validate crate generates validation code at compile time:

// In crates/proto/build.rs
prost_validate_build::Builder::new()
    .configure(&mut config, proto_files, includes)
    .expect("Failed to configure prost-validate");

Every generated message gets a Validator trait implementation. The service uses an extension trait for ergonomic validation:

pub trait ValidateExt {
    fn validate_or_status(&self) -> Result<(), Status>;
}

impl<T: Validator> ValidateExt for T {
    fn validate_or_status(&self) -> Result<(), Status> {
        self.validate()
            .map_err(|e| Status::invalid_argument(e.to_string()))
    }
}

Usage in Handlers

pub async fn sign_up(
    &self,
    request: Request<SignUpRequest>,
) -> Result<Response<AuthResponse>, Status> {
    let req = request.into_inner();
    
    // Proto validation — one line, all rules checked
    req.validate_or_status()?;
    
    // Domain validation — business rules beyond proto
    domain::validate_password(&req.password)?;
    
    // All validation passed — proceed with business logic
    let user = self.ctx.db().create_user(&req).await?;
    // ...
}

Key benefits:

  1. Early Rejection — Invalid requests fail at the entry point, before any database queries execute
  2. Consistent Validation — The same rules are enforced across all clients: mobile, web, and CLI
  3. Self-Documenting API — Validation rules are visible in .proto files
  4. Client Generation — Clients can generate the same validation logic from proto files
  5. Performance — Validation code compiles into the binary with zero runtime overhead
  6. Nested Validation — Complex message hierarchies are validated recursively

Domain-Specific Validation

For business rules that can’t be expressed in proto annotations, the service provides domain validators:

pub mod domain {
    use tonic::Status;

    /// Validate password complexity beyond length (which proto handles)
    pub fn validate_password(password: &str) -> Result<(), Status> {
        let has_letter = password.chars().any(char::is_alphabetic);
        let has_digit = password.chars().any(|c| c.is_ascii_digit());
        
        if !has_letter || !has_digit {
            return Err(Status::invalid_argument(
                "Password must contain both letters and numbers"
            ));
        }
        Ok(())
    }

    /// Validate E.164 phone format
    pub fn validate_phone(phone: &str) -> Result<(), Status> {
        if !phone.starts_with('+') || phone.len() < 8 || phone.len() > 16 {
            return Err(Status::invalid_argument(
                "Phone must be in E.164 format (+1234567890)"
            ));
        }
        Ok(())
    }
}

This two-tier validation approach (proto + domain) ensures comprehensive input validation without code duplication. Proto annotations handle structural constraints such as length, format, and required fields, while domain validators handle business logic like password complexity and semantic format validation.


Streaming APIs with async-stream

The service implements server-side streaming for efficient data transfer, especially useful for admin operations listing thousands of users.

Streaming Implementation

use async_stream::try_stream;
use tokio_stream::StreamExt;

pub fn list_users(&self, req: ListUsersRequest) -> StreamResult<User> {
    let db = self.ctx.db().clone();
    
    let stream = try_stream! {
        let mut rows = db.users.stream_all_users();
        
        while let Some(result) = rows.next().await {
            let user = result.map_err(|e| {
                error!(error = %e, "Failed to stream users");
                Status::from(AppError::Unavailable(e.to_string()))
            })?;
            
            yield User::from(user);  // Stream one record at a time
        }
    };
    
    Box::pin(stream)
}

Why Streaming Matters

Without streaming (traditional approach):

// ❌ Load all users into memory
let users = db.get_all_users().await?;  // OOM with 100K users
Ok(Response::new(ListUsersResponse { users }))

With streaming:

// ✅ Constant memory regardless of user count
let stream = self.list_users(req);
Ok(Response::new(Box::pin(stream)))

Benefits:

  1. Constant Memory — Process millions of records with a fixed memory footprint
  2. Progressive Results — Clients receive data as it’s fetched, reducing time to first result
  3. Early Termination — Clients can close the stream when satisfied, saving server resources
  4. Backpressure — gRPC flow control prevents overwhelming slow clients
  5. Database Cursors — SQLx streams map directly to PostgreSQL cursors without buffering

Real-world impact — Admin dashboard listing 50,000 users:

The async_stream::try_stream! macro provides ergonomic syntax for building streams with ? operator support — much cleaner than manual Stream trait implementation.


Part III: Security & OWASP Compliance

Multi-Layer Security Architecture

The service implements the OWASP Authentication Cheat Sheet through multiple complementary layers:

Layer 1: Password Security

Argon2id Hashing

OWASP currently recommends Argon2id and provides baseline parameters you can start from (then calibrate on your hardware). A common starting point is ~19 MiB memory, 2 iterations, parallelism 1. (Source: OWASP Password Storage Cheat Sheet.)

/// Hash password using Argon2id with OWASP-recommended parameters
pub fn hash_password(password: &str) -> Result<String, AppError> {
    let salt = SaltString::generate(&mut OsRng);

    // OWASP baseline: 19 MiB memory, 2 iterations, 1 parallelism
    let params = Params::new(19 * 1024, 2, 1, None)
        .map_err(|e| AppError::internal(format!("argon2 params: {e}")))?;

    let argon2 = Argon2::new(Algorithm::Argon2id, Version::V0x13, params);

    argon2
        .hash_password(password.as_bytes(), &salt)
        .map(|hash| hash.to_string())
        .map_err(|e| AppError::internal(format!("argon2 hash: {e}")))
}

/// Verify password against stored hash (constant-time comparison)
pub fn verify_password(password: &str, hash: &str) -> bool {
    use password_hash::PasswordVerifier;
    
    let parsed_hash = match password_hash::PasswordHash::new(hash) {
        Ok(h) => h,
        Err(_) => return false,
    };

    Argon2::default()
        .verify_password(password.as_bytes(), &parsed_hash)
        .is_ok()
}

Argon2id is memory-hard — resistant to GPU/ASIC attacks. Cost parameters ensure cracking attempts are prohibitively expensive. The stored hash includes algorithm, parameters, and salt, making future parameter upgrades transparent.

Layer 2: Account Lockout

To prevent brute-force attacks, the service implements progressive account lockout:

pub async fn authenticate(&self, req: AuthenticateRequest) -> Result<AuthResponse> {
    let user = self.find_user(&req.identifier).await?;
    
    // Check if account locked
    if let Some(locked_until) = user.locked_until {
        if locked_until > Utc::now() {
            return Ok(AuthResponse::locked());
        }
    }
    
    // Verify password
    if !password::verify(&req.password, &user.password_hash) {
        let attempts = self.db.increment_failed_login(user.id).await?;
        
        // Lock account after N failures
        if attempts >= self.config.max_failed_attempts {
            self.db.lock_account(user.id, self.config.lockout_duration).await?;
            return Ok(AuthResponse::locked());
        }
        return Ok(AuthResponse::failed());
    }
    
    // Success: reset counter
    self.db.reset_failed_login(user.id).await?;
    Ok(self.create_session(user, req).await?)
}

Configuration via environment:

Layer 3: Generic Error Messages

OWASP requirement: Never reveal whether an email exists in the system.

fn failed_auth() -> AuthResponse {
    AuthResponse {
        status: AuthStatus::Failed,
        message: "Invalid credentials".to_string(),  // Generic message
        // No indication whether email exists
    }
}

Same response for:

This prevents user enumeration attacks, where attackers attempt to harvest valid email addresses.

Layer 4: JWT Token Security

Cached Validator with Strong Configuration

pub struct JwtValidator {
    encoding_key: Arc<EncodingKey>,  // Cached for performance
    decoding_key: Arc<DecodingKey>,
    validation: Validation,
}

impl JwtValidator {
    pub fn new(secret: &SecretString) -> Self {
        let mut validation = Validation::new(Algorithm::HS256);
        validation.set_audience(&["auth-service"]);
        validation.set_issuer(&["auth-service"]);
        validation.validate_exp = true;  // Expiration required
        validation.validate_nbf = true;  // Not-before required
        
        Self {
            encoding_key: Arc::new(EncodingKey::from_secret(secret.as_bytes())),
            decoding_key: Arc::new(DecodingKey::from_secret(secret.as_bytes())),
            validation,
        }
    }
}

Token Claims

struct Claims {
    sub: String,           // user_id
    email: String,
    name: String,
    role: String,
    device_id: String,
    installation_id: String,
    iat: i64,             // Issued at
    exp: i64,             // Expiration
    nbf: i64,             // Not before
    aud: String,          // Audience
    iss: String,          // Issuer
}

Layer 5: Refresh Token Rotation

Single-Use Tokens

pub async fn refresh_tokens(&self, refresh_token: String) -> Result<TokenPair> {
    let token_hash = TokenGenerator::hash_token(&refresh_token);
    
    // Atomically validate and consume token
    let session = self.db.touch_session(&token_hash).await?;
    
    // Generate new tokens
    let new_access = self.jwt.generate_access_token(&session.user, ...)?;
    let new_refresh = TokenGenerator::generate_secure_token();
    let new_hash = TokenGenerator::hash_token(&new_refresh);
    
    // Store new refresh token
    self.db.update_session_token(session.id, &new_hash).await?;
    
    Ok(TokenPair { access_token: new_access, refresh_token: new_refresh })
}

Key property: If an attacker steals a refresh token, they can only use it once before it’s replaced.

Layer 6: OAuth 2.0 Readiness

The service architecture is prepared for OAuth 2.0 social login, with the database schema and proto definitions already in place.

Database schema supports OAuth providers:

CREATE TYPE oauth_provider AS ENUM (
  'google', 'github', 'microsoft', 'apple', 'facebook'
);

CREATE TABLE oauth_accounts (
  id              UUID PRIMARY KEY DEFAULT uuidv7(),
  id_user         UUID NOT NULL REFERENCES users(id),
  provider        oauth_provider NOT NULL,
  provider_id     TEXT NOT NULL,  -- Provider's user ID
  provider_email  email,
  metadata        JSONB DEFAULT '{}'::JSONB,
  linked_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (provider, provider_id)  -- One account per provider
);

Proto definitions ready:

service AuthService {
  rpc GetOAuthUrl(GetOAuthUrlRequest) returns (GetOAuthUrlResponse);
  rpc ExchangeOAuthCode(ExchangeOAuthCodeRequest) returns (AuthResponse);
  rpc LinkOAuthProvider(LinkOAuthProviderRequest) returns (Empty);
  rpc UnlinkOAuthProvider(UnlinkOAuthProviderRequest) returns (Empty);
  rpc ListLinkedProviders(ListLinkedProvidersRequest) returns (ListLinkedProvidersResponse);
}

Why this architecture matters:

  1. Schema Stability — Database supports OAuth from day one (no migration needed later)
  2. API Contract — Proto definitions let clients prepare for OAuth before implementation
  3. Graceful Degradation — Service works without OAuth; easy to add when needed
  4. Account Linking — Users can link multiple OAuth providers to one account
  5. Passwordless Accounts — Schema allows OAuth-only users (no password set)

Implementation path: When OAuth is needed, implementers only need to:

  1. Add OAuth client libraries (for example, the oauth2 crate)
  2. Implement the handler logic for redirect URLs and token exchange
  3. Remove the unimplemented status

No database migrations and no API changes are required — the architecture anticipated this need.

Token Refresh Flow

The service implements sliding session expiration with single-use refresh tokens, as illustrated in the following sequence:

sequenceDiagram
    participant C as Client
    participant A as AuthService
    participant DB as Database

    Note over C: Access token expired after 60 min
    
    C->>A: Request with expired token
    A-->>C: ❌ 401 Unauthenticated
    
    C->>A: RefreshTokens(refresh_token)
    A->>A: hash_token(refresh_token)
    
    A->>DB: touch_session(token_hash)
    
    alt Invalid/Expired Token
        DB-->>A: Token not found or expired
        A-->>C: ❌ Force re-login
    end
    
    DB-->>A: ✓ Session valid
    
    A->>A: Generate new tokens
    A->>DB: update_session(new_refresh_hash)
    Note over DB: Old token invalidated (single-use)
    
    A-->>C: ✅ New token pair
    C->>C: Store new tokens
    
    C->>A: Retry request with new access token
    A-->>C: ✓ Success

Security Properties:

  1. Single-Use Refresh Tokens — Each refresh operation invalidates the previous token
  2. Sliding Expiration — Active sessions remain alive for up to 90 days
  3. Atomic Token Rotation — A database function prevents race conditions
  4. Activity Tracking — The activity_count increments and last_seen_at updates with each refresh
  5. Forced Re-authentication — Required after 90 days of inactivity or manual session revocation

Implementation Detail: touch_session() uses GREATEST()

expires_at = GREATEST(expires_at, NOW() + $extend_by)

This ensures:


Type-Safe Error Handling

Rust’s Result type forces explicit error handling at every step. This project uses a structured approach that ensures errors are handled consistently:

Domain Errors with thiserror

#[derive(Debug, Error)]
pub enum AppError {
    #[error("Not found: {0}")]
    NotFound(String),
    
    #[error("Unauthenticated: {0}")]
    Unauthenticated(String),
    
    #[error("Permission denied: {0}")]
    PermissionDenied(String),
    
    #[error("Conflict: {0}")]
    Conflict(String),
    
    #[error("Internal: {0}")]
    Internal(String),
}

Automatic gRPC Conversion

impl From<AppError> for Status {
    fn from(error: AppError) -> Self {
        match error {
            AppError::NotFound(msg) => Status::not_found(msg),
            AppError::Unauthenticated(msg) => Status::unauthenticated(msg),
            AppError::Conflict(msg) => Status::already_exists(msg),
            AppError::Internal(msg) => {
                error!("Internal error: {}", msg);  // Log details
                Status::internal("Internal server error")  // Generic to client
            }
        }
    }
}

Critical: Internal errors are logged but never exposed to clients — preventing information leakage.

Extension Traits for Ergonomics

To simplify error handling in service methods, the project provides extension traits:

pub trait StatusExt<T> {
    fn status(self, msg: &'static str) -> Result<T, Status>;
}

impl<T, E: Display> StatusExt<T> for Result<T, E> {
    fn status(self, msg: &'static str) -> Result<T, Status> {
        self.map_err(|e| {
            error!("{}: {}", msg, e);
            Status::internal(msg)
        })
    }
}

Usage in handlers:

let user = self.db.get_user(id)
    .await
    .status("Failed to get user")?;  // Logs error, returns generic Status

The ? operator propagates errors up the call stack, with automatic conversion to Status at the gRPC boundary.


Session Management and Device Tracking

The service tracks sessions with rich device context, enabling users to manage their active sessions across multiple devices:

Session Model

pub struct SessionInfo {
    pub device_id: String,           // Unique device identifier
    pub device_name: String,         // "iPhone 15 Pro"
    pub device_type: String,         // mobile/tablet/desktop/web
    pub client_version: String,      // App version
    pub ip_address: IpNetwork,       // Last seen IP
    pub ip_country: String,          // ISO country code (GeoIP)
    pub created_at: DateTime<Utc>,
    pub last_seen_at: DateTime<Utc>,
    pub expires_at: DateTime<Utc>,
    pub activity_count: i32,         // Number of token refreshes
    pub is_current: bool,            // Is this the calling device?
}

User Experience Features Enabled

1. List All Sessions

rpc ListSessions(ListSessionsRequest) returns (ListSessionsResponse);

Users can view all active devices, including their locations and last activities.

2. Revoke Specific Sessions

rpc RevokeSession(RevokeSessionRequest) returns (Empty);

Allows actions like “Log out from iPhone” — revoking only that device’s session.

3. Revoke All Other Sessions

rpc RevokeOtherSessions(RevokeOtherSessionsRequest) returns (Empty);

Enables “Log out everywhere else” — keeping only the current device logged in.

Database Schema

CREATE TABLE auth.sessions (
  id              UUID PRIMARY KEY DEFAULT uuidv7(),
  id_user         UUID NOT NULL REFERENCES auth.users(id),
  refresh_token   BYTEA NOT NULL,  -- SHA-256 hash
  device_id       VARCHAR(255),
  device_name     VARCHAR(255),
  device_type     VARCHAR(50),
  client_version  VARCHAR(100),
  ip_address      INET,
  ip_country      VARCHAR(2),      -- ISO country code
  created_at      TIMESTAMPTZ DEFAULT now(),
  last_seen_at    TIMESTAMPTZ DEFAULT now(),
  expires_at      TIMESTAMPTZ NOT NULL,
  activity_count  INT DEFAULT 0,
  metadata        JSONB DEFAULT '{}'
);

-- Unique constraint: one session per user+device
CREATE UNIQUE INDEX session_user_device_ux 
  ON auth.sessions (id_user, device_id)
  WHERE device_id IS NOT NULL;

Key insight: The unique constraint on (id_user, device_id) ensures that refreshing tokens updates the existing session rather than creating duplicates.


Part IV: Authentication Flows

Authentication Flow

The authentication flow implements OWASP best practices, including account lockout, generic error messages, and secure session management:

sequenceDiagram
    participant C as Client
    participant A as AuthService
    participant DB as Database

    C->>A: Authenticate(email, password, device_info)
    A->>DB: get_user_by_email()
    
    alt Invalid Credentials or Locked
        DB-->>A: User not found / locked / wrong password
        A->>DB: increment_failed_attempts()
        A-->>C: ❌ Generic error: "Invalid credentials"
        Note over C: OWASP: No user enumeration
    end
    
    DB-->>A: ✓ User found
    A->>A: Argon2id verify password
    
    A->>DB: reset_failed_attempts()
    A->>A: Generate JWT tokens (access + refresh)
    A->>DB: create_session(tokens, device, geo)
    
    A-->>C: ✅ Success(access_token, refresh_token, user)

Security Highlights

Account Lockout — Progressive penalties prevent brute-force attacks:

Generic Error Messages — Prevent user enumeration:

Device-Based Sessions — Multi-device support:


Password Recovery Flow

Password recovery implements OWASP recommendations for secure reset workflows:

sequenceDiagram
    participant C as Client
    participant A as AuthService
    participant DB as Database
    participant E as Email

    Note over C,A: Step 1: Request Reset
    
    C->>A: RecoveryStart(email)
    A-->>C: ✓ OK (always succeeds)
    Note over A: OWASP: No user enumeration
    
    par Background
        A->>DB: Find user + create token
        A->>E: Send reset email
    end
    
    Note over C,E: Step 2: Click Email Link
    E->>C: User clicks link
    
    Note over C,A: Step 3: Submit New Password
    C->>A: RecoveryConfirm(token, new_password)
    
    A->>DB: consume_token(token_hash)
    
    alt Token Invalid/Expired
        DB-->>A: Token not found or used
        A-->>C: ❌ Invalid or expired token
    end
    
    DB-->>A: ✓ Token valid, user_id
    
    A->>A: Hash new password (Argon2id)
    A->>DB: update_password(user_id, hash)
    A->>DB: revoke_all_sessions(user_id)
    Note over DB: Force re-login everywhere
    
    A->>E: Send confirmation email
    A-->>C: ✅ Password changed
    
    C->>A: Sign in with new password

Security Features

1. No User Enumeration (OWASP Critical)

2. Atomic Token Management

3. Single-Use Tokens

4. Forced Session Revocation

5. Notification Email


Email Verification Flow

Email verification supports dual paths to accommodate different client types:

sequenceDiagram
    participant C as Client
    participant A as AuthService
    participant DB as Database
    participant E as Email

    Note over C,A: Step 1: Sign Up
    
    C->>A: SignUp(email, password, name)
    A->>A: Hash password
    A->>DB: create_user(status=pending)
    A->>DB: create_verification_token(24h)
    
    par Background
        A->>E: Send welcome email
    end
    
    A-->>C: ✓ Success (PENDING)
    
    Note over C,E: Step 2: Click Email Link
    
    E->>C: User clicks link
    C->>A: ConfirmVerification(token)
    A->>DB: auth.verify_email(token_hash)
    
    alt Token Invalid
        DB-->>A: Error
        A-->>C: ❌ Invalid or expired
    end
    
    DB-->>A: ✓ User verified
    A->>A: Generate tokens
    A->>DB: create_session()
    A-->>C: ✅ Tokens (auto-login)

Path Comparison

Feature gRPC (Primary) REST (Fallback)
Client Mobile, Desktop, Web SPA Email clients, old browsers
Auto-Login ✅ Yes (returns tokens) ❌ No (redirect only)
User Experience Seamless (one-click done) Manual login required
Deep Linking ✅ App handles URL ❌ Browser navigation
Implementation Same DB function Same DB function
Security JWT tokens via gRPC Redirect to frontend

Key Implementation Details

1. Atomic Database Function auth.verify_email()

CREATE OR REPLACE FUNCTION auth.verify_email(p_token_hash BYTEA)
RETURNS auth.users AS $$
DECLARE
  v_user_id UUID;
  v_status auth.user_status;
BEGIN
  -- Step 1: Consume token atomically
  UPDATE auth.email_verification_tokens
  SET used_at = now()
  WHERE token_hash = p_token_hash
    AND used_at IS NULL
    AND expires_at > now()
  RETURNING id_user INTO v_user_id;
  
  IF v_user_id IS NULL THEN
    RAISE EXCEPTION 'TOKEN_INVALID';
  END IF;
  
  -- Step 2: Check account status
  SELECT status INTO v_status FROM auth.users WHERE id = v_user_id;
  IF v_status IN ('suspended', 'deleted') THEN
    RAISE EXCEPTION 'ACCOUNT_SUSPENDED';
  END IF;
  
  -- Step 3: Verify email and activate
  UPDATE auth.users
  SET email_verified = TRUE,
      status = CASE WHEN status = 'pending' THEN 'active' ELSE status END
  WHERE id = v_user_id
  RETURNING * INTO result;
  
  RETURN result;
END;
$$ LANGUAGE plpgsql;

2. Graceful Client Fallback

3. Token Security


Part V: Production Operations

Observability from Day One

Production systems need visibility. This service integrates four complementary layers of observability:

Layer 1: Structured Logging (tracing)

Every request receives a trace ID that follows it through all operations. The tracing crate provides structured, machine-parseable logs:

use tracing::{info, error, instrument};

#[instrument(skip(self), fields(user_id))]
pub async fn authenticate(&self, req: AuthenticateRequest) -> Result<AuthResponse> {
    let user = self.find_user(&req.identifier).await?;
    
    // Record user_id in span for all subsequent logs
    Span::current().record("user_id", user.id.to_string());
    
    info!("User authenticated successfully");
    Ok(self.create_session(user, req).await?)
}

Every request gets a trace ID that follows it through all operations. Optionally, logs are output as structured JSON for easy ingestion by monitoring systems:

{
  "timestamp": "2026-01-23T15:42:17.123Z",
  "level": "INFO",
  "message": "User authenticated successfully",
  "request_id": "01JQBX...",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "span": "authenticate"
}

Layer 2: Prometheus Metrics

use metrics::{counter, histogram};

pub async fn authenticate(&self, req: AuthenticateRequest) -> Result<AuthResponse> {
    let start = Instant::now();
    
    let result = self.authenticate_impl(req).await;
    
    histogram!("auth_duration_ms").record(start.elapsed().as_millis() as f64);
    
    if result.is_ok() {
        counter!("auth_success_total").increment(1);
    } else {
        counter!("auth_failure_total").increment(1);
    }
    
    result
}

Metrics exposed at /metrics:

Example Prometheus queries:

# P99 authentication latency
histogram_quantile(0.99, rate(auth_duration_ms_bucket[5m]))

# Authentication failure rate
rate(auth_failure_total[5m]) / rate(auth_success_total[5m])

# Database query errors
rate(db_query_errors_total[5m])

Layer 3: Distributed Tracing (OpenTelemetry)

pub fn setup_telemetry(config: &TelemetryConfig) -> TelemetryGuard {
    // Initialize OpenTelemetry exporter
    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(&config.otlp_endpoint)
        )
        .install_batch(opentelemetry_sdk::runtime::Tokio)?;
    
    // Combine with console logging
    tracing_subscriber::registry()
        .with(tracing_opentelemetry::layer().with_tracer(tracer))
        .with(fmt::layer().json())
        .init();
    
    TelemetryGuard { tracer }
}

Sends traces to Jaeger, Tempo, or any OTLP collector for distributed request tracing across services.

Layer 4: Error Tracking (Sentry)

#[cfg(feature = "sentry")]
pub fn init_sentry(dsn: &str) -> ClientInitGuard {
    sentry::init((dsn, sentry::ClientOptions {
        release: Some(env!("CARGO_PKG_VERSION").into()),
        environment: Some(std::env::var("ENVIRONMENT").unwrap_or_else(|_| "development".into()).into()),
        ..Default::default()
    }))
}

Automatic error capture with context:


Cloud-Native Deployment

Docker: Optimized for Size and Security

The Dockerfile uses a multi-stage build with static linking to produce a minimal, secure image:

# Build stage
FROM rust:1.93-alpine AS build
RUN apk add --no-cache musl-dev protobuf-dev ca-certificates

COPY . /app
WORKDIR /app

RUN cargo build --release --locked && \
    cp target/release/auth-service /service

# Runtime stage
FROM scratch

COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /service /service

ENV RUST_LOG=info
EXPOSE 8080
ENTRYPOINT ["/service"]

Result: 15-25 MB image with a minimal OS-level attack surface (no shell, no package manager, no distro userland).

Multi-Architecture CI/CD: Native Builds at Scale

The Challenge: Cross-compilation with QEMU is painfully slow, often taking 30+ minutes for Rust projects. Most CI/CD pipelines suffer from this bottleneck.

The Solution: Native builds on platform-specific runners, where each architecture builds on its own hardware.

name: Build & Publish multi-arch image

jobs:
  # ARM64: Native build on ARM runner (GitHub hosted)
  build-arm64:
    runs-on: ubuntu-24.04-arm  # ARM64 runner
    steps:
      - uses: docker/build-push-action@v6
        with:
          platforms: linux/arm64
          push: true
          push-by-digest: true  # Push by digest for manifest merge
          cache-from: type=gha,scope=arm64
          cache-to: type=gha,scope=arm64,mode=max

  # AMD64: Native build on x86 runner
  build-amd64:
    runs-on: ubuntu-latest      # AMD64 runner
    steps:
      - uses: docker/build-push-action@v6
        with:
          platforms: linux/amd64
          push: true
          push-by-digest: true
          cache-from: type=gha,scope=amd64

  # Merge: Create multi-arch manifest
  manifest:
    needs: [build-arm64, build-amd64]
    steps:
      - name: Create and push manifest
        run: |
          docker buildx imagetools create \
            -t ghcr.io/org/auth-service:latest \
            ghcr.io/org/auth-service@sha256:$ARM64_DIGEST \
            ghcr.io/org/auth-service@sha256:$AMD64_DIGEST

Performance comparison:

Approach Build Time Cost (GitHub Actions)
QEMU cross-compilation 35-45 min High (single long-running job)
Native parallel builds 4-6 min Low (parallel, shorter duration)
Speedup ~8x faster 60% cost reduction

Key innovations:

  1. Parallel Execution — Both architectures build simultaneously
  2. Layer Caching — GitHub Actions cache (type=gha) persists Cargo dependencies across builds
  3. Digest-Based Push — Each build pushes by digest; the manifest job merges them
  4. SBOM and Provenance — Security metadata is included automatically

Aggressive Build Optimizations

The Cargo.toml profile maximizes release binary optimization:

[profile.release]
opt-level = 3           # Maximum optimization
lto = true              # Link-time optimization (whole-program)
codegen-units = 1       # Single codegen unit for better optimization
strip = true            # Strip symbols (reduces binary size 30%)

Impact on binary size:

Configuration Binary Size Image Size
Debug build 180 MB N/A
Release (no LTO) 45 MB 50 MB
Release (with LTO) 28 MB 32 MB
Release (LTO + strip) 12 MB 15 MB

LTO (Link-Time Optimization) enables:

Trade-off: Longer compile time (5-7 minutes) for 60% smaller binaries. In production, smaller images mean:

The Dockerfile uses static musl linking for a truly standalone binary:

RUN cargo build --release --locked && \
    cp target/release/auth-service /service

# Runtime: scratch (0 bytes base)
FROM scratch
COPY --from=build /service /service
ENTRYPOINT ["/service"]

Result: 15MB self-contained image with zero dependencies. Compare to:

Google Cloud Run Deployment

One-command deployment:

gcloud run deploy auth-service \
  --image ghcr.io/zs-dima/auth-service-rs:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars PORT=8080 \
  --set-secrets=JWT_SECRET_KEY=jwt-secret:latest,DB_PASSWORD=db-password:latest

Cloud Run automatically:

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth-service
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: auth-service
        image: ghcr.io/zs-dima/auth-service-rs:latest
        resources:
          requests:
            memory: "32Mi"   # Rust's small footprint
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        env:
        - name: PORT
          value: "8080"
        - name: JWT_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: auth-secrets
              key: jwt-secret
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Note the resource requirements: 32 MB request, 128 MB limit. Compare this to Java’s typical 512 MB–2 GB requirements.


The Cross-Platform Client

The Flutter client (auth-app) demonstrates one codebase for all platforms: Android, iOS, Windows, macOS, Linux, and Web.

Platform-Aware gRPC Channel

ClientChannelBase getChannel(String host, int port) {
  if (kIsWeb) {
    // Web: Use XHR-based gRPC-Web
    return GrpcWebClientChannel.xhr(
      Uri.parse('https://$host:$port'),
    );
  } else {
    // Native: Use HTTP/2 channel
    return ClientChannel(
      host,
      port: port,
      options: ChannelOptions(
        credentials: ChannelCredentials.secure(),
      ),
    );
  }
}

Key insight: The server’s protocol multiplexing enables this — browsers use gRPC-Web supporting server-side streaming, native apps use native gRPC, same backend.

Automatic Token Injection

class AuthInterceptor extends ClientInterceptor {
  final TokenStorage _tokens;
  
  @override
  ResponseFuture<R> interceptUnary<Q, R>(
    ClientMethod<Q, R> method,
    Q request,
    CallOptions options,
    ClientUnaryInvoker<Q, R> invoker,
  ) {
    final token = _tokens.accessToken;
    if (token != null) {
      options = options.mergedWith(
        CallOptions(metadata: {'authorization': 'Bearer $token'}),
      );
    }
    return invoker(method, request, options);
  }
}

Every gRPC call automatically includes the access token, eliminating manual header management.


Part VI: Getting Started & Comparison

Rust vs. Traditional Stacks

Having implemented authentication services in Go, Node.js, and Rust, here are the concrete improvements that Rust delivers:

1. Compile-Time Correctness vs Runtime Surprises

Go/Node.js: Database schema changes discovered in production

// Go: Compiles fine, fails at runtime if column renamed
user, err := db.Query("SELECT id, email FROM users WHERE email = ?", email)
// ERROR: column "email" does not exist (discovered in production!)

Rust: Database schema changes discovered during cargo build

// Rust: Fails compilation if column doesn't exist
let user = sqlx::query_as!(User, "SELECT id, email FROM users WHERE email = $1", email)
// error: no such column: `email` in table `users` (caught at compile time!)

Impact: Eliminated an entire class of database-related production incidents. Schema migrations now require corresponding code changes before deployment.

2. Memory Safety Without Runtime Overhead

Go: Garbage collector pauses cause latency spikes

Node.js: Worse GC characteristics

Rust: No garbage collector

Real-world metric: Authentication endpoint at 10,000 req/s:

3. Type Safety Prevents Vulnerability Classes

Go/Node.js: String manipulation errors cause security issues

// Go: Easy to accidentally log secrets
log.Printf("Config: %+v", config)  // Oops, logged JWT secret!

// Node.js: JSON serialization exposes secrets
res.json(config)  // Sent JWT secret to client!

Rust: Compiler enforces secret protection

use secrecy::SecretString;

pub struct Config {
    jwt_secret: SecretString,  // Debug trait redacts value
}

println!("{:?}", config);  // Output: Config { jwt_secret: "***" }

Using SecretString makes it harder to accidentally expose secrets through logging or serialization.

4. Zero-Cost Abstractions Enable Better Architecture

Go: Interface indirection has runtime cost

type Repository interface {
    GetUser(id string) (*User, error)
}

// Every call goes through vtable — measurable overhead at scale
user, err := repo.GetUser(id)

Rust: Trait dispatch optimized away at compile time

trait Repository {
    async fn get_user(&self, id: Uuid) -> Result<User, AppError>;
}

// Monomorphization means direct call — zero overhead
let user = repo.get_user(id).await?;

This enables layered architectures (repository → service → handler) without performance penalties. Rust’s design philosophy: abstractions should not cost.

5. Dependency Management and Build Reproducibility

Node.js: node_modules chaos

Go: Better, but still issues

Rust: Cargo ecosystem

6. Fearless Refactoring

Go/Node.js: Refactoring requires extensive testing

Rust: The compiler catches breaking changes

Real experience: Refactoring session management from ID-based to hash-based tokens:

7. Cross-Platform Deployment

Node.js: Different behavior across platforms

Rust: True “compile once, run anywhere”

Performance Summary Table

Metric Node.js (Express) Go (gRPC) Rust (Tonic)
Idle Memory 70-150 MB 20-40 MB 5-15 MB
P99 Latency 30-80 ms 15-30 ms 2-10 ms
Throughput (1 core) 5k req/s 15k req/s 25k req/s
Docker Image 200-400 MB 30-50 MB 15-25 MB
Cold Start 200-500 ms 50-150 ms 10-30 ms
Memory Safety Runtime (V8) Runtime Compile-time
SQL Verification Runtime Runtime Compile-time

Cost Implications at Scale

For a service handling 10 million authentications per day:

Node.js deployment (AWS ECS):

Go deployment (AWS ECS):

Rust deployment (AWS ECS):

Rust is 9x cheaper than Node.js, 3x cheaper than Go at the same traffic level.


What Makes This Implementation Unique?

This authentication service showcases several architectural innovations rarely seen together:

1. Protocol Multiplexing on a Single Port

2. Dual-Path Email Verification

3. Compile-Time Database Guarantees

4. Atomic Operations Prevent Race Conditions

5. Native Multi-Architecture CI/CD

6. Zero-Trust Security by Design

7. Graceful Degradation

8. Observable from Day One


Getting Started

Prerequisites

Quick Start (Local Development)

1. Clone and configure

git clone https://github.com/zs-dima/auth-service-rs
cd auth-service-rs
cp configs/.env.example configs/development.env

2. Edit configs/development.env

# Required
DB_URL=postgres://postgres:@localhost:5432/auth_dev
JWT_SECRET_KEY=your-32-character-secret-key-here

# Optional: Email (service works without it)
SMTP_URL=smtp://user@localhost:1025?tls=none  # MailHog for local testing
EMAIL_SENDER="Dev Auth <dev@localhost>"

3. Setup database

# Create database
createdb auth_dev

# Run migrations
make db-migrate

# Verify SQLx offline data (optional)
make db-prepare

4. Run the service

make run
# Or with auto-reload on file changes:
make watch

5. Test with grpcurl

# Install grpcurl
go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest

# Sign up
grpcurl -plaintext -d '{
  "email": "test@example.com",
  "password": "SecurePass123",
  "display_name": "Test User",
  "client_info": {
    "device_type": "web",
    "device_id": "browser-123",
    "ip_address": "127.0.0.1"
  }
}' localhost:8080 auth.AuthService/SignUp

# Authenticate
grpcurl -plaintext -d '{
  "identifier": "test@example.com",
  "password": "SecurePass123",
  "client_info": {
    "device_type": "web",
    "device_id": "browser-123"
  }
}' localhost:8080 auth.AuthService/Authenticate

Development Workflow

# Format code
make fmt

# Run linter
make lint

# Run tests
make test

# Run all pre-commit checks
make pre-commit

# Generate proto code (after modifying .proto files)
make proto

# Update database offline data (after schema changes)
make db-prepare

Docker Deployment

Build image:

make docker
# Result: 15-25 MB image based on scratch

Run container:

docker run -p 8080:8080 \
  -e DB_URL=postgres://... \
  -e JWT_SECRET_KEY=... \
  auth-service:latest

Docker Compose:

services:
  auth-service:
    image: ghcr.io/zs-dima/auth-service-rs:latest
    ports:
      - "8080:8080"
    environment:
      DB_URL: postgres://auth:password@db:5432/auth
      JWT_SECRET_KEY_FILE: /run/secrets/jwt_secret
    secrets:
      - jwt_secret
  
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: auth
      POSTGRES_USER: auth
      POSTGRES_PASSWORD: password
    volumes:
      - db-data:/var/lib/postgresql/data

secrets:
  jwt_secret:
    file: ./secrets/jwt_secret.txt

volumes:
  db-data:

Cloud Run Deployment

Prerequisites:

Deploy:

# Build and push image
gcloud builds submit --tag gcr.io/PROJECT_ID/auth-service

# Deploy to Cloud Run
gcloud run deploy auth-service \
  --image gcr.io/PROJECT_ID/auth-service \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars DOMAIN=api.example.com \
  --set-secrets JWT_SECRET_KEY=jwt-secret:latest,DB_PASSWORD=db-password:latest \
  --memory 512Mi \
  --cpu 1 \
  --max-instances 10

Cloud Run automatically:

Key Configuration Variables

Secrets (use _FILE variants in production):

JWT_SECRET_KEY=32-char-minimum-secret
DB_PASSWORD=strong-database-password
S3_SECRET_ACCESS_KEY=s3-secret-key
MAILJET_API_SECRET=mailjet-private-key

Token lifetimes:

ACCESS_TOKEN_TTL_MINUTES=60        # Short-lived (1 hour)
REFRESH_TOKEN_TTL_DAYS=90          # Long-lived (3 months)
PASSWORD_RESET_TTL_MINUTES=60      # Reset link expires in 1 hour
EMAIL_VERIFICATION_TTL_HOURS=24    # Verification link valid 24 hours

Security settings:

MAX_FAILED_LOGIN_ATTEMPTS=5        # Lock account after 5 failed attempts
LOCKOUT_DURATION_MINUTES=15        # Lock duration

Performance tuning:

DB_POOL_MIN=2                      # Minimum database connections
DB_POOL_MAX=10                     # Maximum database connections
CONCURRENCY_LIMIT=100              # Max concurrent gRPC requests

Monitoring and Observability

Prometheus metrics at /metrics:

# Authentication requests per second
rate(auth_requests_total[5m])

# P99 latency
histogram_quantile(0.99, rate(auth_duration_seconds_bucket[5m]))

# Error rate
rate(auth_errors_total[5m]) / rate(auth_requests_total[5m])

# Database connection pool usage
db_connections_active / db_connections_max

OpenTelemetry tracing:

Sentry error tracking:

Troubleshooting

“failed to verify access token: no ‘kid’ in token header”

“database connection failed”

“failed to compile protos”

“SQLx query verification failed”

Email not sending


Testing and CI/CD

The service implements compile-time validation at multiple layers:

Compile-Time Guarantees

SQLx Query Verification — SQLx validates every query against the database during build:

// This fails at compile-time if 'email' column doesn't exist or types mismatch
let user = sqlx::query_as!(User,
    "SELECT id, email, password FROM auth.users WHERE email = $1",
    email
).fetch_optional(&pool).await?;

What SQLx catches at compile time:

# Generate offline query data for CI (no DB needed at build time)
cargo sqlx prepare

# Verify queries match schema
cargo sqlx prepare --check

Additional compile-time checks:

CI/CD: Native Multi-Architecture Builds

Traditional Docker multi-platform builds use QEMU emulation (10x slower). This project uses native ARM64 runners:

graph LR
    subgraph "Build Phase (Parallel)"
        A["ARM64 Runner<br/>(ubuntu-24.04-arm)"]
        B["AMD64 Runner<br/>(ubuntu-latest)"]
    end
    
    subgraph "Manifest Creation"
        E["Multi-Arch Manifest"]
    end
    
    subgraph "Registry"
        F["ghcr.io/org/auth-service:latest"]
    end
    
    A -->|"4 min"| E
    B -->|"3 min"| E
    E --> F
    
    style A fill:#4CAF50
    style B fill:#2196F3
    style F fill:#FF5722
Build Method ARM64 AMD64 Total
Native runners 4 min 3 min 7 min
QEMU emulation 40 min 3 min 43 min
Speedup 10x Same 6x

Pre-commit workflow:

make pre-commit  # Runs: fmt, clippy, test, sqlx-check

Production Readiness Checklist

The following table summarizes the key production-ready features of this service:

Category Highlights
Security OWASP compliance, Argon2id hashing, JWT rotation, account lockout, generic errors, zero unsafe
Scalability Stateless (JWT), connection pooling, streaming APIs, atomic DB operations
Observability OpenTelemetry, Prometheus /metrics, Sentry, structured JSON logs
Reliability Compile-time SQL, graceful shutdown, timeout middleware, retry logic
Deployment 15-25 MB images, multi-arch (ARM64/AMD64), Cloud Run & K8s ready

Roadmap

Feature Status
OAuth 2.0 social login 🚧 In development
MFA / TOTP 📋 Planned
Passkeys / WebAuthn 📋 Planned
Rate limiting 📋 Planned

Want to contribute? PRs welcome for email providers (SendGrid, AWS SES), SMS providers (Twilio), Helm charts, and Terraform modules.


Conclusion

Building authentication services in Rust isn’t just about performance — it’s about confidence. The compiler catches mistakes that would become production incidents in other languages:

Key Takeaways

  1. Compile-Time Guarantees Win — SQL queries are validated at compile time, and protocol validation is embedded at build time. Entire classes of bugs are eliminated before deployment.
  2. Zero-Cost Abstractions Are Real — Tower middleware, async/await, and perfect-hash functions compile to code as efficient as hand-written C.
  3. Memory Safety Equals Production Reliability — No null pointer exceptions, no use-after-free, and no data races.
  4. Infrastructure Services Deserve Infrastructure Languages — Authentication is the critical path for every request.

This service demonstrates that infrastructure-grade reliability is achievable with:

The Rust ecosystem has matured to the point where building production services is not just feasible — it’s enjoyable. Tonic, SQLx, Tower, and Tokio form a cohesive stack that rivals any language for backend development while providing guarantees that others cannot match.

The future of backend development is compiled, type-safe, and fast. The future is Rust.


Next Steps

Explore the Code

Learn More

Connect

Questions? Feedback? Find me:


Built with ❤️ by Dmitrii Zusmanovich

If this article helped you, consider ⭐️ starring the repository!