~35 min read
Authentication is the front door of your platform. Every request hits it, every incident starts or ends there, and every millisecond of latency multiplies across your entire product.
For years, backend engineers have faced painful trade-offs:
Rust eliminates these compromises.
By combining Rust with Tonic (gRPC), SQLx (async SQL), and Tower (middleware), you get a stack where:
This article dissects a production-grade authentication service that demonstrates why infrastructure-critical services are increasingly being built in Rust. We explore architectural innovations that are only possible with Rust’s type system:
Note on numbers: The memory/latency/cost figures below are typical “order of magnitude” comparisons. Your results will depend on dependencies, runtime flags, database, traffic shape, and deployment environment.
Live Demo: auth-demo.dmitrii.app
Source Code: Server (Rust) • Client (Flutter)
Part I: Why Rust Changes Everything
Part II: Architecture
Part III: Security & OWASP Compliance
Part IV: Authentication Flows
Part V: Production Operations
Part VI: Getting Started & Comparison
Wrap-Up
Appendix
Authentication services are high-traffic gateways — every request touches them. They must be fast, secure, and economical to run at scale.
| Runtime | Idle Memory | P99 Latency | Cost @ 100 Replicas |
|---|---|---|---|
| Java/Spring Boot | 250–500 MB | 50–200 ms | 25–50 GB requested |
| Node.js/Express | 70–150 MB | 20–80 ms | 7–15 GB requested |
| Go/gRPC | 20–40 MB | 5–20 ms | 2–4 GB requested |
| Rust/Tonic | 5–15 MB | 2–10 ms | 0.5–1.5 GB |
These ranges are illustrative, not guarantees. Baseline memory/latency depends heavily on framework choices, TLS, logging/metrics, connection pools, and workload. Use this as a mental model, then benchmark your own service.
In Kubernetes — where you often pay for requested resources — a smaller and more predictable baseline makes it easier to right-size requests and scale more granularly.
The real breakthrough isn’t just efficiency — it’s predictability. Without a garbage collector, there are no GC pauses — removing one common source of tail-latency spikes. P99 latencies remain consistently low under load, which is critical when every millisecond of authentication latency cascades across your platform.
Most database bugs surface in production. A renamed column, a missing WHERE clause, or a type mismatch — each produces a runtime error that could have been prevented.
SQLx eliminates this entire class of bugs through compile-time verification.
let user = sqlx::query_as!(
User,
"SELECT id, email FROM auth.users WHERE email = $1",
email
)
.fetch_optional(&pool)
.await?;
During cargo build, SQLx:
CI tip: SQLx also supports an offline workflow (via
cargo sqlx prepare) so builds don’t need a live database connection. This keeps CI deterministic while preserving compile-time validation.
A renamed email column? The build breaks. This moves bugs from runtime (where they hurt users) to compile-time (where they only annoy developers).
Before SQLx, you’d discover this in production:
ERROR: column "email" does not exist
With SQLx, you discover it during cargo build:
error: no such column: `email` in table `users`
--> src/services/auth.rs:42:5
Rust’s macro ecosystem makes this kind of schema-aware checking unusually ergonomic. SQLx’s online/offline workflow is one of the most seamless ways to get compile-time confidence in real SQL.
Rust compiles async/await into state machines at compile time. There are no runtime allocations and no scheduler overhead. Futures remain lightweight (often stack-resident until you spawn them), while Tokio provides the scheduling and I/O layer with low, predictable overhead:
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let config = Config::init()?;
telemetry::init(&config.telemetry)?;
let (app, addr) = startup::build_app(&config).await?;
let listener = TcpListener::bind(addr).await?;
info!("Server listening on {addr}");
axum::serve(listener, app)
.with_graceful_shutdown(shutdown_signal())
.await?;
Ok(())
}
Every await becomes a state transition in that generated state machine. In practice, the runtime cost is often close to what you’d write by hand in a lower-level language—while still preserving memory safety.
Rust’s type system enforces security patterns:
1. Secret Protection
use secrecy::SecretString;
pub struct Config {
jwt_secret: SecretString, // Can't accidentally log
db_password: SecretString, // Debug trait redacts value
}
2. Lifetime Safety
// Compiler ensures database connections don't outlive transactions
pub async fn create_session<'a>(
pool: &'a PgPool,
params: CreateSessionParams<'a>, // Borrows from caller
) -> Result<Session, AppError>
3. Thread-Safe Sharing
// Arc provides safe sharing without locks
let context = Arc::new(ServiceContext::new(db, email, s3));
let auth_service = AuthService::new(context.clone());
let user_service = UserService::new(context.clone());
During development, the compiler caught timing attack vulnerabilities that would have been subtle runtime bugs in other languages. For example, accidentally cloning a token hash instead of referencing it would create a timing side-channel — Rust’s borrow checker flagged this immediately.
The service follows a layered architecture with clear separation of concerns:
graph TB
subgraph Clients["📱 Client Layer"]
Mobile["Mobile Apps<br/>(Native gRPC)"]
Web["Web Apps<br/>(gRPC-Web)"]
Browser["Browsers<br/>(REST)"]
end
subgraph Gateway["🚪 Gateway - Single Port"]
Router["Axum Router<br/>(Protocol Detection)"]
end
subgraph Middleware["🛡️ Middleware Stack (Tower)"]
MW["RequestID → Tracing → Auth → CORS → Timeout"]
end
subgraph Protocol["📡 Protocol Handlers"]
GrpcWeb["gRPC + gRPC-Web"]
RestAPI["REST API"]
end
subgraph Services["⚙️ Service Layer"]
AuthSvc["AuthService"]
UserSvc["UserService"]
HealthSvc["HealthService"]
end
subgraph Context["🔧 ServiceContext (Shared Infrastructure)"]
DB["Database Pool"]
Email["Email Provider"]
S3["S3 Storage"]
GeoIP["GeoIP"]
JWT["JWT Validator"]
URLs["URL Builder"]
end
subgraph Data["💾 Data Layer"]
Postgres[("PostgreSQL<br/>Atomic Functions")]
end
Mobile & Web & Browser --> Router
Router --> MW
MW --> GrpcWeb & RestAPI
GrpcWeb --> AuthSvc & UserSvc & HealthSvc
RestAPI --> HealthSvc
AuthSvc --> DB & Email & GeoIP & JWT
UserSvc --> DB & S3 & Email
HealthSvc --> DB
DB --> Postgres
style Gateway fill:#4CAF50,color:#fff
style Middleware fill:#2196F3,color:#fff
style Services fill:#FF9800,color:#fff
style Data fill:#FF5722,color:#fff
1. Single Responsibility via Crate Organization
The project uses Cargo workspace to separate concerns:
auth-service-rs/
├── src/ # Service binary (wiring, routing)
│ ├── main.rs # Entry point, telemetry setup
│ ├── startup.rs # Server wiring, dependency injection
│ ├── routes.rs # REST endpoints (health, verify-email)
│ ├── config.rs # Environment config with validation
│ ├── services/ # gRPC services implementations
│ ├── middleware/ # Tower layers (auth, tracing, IP)
│ └── core/ # Service context, URLs, password utils
├── crates/
│ ├── proto/ # Protobuf definitions + codegen
│ ├── core/ # Domain logic (JWT, tokens, password, validation)
│ ├── db/ # SQLx repositories, migrations, atomic SQL functions
│ ├── email/ # SMTP provider
│ ├── mailjet/ # Mailjet provider
│ ├── storage/ # S3-compatible storage
│ └── telemetry/ # OTel, Prometheus, Sentry glue
└── proto/ # Raw .proto files (auth.proto, users.proto)
Benefits of this structure:
auth-email don’t require recompiling auth-dbauth-core can be imported by other Rust services2. Dependency Injection via ServiceContext
Arc<ServiceContext>3. Compile-Time Verification
4. Zero-Trust Security
AuthInfo extensionSecretString type5. Graceful Degradation
Traditional microservices run gRPC and REST on separate ports, requiring:
This service handles gRPC, gRPC-Web, and REST on one port using Axum’s router composition:
pub async fn build_app(config: &Config) -> Result<(Router, SocketAddr)> {
// gRPC services
let grpc_routes = Routes::new(health_service)
.add_service(auth_service)
.add_service(user_service);
// REST routes
let rest_router = rest_routes(state);
// gRPC-Web layer enables browsers (HTTP/1.1)
let grpc_router = grpc_routes
.into_axum_router()
.layer(GrpcWebLayer::new());
let app = rest_router.merge(grpc_router).layer(middleware);
Ok((app, addr))
}
Requests are routed based on the Content-Type header:
application/grpc → native gRPC (mobile and desktop clients)application/grpc-web → gRPC-Web (browser clients)application/json → REST (health checks, webhooks, and callbacks)DevOps complexity: eliminated
The service implements a unified middleware pipeline that works seamlessly for both gRPC and REST, eliminating the need for separate interceptor chains. This section covers two key innovations: perfect-hash route matching and the Tower middleware stack.
Most auth middleware needs a “public route allowlist” (health checks, sign-up, recovery endpoints). Many implementations use HashMap or linear checks.
This service uses a compile-time perfect hash:
use phf::phf_set;
static PUBLIC_ROUTES: phf::Set<&'static str> = phf_set! {
// gRPC methods
"Authenticate", "SignUp", "RecoveryStart", "RefreshTokens",
// REST endpoints
"/health", "/metrics", "/verify-email",
};
fn is_public_route(path: &str) -> bool {
PUBLIC_ROUTES.contains(path) // O(1), zero allocation
}
phf_set! generates a perfect hash at compile time:
The same auth layer returns the right “shape” of error for gRPC vs REST:
fn build_error_response(err: &JwtError, is_grpc: bool) -> Response<Body> {
if is_grpc {
// gRPC typically uses HTTP 200 and carries status in `grpc-status` trailers
Response::builder()
.status(StatusCode::OK)
.header("grpc-status", "16") // UNAUTHENTICATED
.header("grpc-message", err.to_string())
.body(Body::empty())
} else {
// REST uses standard HTTP status codes
Response::builder()
.status(StatusCode::UNAUTHORIZED)
.header("www-authenticate", "Bearer")
.body(json!({"error": err}).into())
}
}
This enables one authentication layer for all protocols.
Here’s the complete middleware pipeline:
let middleware = ServiceBuilder::new()
.layer(RequestIdLayer::new()) // 1. Extract/generate request ID
.layer(TraceLayer::new_for_http()) // 2. Distributed tracing
.layer(TimeoutLayer::new(Duration::from_secs(10))) // 3. Request timeout
.layer(cors) // 4. CORS handling
.layer(AuthLayer::new(jwt_validator)); // 5. JWT authentication
Key insight: Tower’s middleware abstraction is protocol-agnostic — the same Layer trait works for gRPC and REST because both are HTTP-based.
Benefits of unified middleware:
Content-Type and formats errors appropriatelyIf you need gRPC-only behavior (custom metadata/trailers), Tonic interceptors are still available:
let server = AuthServiceServer::with_interceptor(
auth_service,
|req: Request<()>| {
if let Some(_custom) = req.metadata().get("x-custom") {
// gRPC-specific logic here
}
Ok(req)
}
);
This architecture provides simplicity by default with power when needed — the best of both worlds.
Race conditions in authentication systems cause critical security bugs: double token use, account takeover, and privilege escalation. Traditional multi-query transactions leave windows where concurrent requests can corrupt state. This section demonstrates how to eliminate these vulnerabilities entirely.
Instead of multi-step transactions in application code, critical operations are atomic database functions:
CREATE OR REPLACE FUNCTION auth.verify_email(p_token_hash BYTEA)
RETURNS auth.users
LANGUAGE plpgsql
AS $$
DECLARE
v_user_id UUID;
v_status auth.user_status;
BEGIN
-- Step 1: Consume token atomically
UPDATE auth.email_verification_tokens
SET used_at = now()
WHERE token_hash = p_token_hash
AND used_at IS NULL
AND expires_at > now()
RETURNING id_user INTO v_user_id;
IF v_user_id IS NULL THEN
RAISE EXCEPTION 'TOKEN_INVALID';
END IF;
-- Step 2: Validate user status
SELECT status INTO v_status
FROM auth.users WHERE id = v_user_id;
IF v_status IN ('suspended', 'deleted') THEN
RAISE EXCEPTION 'ACCOUNT_SUSPENDED';
END IF;
-- Step 3: Verify + activate
UPDATE auth.users
SET email_verified = TRUE,
status = CASE WHEN status = 'pending' THEN 'active' ELSE status END
WHERE id = v_user_id
RETURNING * INTO result;
RETURN result;
END;
$$;
Without atomic function (multi-query approach):
// ❌ Race condition possible between queries
let token = db.get_token(hash).await?;
if token.used_at.is_some() {
return Err(AppError::TokenInvalid);
}
db.mark_token_used(token.id).await?; // Another request could use token here
db.verify_user_email(token.user_id).await?;
With atomic function:
// ✅ Single atomic operation
let user = db.verify_email(hash).await?;
One call, one transaction, no race window.
Session Touch (Sliding Expiration)
CREATE FUNCTION auth.touch_session(p_token_hash BYTEA, p_extend_by INTERVAL)
RETURNS TABLE (id_user UUID, expires_at TIMESTAMPTZ)
AS $$
UPDATE auth.sessions
SET last_seen_at = now(),
expires_at = GREATEST(expires_at, now() + p_extend_by),
activity_count = activity_count + 1
WHERE refresh_token = p_token_hash AND expires_at > now()
RETURNING id_user, expires_at;
$$;
Every token refresh atomically validates, updates activity, and extends expiration — preventing session hijacking through precise timing attacks.
The schema leverages PostgreSQL-specific features for safety and performance:
1. UUIDv7 for Time-Ordered IDs
CREATE TABLE auth.users (
id UUID PRIMARY KEY DEFAULT uuidv7(), -- Time-ordered, index-friendly, PostgreSQL 18+
...
);
created_at column queries2. Domain types for normalization + validation
CREATE DOMAIN email AS TEXT
CHECK (
VALUE = lower(VALUE) AND
VALUE ~ '^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$' AND
length(VALUE) <= 254
);
CREATE DOMAIN phone_e164 AS TEXT
CHECK (VALUE ~ '^\\+[1-9][0-9]{1,14}$');
email and phone_e164 are distinct types3. Partial Unique Indexes (Soft Delete)
CREATE UNIQUE INDEX user_email_active_ux
ON auth.users (email)
WHERE deleted_at IS NULL AND email IS NOT NULL;
4. Constraint-Based Business Rules
CONSTRAINT user_auth_method_ck
CHECK (status != 'active' OR email IS NOT NULL OR phone IS NOT NULL)
5. JSONB for Flexible Metadata
metadata JSONB DEFAULT '{}'::JSONB
CONSTRAINT role_permissions_valid_ck
CHECK (jsonb_typeof(permissions) = 'object')
Rather than passing individual dependencies everywhere, a shared context encapsulates all infrastructure:
#[derive(Clone)]
pub struct ServiceContext {
db: Database,
email: Option<EmailProvider>, // SMTP or Mailjet
s3: Option<Arc<S3Storage>>,
urls: UrlBuilder,
}
1. Single Source of Truth
// Services share infrastructure via Arc
let context = Arc::new(ServiceContext::new(db, email, s3, urls));
let auth_service = AuthService::new(config, context.clone());
let user_service = UserService::new(config, context.clone());
2. Graceful Degradation
// Optional features don't break the service
impl ServiceContext {
pub fn send_email(&self, /* params */) {
let Some(email) = &self.email else {
return; // Email not configured — service still works
};
// Send email...
}
}
3. Fire-and-Forget Background Tasks
pub fn send_welcome_email(&self, user_id: Uuid, email: String, /* ... */) {
let db = self.db.clone();
let email_provider = self.email.clone();
tokio::spawn(async move {
// Create verification token
// Send email
// Log result
}); // User response is sent immediately; email delivery happens in background
}
Sign-up returns in milliseconds, while email delivery happens asynchronously. Failures are logged but never block users.
The service leverages compile-time validation through Protocol Buffers annotations. This approach catches invalid requests before they reach business logic, eliminates boilerplate validation code, and ensures consistent enforcement across all clients.
message SignUpRequest {
string email = 1 [
(validate.rules).string.email = true,
(validate.rules).string.max_len = 255
];
string password = 2 [
(validate.rules).string.min_len = 8,
(validate.rules).string.max_len = 128
];
string name = 3 [
(validate.rules).string.min_len = 1,
(validate.rules).string.max_len = 255
];
}
The prost-validate crate generates validation code at compile time:
// In crates/proto/build.rs
prost_validate_build::Builder::new()
.configure(&mut config, proto_files, includes)
.expect("Failed to configure prost-validate");
Every generated message gets a Validator trait implementation. The service uses an extension trait for ergonomic validation:
pub trait ValidateExt {
fn validate_or_status(&self) -> Result<(), Status>;
}
impl<T: Validator> ValidateExt for T {
fn validate_or_status(&self) -> Result<(), Status> {
self.validate()
.map_err(|e| Status::invalid_argument(e.to_string()))
}
}
pub async fn sign_up(
&self,
request: Request<SignUpRequest>,
) -> Result<Response<AuthResponse>, Status> {
let req = request.into_inner();
// Proto validation — one line, all rules checked
req.validate_or_status()?;
// Domain validation — business rules beyond proto
domain::validate_password(&req.password)?;
// All validation passed — proceed with business logic
let user = self.ctx.db().create_user(&req).await?;
// ...
}
Key benefits:
.proto filesFor business rules that can’t be expressed in proto annotations, the service provides domain validators:
pub mod domain {
use tonic::Status;
/// Validate password complexity beyond length (which proto handles)
pub fn validate_password(password: &str) -> Result<(), Status> {
let has_letter = password.chars().any(char::is_alphabetic);
let has_digit = password.chars().any(|c| c.is_ascii_digit());
if !has_letter || !has_digit {
return Err(Status::invalid_argument(
"Password must contain both letters and numbers"
));
}
Ok(())
}
/// Validate E.164 phone format
pub fn validate_phone(phone: &str) -> Result<(), Status> {
if !phone.starts_with('+') || phone.len() < 8 || phone.len() > 16 {
return Err(Status::invalid_argument(
"Phone must be in E.164 format (+1234567890)"
));
}
Ok(())
}
}
This two-tier validation approach (proto + domain) ensures comprehensive input validation without code duplication. Proto annotations handle structural constraints such as length, format, and required fields, while domain validators handle business logic like password complexity and semantic format validation.
The service implements server-side streaming for efficient data transfer, especially useful for admin operations listing thousands of users.
use async_stream::try_stream;
use tokio_stream::StreamExt;
pub fn list_users(&self, req: ListUsersRequest) -> StreamResult<User> {
let db = self.ctx.db().clone();
let stream = try_stream! {
let mut rows = db.users.stream_all_users();
while let Some(result) = rows.next().await {
let user = result.map_err(|e| {
error!(error = %e, "Failed to stream users");
Status::from(AppError::Unavailable(e.to_string()))
})?;
yield User::from(user); // Stream one record at a time
}
};
Box::pin(stream)
}
Without streaming (traditional approach):
// ❌ Load all users into memory
let users = db.get_all_users().await?; // OOM with 100K users
Ok(Response::new(ListUsersResponse { users }))
With streaming:
// ✅ Constant memory regardless of user count
let stream = self.list_users(req);
Ok(Response::new(Box::pin(stream)))
Benefits:
Real-world impact — Admin dashboard listing 50,000 users:
The async_stream::try_stream! macro provides ergonomic syntax for building streams with ? operator support — much cleaner than manual Stream trait implementation.
The service implements the OWASP Authentication Cheat Sheet through multiple complementary layers:
Argon2id Hashing
OWASP currently recommends Argon2id and provides baseline parameters you can start from (then calibrate on your hardware). A common starting point is ~19 MiB memory, 2 iterations, parallelism 1. (Source: OWASP Password Storage Cheat Sheet.)
/// Hash password using Argon2id with OWASP-recommended parameters
pub fn hash_password(password: &str) -> Result<String, AppError> {
let salt = SaltString::generate(&mut OsRng);
// OWASP baseline: 19 MiB memory, 2 iterations, 1 parallelism
let params = Params::new(19 * 1024, 2, 1, None)
.map_err(|e| AppError::internal(format!("argon2 params: {e}")))?;
let argon2 = Argon2::new(Algorithm::Argon2id, Version::V0x13, params);
argon2
.hash_password(password.as_bytes(), &salt)
.map(|hash| hash.to_string())
.map_err(|e| AppError::internal(format!("argon2 hash: {e}")))
}
/// Verify password against stored hash (constant-time comparison)
pub fn verify_password(password: &str, hash: &str) -> bool {
use password_hash::PasswordVerifier;
let parsed_hash = match password_hash::PasswordHash::new(hash) {
Ok(h) => h,
Err(_) => return false,
};
Argon2::default()
.verify_password(password.as_bytes(), &parsed_hash)
.is_ok()
}
Argon2id is memory-hard — resistant to GPU/ASIC attacks. Cost parameters ensure cracking attempts are prohibitively expensive. The stored hash includes algorithm, parameters, and salt, making future parameter upgrades transparent.
To prevent brute-force attacks, the service implements progressive account lockout:
pub async fn authenticate(&self, req: AuthenticateRequest) -> Result<AuthResponse> {
let user = self.find_user(&req.identifier).await?;
// Check if account locked
if let Some(locked_until) = user.locked_until {
if locked_until > Utc::now() {
return Ok(AuthResponse::locked());
}
}
// Verify password
if !password::verify(&req.password, &user.password_hash) {
let attempts = self.db.increment_failed_login(user.id).await?;
// Lock account after N failures
if attempts >= self.config.max_failed_attempts {
self.db.lock_account(user.id, self.config.lockout_duration).await?;
return Ok(AuthResponse::locked());
}
return Ok(AuthResponse::failed());
}
// Success: reset counter
self.db.reset_failed_login(user.id).await?;
Ok(self.create_session(user, req).await?)
}
Configuration via environment:
MAX_FAILED_LOGIN_ATTEMPTS=5LOCKOUT_DURATION_MINUTES=15OWASP requirement: Never reveal whether an email exists in the system.
fn failed_auth() -> AuthResponse {
AuthResponse {
status: AuthStatus::Failed,
message: "Invalid credentials".to_string(), // Generic message
// No indication whether email exists
}
}
Same response for:
This prevents user enumeration attacks, where attackers attempt to harvest valid email addresses.
Cached Validator with Strong Configuration
pub struct JwtValidator {
encoding_key: Arc<EncodingKey>, // Cached for performance
decoding_key: Arc<DecodingKey>,
validation: Validation,
}
impl JwtValidator {
pub fn new(secret: &SecretString) -> Self {
let mut validation = Validation::new(Algorithm::HS256);
validation.set_audience(&["auth-service"]);
validation.set_issuer(&["auth-service"]);
validation.validate_exp = true; // Expiration required
validation.validate_nbf = true; // Not-before required
Self {
encoding_key: Arc::new(EncodingKey::from_secret(secret.as_bytes())),
decoding_key: Arc::new(DecodingKey::from_secret(secret.as_bytes())),
validation,
}
}
}
Token Claims
struct Claims {
sub: String, // user_id
email: String,
name: String,
role: String,
device_id: String,
installation_id: String,
iat: i64, // Issued at
exp: i64, // Expiration
nbf: i64, // Not before
aud: String, // Audience
iss: String, // Issuer
}
Single-Use Tokens
pub async fn refresh_tokens(&self, refresh_token: String) -> Result<TokenPair> {
let token_hash = TokenGenerator::hash_token(&refresh_token);
// Atomically validate and consume token
let session = self.db.touch_session(&token_hash).await?;
// Generate new tokens
let new_access = self.jwt.generate_access_token(&session.user, ...)?;
let new_refresh = TokenGenerator::generate_secure_token();
let new_hash = TokenGenerator::hash_token(&new_refresh);
// Store new refresh token
self.db.update_session_token(session.id, &new_hash).await?;
Ok(TokenPair { access_token: new_access, refresh_token: new_refresh })
}
Key property: If an attacker steals a refresh token, they can only use it once before it’s replaced.
The service architecture is prepared for OAuth 2.0 social login, with the database schema and proto definitions already in place.
Database schema supports OAuth providers:
CREATE TYPE oauth_provider AS ENUM (
'google', 'github', 'microsoft', 'apple', 'facebook'
);
CREATE TABLE oauth_accounts (
id UUID PRIMARY KEY DEFAULT uuidv7(),
id_user UUID NOT NULL REFERENCES users(id),
provider oauth_provider NOT NULL,
provider_id TEXT NOT NULL, -- Provider's user ID
provider_email email,
metadata JSONB DEFAULT '{}'::JSONB,
linked_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (provider, provider_id) -- One account per provider
);
Proto definitions ready:
service AuthService {
rpc GetOAuthUrl(GetOAuthUrlRequest) returns (GetOAuthUrlResponse);
rpc ExchangeOAuthCode(ExchangeOAuthCodeRequest) returns (AuthResponse);
rpc LinkOAuthProvider(LinkOAuthProviderRequest) returns (Empty);
rpc UnlinkOAuthProvider(UnlinkOAuthProviderRequest) returns (Empty);
rpc ListLinkedProviders(ListLinkedProvidersRequest) returns (ListLinkedProvidersResponse);
}
Why this architecture matters:
Implementation path: When OAuth is needed, implementers only need to:
oauth2 crate)unimplemented statusNo database migrations and no API changes are required — the architecture anticipated this need.
The service implements sliding session expiration with single-use refresh tokens, as illustrated in the following sequence:
sequenceDiagram
participant C as Client
participant A as AuthService
participant DB as Database
Note over C: Access token expired after 60 min
C->>A: Request with expired token
A-->>C: ❌ 401 Unauthenticated
C->>A: RefreshTokens(refresh_token)
A->>A: hash_token(refresh_token)
A->>DB: touch_session(token_hash)
alt Invalid/Expired Token
DB-->>A: Token not found or expired
A-->>C: ❌ Force re-login
end
DB-->>A: ✓ Session valid
A->>A: Generate new tokens
A->>DB: update_session(new_refresh_hash)
Note over DB: Old token invalidated (single-use)
A-->>C: ✅ New token pair
C->>C: Store new tokens
C->>A: Retry request with new access token
A-->>C: ✓ Success
Security Properties:
activity_count increments and last_seen_at updates with each refreshImplementation Detail: touch_session() uses GREATEST()
expires_at = GREATEST(expires_at, NOW() + $extend_by)
This ensures:
Rust’s Result type forces explicit error handling at every step. This project uses a structured approach that ensures errors are handled consistently:
#[derive(Debug, Error)]
pub enum AppError {
#[error("Not found: {0}")]
NotFound(String),
#[error("Unauthenticated: {0}")]
Unauthenticated(String),
#[error("Permission denied: {0}")]
PermissionDenied(String),
#[error("Conflict: {0}")]
Conflict(String),
#[error("Internal: {0}")]
Internal(String),
}
impl From<AppError> for Status {
fn from(error: AppError) -> Self {
match error {
AppError::NotFound(msg) => Status::not_found(msg),
AppError::Unauthenticated(msg) => Status::unauthenticated(msg),
AppError::Conflict(msg) => Status::already_exists(msg),
AppError::Internal(msg) => {
error!("Internal error: {}", msg); // Log details
Status::internal("Internal server error") // Generic to client
}
}
}
}
Critical: Internal errors are logged but never exposed to clients — preventing information leakage.
To simplify error handling in service methods, the project provides extension traits:
pub trait StatusExt<T> {
fn status(self, msg: &'static str) -> Result<T, Status>;
}
impl<T, E: Display> StatusExt<T> for Result<T, E> {
fn status(self, msg: &'static str) -> Result<T, Status> {
self.map_err(|e| {
error!("{}: {}", msg, e);
Status::internal(msg)
})
}
}
Usage in handlers:
let user = self.db.get_user(id)
.await
.status("Failed to get user")?; // Logs error, returns generic Status
The ? operator propagates errors up the call stack, with automatic conversion to Status at the gRPC boundary.
The service tracks sessions with rich device context, enabling users to manage their active sessions across multiple devices:
pub struct SessionInfo {
pub device_id: String, // Unique device identifier
pub device_name: String, // "iPhone 15 Pro"
pub device_type: String, // mobile/tablet/desktop/web
pub client_version: String, // App version
pub ip_address: IpNetwork, // Last seen IP
pub ip_country: String, // ISO country code (GeoIP)
pub created_at: DateTime<Utc>,
pub last_seen_at: DateTime<Utc>,
pub expires_at: DateTime<Utc>,
pub activity_count: i32, // Number of token refreshes
pub is_current: bool, // Is this the calling device?
}
1. List All Sessions
rpc ListSessions(ListSessionsRequest) returns (ListSessionsResponse);
Users can view all active devices, including their locations and last activities.
2. Revoke Specific Sessions
rpc RevokeSession(RevokeSessionRequest) returns (Empty);
Allows actions like “Log out from iPhone” — revoking only that device’s session.
3. Revoke All Other Sessions
rpc RevokeOtherSessions(RevokeOtherSessionsRequest) returns (Empty);
Enables “Log out everywhere else” — keeping only the current device logged in.
CREATE TABLE auth.sessions (
id UUID PRIMARY KEY DEFAULT uuidv7(),
id_user UUID NOT NULL REFERENCES auth.users(id),
refresh_token BYTEA NOT NULL, -- SHA-256 hash
device_id VARCHAR(255),
device_name VARCHAR(255),
device_type VARCHAR(50),
client_version VARCHAR(100),
ip_address INET,
ip_country VARCHAR(2), -- ISO country code
created_at TIMESTAMPTZ DEFAULT now(),
last_seen_at TIMESTAMPTZ DEFAULT now(),
expires_at TIMESTAMPTZ NOT NULL,
activity_count INT DEFAULT 0,
metadata JSONB DEFAULT '{}'
);
-- Unique constraint: one session per user+device
CREATE UNIQUE INDEX session_user_device_ux
ON auth.sessions (id_user, device_id)
WHERE device_id IS NOT NULL;
Key insight: The unique constraint on (id_user, device_id) ensures that refreshing tokens updates the existing session rather than creating duplicates.
The authentication flow implements OWASP best practices, including account lockout, generic error messages, and secure session management:
sequenceDiagram
participant C as Client
participant A as AuthService
participant DB as Database
C->>A: Authenticate(email, password, device_info)
A->>DB: get_user_by_email()
alt Invalid Credentials or Locked
DB-->>A: User not found / locked / wrong password
A->>DB: increment_failed_attempts()
A-->>C: ❌ Generic error: "Invalid credentials"
Note over C: OWASP: No user enumeration
end
DB-->>A: ✓ User found
A->>A: Argon2id verify password
A->>DB: reset_failed_attempts()
A->>A: Generate JWT tokens (access + refresh)
A->>DB: create_session(tokens, device, geo)
A-->>C: ✅ Success(access_token, refresh_token, user)
Account Lockout — Progressive penalties prevent brute-force attacks:
retry_after value in secondsGeneric Error Messages — Prevent user enumeration:
Device-Based Sessions — Multi-device support:
device_idPassword recovery implements OWASP recommendations for secure reset workflows:
sequenceDiagram
participant C as Client
participant A as AuthService
participant DB as Database
participant E as Email
Note over C,A: Step 1: Request Reset
C->>A: RecoveryStart(email)
A-->>C: ✓ OK (always succeeds)
Note over A: OWASP: No user enumeration
par Background
A->>DB: Find user + create token
A->>E: Send reset email
end
Note over C,E: Step 2: Click Email Link
E->>C: User clicks link
Note over C,A: Step 3: Submit New Password
C->>A: RecoveryConfirm(token, new_password)
A->>DB: consume_token(token_hash)
alt Token Invalid/Expired
DB-->>A: Token not found or used
A-->>C: ❌ Invalid or expired token
end
DB-->>A: ✓ Token valid, user_id
A->>A: Hash new password (Argon2id)
A->>DB: update_password(user_id, hash)
A->>DB: revoke_all_sessions(user_id)
Note over DB: Force re-login everywhere
A->>E: Send confirmation email
A-->>C: ✅ Password changed
C->>A: Sign in with new password
1. No User Enumeration (OWASP Critical)
RecoveryStart always returns success, even for non-existent emails2. Atomic Token Management
3. Single-Use Tokens
used_at timestamp marks token as consumedWHERE used_at IS NULL4. Forced Session Revocation
5. Notification Email
Email verification supports dual paths to accommodate different client types:
sequenceDiagram
participant C as Client
participant A as AuthService
participant DB as Database
participant E as Email
Note over C,A: Step 1: Sign Up
C->>A: SignUp(email, password, name)
A->>A: Hash password
A->>DB: create_user(status=pending)
A->>DB: create_verification_token(24h)
par Background
A->>E: Send welcome email
end
A-->>C: ✓ Success (PENDING)
Note over C,E: Step 2: Click Email Link
E->>C: User clicks link
C->>A: ConfirmVerification(token)
A->>DB: auth.verify_email(token_hash)
alt Token Invalid
DB-->>A: Error
A-->>C: ❌ Invalid or expired
end
DB-->>A: ✓ User verified
A->>A: Generate tokens
A->>DB: create_session()
A-->>C: ✅ Tokens (auto-login)
| Feature | gRPC (Primary) | REST (Fallback) |
|---|---|---|
| Client | Mobile, Desktop, Web SPA | Email clients, old browsers |
| Auto-Login | ✅ Yes (returns tokens) | ❌ No (redirect only) |
| User Experience | Seamless (one-click done) | Manual login required |
| Deep Linking | ✅ App handles URL | ❌ Browser navigation |
| Implementation | Same DB function | Same DB function |
| Security | JWT tokens via gRPC | Redirect to frontend |
1. Atomic Database Function auth.verify_email()
CREATE OR REPLACE FUNCTION auth.verify_email(p_token_hash BYTEA)
RETURNS auth.users AS $$
DECLARE
v_user_id UUID;
v_status auth.user_status;
BEGIN
-- Step 1: Consume token atomically
UPDATE auth.email_verification_tokens
SET used_at = now()
WHERE token_hash = p_token_hash
AND used_at IS NULL
AND expires_at > now()
RETURNING id_user INTO v_user_id;
IF v_user_id IS NULL THEN
RAISE EXCEPTION 'TOKEN_INVALID';
END IF;
-- Step 2: Check account status
SELECT status INTO v_status FROM auth.users WHERE id = v_user_id;
IF v_status IN ('suspended', 'deleted') THEN
RAISE EXCEPTION 'ACCOUNT_SUSPENDED';
END IF;
-- Step 3: Verify email and activate
UPDATE auth.users
SET email_verified = TRUE,
status = CASE WHEN status = 'pending' THEN 'active' ELSE status END
WHERE id = v_user_id
RETURNING * INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
2. Graceful Client Fallback
3. Token Security
used_at columnProduction systems need visibility. This service integrates four complementary layers of observability:
Every request receives a trace ID that follows it through all operations. The tracing crate provides structured, machine-parseable logs:
use tracing::{info, error, instrument};
#[instrument(skip(self), fields(user_id))]
pub async fn authenticate(&self, req: AuthenticateRequest) -> Result<AuthResponse> {
let user = self.find_user(&req.identifier).await?;
// Record user_id in span for all subsequent logs
Span::current().record("user_id", user.id.to_string());
info!("User authenticated successfully");
Ok(self.create_session(user, req).await?)
}
Every request gets a trace ID that follows it through all operations. Optionally, logs are output as structured JSON for easy ingestion by monitoring systems:
{
"timestamp": "2026-01-23T15:42:17.123Z",
"level": "INFO",
"message": "User authenticated successfully",
"request_id": "01JQBX...",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"span": "authenticate"
}
use metrics::{counter, histogram};
pub async fn authenticate(&self, req: AuthenticateRequest) -> Result<AuthResponse> {
let start = Instant::now();
let result = self.authenticate_impl(req).await;
histogram!("auth_duration_ms").record(start.elapsed().as_millis() as f64);
if result.is_ok() {
counter!("auth_success_total").increment(1);
} else {
counter!("auth_failure_total").increment(1);
}
result
}
Metrics exposed at /metrics:
auth_duration_ms — Authentication latency histogramauth_success_total — Successful authenticationsauth_failure_total — Failed authenticationsdb_query_duration_ms — Database query latenciesgrpc_request_total — Request counts by methodhttp_requests_total — REST endpoint hitsactive_sessions — Current active session countExample Prometheus queries:
# P99 authentication latency
histogram_quantile(0.99, rate(auth_duration_ms_bucket[5m]))
# Authentication failure rate
rate(auth_failure_total[5m]) / rate(auth_success_total[5m])
# Database query errors
rate(db_query_errors_total[5m])
pub fn setup_telemetry(config: &TelemetryConfig) -> TelemetryGuard {
// Initialize OpenTelemetry exporter
let tracer = opentelemetry_otlp::new_pipeline()
.tracing()
.with_exporter(
opentelemetry_otlp::new_exporter()
.tonic()
.with_endpoint(&config.otlp_endpoint)
)
.install_batch(opentelemetry_sdk::runtime::Tokio)?;
// Combine with console logging
tracing_subscriber::registry()
.with(tracing_opentelemetry::layer().with_tracer(tracer))
.with(fmt::layer().json())
.init();
TelemetryGuard { tracer }
}
Sends traces to Jaeger, Tempo, or any OTLP collector for distributed request tracing across services.
#[cfg(feature = "sentry")]
pub fn init_sentry(dsn: &str) -> ClientInitGuard {
sentry::init((dsn, sentry::ClientOptions {
release: Some(env!("CARGO_PKG_VERSION").into()),
environment: Some(std::env::var("ENVIRONMENT").unwrap_or_else(|_| "development".into()).into()),
..Default::default()
}))
}
Automatic error capture with context:
The Dockerfile uses a multi-stage build with static linking to produce a minimal, secure image:
# Build stage
FROM rust:1.93-alpine AS build
RUN apk add --no-cache musl-dev protobuf-dev ca-certificates
COPY . /app
WORKDIR /app
RUN cargo build --release --locked && \
cp target/release/auth-service /service
# Runtime stage
FROM scratch
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /service /service
ENV RUST_LOG=info
EXPOSE 8080
ENTRYPOINT ["/service"]
Result: 15-25 MB image with a minimal OS-level attack surface (no shell, no package manager, no distro userland).
The Challenge: Cross-compilation with QEMU is painfully slow, often taking 30+ minutes for Rust projects. Most CI/CD pipelines suffer from this bottleneck.
The Solution: Native builds on platform-specific runners, where each architecture builds on its own hardware.
name: Build & Publish multi-arch image
jobs:
# ARM64: Native build on ARM runner (GitHub hosted)
build-arm64:
runs-on: ubuntu-24.04-arm # ARM64 runner
steps:
- uses: docker/build-push-action@v6
with:
platforms: linux/arm64
push: true
push-by-digest: true # Push by digest for manifest merge
cache-from: type=gha,scope=arm64
cache-to: type=gha,scope=arm64,mode=max
# AMD64: Native build on x86 runner
build-amd64:
runs-on: ubuntu-latest # AMD64 runner
steps:
- uses: docker/build-push-action@v6
with:
platforms: linux/amd64
push: true
push-by-digest: true
cache-from: type=gha,scope=amd64
# Merge: Create multi-arch manifest
manifest:
needs: [build-arm64, build-amd64]
steps:
- name: Create and push manifest
run: |
docker buildx imagetools create \
-t ghcr.io/org/auth-service:latest \
ghcr.io/org/auth-service@sha256:$ARM64_DIGEST \
ghcr.io/org/auth-service@sha256:$AMD64_DIGEST
Performance comparison:
| Approach | Build Time | Cost (GitHub Actions) |
|---|---|---|
| QEMU cross-compilation | 35-45 min | High (single long-running job) |
| Native parallel builds | 4-6 min | Low (parallel, shorter duration) |
| Speedup | ~8x faster | 60% cost reduction |
Key innovations:
type=gha) persists Cargo dependencies across buildsThe Cargo.toml profile maximizes release binary optimization:
[profile.release]
opt-level = 3 # Maximum optimization
lto = true # Link-time optimization (whole-program)
codegen-units = 1 # Single codegen unit for better optimization
strip = true # Strip symbols (reduces binary size 30%)
Impact on binary size:
| Configuration | Binary Size | Image Size |
|---|---|---|
| Debug build | 180 MB | N/A |
| Release (no LTO) | 45 MB | 50 MB |
| Release (with LTO) | 28 MB | 32 MB |
| Release (LTO + strip) | 12 MB | 15 MB |
LTO (Link-Time Optimization) enables:
Trade-off: Longer compile time (5-7 minutes) for 60% smaller binaries. In production, smaller images mean:
The Dockerfile uses static musl linking for a truly standalone binary:
RUN cargo build --release --locked && \
cp target/release/auth-service /service
# Runtime: scratch (0 bytes base)
FROM scratch
COPY --from=build /service /service
ENTRYPOINT ["/service"]
Result: 15MB self-contained image with zero dependencies. Compare to:
One-command deployment:
gcloud run deploy auth-service \
--image ghcr.io/zs-dima/auth-service-rs:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars PORT=8080 \
--set-secrets=JWT_SECRET_KEY=jwt-secret:latest,DB_PASSWORD=db-password:latest
Cloud Run automatically:
PORT environment variableapiVersion: apps/v1
kind: Deployment
metadata:
name: auth-service
spec:
replicas: 3
template:
spec:
containers:
- name: auth-service
image: ghcr.io/zs-dima/auth-service-rs:latest
resources:
requests:
memory: "32Mi" # Rust's small footprint
cpu: "50m"
limits:
memory: "128Mi"
cpu: "500m"
env:
- name: PORT
value: "8080"
- name: JWT_SECRET_KEY
valueFrom:
secretKeyRef:
name: auth-secrets
key: jwt-secret
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Note the resource requirements: 32 MB request, 128 MB limit. Compare this to Java’s typical 512 MB–2 GB requirements.
The Flutter client (auth-app) demonstrates one codebase for all platforms: Android, iOS, Windows, macOS, Linux, and Web.
ClientChannelBase getChannel(String host, int port) {
if (kIsWeb) {
// Web: Use XHR-based gRPC-Web
return GrpcWebClientChannel.xhr(
Uri.parse('https://$host:$port'),
);
} else {
// Native: Use HTTP/2 channel
return ClientChannel(
host,
port: port,
options: ChannelOptions(
credentials: ChannelCredentials.secure(),
),
);
}
}
Key insight: The server’s protocol multiplexing enables this — browsers use gRPC-Web supporting server-side streaming, native apps use native gRPC, same backend.
class AuthInterceptor extends ClientInterceptor {
final TokenStorage _tokens;
@override
ResponseFuture<R> interceptUnary<Q, R>(
ClientMethod<Q, R> method,
Q request,
CallOptions options,
ClientUnaryInvoker<Q, R> invoker,
) {
final token = _tokens.accessToken;
if (token != null) {
options = options.mergedWith(
CallOptions(metadata: {'authorization': 'Bearer $token'}),
);
}
return invoker(method, request, options);
}
}
Every gRPC call automatically includes the access token, eliminating manual header management.
Having implemented authentication services in Go, Node.js, and Rust, here are the concrete improvements that Rust delivers:
Go/Node.js: Database schema changes discovered in production
// Go: Compiles fine, fails at runtime if column renamed
user, err := db.Query("SELECT id, email FROM users WHERE email = ?", email)
// ERROR: column "email" does not exist (discovered in production!)
Rust: Database schema changes discovered during cargo build
// Rust: Fails compilation if column doesn't exist
let user = sqlx::query_as!(User, "SELECT id, email FROM users WHERE email = $1", email)
// error: no such column: `email` in table `users` (caught at compile time!)
Impact: Eliminated an entire class of database-related production incidents. Schema migrations now require corresponding code changes before deployment.
Go: Garbage collector pauses cause latency spikes
Node.js: Worse GC characteristics
Rust: No garbage collector
Real-world metric: Authentication endpoint at 10,000 req/s:
Go/Node.js: String manipulation errors cause security issues
// Go: Easy to accidentally log secrets
log.Printf("Config: %+v", config) // Oops, logged JWT secret!
// Node.js: JSON serialization exposes secrets
res.json(config) // Sent JWT secret to client!
Rust: Compiler enforces secret protection
use secrecy::SecretString;
pub struct Config {
jwt_secret: SecretString, // Debug trait redacts value
}
println!("{:?}", config); // Output: Config { jwt_secret: "***" }
Using SecretString makes it harder to accidentally expose secrets through logging or serialization.
Go: Interface indirection has runtime cost
type Repository interface {
GetUser(id string) (*User, error)
}
// Every call goes through vtable — measurable overhead at scale
user, err := repo.GetUser(id)
Rust: Trait dispatch optimized away at compile time
trait Repository {
async fn get_user(&self, id: Uuid) -> Result<User, AppError>;
}
// Monomorphization means direct call — zero overhead
let user = repo.get_user(id).await?;
This enables layered architectures (repository → service → handler) without performance penalties. Rust’s design philosophy: abstractions should not cost.
Node.js: node_modules chaos
npm audit reports 50+ vulnerabilities regularlyGo: Better, but still issues
Rust: Cargo ecosystem
Cargo.lock ensures reproducible buildscargo audit checks for CVEs automaticallycargo-deny enforces license and security policiesGo/Node.js: Refactoring requires extensive testing
Rust: The compiler catches breaking changes
? or .unwrap()Real experience: Refactoring session management from ID-based to hash-based tokens:
Node.js: Different behavior across platforms
Rust: True “compile once, run anywhere”
scratch with no OS dependencies| Metric | Node.js (Express) | Go (gRPC) | Rust (Tonic) |
|---|---|---|---|
| Idle Memory | 70-150 MB | 20-40 MB | 5-15 MB |
| P99 Latency | 30-80 ms | 15-30 ms | 2-10 ms |
| Throughput (1 core) | 5k req/s | 15k req/s | 25k req/s |
| Docker Image | 200-400 MB | 30-50 MB | 15-25 MB |
| Cold Start | 200-500 ms | 50-150 ms | 10-30 ms |
| Memory Safety | Runtime (V8) | Runtime | Compile-time |
| SQL Verification | Runtime | Runtime | Compile-time |
For a service handling 10 million authentications per day:
Node.js deployment (AWS ECS):
Go deployment (AWS ECS):
Rust deployment (AWS ECS):
Rust is 9x cheaper than Node.js, 3x cheaper than Go at the same traffic level.
This authentication service showcases several architectural innovations rarely seen together:
SecretString typecargo install sqlx-cli --features postgres1. Clone and configure
git clone https://github.com/zs-dima/auth-service-rs
cd auth-service-rs
cp configs/.env.example configs/development.env
2. Edit configs/development.env
# Required
DB_URL=postgres://postgres:@localhost:5432/auth_dev
JWT_SECRET_KEY=your-32-character-secret-key-here
# Optional: Email (service works without it)
SMTP_URL=smtp://user@localhost:1025?tls=none # MailHog for local testing
EMAIL_SENDER="Dev Auth <dev@localhost>"
3. Setup database
# Create database
createdb auth_dev
# Run migrations
make db-migrate
# Verify SQLx offline data (optional)
make db-prepare
4. Run the service
make run
# Or with auto-reload on file changes:
make watch
5. Test with grpcurl
# Install grpcurl
go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest
# Sign up
grpcurl -plaintext -d '{
"email": "test@example.com",
"password": "SecurePass123",
"display_name": "Test User",
"client_info": {
"device_type": "web",
"device_id": "browser-123",
"ip_address": "127.0.0.1"
}
}' localhost:8080 auth.AuthService/SignUp
# Authenticate
grpcurl -plaintext -d '{
"identifier": "test@example.com",
"password": "SecurePass123",
"client_info": {
"device_type": "web",
"device_id": "browser-123"
}
}' localhost:8080 auth.AuthService/Authenticate
# Format code
make fmt
# Run linter
make lint
# Run tests
make test
# Run all pre-commit checks
make pre-commit
# Generate proto code (after modifying .proto files)
make proto
# Update database offline data (after schema changes)
make db-prepare
Build image:
make docker
# Result: 15-25 MB image based on scratch
Run container:
docker run -p 8080:8080 \
-e DB_URL=postgres://... \
-e JWT_SECRET_KEY=... \
auth-service:latest
Docker Compose:
services:
auth-service:
image: ghcr.io/zs-dima/auth-service-rs:latest
ports:
- "8080:8080"
environment:
DB_URL: postgres://auth:password@db:5432/auth
JWT_SECRET_KEY_FILE: /run/secrets/jwt_secret
secrets:
- jwt_secret
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: auth
POSTGRES_USER: auth
POSTGRES_PASSWORD: password
volumes:
- db-data:/var/lib/postgresql/data
secrets:
jwt_secret:
file: ./secrets/jwt_secret.txt
volumes:
db-data:
Prerequisites:
gcloud CLI installedDeploy:
# Build and push image
gcloud builds submit --tag gcr.io/PROJECT_ID/auth-service
# Deploy to Cloud Run
gcloud run deploy auth-service \
--image gcr.io/PROJECT_ID/auth-service \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars DOMAIN=api.example.com \
--set-secrets JWT_SECRET_KEY=jwt-secret:latest,DB_PASSWORD=db-password:latest \
--memory 512Mi \
--cpu 1 \
--max-instances 10
Cloud Run automatically:
PORT environment variableSecrets (use _FILE variants in production):
JWT_SECRET_KEY=32-char-minimum-secret
DB_PASSWORD=strong-database-password
S3_SECRET_ACCESS_KEY=s3-secret-key
MAILJET_API_SECRET=mailjet-private-key
Token lifetimes:
ACCESS_TOKEN_TTL_MINUTES=60 # Short-lived (1 hour)
REFRESH_TOKEN_TTL_DAYS=90 # Long-lived (3 months)
PASSWORD_RESET_TTL_MINUTES=60 # Reset link expires in 1 hour
EMAIL_VERIFICATION_TTL_HOURS=24 # Verification link valid 24 hours
Security settings:
MAX_FAILED_LOGIN_ATTEMPTS=5 # Lock account after 5 failed attempts
LOCKOUT_DURATION_MINUTES=15 # Lock duration
Performance tuning:
DB_POOL_MIN=2 # Minimum database connections
DB_POOL_MAX=10 # Maximum database connections
CONCURRENCY_LIMIT=100 # Max concurrent gRPC requests
Prometheus metrics at /metrics:
# Authentication requests per second
rate(auth_requests_total[5m])
# P99 latency
histogram_quantile(0.99, rate(auth_duration_seconds_bucket[5m]))
# Error rate
rate(auth_errors_total[5m]) / rate(auth_requests_total[5m])
# Database connection pool usage
db_connections_active / db_connections_max
OpenTelemetry tracing:
OTLP_ENDPOINT to send traces to Jaeger, Tempo, or any OTLP collectorSentry error tracking:
SENTRY_DSN to capture unhandled errors“failed to verify access token: no ‘kid’ in token header”
“database connection failed”
DB_URL format: postgres://user:password@host:port/database“failed to compile protos”
brew install protobuf (macOS) or apt install protobuf-compiler (Linux)“SQLx query verification failed”
make db-migratemake db-prepareEmail not sending
EMAIL_PROVIDER is set correctly (smtp or mailjet)The service implements compile-time validation at multiple layers:
SQLx Query Verification — SQLx validates every query against the database during build:
// This fails at compile-time if 'email' column doesn't exist or types mismatch
let user = sqlx::query_as!(User,
"SELECT id, email, password FROM auth.users WHERE email = $1",
email
).fetch_optional(&pool).await?;
What SQLx catches at compile time:
emial vs email)String, column is i32)# Generate offline query data for CI (no DB needed at build time)
cargo sqlx prepare
# Verify queries match schema
cargo sqlx prepare --check
Additional compile-time checks:
#![warn(clippy::pedantic)])unsafe code enforced (#![forbid(unsafe_code)])Traditional Docker multi-platform builds use QEMU emulation (10x slower). This project uses native ARM64 runners:
graph LR
subgraph "Build Phase (Parallel)"
A["ARM64 Runner<br/>(ubuntu-24.04-arm)"]
B["AMD64 Runner<br/>(ubuntu-latest)"]
end
subgraph "Manifest Creation"
E["Multi-Arch Manifest"]
end
subgraph "Registry"
F["ghcr.io/org/auth-service:latest"]
end
A -->|"4 min"| E
B -->|"3 min"| E
E --> F
style A fill:#4CAF50
style B fill:#2196F3
style F fill:#FF5722
| Build Method | ARM64 | AMD64 | Total |
|---|---|---|---|
| Native runners | 4 min | 3 min | 7 min |
| QEMU emulation | 40 min | 3 min | 43 min |
| Speedup | 10x | Same | 6x |
Pre-commit workflow:
make pre-commit # Runs: fmt, clippy, test, sqlx-check
The following table summarizes the key production-ready features of this service:
| Category | Highlights |
|---|---|
| Security | OWASP compliance, Argon2id hashing, JWT rotation, account lockout, generic errors, zero unsafe |
| Scalability | Stateless (JWT), connection pooling, streaming APIs, atomic DB operations |
| Observability | OpenTelemetry, Prometheus /metrics, Sentry, structured JSON logs |
| Reliability | Compile-time SQL, graceful shutdown, timeout middleware, retry logic |
| Deployment | 15-25 MB images, multi-arch (ARM64/AMD64), Cloud Run & K8s ready |
| Feature | Status |
|---|---|
| OAuth 2.0 social login | 🚧 In development |
| MFA / TOTP | 📋 Planned |
| Passkeys / WebAuthn | 📋 Planned |
| Rate limiting | 📋 Planned |
Want to contribute? PRs welcome for email providers (SendGrid, AWS SES), SMS providers (Twilio), Helm charts, and Terraform modules.
Building authentication services in Rust isn’t just about performance — it’s about confidence. The compiler catches mistakes that would become production incidents in other languages:
This service demonstrates that infrastructure-grade reliability is achievable with:
The Rust ecosystem has matured to the point where building production services is not just feasible — it’s enjoyable. Tonic, SQLx, Tower, and Tokio form a cohesive stack that rivals any language for backend development while providing guarantees that others cannot match.
The future of backend development is compiled, type-safe, and fast. The future is Rust.
Questions? Feedback? Find me:
Built with ❤️ by Dmitrii Zusmanovich
If this article helped you, consider ⭐️ starring the repository!