Platform

Dokumen connects document sources, AI inference, AWS storage, billing, and edge security into one operating layer for turning PDFs into structured data.

Connected flow

Documents arrive from teams and systems
OCR and extraction models process each file
Structured data returns to projects and workflows

Architecture

The public web app, FastAPI, workers, databases, storage, and GPU services stay separated so each part can scale around its responsibility.

1

Experience

Public web app

Tanstack React Start app served by Node.js for public pages, auth, projects, billing, demos, and document UI.

FastAPI

Python API boundary for upload, avatar, PDF utility, OCR, extraction, telemetry, and health routes.

2

Processing

Workers

Celery orchestration for document workflows, source watching, text extraction, and entity workflows.

Runpod GPU services

Accelerated AI/ML inference and training workloads for document-heavy OCR and model serving.

3

AWS state

PostgreSQL and Redis

RDS stores application data while ElastiCache provides managed cache capacity.

S3 and Secrets Manager

S3 stores PDFs while Secrets Manager keeps deployment and integration secrets out of application code.

4

Edges

Cloudflare

DNS, SSL, DDoS protection, load balancing, rate limits, WAF, and Tunnel in front of the app.

Optional integrations

Google, Microsoft, OpenAI, and Anthropic are optional for auth, AI inference, and source connectivity.

External providers

Dokumen relies on 5 required providers: Stripe, Cloudflare, GitHub, AWS, and Runpod. Google, Microsoft, OpenAI, and Anthropic are optional integrations.

View demo

Mandatory providers

AWS

VPC, EC2, ECR, RDS, ElastiCache, S3, SES, and Secrets Manager.

Cloudflare

DNS, SSL, DDoS, load balancing, rate limits, and WAF.

GitHub

Source control and CI/CD through GitHub Actions.

Runpod

GPU capacity for OCR and model-serving workloads.

Stripe

Payments, billing setup, and customer payment methods.

Optional providers

Google

Optional authentication, Google Deepmind inference, Drive, GCS, and Gmail.

Microsoft

Optional authentication, OneDrive, Azure Blob, and Outlook integrations.

OpenAI

Optional AI inference provider through Bring Your Own Key.

Anthropic

Optional AI inference provider through Bring Your Own Key.

Required services

Dokumen limits service providers and external APIs to reduce security risk. The platform only uses 5 required providers: Stripe, Cloudflare, GitHub, AWS, and Runpod.

ServiceCategory
StripePayments
Cloudflare NetworkingDNS, SSL, DDoS, load balancing, rate limits
Cloudflare WAFEdge security firewall
AWS VPCFull-stack networking
GitHubSource control
GitHub ActionsCI/CD
AWS ECRContainer image registries
AWS Secrets ManagerSecrets management
AWS EC2CPU compute
Runpod.ioGPU compute
AWS RDSManaged database
AWS ElastiCacheManaged cache
AWS S3PDF storage
AWS SESEmails

Capability coverage

CapabilityProviders
AuthenticationDokumen, Google, Microsoft
AI inferenceDokumen, OpenAI, Anthropic, Google Deepmind
PDF storageDokumen, external AWS S3, Google Drive, GCS, Gmail, Microsoft OneDrive, Azure Blob, Outlook

Dokumen supports Bring Your Own Key (BYOK) for AI inference. The web app does not yet allow users to upload keys, but you can contact us to provide keys for your organization through a separate secure method.

Runtime and dependencies

Dokumen keeps the production runtime explicit: Node.js serves the web app, Python runs the API and workers, and Docker Compose coordinates the host services.

DependenciesCategory
Docker + ComposeContainer orchestration
PostgreSQLDatabase
RedisCache
OpenTelemetryObservability
TypeScriptFrontend language
Node.jsFrontend runtime
Bun.jsFrontend package manager
Tanstack React StartWeb app framework
Cloudflare TunnelHTTPS reverse proxy
PDFiumPDF engine for frontend and backend
CPythonBackend language
UvicornBackend runtime and package manager
FastAPIAPI framework
CeleryWorkers framework

Document sources

PDF storage can use Dokumen storage or optional external sources. External PDF storage integrations are listed in the infrastructure knowledge base as planned but not yet complete.

External AWS S3

external_aws_s3

Google Drive

google_drive

Google Cloud Storage

gcs

Gmail

gmail

Microsoft OneDrive

microsoft_onedrive

Azure Blob Storage

azure_blob

Outlook

outlook