Platform
Dokumen connects document sources, AI inference, AWS storage, billing, and edge security into one operating layer for turning PDFs into structured data.
Connected flow
Architecture
The public web app, FastAPI, workers, databases, storage, and GPU services stay separated so each part can scale around its responsibility.
Experience
Public web app
Tanstack React Start app served by Node.js for public pages, auth, projects, billing, demos, and document UI.
FastAPI
Python API boundary for upload, avatar, PDF utility, OCR, extraction, telemetry, and health routes.
Processing
Workers
Celery orchestration for document workflows, source watching, text extraction, and entity workflows.
Runpod GPU services
Accelerated AI/ML inference and training workloads for document-heavy OCR and model serving.
AWS state
PostgreSQL and Redis
RDS stores application data while ElastiCache provides managed cache capacity.
S3 and Secrets Manager
S3 stores PDFs while Secrets Manager keeps deployment and integration secrets out of application code.
Edges
Cloudflare
DNS, SSL, DDoS protection, load balancing, rate limits, WAF, and Tunnel in front of the app.
Optional integrations
Google, Microsoft, OpenAI, and Anthropic are optional for auth, AI inference, and source connectivity.
External providers
Dokumen relies on 5 required providers: Stripe, Cloudflare, GitHub, AWS, and Runpod. Google, Microsoft, OpenAI, and Anthropic are optional integrations.
View demoMandatory providers
AWS
VPC, EC2, ECR, RDS, ElastiCache, S3, SES, and Secrets Manager.
Cloudflare
DNS, SSL, DDoS, load balancing, rate limits, and WAF.
GitHub
Source control and CI/CD through GitHub Actions.
Runpod
GPU capacity for OCR and model-serving workloads.
Stripe
Payments, billing setup, and customer payment methods.
Optional providers
Optional authentication, Google Deepmind inference, Drive, GCS, and Gmail.
Microsoft
Optional authentication, OneDrive, Azure Blob, and Outlook integrations.
OpenAI
Optional AI inference provider through Bring Your Own Key.
Anthropic
Optional AI inference provider through Bring Your Own Key.
Required services
Dokumen limits service providers and external APIs to reduce security risk. The platform only uses 5 required providers: Stripe, Cloudflare, GitHub, AWS, and Runpod.
| Service | Category |
|---|---|
| Stripe | Payments |
| Cloudflare Networking | DNS, SSL, DDoS, load balancing, rate limits |
| Cloudflare WAF | Edge security firewall |
| AWS VPC | Full-stack networking |
| GitHub | Source control |
| GitHub Actions | CI/CD |
| AWS ECR | Container image registries |
| AWS Secrets Manager | Secrets management |
| AWS EC2 | CPU compute |
| Runpod.io | GPU compute |
| AWS RDS | Managed database |
| AWS ElastiCache | Managed cache |
| AWS S3 | PDF storage |
| AWS SES | Emails |
Capability coverage
| Capability | Providers |
|---|---|
| Authentication | Dokumen, Google, Microsoft |
| AI inference | Dokumen, OpenAI, Anthropic, Google Deepmind |
| PDF storage | Dokumen, external AWS S3, Google Drive, GCS, Gmail, Microsoft OneDrive, Azure Blob, Outlook |
Dokumen supports Bring Your Own Key (BYOK) for AI inference. The web app does not yet allow users to upload keys, but you can contact us to provide keys for your organization through a separate secure method.
Runtime and dependencies
Dokumen keeps the production runtime explicit: Node.js serves the web app, Python runs the API and workers, and Docker Compose coordinates the host services.
| Dependencies | Category |
|---|---|
| Docker + Compose | Container orchestration |
| PostgreSQL | Database |
| Redis | Cache |
| OpenTelemetry | Observability |
| TypeScript | Frontend language |
| Node.js | Frontend runtime |
| Bun.js | Frontend package manager |
| Tanstack React Start | Web app framework |
| Cloudflare Tunnel | HTTPS reverse proxy |
| PDFium | PDF engine for frontend and backend |
| CPython | Backend language |
| Uvicorn | Backend runtime and package manager |
| FastAPI | API framework |
| Celery | Workers framework |
Document sources
PDF storage can use Dokumen storage or optional external sources. External PDF storage integrations are listed in the infrastructure knowledge base as planned but not yet complete.
External AWS S3
external_aws_s3
Google Drive
google_drive
Google Cloud Storage
gcs
Gmail
gmail
Microsoft OneDrive
microsoft_onedrive
Azure Blob Storage
azure_blob
Outlook
outlook