Technical architecture
Live in days. Flat through every add-on. Ready for what you deploy next.
Architecture overview
Where the Foundry sits in your data infrastructure.
MTN Data Foundry is an adaptive transition layer. It detects schemas automatically, maps them to canonical concepts with confidence-based governance, and self-heals when sources change.
Whether you use medallion architecture, data mesh, or your own layered approach, the transition from raw source data to governed, consistent output is traditionally the most fragile point. MTN Data Foundry operates at this boundary—automatically detecting structure, mapping to a canonical layer, and adapting when source schemas change.
Medallion terminology provides a familiar reference point. The Foundry itself is architecture-agnostic and works with any pipeline that requires resilient schema mapping.
What it does
- • Ingests bronze-layer data from any source
- • Maps to silver-layer semantic definitions
- • Adapts when source schemas change
- • Outputs governed, consistent data downstream
What it does not do
- • Replace your silver or gold layers
- • Compete with your warehouse or BI tools
- • Require changes to source systems
- • Store data long-term (it's a transition layer)
How the Foundry delivers.
Three promises, mapped to the architecture below.
Schema detection, payload fingerprinting, suggested mappings with confidence scoring.
Self-healing on schema changes, immutable mapping versions, forward-only adaptation.
Canonical concept layer, model-agnostic outputs, BAA-aware lineage.
Ingestion and schema detection
MTN Data Foundry ingests data from heterogeneous sources without requiring upfront schema definitions. Incoming payloads are fingerprinted to identify structure and detect changes.
- Supports JSON, HL7, X12, CSV, and structured database connections
- Schema signatures are computed from payload structure, not just field names
- New schema versions trigger mapping workflows automatically
- Malformed payloads are quarantined with detailed error context
Output endpoints respect downstream contracts: SQL, REST, event streams, and dbt-compatible models.
Semantic mapping and governance
Data is mapped to a canonical concept layer that represents healthcare entities consistently across sources. Mapping decisions are governed by confidence thresholds and human approval.
A canonical concept is a stable, healthcare-aware definition: patient, encounter, claim, medication, schedule, provider. Each concept has a target schema, vocabulary bindings (SNOMED CT, ICD-10, LOINC, RxNorm, CPT, NPI), and identity-resolution rules. The Foundry annotates source schemas first; canonical definitions emerge from the patterns across annotated sources, or anchor to an external target like FHIR, USCDI, or OMOP when one applies. The canonical layer grows with the portfolio: new concepts can be added, and existing concepts can be refined without breaking downstream consumers.
- Mappings are suggested based on field values, patterns, and healthcare standards
- Confidence scores determine whether mappings auto-apply or require review
- Human-in-the-loop workflows route uncertain cases to domain experts
- Mapping versions are immutable; changes create new versions
Governance model
Configurable thresholds control automation vs. human review
Every decision is logged for compliance and debugging
Mappings are versioned; rollback is always available
Backward compatibility guaranteed
Change detection and self-healing
When source schemas change, the Foundry detects the change, evaluates impact, and either adapts automatically or routes for review. Historical data remains stable.
- Schema changes detected through payload fingerprinting
- High-confidence changes apply automatically with audit logging
- Low-confidence changes pause and notify for human decision
- Historical data is never retroactively remapped unless explicitly requested
Model-agnostic by design
Whatever you deploy next, it reads from the same canonical layer.
- Models you deploy: Claude, GPT, Gemini, open-weight models, fine-tuned models, your own. Same data, no per-vendor pipeline.
- Retrieval-friendly: canonical concepts are vector-embedding-ready and integrate with existing vector stores.
- Tool-friendly: outputs structured data that downstream agents can call as tools without re-mapping.
- BAA-aware: every model query lands inside the compliance perimeter; vendor BAAs flow through subprocessor relationships.
- Auditable: every model interaction is logged at the canonical layer, not at each vendor.
Monitoring and resilience
Continuous monitoring of data transmission, structure, and quality. Problems are surfaced before they impact downstream systems.
- Transmission health monitoring tracks source connectivity and data flow
- Alerting triggers on schema drift, volume anomalies, and quality degradation
- Downstream systems are protected from bad data through validation gates
- Full observability through structured logs and metrics export
Compliance and security.
Designed for healthcare's regulatory perimeter, not retrofitted onto a generic platform.
HIPAA and the BAA chain
BAA-ready by default. The BAA chain extends through subprocessors, infrastructure providers, and model vendors, so PHI flowing into and out of the Foundry stays under one continuous chain of liability. Documentation artifacts align with HHS OCR audit expectations. Minimum-necessary access is enforced at the canonical layer, not bolted on at output.
42 CFR Part 2 and redisclosure
Substance-use-disorder data is tagged at ingestion and tracked through every downstream consumer. Redisclosure restrictions are encoded in the lineage layer; a query that would violate Part 2 returns an error, not the data. Consent metadata travels with the record, not in a separate consent system.
State and federal regimes
Aligned controls for California (CCPA, CPRA, CMIA, AB-3129), Washington (My Health My Data Act), Texas (Texas Data Privacy and Security Act), and the multi-state landscape (Colorado, Connecticut, Virginia, and others as they take effect). ONC information-blocking interfaces are respected; 21st Century Cures Act data-exchange requirements are supported out of the box. New regimes update at the canonical layer, not every pipeline.
Controls and audit
SOC 2 Type II aligned controls. End-to-end encryption (TLS 1.3 in transit, AES-256 at rest). Role-based access with purpose-of-use metadata on every query. Comprehensive audit logging covers every mapping change, query, and model interaction, with retention tunable per regulation. Customer-managed keys available on cloud and on-premises deployments.
Deployment options
Flexible deployment to fit your security and infrastructure requirements.
Cloud-hosted
Managed deployment in your preferred cloud environment with SOC 2 compliance and BAA support. BAA executed at deployment; no separate negotiation per model vendor.
- • AWS, Azure, or GCP
- • Single-tenant isolation
- • Managed updates and monitoring
On-premises
Deploy within your existing infrastructure for complete data control and air-gapped environments. Fully air-gapped operation; no model traffic leaves your VPC unless you configure it to.
- • Kubernetes or VM deployment
- • No external data transmission
- • Your security policies apply
For deployment partners.
If you're a forward-deployed engineer or SI technical lead, here's what's exposed.
- API documentation and OpenAPI specs
- Sandbox environments for pre-engagement schema introspection
- Mapping export, replay, and review endpoints
- Subprocessor BAA inheritance
- White-label deployment available
Ready to discuss architecture?
We'll walk through how this fits your stack, your deployment plans, and your security review process, and answer technical questions in detail.