MarcoPolo
Governed conversational access to distributed enterprise data — query across sources in plain English without dismantling the governance model that keeps those sources safe.
What MarcoPolo does
Enterprise data is fragmented by necessity: structured tables in relational databases, unstructured documents in NoSQL stores, flat files in object storage, spreadsheets in shared drives. Bringing these together for analysis normally requires either a data engineer and a week of pipeline work, or a governance compromise that exposes more than it should.
MarcoPolo's answer: govern the access, not just the query. Natural language questions are translated into validated query plans. Read-only execution is enforced at the engine level, not just in the user interface. DuckDB stitches cross-source results in memory within configurable row and memory limits. Every answer is attributable to the sources it came from.
Dashboard creation follows the same rules. When a user pins a query result as a dashboard and refreshes it a week later, the refresh runs through identical RBAC policies and datasource allowlists as the original query — not a cached permissive shortcut.
Core capabilities
Multi-source natural language query
Query PostgreSQL, MongoDB, S3-compatible object storage, JSON files, and Excel spreadsheets — from a single natural language question, without writing SQL or building connectors.
Read-only execution enforcement
Query plans are validated before execution. No writes, no schema changes, no privilege escalation. Read-only is enforced at the engine level, not enforced by trust.
DuckDB-powered cross-source joins
Results from multiple sources are joined in memory by DuckDB within strict row count and memory limits. No permanent intermediate tables; no data movement outside the defined execution boundary.
Persistent governed dashboards
Pin any query result as a persistent dashboard. Dashboard refreshes re-run the underlying query through the same RBAC and allowlist policies as the original — no governance bypass at refresh time.
Workspace isolation & RBAC
Team-scoped workspaces with role-based access control. Each user sees only the datasources on their allowlist. Cross-workspace access is not possible without explicit permission grants.
Full query audit trail
Every query and dashboard refresh is logged — who asked, what sources were accessed, what rows were returned, and which policies were applied. Audit records are protected from modification.
How it works
-
Natural language query intake
The user submits a question in plain English from within their permitted workspace. MarcoPolo identifies the relevant datasources based on the question semantics and the user's allowlist.
-
Query plan generation and validation
A structured query plan is generated for each relevant datasource. Plans are validated against a schema of permitted operations before any execution begins — read-only operations only, bounded by row and column limits.
-
Per-source execution
Validated plans execute against each permitted datasource. Results are returned as bounded datasets — excess rows are truncated, not silently dropped; users are informed when limits apply.
-
Cross-source stitching with DuckDB
Where the query spans multiple sources, DuckDB joins the result sets in memory within the configured execution limits. No intermediate data is written to persistent storage during this step.
-
Answer delivery and audit log write
The final answer is presented in natural language with source attribution. The complete execution — query plan, sources accessed, rows returned, user identity, policies applied — is written to the audit log.
Supported data sources
| Source type | Details |
|---|---|
| PostgreSQL | Standard relational queries; column and row-level permissions honoured |
| MongoDB | Document queries with field projection; collection allowlists enforced |
| S3-compatible storage | AWS S3, Cloudflare R2, MinIO; object and prefix allowlists |
| JSON files | Structured and newline-delimited JSON; path-based access controls |
| Excel / CSV | Workbooks and flat files; sheet and column allowlists |
Get the MarcoPolo white paper
Query architecture, governance model, execution limits, RBAC design, and deployment guide — available on request.