×
×
Aqfer Insights
Stay on top of the latest trends in Martech, Adtech, and Beyond
Subscribe to follow the news on what’s happening in the marketing data ecosystem.
By Dan Jaye, CTO
There’s a common assumption in technical circles: if data starts out structured, then SQL and well-designed schemas should be enough to unlock its value. Embeddings, NLP, and other unstructured techniques are seen as tools for text, images, or documents, but not databases.
That assumption holds only when the data is clean, complete, and of limited complexity. It breaks down the moment data encounters the realities of production systems.
As systems grow and interoperate with more sources and channels, and as tools age or evolve, structured data inevitably becomes complex and messy. Sometimes the mess is subtle: one source reports age as 31–40, another uses 35–44; one defines income as $25–50k, another as $20–40k. All technically “structured,” but no longer structurally compatible. And when this happens, methods designed for unstructured inputs often provide more value than traditional database approaches.
Nothing crashes when this happens. Tables still validate. Dashboards still render. Pipelines still run. On the surface, everything looks fine, yet underneath, the structure begins to slow insight rather than support it.
Data can remain neatly arranged in rows and columns and still need to be treated as unstructured input. It may carry remnants of past decisions, mix formats, or arrive from sources that never fully align. Manual fixes become routine. Every fix quietly signals that the schema no longer matches the work the business is trying to do with it.
Seeing that isn’t a setback. It’s often the first sign that something better is possible.
Structured tools still have value, but they often no longer suffice on their own. Unstructured approaches help surface the parts of the data that the schema no longer expresses:
This isn’t a rejection of structured systems. It’s a way to restore meaning when structure has stopped doing its job.
We’ve found that a modest set of embeddings can address a surprising number of structured-data problems.
With around 256 embeddings, we can retain many of the patterns and relationships in an original file with 10,000 dimensions that are sparsely and inconsistently populated. The issue isn’t just sparse data. The “Curse of Dimensionality” dilutes the potency of what should be rich, powerful signals.
Abstracting that complexity, inconsistency, and absence into a moderate set of embeddings often results in broader utility and far greater reliability.
“SQL and feature selection for structured data.
Embeddings and ML for unstructured data.”
This is a common assumption, but it can unnecessarily handcuff your data.
Often, explainability is the primary objection. Unstructured techniques seem to obscure the raw features people are used to seeing.
Using embeddings for structured data doesn’t mean abandoning explainability—it simply changes how we achieve it. While embeddings transform explicit fields and values into abstract vector representations, several complementary techniques maintain interpretability:
Attribution Methods:
Tools like SHAP and LIME trace predictions back to the original structured features that influenced them. Even after embedding, these methods reveal which customer attributes, transaction patterns, or behavioral signals drove a recommendation or decision.
Embedding Space Analysis:
Visualization techniques such as t-SNE or UMAP, along with simple distance metrics, show how the model organizes information. For structured data, this validates that customers with similar purchase histories cluster together, or that accounts with comparable risk profiles map to nearby points in the embedding space.
Hybrid Approaches:
Many production systems maintain both the original structured representation and the embedding. Queries execute efficiently in the vector space, while explanations reference the underlying structured attributes. This “best of both worlds” approach preserves performance and interpretability.
The key insight: embeddings don’t eliminate explainability: they shift it from field-by-field matching to geometric reasoning. Instead of explaining “these records matched on demographics AND purchase_category,” you explain “these customers are similar across a learned representation of their complete behavioral profile.”
If you build or maintain systems at scale, you already know this isn’t a debate about AI versus SQL. It’s about timing.
Structure reflects early assumptions. Those assumptions eventually become boundaries.
Avoiding that outcome doesn’t require starting over. It requires sequencing the work so the data tells you when to change it.
And when it does, you won’t need to force it … you’ll already see it coming.
About the Author
Chief Technology Officer
Dan has provided strategic, tactical and technology advisory services to a wide range of marketing technology and big data companies. Clients have included Altiscale, ShareThis, Ghostery, OwnerIQ, Netezza, Akamai, and Tremor Media. Dan was the founder and CEO of Korrelate, a leading automotive marketing attribution company, purchased by J.D. Power in 2014. Dan is the former president of TACODA, bought by AOL in 2007, and was the founder and CTO of Permissus, an enterprise privacy compliance technology provider. He was the Founder and CTO of Engage and served as the acting CTO of CMGI. Prior to Engage, he was the director of High Performance Computing at Fidelity Investments and worked at Epsilon and Accenture (formerly Andersen Consulting).
Dan graduated magna cum laude with a BA in Astronomy and Astrophysics and Physics from Harvard University.