Data Discovery at Scale

Organizations face a data discovery problem when their analysts spend more time finding relevant data than analyzing it. This problem has become common as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas, instead they want to find data across their organization to answer increasingly complex business questions.

We have built Aurum, a system to tackle data discovery problems at large. Aurum introduces a new discovery algebra, R2QL, that permits users to declare their intuition of what is relevant through a set of data primitives that expose the relations of the underlying data. The algebra relies on a metaschema graph to answer queries in human-scale latencies. Aurum is scalable: it builds the metaschema graph in linear time, despite the complexity of extracting complex relationships among thousands of data sources.

For more info, contact