Data Engineering Podcast

    This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

  • The Art of Database Selection and Evolution

  • Bridging Code and UI in Data Orchestration with Kestra

  • Streaming Data Into The Lakehouse With Iceberg And Trino At Going

  • An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin

  • Feldera: Bridging Batch and Streaming with Incremental Computation

  • Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent

  • Bring Vector Search And Storage To The Data Lake With Lance

  • The Role of Python in Shaping the Future of Data Platforms with DLT

  • Build Your Data Transformations Faster And Safer With SDF

  • Scaling Airbyte: Challenges and Milestones on the Road to 1.0

  • Enhancing Data Accessibility and Governance with Gravitino

  • The Evolution of DataOps: Insights from DataKitchen's CEO

  • Achieving Data Reliability: The Role of Data Contracts in Modern Data Management

  • How Generative AI Is Impacting Data Engineering Teams

  • The Role of Product Managers in Data-Centric Organizations

  • Neon: A Serverless And Developer Friendly Postgres

  • Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

  • Stitching Together Enterprise Analytics With Microsoft Fabric

  • Being Data Driven At Stripe With Trino And Iceberg

  • X-Ray Vision For Your Flink Stream Processing With Datorios

  • Practical First Steps In Data Governance For Long Term Success

  • Data Migration Strategies For Large Scale Systems

  • Zenlytic Is Building You A Better Coworker With AI Agents

  • Release Management For Data Platform Services And Logic

  • Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

  • Build Your Second Brain One Piece At A Time

  • Making Email Better With AI At Shortwave

  • Designing A Non-Relational Database Engine

  • Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

  • Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

  • Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

  • Reconciling The Data In Your Databases With Datafold

  • Version Your Data Lakehouse Like Your Software With Nessie

  • When And How To Conduct An AI Program

  • Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

  • Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

  • Data Sharing Across Business And Platform Boundaries

  • Tackling Real Time Streaming Data With SQL Using RisingWave

  • Build A Data Lake For Your Security Logs With Scanner

  • Modern Customer Data Platform Principles

  • Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

  • Designing Data Platforms For Fintech Companies

  • Troubleshooting Kafka In Production

  • Adding An Easy Mode For The Modern Data Stack With 5X

  • Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

  • Designing Data Transfer Systems That Scale

  • Addressing The Challenges Of Component Integration In Data Platform Architectures

  • Unlocking Your dbt Projects With Practical Advice For Practitioners

  • Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

  • Shining Some Light In The Black Box Of PostgreSQL Performance

  • Surveying The Market Of Database Products

  • Defining A Strategy For Your Data Products

  • Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

  • Using Data To Illuminate The Intentionally Opaque Insurance Industry

  • Building ETL Pipelines With Generative AI

  • Powering Vector Search With Real Time And Incremental Vector Indexes

  • Building Linked Data Products With JSON-LD

  • An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem

  • Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

  • Building An Internal Database As A Service Platform At Cloudflare

  • Harnessing Generative AI For Creating Educational Content With Illumidesk

  • Unpacking The Seven Principles Of Modern Data Pipelines

  • Quantifying The Return On Investment For Your Data Team

  • Strategies For A Successful Data Platform Migration

  • Build Real Time Applications With Operational Simplicity Using Dozer

  • Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

  • Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

  • How Data Engineering Teams Power Machine Learning With Feature Platforms

  • Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

  • How Column-Aware Development Tooling Yields Better Data Models

  • Build Better Tests For Your dbt Projects With Datafold And data-diff

  • Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

  • A Roadmap To Bootstrapping The Data Team At Your Startup

  • Keep Your Data Lake Fresh With Real Time Streams Using Estuary

  • What Happens When The Abstractions Leak On Your Data

  • Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

  • Realtime Data Applications Made Easier With Meroxa

  • Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

  • An Exploration Of The Composable Customer Data Platform

  • Mapping The Data Infrastructure Landscape As A Venture Capitalist

  • Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite

  • Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed

  • Use Your Data Warehouse To Power Your Product Analytics With NetSpring

  • Exploring The Nuances Of Building An Intentional Data Culture

  • Building A Data Mesh Platform At PayPal

  • The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

  • Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

  • Reflecting On The Past 6 Years Of Data Engineering

  • Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics

  • Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI

  • Building Applications With Data As Code On The DataOS

  • Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

  • Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

  • Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

  • An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

  • Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

  • Making Sense Of The Technical And Organizational Considerations Of Data Contracts

  • Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle

  • Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee

  • Run Your Applications Worldwide Without Worrying About The Database With Planetscale

  • Business Intelligence In The Palm Of Your Hand With Zing Data

  • Adopting Real-Time Data At Organizations Of Every Size

  • Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

  • Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase

  • A Look At The Data Systems Behind The Gameplay For League Of Legends

  • Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

  • Taking A Look Under The Hood At CreditKarma's Data Platform

  • Build Data Products Without A Data Team Using AgileData

  • Build Better Data Products By Creating Data, Not Consuming It

  • Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

  • Expanding The Reach of Business Intelligence Through Ubiquitous Embedded Analytics With Sisense

  • Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

  • How To Bring Agile Practices To Your Data Projects

  • Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

  • An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

  • Speeding Up The Time To Insight For Supply Chains And Logistics With The Pathway Database That Thinks

  • Making The Open Data Lakehouse Affordable Without The Overhead At Iomete

  • Investing In Understanding The Customer Journey At American Express

  • Gain Visibility And Insight Into Your Supply Chains Through Operational Analytics Powered By Roambee

  • Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

  • Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations

  • Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

  • Operational Analytics To Increase Efficiency For Multi-Location Businesses With OpsAnalitica

  • Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

  • Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

  • Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

  • A Reflection On Data Observability As It Reaches Broader Adoption

  • Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global

  • An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

  • Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

  • An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

  • Understanding The Role Of The Chief Data Officer

  • Bringing Automation To Data Labeling For Machine Learning With Watchful

  • Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

  • Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab

  • Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

  • Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda

  • What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

  • Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster

  • Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

  • Making The Total Cost Of Ownership For External Data Manageable With Crux

  • Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast

  • Charting the Path of Riskified's Data Platform Journey

  • Maintain Your Data Engineers' Sanity By Embracing Automation

  • Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff

  • The View From The Lakehouse Of Architectural Patterns For Your Data Platform

  • Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

  • Strategies And Tactics For A Successful Master Data Management Implementation

  • Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

  • Level Up Your Data Platform With Active Metadata

  • Discover And De-Clutter Your Unstructured Data With Aparavi

  • Hire And Scale Your Data Team With Intention

  • Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

  • Bringing The Modern Data Stack To Everyone With Y42

  • Data Cloud Cost Optimization With Bluesky Data

  • A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

  • Unlocking The Value Of Data Across The Organization Through User Friendly Data Tools With Prophecy

  • Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

  • Designing And Deploying IoT Analytics For Industrial Applications At Vopak

  • Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

  • Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database

  • Exploring The Insights And Impact Of Dan Delorey's Distinguished Career In Data

  • Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

  • Evolving And Scaling The Data Platform at Yotpo

  • Operational Analytics At Speed With Minimal Busy Work Using Incorta

  • Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

  • Connecting To The Next Frontier Of Computing With Quantum Networks

  • What Does It Really Mean To Do MLOps And What Is The Data Engineer's Role?

  • DataOps As A Service For Your Data Integration Workflows With Rivery

  • Synthetic Data As A Service For Simplifying Privacy Engineering With Gretel

  • Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

  • Repeatable Patterns For Designing Data Platforms And When To Customize Them

  • Eliminate The Bottlenecks In Your Key/Value Storage With SpeeDB

  • Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

  • Exploring Incident Management Strategies For Data Teams

  • Accelerate Your Embedded Analytics With Apache Pinot

  • Accelerating Adoption Of The Modern Data Stack At 5X Data

  • Taking A Multidimensional Approach To Data Observability At Acceldata

  • Move Your Database To The Data And Speed Up Your Analytics With DuckDB

  • Developer Friendly Application Persistence That Is Fast And Scalable With HarperDB

  • Reflections On Designing A Data Platform From Scratch

  • Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise

  • Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

  • Understanding The Immune System With Data At ImmunAI

  • Bring Your Code To Your Streaming And Static Data Without Effort With The Deephaven Real Time Query Engine

  • Build Your Own End To End Customer Data Platform With Rudderstack

  • Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions

  • Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets

  • A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

  • Effective Pandas Patterns For Data Engineering

  • The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

  • Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

  • Automated Data Quality Management Through Machine Learning With Anomalo

  • An Introduction To Data And Analytics Engineering For Non-Programmers

  • Open Source Reverse ETL For Everyone With Grouparoo

  • Data Observability Out Of The Box With Metaplane

  • Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

  • A Reflection On The Data Ecosystem For The Year 2021

  • Exploring The Evolving Role Of Data Engineers

  • Revisiting The Technical And Social Benefits Of The Data Mesh

  • Fast And Flexible Headless Data Analytics With Cube.JS

  • Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

  • Building Auditable Spark Pipelines At Capital One

  • Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

  • Data Driven Hiring For Data Professionals With Alooba

  • Experimentation and A/B Testing For Modern Data Teams With Eppo

  • Creating A Unified Experience For The Modern Data Stack At Mozart Data

  • Doing DataOps For External Data Sources As A Service at Demyst

  • Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

  • Laying The Foundation Of Your Data Platform For The Era Of Big Complexity With Dagster

  • Data Quality Starts At The Source

  • Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

  • Business Intelligence Beyond The Dashboard With ClicData

  • Exploring The Evolution And Adoption of Customer Data Platforms and Reverse ETL

  • Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

  • Streaming Data Pipelines Made SQL With Decodable

  • Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

  • Completing The Feedback Loop Of Data Through Operational Analytics With Census

  • Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

  • How And Why To Become Data Driven As A Business

  • Make Your Business Metrics Reusable With Open Source Headless BI Using Metriql

  • Adding Support For Distributed Transactions To The Redpanda Streaming Engine

  • Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike

  • Delivering Your Personal Data Cloud With Prifina

  • Digging Into Data Reliability Engineering

  • Massively Parallel Data Processing In Python Without The Effort Using Bodo

  • Declarative Machine Learning Without The Operational Overhead Using Continual

  • An Exploration Of The Data Engineering Requirements For Bioinformatics

  • Setting The Stage For The Next Chapter Of The Cassandra Database

  • A View From The Round Table Of Gartner's Cool Vendors

  • Designing And Building Data Platforms As A Product

  • Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

  • Do Away With Data Integration Through A Dataware Architecture With Cinchy

  • Decoupling Data Operations From Data Infrastructure Using Nexla

  • Let Your Analysts Build A Data Lakehouse With Cuelake

  • Migrate And Modify Your Data Platform Confidently With Compilerworks

  • Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

  • Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

  • Data Discovery From Dashboards To Databases With Castor

  • Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

  • Adding Context And Comprehension To Your Analytics Through Data Discovery With SelectStar

  • Building a Multi-Tenant Managed Platform For Streaming Data With Pulsar at Datastax

  • Bringing The Metrics Layer To The Masses With Transform

  • Strategies For Proactive Data Quality Management

  • Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

  • Exploring The Design And Benefits Of The Modern Data Stack

  • Democratize Data Cleaning Across Your Organization With Trifacta

  • Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

  • Leveling Up Open Source Data Integration With Meltano Hub And The Singer SDK

  • A Candid Exploration Of Timeseries Data Analysis With InfluxDB

  • Lessons Learned From The Pipeline Data Engineering Academy

  • Make Database Performance Optimization A Playful Experience With OtterTune

  • Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

  • Accelerating ML Training And Delivery With In-Database Machine Learning

  • Taking A Tour Of The Google Cloud Platform For Data And Analytics

  • Make Sure Your Records Are Reliable With The BookKeeper Distributed Storage Layer

  • Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook

  • Making Data Pipelines Self-Serve For Everyone With Shipyard

  • Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

  • Easily Build Advanced Similarity Search With The Pinecone Vector Database

  • A Holistic Approach To Data Governance Through Self Reflection At Collibra

  • Unlocking The Power of Data Lineage In Your Platform with OpenLineage

  • Building Your Data Warehouse On Top Of PostgreSQL

  • Making Analytical APIs Fast With Tinybird

  • Making Spark Cloud Native At Data Mechanics

  • The Grand Vision And Present Reality of DataOps

  • Self Service Data Exploration And Dashboarding With Superset

  • Moving Machine Learning Into The Data Pipeline at Cherre

  • Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

  • Put Your Whole Data Team On The Same Page With Atlan

  • Data Quality Management For The Whole Team With Soda Data

  • Real World Change Data Capture At Datacoral

  • Managing The DoorDash Data Platform

  • Leave Your Data Where It Is And Automate Feature Extraction With Molecula

  • Bridging The Gap Between Machine Learning And Operations At Iguazio

  • Self Service Open Source Data Integration With AirByte

  • Building The Foundations For Data Driven Businesses at 5xData

  • How Shopify Is Building Their Production Data Warehouse Using DBT

  • System Observability For The Cloud Native Era With Chronosphere

  • Making It Easier To Stick B2B Data Integration Pipelines Together With Hotglue

  • Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch

  • Enabling Version Controlled Data Collaboration With TerminusDB

  • Bringing Feature Stores and MLOps to the Enterprise at Tecton

  • Off The Shelf Data Governance With Satori

  • Low Friction Data Governance With Immuta

  • Building A Self Service Data Platform For Alternative Data Analytics At YipitData

  • Proven Patterns For Building Successful Data Teams

  • Streaming Data Integration Without The Code at Equalum

  • Keeping A Bigeye On The Data Quality Market

  • Self Service Data Management From Ingest To Insights With Isima

  • Building A Cost Effective Data Catalog With Tree Schema

  • Add Version Control To Your Data Lake With LakeFS

  • Cloud Native Data Security As Code With Cyral

  • Better Data Quality Through Observability With Monte Carlo

  • Rapid Delivery Of Business Intelligence Using Power BI

  • Self Service Real Time Data Integration Without The Headaches With Meroxa

  • Speed Up And Simplify Your Streaming Data Workloads With Red Panda

  • Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor

  • Distributed In Memory Processing And Streaming With Hazelcast

  • Simplify Your Data Architecture With The Presto Distributed SQL Engine

  • Building A Better Data Warehouse For The Cloud At Firebolt

  • Metadata Management And Integration At LinkedIn With DataHub

  • Exploring The TileDB Universal Data Engine

  • Closing The Loop On Event Data Collection With Iteratively

  • A Practical Introduction To Graph Data Applications

  • Build More Reliable Distributed Systems By Breaking Them With Jepsen

  • Making Wind Energy More Efficient With Data At Turbit Systems

  • Open Source Production Grade Data Integration With Meltano

  • DataOps For Streaming Systems With Lenses.io

  • Data Collection And Management To Power Sound Recognition At Audio Analytic

  • Bringing Business Analytics To End Users With GoodData

  • Accelerate Your Machine Learning With The StreamSQL Feature Store

  • Data Management Trends From An Investor Perspective

  • Building A Data Lake For The Database Administrator At Upsolver

  • Mapping The Customer Journey For B2B Companies At Dreamdata

  • Power Up Your PostgreSQL Analytics With Swarm64

  • StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

  • Enterprise Data Operations And Orchestration At Infoworks

  • Taming Complexity In Your Data Driven Organization With DataOps

  • Building Real Time Applications On Streaming Data With Eventador

  • Making Data Collection In Your Code Easy With Rookout

  • Building A Knowledge Graph Of Commercial Real Estate At Cherre

  • The Life Of A Non-Profit Data Professional

  • Behind The Scenes Of The Linode Object Storage Service

  • Building A New Foundation For CouchDB

  • Scaling Data Governance For Global Businesses With A Data Hub Architecture

  • Easier Stream Processing On Kafka With ksqlDB

  • Shining A Light on Shadow IT In Data And Analytics

  • Data Infrastructure Automation For Private SaaS At Snowplow

  • Data Modeling That Evolves With Your Business Using Data Vault

  • The Benefits And Challenges Of Building A Data Trust

  • Pay Down Technical Debt In Your Data Pipeline With Great Expectations

  • Replatforming Production Dataflows

  • Planet Scale SQL For The New Generation Of Applications With YugabyteDB

  • Change Data Capture For All Of Your Databases With Debezium

  • Building The DataDog Platform For Processing Timeseries Data At Massive Scale

  • Building The Materialize Engine For Interactive Streaming Analytics In SQL

  • Solving Data Lineage Tracking And Data Discovery At WeWork

  • SnowflakeDB: The Data Warehouse Built For The Cloud

  • Organizing And Empowering Data Engineers At Citadel

  • Building A Real Time Event Data Warehouse For Sentry

  • Escaping Analysis Paralysis For Your Data Platform With Data Virtualization

  • Designing For Data Protection

  • Automating Your Production Dataflows On Spark

  • Build Maintainable And Testable Data Applications With Dagster

  • Data Orchestration For Hybrid Cloud Analytics

  • Keeping Your Data Warehouse In Order With DataForm

  • Fast Analytics On Semi-Structured And Structured Data In The Cloud

  • Ship Faster With An Opinionated Data Pipeline Framework

  • Open Source Object Storage For All Of Your Data

  • Navigating Boundless Data Streams With The Swim Kernel

  • Building A Reliable And Performant Router For Observability Data

  • Building A Community For Data Professionals at Data Council

  • Building Tools And Platforms For Data Analytics

  • A High Performance Platform For The Full Big Data Lifecycle

  • Digging Into Data Replication At Fivetran

  • Solving Data Discovery At Lyft

  • Simplifying Data Integration Through Eventual Connectivity

  • Straining Your Data Lake Through A Data Mesh

  • Data Labeling That You Can Feel Good About With CloudFactory

  • Scale Your Analytics On The Clickhouse Data Warehouse

  • Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

  • The Workflow Engine For Data Engineers And Data Scientists

  • Maintaining Your Data Lake At Scale With Spark

  • Managing The Machine Learning Lifecycle

  • Evolving An ETL Pipeline For Better Productivity

  • Data Lineage For Your Pipelines

  • Build Your Data Analytics Like An Engineer With DBT

  • Using FoundationDB As The Bedrock For Your Distributed Systems

  • Running Your Database On Kubernetes With KubeDB

  • Unpacking Fauna: A Global Scale Cloud Native Database

  • Index Your Big Data With Pilosa For Faster Analytics

  • Serverless Data Pipelines On DataCoral

  • Why Analytics Projects Fail And What To Do About It

  • Building An Enterprise Data Fabric At CluedIn

  • A DataOps vs DevOps Cookoff In The Data Kitchen

  • Customer Analytics At Scale With Segment

  • Deep Learning For Data Engineers

  • Speed Up Your Analytics With The Alluxio Distributed Storage System

  • Machine Learning In The Enterprise

  • Cleaning And Curating Open Data For Archaeology

  • Managing Database Access Control For Teams With strongDM

  • Building Enterprise Big Data Systems At LEGO

  • TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

  • Performing Fast Data Analytics Using Apache Kudu - Episode 64

  • Simplifying Continuous Data Processing Using Stream Native Storage In Pravega with Tom Kaitchuck - Episode 63

  • Continuously Query Your Time-Series Data Using PipelineDB with Derek Nelson and Usman Masood - Episode 62

  • Advice On Scaling Your Data Pipeline Alongside Your Business with Christian Heinzmann - Episode 61

  • Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

  • Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt - Episode 59

  • Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

  • Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

  • How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

  • Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

  • Using Notebooks As The Unifying Layer For Data Roles At Netflix with Matthew Seal - Episode 54

  • Of Checklists, Ethics, and Data with Emily Miller and Peter Bull (Cross Post from Podcast.__init__) - Episode 53

  • Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52

  • Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov

  • Building A Knowledge Graph From Public Data At Enigma With Chris Groskopf - Episode 50

  • A Primer On Enterprise Data Curation with Todd Walter - Episode 49

  • Take Control Of Your Web Analytics Using Snowplow With Alexander Dean - Episode 48

  • Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

  • An Agile Approach To Master Data Management with Mark Marinelli - Episode 46

  • Protecting Your Data In Use At Enveil with Ellison Anne Williams - Episode 45

  • Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

  • Putting Airflow Into Production With James Meickle - Episode 43

  • Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42

  • Mobile Data Collection And Analysis Using Ona And Canopy With Peter Lubell-Doughtie - Episode 41

  • Ceph: A Reliable And Scalable Distributed Filesystem with Sage Weil - Episode 40

  • Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39

  • Leveraging Human Intelligence For Better AI At Alegion With Cheryl Martin - Episode 38

  • Package Management And Distribution For Your Data Using Quilt with Kevin Moore - Episode 37

  • User Analytics In Depth At Heap with Dan Robinson - Episode 36

  • CockroachDB In Depth with Peter Mattis - Episode 35

  • ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

  • The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

  • PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

  • Brief Conversations From The Open Data Science Conference: Part 2 - Episode 31

  • Brief Conversations From The Open Data Science Conference: Part 1 - Episode 30

  • Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29

  • Octopai: Metadata Management for Better Business Intelligence with Amnon Drori - Episode 28

  • Data Engineering Weekly with Joe Crobak - Episode 27

  • Defining DataOps with Chris Bergh - Episode 26

  • ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable - Episode 25

  • MarketStore: Managing Timeseries Financial Data with Hitoshi Harada and Christopher Ryan - Episode 24

  • Stretching The Elastic Stack with Philipp Krenn - Episode 23

  • Database Refactoring Patterns with Pramod Sadalage - Episode 22

  • The Future Data Economy with Roger Chen - Episode 21

  • Honeycomb Data Infrastructure with Sam Stokes - Episode 20

  • Data Teams with Will McGinnis - Episode 19

  • TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

  • Pulsar: Fast And Scalable Messaging with Rajan Dhabalia and Matteo Merli - Episode 17

  • Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16

  • Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

  • CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14

  • Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

  • Wallaroo with Sean T. Allen - Episode 12

  • SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11

  • Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10

  • data.world with Bryon Jacob - Episode 9

  • Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

  • Buzzfeed Data Infrastructure with Walter Menendez - Episode 7

  • Astronomer with Ry Walker - Episode 6

  • Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

  • ScyllaDB with Eyal Gutkind - Episode 4

  • Defining Data Engineering with Maxime Beauchemin - Episode 3

  • Dask with Matthew Rocklin - Episode 2

  • Pachyderm with Daniel Whitenack - Episode 1

  • Introducing The Show

undefined undefined
undefined undefined