Why Databricks will surpass Snowflake to become the biggest software IPO in the history
Enterprises are looking to monetize the vast amounts of data generating every minute. Snowflake and Databricks help customers manage their data intrastructure in the cloud. Let’s look at the timeline and analyse contrasting philosophies of Databricks and Snowflake.
2013 – Raised $13.9M from a16z venture firm
2016 – Free community edition of cloud-based data platform
2017 – Databricks as first party service on Microsoft; not just a software in Microsoft Azure marketplace
2019 – Databricks open source’s key software (Delta lake)
2020 – Introduces SQL analytics for analysts
2021 – 5000 customers; raised $1B at $28B valuation
Open-source led growth versus Product-led growth
Databricks product is based on open-source Apache Spark. Apache Spark is put to test by open source community continuously compared to Snowflake’s closed source development. Databricks is behind other open-source projects such as MLflow and Delta Lake, opening contributions from other software companies for fast development.
Built for ML workloads versus Business intelligence
Historically, enteprises stored structured data in Oracle, Teradata, IBM’s on-premises structured data warehouses. Snowflake built the data warehouse for structured data with first class support for analysts (business intelligence).
Today, there is unstructured data generated every minute and enterprises are looking for centralized platforms for data workloads including ML, streaming analytics and business intelligence. Databricks is built for processing and ML relying on continuous open-source innovation.
Snowflake best supports structured data analysis while Databricks supports unstructured data although both data platforms are slowly converging to provide a similar suite of services. Data technology landscape is changing at a faster pace and relying on open-source helps enteprises to stay ahead of the curve(Databricks > Snowflake).