Couchbase 7.0 is a major milestone for databases.
In the latest release, there are over 33 major innovations and 100 minor innovations, all culminating in a product that is both ground breaking and yet familiar to developers and DevOps engineers.
What we have achieved with this release is that for the first time you can interleave full SQL transactions with high-performance, high-scale operations, all in a single database. This eliminates the need for developers to choose one system for transactions and a different one for high-speed, low-latency and scale.
I am going to talk about only two of these innovations in this article. In the coming days and weeks, you’ll see several other blog posts from our brilliant technical team about the other details of the Couchbase 7.0 release
Couchbase 7.0 introduces for the first time multi-statement SQL transactions on a flexible-schema document database. Further, we have introduced the concept of Scopes and Collections in 7.0.
These two innovations – multi-statement transactions and data organization – offer the familiar concepts of relational databases on a NoSQL system. Now it’s easier than ever for developers to migrate their monolithic applications of the past into modern microservices that meet the current and future demands of enterprise organizations.
The Limitations of Relational Databases
From a database evolution perspective, there were two major limitations that relational database management systems (RDBMS) posed. One was scale and the other was schema rigidity.
The vertically scaling nature of the relational systems that were built in the 80s reflected the infrastructure reality of those times. Networks – LANs even – were scarcely available, let alone WANs and the internet. So bringing a network in the middle of compute was not possible unless you built the entire hardware stack and managed it all in your own proprietary way.
As a result, transactional databases essentially became single-server databases. But modern databases require horizontal scaling systems that expand and contract hardware usage based on resource utilization. This is one of the first problems NoSQL solved.
NoSQL databases also gave the world schema flexibility, which is the primary reason these systems are so popular. But we – as a technology ecosystem – forgot one of the most important lessons that 40 years of database research and innovation taught us.
That was the lesson of SQL.
The Strengths of SQL
Hundreds of databases have come and gone, with each proclaiming its own programming paradigm, but only SQL has withstood the test of time.
The reason for this endurance is two fold: First is the math underpinning the relational model (known as relational calculus), and second is the simplicity of the programming language. Structured Query Language (SQL) is functional, and it’s based on English.
These two strengths of SQL allow you to express complex set theory concepts in a syntax that is close to how you would express it in English and hide all the programming complexities and other data manipulation idiosyncrasies within the database system. This declarative simplicity is what led to the popularity of relational databases and allowed developer teams to build some of the most sophisticated enterprise applications of the last 40-odd years.
When it came to developing the Couchbase 7.0 release, it wasn’t an easy proposition bringing that relational calculus and programming simplicity to a flexible-schema database. It’s been a six-year journey.
With the N1QL query language, most of the strengths of SQL are already covered. N1QL offers the “opposable thumb” of relational systems – the JOIN – as well as subqueries and window functions. It’s the syntactical equivalence of relational databases on a flexible-schema document database.
Innovation #1: Multi-Statement SQL Transactions in N1QL
Over the years, Couchbase documents have included ACID properties. That offering has matured from single-document ACID (in Couchbase 2.0) to multi-document distributed ACID (in 6.5) and Couchbase now supports SQL multi-statement transactions in 7.0.
This is a major step in the evolution of modern databases.
The document data model removes the impedance mismatch between application developers, who think in terms of objects, and databases, that store data in terms of rows and columns.
For Couchbase, JSON is the data model. JSON provides objects, hierarchy and arrays. The N1QL query language and the query engines help you query, transform and manipulate JSON declaratively. There’s no need to write long pipelines that are hard to write and maintain. No need to manually optimize. N1QL is SQL for JSON. You write; Couchbase optimizes.
SQL is many things. It’s a query language for reporting, but it’s also a query language for transaction processing. Couchbase Server 7.0 now provides multi-statement transactions in N1QL, and it offers ACID transactions just like an RDBMS.
Distributed transactions like this have been around for a few decades now. However, they never scaled since they ran into what are popularly known as the blocking and cloggage problems.
Couchbase is a multi-model, multi-access database. You can access and manipulate the JSON document via the key-value API and via the N1QL query language. In introducing multi-statement transactions, our first step was support for key-value data access. This step was included in the 6.5 release with an architecture for multi-document transactions, but it lacked a central coordinator.
The 7.0 release extends this support to multi-statement SQL transactions on N1QL. This required a novel, patent-pending approach that exploits the scale-out architecture and optimistic concurrency of Couchbase to implement distributed transactions.
Scale out is achieved by avoiding the central coordinator for each transaction. Details of each transaction are kept in multiple active transaction records and are used by the commit protocol. The distributed commit protocol is run on multiple instances and in parallel. This makes the systems more scalable and cost efficient.
Here is how a SQL multi-statement transaction looks in a traditional relational database:
1 2 3 4 5 6 7 8 9 10 |
START TRANSACTION; SET TRANSACTION ISOLATION LEVEL READ COMMITTED; UPDATE customer SET balance = balance + 100 WHERE cid = 4872; SELECT cid, name, balance FROM customer; SAVEPOINT s1; UPDATE customer SET balance = balance – 100 WHERE cid = 1924; SELECT cid, name, balance FROM customer; ROLLBACK WORK TO SAVEPOINT s1; SELECT cid, name, balance FROM customer; COMMIT ; |
This same transaction works in Couchbase 7.0 without any modification.
We support BEGIN
, COMMIT
, ROLLBACK
and SAVEPOINT
to control the transaction. The ACID semantics like statement-level atomicity and read-your-own-writes are supported for each DML statement within a transaction.
These multi-statement transactions are the first major innovation I referred to earlier. This innovation gives developers the familiar transaction protection on a database that tackles all the complexities of a flexible schema and elastic data distribution.
Innovation #2: Data Organization
The other major innovation introduced in the Couchbase 7.0 release is data organization within the database. This is where Scope and Collections come in.
RDBMSs have database -> schema -> table -> rows -> columns. Now Couchbase has Bucket -> Scope -> Collection -> documents -> fields. With the introduction of Scopes and Collections, there is a direct match between the ontology of a traditional relational database and a NoSQL database.
Bringing It All Together
The combination of these two innovations establishes equivalence to a relational database system at both the structural level of data organization – with Scopes and Collections – as well as functional level equivalence – multi-statement transactions.
Let’s take a closer look at what these two innovations mean for a single transaction.
A schema-flexible database means that you don’t have to predefine a schema before storing data in such a database. Couchbase is a JSON database, so as long as you store JSON data, you don’t need to worry about defining schema as JSON is self describing.
We all know there is no such thing as a schema-less application. The difference with NoSQL is that the database doesn’t impose its own schema (data model) over and beyond the schema (object model) imposed and implied by the application. As you can imagine, this JSON is nothing but the data portion of the object. Schema evolution is nothing but changes to the JSON itself.
The introduction of Scopes and Collections gives you the ability to namespace the JSON documents and organize them in a meaningful manner. Being a flexible-schema database, Couchbase preserves the ability to create, update and modify Scopes and Collections as well as creating indexes and other associated schema artifacts. Even further, it allows for moving data between Collections, handling elastic capacity scaling and adding or removing nodes – all while a transaction is in flight.
To put it differently, the ACID semantics are guaranteed, despite all this data movement, from physical to logical reorganization across the whole cluster. You don’t have to do any specific administrative structuring of data into appropriate nodes or shards to make this work. When you issue a multi-statement SQL transaction, Couchbase is precisely designed to handle this fluidity of data movement beneath the surface.
This is the flexibility that the NoSQL movement promised. It just wasn’t fully realized until now.
Traditional relational databases make you pay the full ACID price for each and every operation in the database. RDBMS were built at a time when applications went to the database only for transactional ledger-like applications, running on a mainframe with a limited number of users.
This full-ACID approach has proven so costly for today’s applications that in the last 10 years, solution architects have had to choose at the onset of designing a new microservice: Does this microservice require transactional semantics? If yes, then use a relational database. If not, thank god, we can use a scalable, performant, low-latency database where we can develop quickly and scale easily. This choice – which had to be made at the beginning of every new project, application or microservice – led to developer frustration, database sprawl, data inconsistency, data insecurity and data governance issues.
That dilemma is now one for the history books.
Now, What Will You Build?
Developers can now use a single database for scale, low-latency, performance and schema flexibility as well as SQL and full-ACID guarantees without compromising on any of their needs. You pay the cost of transactions only when required, and you otherwise liberate the system to perform at scale.
Couchbase 7.0 is a database where you can perform your key-value operations, your SQL queries, your full ACID transactions, your tokenized searches, your event streaming, or your ad-hoc analytical queries – in the cloud or at the edge. The power of bringing all these engines on those JSONs that you write once into the database enables your developers to build a new class of applications that have not yet been imagined.
I am excited about what the Couchbase 7.0 release offers, but I’m even more excited when I hear of an application you built and I ask in disbelief: “Wait, you built that on Couchbase?!”.
I’m looking forward to that,
Ravi