Simplifying Query & Index with Collections

Couchbase Collections available with the upcoming version 7 brings Couchbase JSON database closer to that of other RDBMS. Collections will make it easier for customers who are looking to migrate their current RDBMS data model to Couchbase document model. This allows customers to benefit from Couchbase distributed shared-nothing database architecture, bringing high availability, and horizontal scalability. In addition to the flexibility of the data model, Couchbase N1QL is the only NoSQL database that provides N1QL, and extended SQL language, for querying its database.

What is a SCOPE

A bucket scope is equivalent to a schema in RDBMS. It is a logical container for Couchbase collections. Every Couchbase bucket contains a default scope. You can use these default containers directly, or define your own.

What is a COLLECTION

A collection is analogous to a table in RDBMS. Every scope has a default collection. While a collection can be used to store similar types of records, similar to RDBMS tables, there is no schema restriction on what you can store in a collection.

System keyspaces

Keyspaces	Description
system:all_scopes	A list of all scopes, including system objects, such as the default scopes
system:scopes	All available scopes except the system scopes
system:all_collections	A list of all collections, including system objects, such as the default collections
system:collections	All available collections except system collections

Key benefits for N1QL Collections

From the N1QL language perspective, both the Data Definition Language(DDL) and Data Manipulation Language (DML) will be simpler as you no longer need a type field on every document.

Query will be simpler and more intuitive

Bucket model

Collection model

SELECT  a.name
  FROM `travel-sample` a  
WHERE a.type='airport'
   AND a.city='London'

SELECT a.name

FROM `travel-sample` a

WHERE a.type='airport'

AND a.city='London'

SELECT a.name 
 FROM  airport a
WHERE a.city='London'

SELECT a.name

FROM airport a

WHERE a.city='London'

The collection model provides you with the equivalent of the relational table, bridging the similarity between the logical and the physical data model, that RDBMS users have accustomed to.

Simpler N1QL Join condition

Bucket model

Collection model

SELECT h.name, 
       a.faa
 FROM `travel-sample` h
   JOIN `travel-sample` a  
      ON h.city=a.city AND a.type='airport'
 WHERE h.type='hotel'
   AND a.city='London'

SELECT h.name,

a.faa

FROM `travel-sample` h

JOIN `travel-sample` a

ON h.city=a.city AND a.type='airport'

WHERE h.type='hotel'

AND a.city='London'

SELECT h.name, 
       a.faa
 FROM hotel h
   JOIN airport a 
     ON h.city=a.city
 WHERE a.city='London'

SELECT h.name,

a.faa

FROM hotel h

JOIN airport a

ON h.city=a.city

WHERE a.city='London'

The need to use the type field to restrict to a specific table of documents with the bucket model for the JOIN syntax is not longer required in the collection model.

Backward compatibility with Couchbase bucket model

Bucket model

Collection model

SELECT a.faa 
 FROM `travel-sample` a
WHERE a.type='airport'

SELECT a.faa

FROM `travel-sample` a

WHERE a.type='airport'

You can either use:

SELECT a.faa
FROM `travel-sample`
WHERE a.type='airport'

OR

SELECT a.faa
FROM default:`travel-sample`._default._default a
WHERE a.type='airport'

You can either use:

SELECT a.faa

FROM `travel-sample`

WHERE a.type='airport'

SELECT a.faa

FROM default:`travel-sample`._default._default a

WHERE a.type='airport'

The bucket model is still supported as before. However, the bucket data is actually stored in the default scope and collection. Please note that the namespace prefix is required, and should have the value default: for all references to the scope or collection.

N1QL Query with Collections

With the Collection model, the query engine needs to be aware of the full path of the collection name. This is because a collection name does not need to be unique in a bucket, but only in its own scope. A fully qualified collection name has the following format.

Format: namespace:bucket.scope.collection
Example: namespace:`travel-sample`.booking.hotel

1 2	Format: namespace:bucket.scope.collection Example: namespace:`travel-sample`.booking.hotel

However, you could reference a collection with its relative path by setting the query_context

Query Workbench UI allows you to set the query context by selecting the bucket and scope from the drop-down box.

The query context is also supported in SDKs, REST, and cbq.

Do you need to migrate to the Collections?

You do not need to migrate if you don’t want to.

Data – All existing data remains in the same bucket. You can reference your documents using the bucket query syntax, or using the new default scope and collection.
Queries – N1QL query syntax for the DDL and DML continue to support the bucket model.
Indexes – The existing indexes will remain at the bucket level, and will continue to be available to all of your queries as before.

Migrating from Bucket to Collections

For this section, I will use the travel-sample bucket as an example of to how to migrate from the bucket model to the collection model.

Data Migration

If your documents already have a field to identify their groups, then use that to create the collection. For the travel-sample, we can use the type field as the collection name. In addition, I will also create an inventory scope for all the travel samples collectioms

CREATE SCOPE `travel-sample`.inventory;
 
CREATE COLLECTION `travel-sample`.inventory.route;

CREATE COLLECTION `travel-sample.inventory.landmark;

CREATE COLLECTION `travel-sample`.inventory.airline;

CREATE COLLECTION `travel-sample`.inventory.hotel;

CREATE COLLECTION `travel-sample`.inventory.airport;

CREATE SCOPE `travel-sample`.inventory;

CREATE COLLECTION `travel-sample`.inventory.route;

CREATE COLLECTION `travel-sample.inventory.landmark;

CREATE COLLECTION `travel-sample`.inventory.airline;

CREATE COLLECTION `travel-sample`.inventory.hotel;

CREATE COLLECTION `travel-sample`.inventory.airport;

In the above example, we have created an ‘inventory’ scope and added new collections in the same travel-sample bucket.

Document Key

Document key needs to be unique. The document key that you currently have for your bucket should already be unique because they all exist in the same bucket. For that reason, the same document key should be adequate for use as a collection document key.

Copy the data

Here I use INSERT SELECT to copy the data from the bucket into the individual collection. I also use the existing META().id bucket key for the collection key.

INSERT INTO `travel-sample`.inventory.landmark (KEY k, VALUE val)
 SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='landmark' ;

INSERT INTO `travel-sample`.inventory.airline (KEY k, VALUE val)
 SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='airline' ;

INSERT INTO `travel-sample`.inventory.hotel (KEY k, VALUE val)
 ELECT META().id k, t val FROM `travel-sample` t WHERE t.type='hotel' ;

INSERT INTO `travel-sample`.inventory.airport (KEY k, VALUE val)&nbsp;
 SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='airport' ;

INSERT INTO `travel-sample`.inventory.route (KEY k, VALUE val)&nbsp;
 SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='route' ;

INSERT INTO `travel-sample`.inventory.landmark (KEY k, VALUE val)

SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='landmark' ;

INSERT INTO `travel-sample`.inventory.airline (KEY k, VALUE val)

SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='airline' ;

INSERT INTO `travel-sample`.inventory.hotel (KEY k, VALUE val)

ELECT META().id k, t val FROM `travel-sample` t WHERE t.type='hotel' ;

INSERT INTO `travel-sample`.inventory.airport (KEY k, VALUE val) 

SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='airport' ;

INSERT INTO `travel-sample`.inventory.route (KEY k, VALUE val) 

SELECT META().id k, t val FROM `travel-sample` t WHERE t.type='route' ;

Index conversion

It is highly likely that you will need to modify your existing bucket indexes for them to be effective for the collection model. This section lists out the common patterns for bucket indexes, and recommends the steps to convert them to a collection based index.

Bucket Index with `type` predicate – i.e. partial index

For a partial index with a specific type field filter, you can simply re-create the new index on the specific collection for the type.

Bucket Model

Collection Model

CREATE INDEX `def_route_src_dst_day` 
ON `travel-sample`(
      `sourceairport`,
      `destinationairport`,
       (distinct (array (`v`.`day`) 
         for `v` in `schedule` end))) 
WHERE (`type` = "route")

CREATE INDEX `def_route_src_dst_day`

ON `travel-sample`(

`sourceairport`,

`destinationairport`,

(distinct (array (`v`.`day`)

for `v` in `schedule` end)))

WHERE (`type` = "route")

CREATE INDEX `def_route_src_dst_day` 
ON `travel-sample`.inventory.route(
      `sourceairport`,
      `destinationairport`,
       (distinct (array (`v`.`day`) 
         for `v` in `schedule` end)))

CREATE INDEX `def_route_src_dst_day`

ON `travel-sample`.inventory.route(

`sourceairport`,

`destinationairport`,

(distinct (array (`v`.`day`)

for `v` in `schedule` end)))

Bucket Index without type predicate

A Global Secondary Index (GSI) index can be created for fields that may or may not exist in the document. For instance, an index may include the ‘icao’ field, but not every document may have the ‘icao’ field. For such an index, the indexer will only include the document that has the field. For this kind of index, you may need to be more specific, and create such an index for the collection where the field exists.

Bucket Model

Collection Model

CREATE INDEX `def_icao` ON `travel-sample`(`icao`)

1	CREATE INDEX `def_icao` ON `travel-sample`(`icao`)

CREATE INDEX `def_icao` 
ON `travel-sample`.inventory.airport(`icao`)

1 2	CREATE INDEX `def_icao` ON `travel-sample`.inventory.airport(`icao`)

Bucket index for common field

A bucket index can also be created without specifying the specific type, if the field exists in multiple document types. Consider the example where the field city exists in multiple document types, i.e. airport, landmark and hotel. But the bucket model only has a single def_city index, which can cover all three document types. For this index, you will now need to create a separate index for each collection.

Bucket Model

Collection Model

CREATE INDEX `def_city` ON `travel-sample`(`city`)

1	CREATE INDEX `def_city` ON `travel-sample`(`city`)

CREATE INDEX `def_airport_city` 
ON `travel-sample`.inventory.airport(`city`);

CREATE INDEX `def_hotel_city` 
ON `travel-sample`.inventory.hotel(`city`);

CREATE INDEX `def_landmark_city` 
ON `travel-sample`.inventory.landmark(`city`);

CREATE INDEX `def_airport_city`

ON `travel-sample`.inventory.airport(`city`);

CREATE INDEX `def_hotel_city`

ON `travel-sample`.inventory.hotel(`city`);

CREATE INDEX `def_landmark_city`

ON `travel-sample`.inventory.landmark(`city`);

Query conversion

Since the underlying data model has changed from a shared bucket to an individual collection based. The queries that you have will also need to be modified. Furthermore, after you have modified the query, you will also need to ensure that the queries are using the new collection based indexes.

Bucket Model

Collection Model

SELECT airline.name, 
       airport.airportname, 
       COUNT(1)
FROM `travel-sample` airport
INNER JOIN `travel-sample` route 
  ON airport.faa = route.sourceairport
  AND route.type = "route"
INNER JOIN `travel-sample` airline 
  ON route.airline = airline.iata
  AND airline.type = "airline"
WHERE airport.type = "airport"
  AND airport.city = "San Jose"
  AND airport.country = "United States"
  AND ANY x in route.schedule 
            SATISFIES x.day =0 END
GROUP BY airline.name, airport.airportname

SELECT airline.name,

airport.airportname,

COUNT(1)

FROM `travel-sample` airport

INNER JOIN `travel-sample` route

ON airport.faa = route.sourceairport

AND route.type = "route"

INNER JOIN `travel-sample` airline

ON route.airline = airline.iata

AND airline.type = "airline"

WHERE airport.type = "airport"

AND airport.city = "San Jose"

AND airport.country = "United States"

AND ANY x in route.schedule

SATISFIES x.day =0 END

GROUP BY airline.name, airport.airportname

SELECT airline.name, 
       airport.airportname,
       COUNT(1)
FROM <del>`travel-sample`</del> airport
INNER JOIN <del>`travel-sample`</del> route 
  ON airport.faa = route.sourceairport
  <del>AND route.type = "route"</del>
INNER JOIN <del>`travel-sample`</del> airline 
  ON route.airline = airline.iata
  <del>AND airline.type = "airline"</del>
WHERE <del>airport.type = "airport"
  AND</del> airport.city = "San Jose"
  AND airport.country = "United States"
  AND ANY x in route.schedule 
          SATISFIES x.day =0 END
GROUP BY airline.name, airport.airportname

SELECT airline.name,

airport.airportname,

COUNT(1)

FROM <del>`travel-sample`</del> airport

INNER JOIN <del>`travel-sample`</del> route

ON airport.faa = route.sourceairport

<del>AND route.type = "route"</del>

INNER JOIN <del>`travel-sample`</del> airline

ON route.airline = airline.iata

<del>AND airline.type = "airline"</del>

WHERE <del>airport.type = "airport"

AND</del> airport.city = "San Jose"

AND airport.country = "United States"

AND ANY x in route.schedule

SATISFIES x.day =0 END

GROUP BY airline.name, airport.airportname

The re-written query without the type filters.

SELECT airline.name, 
       airport.airportname, 
       COUNT(1)
FROM airport
INNER JOIN `travel-sample` route 
  ON airport.faa = route.sourceairport
INNER JOIN `travel-sample` airline 
  ON route.airline = airline.iata
WHERE airport.city = "San Jose"
  AND airport.country = "United States"
  AND ANY x in route.schedule 
            SATISFIES x.day =0 END
GROUP BY airline.name, airport.airportname

The re-written query without the type filters.

SELECT airline.name,

airport.airportname,

COUNT(1)

FROM airport

INNER JOIN `travel-sample` route

ON airport.faa = route.sourceairport

INNER JOIN `travel-sample` airline

ON route.airline = airline.iata

WHERE airport.city = "San Jose"

AND airport.country = "United States"

AND ANY x in route.schedule

SATISFIES x.day =0 END

GROUP BY airline.name, airport.airportname

Documentation

Author

james.oquendo

Couchbase team member

View all posts

Spread the love

Platform

Why Couchbase?

SEE HOW CAPELLA STACKS UP

By Use Case

By industry

By Application need

Popular Docs

By Developer Role

Capella Playground

Start a Capella session

Resource Center

Education

CERTIFICATION EXAMS 2023

Get Couchbase certified

About

Partnerships

Our Services

PARTNERS: REGISTER A DEAL

Ready to register a deal with Couchbase?

Marriott

Simplifying Query & Index with Collections

What is a SCOPE

What is a COLLECTION

System keyspaces

Key benefits for N1QL Collections

Query will be simpler and more intuitive

Simpler N1QL Join condition

Backward compatibility with Couchbase bucket model

N1QL Query with Collections

Do you need to migrate to the Collections?

Migrating from Bucket to Collections

Data Migration

Document Key

Copy the data

Index conversion

Bucket Index with type predicate – i.e. partial index

Bucket Index without type predicate

Bucket index for common field

Query conversion

Documentation

Author

Author

Posted by james.oquendo

Leave a reply Cancel reply

Bucket Index with `type` predicate – i.e. partial index