Lower TCO With SQL++ FLATTEN_KEYS For JSON Arrays

An RDBMS with a well-established schema serves well when application functions remain static. While this rigid structure can guarantee stability, it doesn’t lend itself to fast-changing business requirements. Organizations today must consider modernizing their application infrastructure and moving from RDBMS to NoSQL.

This article looks into some benefits of the NoSQL document model, and how data and indexes can be organized to optimize query performance and index configuration. I will also discuss the index strategy that you can consider when migrating your RDBMS data model to the Couchbase NoSQL database, and how the Couchbase 7.1 FLATTEN_KEYS feature can help improve query performance and reduce the number of indexes.

An effective index strategy is one of the most important factors in operating a database. It helps us get the right balance between query performance and resource management. The database should not only efficiently capture data, it should also provide the most optimal access to that data. NoSQL databases are no different from RDBMS when it comes to an effective index strategy.

An RDBMS use case

Let’s consider the hotel object in the Couchbase travel-sample dataset and assume that we use the RDBMS model to capture this information. The relational model would look like this:

The hotel object has all the detailed information about the hotel. There is also a hotel_like object that records the names of customers who have clicked on like on the hotel’s social media page. There is also a hotel_review object that records all hotel reviews, including comments and detailed ratings for the different services and amenities.

Business requirement

Create a query to allow users to get a summary view of all the hotels in a particular city that they plan to visit. The summary should include the review ratings of each hotel, and how many people have liked the hotel. To narrow down the list, the query should focus on the recent reviews (2015 for this data set) and the review ratings should be 4 or above (5 is the highest rating).

The relational model query

SELECT  h.name hotel,
  count(distinct l.name) liked_by, 
  count(distinct r.author)  review_author,
  avg(r.ratings.Overall) avg_ratings
FROM  hotel h
  INNER JOIN hotel_review r ON r.hotel_id = h.id
  INNER JOIN hotel_like l ON l.hotel_id = h.id
WHERE h.city='London'
  AND   r.ratings.Overall >= 4
  AND   r.date > '2015-01-01'
  GROUP BY h.name
  ORDER BY avg(r.ratings.Overall) DESC

SELECT h.name hotel,

count(distinct l.name) liked_by,

count(distinct r.author) review_author,

avg(r.ratings.Overall) avg_ratings

FROM hotel h

INNER JOIN hotel_review r ON r.hotel_id = h.id

INNER JOIN hotel_like l ON l.hotel_id = h.id

WHERE h.city='London'

AND r.ratings.Overall >= 4

AND r.date > '2015-01-01'

GROUP BY h.name

ORDER BY avg(r.ratings.Overall) DESC

The index for a relational model query

hotel: Since there is a filter on city and name, an index should exist. Note that the hotel id is appended to the index because it can help with the JOIN.

CREATE INDEX adv_city_name_id ON `hotel`(`city`,`name`,`id`)

1	CREATE INDEX adv_city_name_id ON `hotel`(`city`,`name`,`id`)

hotel_review: There are two filters on the hotel reviews: the review date and the review ratings. There is also a need to eliminate double counting, because there is no explicit relationship between hotel_like and hotel_review, so the review author is added here for that purpose. The hotel_id field is also added to the index because it is the foreign key and can help with the JOIN.

CREATE INDEX adv_ratings_Overall_date_hotel_id_author ON `hotel_review`(`ratings`.`Overall`,`date`,`hotel_id`,`author`)

1	CREATE INDEX adv_ratings_Overall_date_hotel_id_author ON `hotel_review`(`ratings`.`Overall`,`date`,`hotel_id`,`author`)

hotel_review: While there are no filters on hotel_like, there is a need to eliminate double counting of likes, because there is no relationship between review and like. The hotel_id field is added to the index because it is the foreign key and can help with the JOIN.

CREATE INDEX adv_hotel_id_name ON `hotel_like`(`hotel_id`,`name`)

1	CREATE INDEX adv_hotel_id_name ON `hotel_like`(`hotel_id`,`name`)

The execution plan for the relational model query

Notice the need to perform:

1. Two JOINs
2. Three index scans (one for each object)

The Document Model view

For the document model, both the hotel_like and the hotel_review objects are stored as arrays in the hotel object. There is no strict rule that you should always include child objects as an array in the parent object, but it does make sense to do so in this case because these objects are always accessed together.

The denormalized document model query

SELECT h.name hotel,
    COUNT(distinct l) liked_by,
    COUNT(distinct r.author)  review_author,
    AVG(r.ratings.Overall) avg_ratings
  FROM  hotel h
    UNNEST h.reviews AS r 
    UNNEST  h.public_likes AS l 
  WHERE h.city='London'
    AND   r.ratings.Overall >= 0 
    AND   r.date > '2015-01-01'
  GROUP BY h.name
  ORDER BY avg(r.ratings.Overall) DESC

SELECT h.name hotel,

COUNT(distinct l) liked_by,

COUNT(distinct r.author) review_author,

AVG(r.ratings.Overall) avg_ratings

FROM hotel h

UNNEST h.reviews AS r

UNNEST h.public_likes AS l

WHERE h.city='London'

AND r.ratings.Overall >= 0

AND r.date > '2015-01-01'

GROUP BY h.name

ORDER BY avg(r.ratings.Overall) DESC

The index for the denormalized document model

CREATE INDEX adv_ALL_reviews_ratings_Overall_date_city 
 ON `hotel`(
             ALL ARRAY
                FLATTEN_KEYS(`r`.`ratings`.`Overall`,`r`.`date` DESC) 
                  FOR r IN `reviews` END,
            `city`)

CREATE INDEX adv_ALL_reviews_ratings_Overall_date_city

ON `hotel`(

ALL ARRAY

FLATTEN_KEYS(`r`.`ratings`.`Overall`,`r`.`date` DESC)

FOR r IN `reviews` END,

`city`)

A few points to note:

1. The SQL++ query references a single hotel object, so there is no need to perform any explicit JOIN between the parent hotel object and the child like or review
2. The query uses a single index that covers all the query predicates, i.e. the hotel.city, the reviews.ratings, and the reviews.date
3. When an array is indexed, the index can only be on a single key. FLATTEN_KEYS() allows composite fields from the array, thus allowing the predicates to be on multiple array fields.

The execution plan for the document model query

Notice the need to perform

1. Two UNNEST (no JOINs are needed as it is in single document)
2. Only ONE index scan

Summary

When a relational model is denormalized into a single object in the document model:

1. The SQL++ query is simpler because there is no need to perform any JOINs
2. Multiple relational model indexes could be combined into a single document model
3. The Couchbase 7.1 FLATTEN_KEYS feature combines multiple array element predicates into a single index.

This is just one of the new features we have delivered recently – read more about What’s New in Couchbase Server 7.1.

Madhuram Gupta

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Vector Search

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Capella Playground

Start A Free Capella Trial

Resource Center

Education

Compare

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts