Introduction to remote links
Couchbase is excited to announce its new Remote Links Analytics Service feature in the latest Couchbase Server 6.6 release. Remote links enable real-time operational analytics to obtain and analyze data from multiple Couchbase data clusters and datacenters in a separate cluster dedicated to the Analytics Service.
Customer use case
Prior to the 6.6 release, the Analytics Service was available within one cluster, but the service and its analyses were tied to that cluster. Several of our retail, lifestyle, and travel customers were performing analytics for their business lines (e.g., e-commerce, marketing, supply chain, etc.) in separate Couchbase clusters. They expressed a desire to unify data from various operational applications into a centralized analytics cluster. This motivated our engineering and product teams to help address this customer need. You can read more about other Analytics use cases here.
How do remote links work?
Remote links allow for the ingestion of data from the Data Service, a remote Couchbase cluster into an Analytics cluster. This is achieved in three simple steps:
- Set up a remote link by using a REST API call or the command-line interface (CLI)
- Create a dataset in the Analytics cluster on the remote link configured above
- Query the dataset using SQL++ (or your favorite BI tool)
Let’s walk through a simple example. iWorks, an e-commerce company, sells iPhone accessories online. The order data is stored in one Couchbase cluster in a bucket called “ecommerce” with docType “order”. The customer data is stored in a second Couchbase cluster in a bucket called “customer360” with docType “customer”. iWorks would like to use the Analytics Service to combine and analyze order data along with customer data to determine the top 3 customers by sales. The illustration directly below is prior to setting remote links:
Sample customer data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[ { "custid": "C31", "name": "D. Pitts", "docType": "customer", "address": { "street": "360 Mountain Ave.", "city": "St. Louis, MO", "zipcode": "63101" } }, { "custid": "C35", "name": "F. Robert", "docType": "customer", "address": { "street": "420 Green St.", "city": "Boston, MA", "zipcode": "02115" }, "rating": 565 } ] |
Sample order data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
[ { "orderno": 1004, "custid": "C35", "docType": "order", "order_date": "2020-07-10", "ship_date": "2020-07-15", "items": [ { "itemno": 680, "qty": 6, "price": 10.00 }, { "itemno": 195, "qty": 4, "price": 20.00 } ] } { "orderno": 1050, "custid": "C31", "docType": "order", "order_date": "2020-06-05", "ship_date": "2020-06-12", "items": [ { "itemno": 680, "qty": 4, "price": 10.00 }, { "itemno": 195, "qty": 2, "price": 20.00 } ] } ] |
Let’s follow the three steps from above with sample setup code along with a SQL++ query.
Step 1: Set up remote links
We’ll create two remote links on a new Analytics cluster using a REST API call. (Alternatively, you can use the CLI to create remote links.) Let’s first set up “order” remote link. We will need to provide:
-
- Analytics cluster hostname
- Analytics user credentials
- Remote link name (in this case remoteOrders)
- Dataverse name (if different from default)
- Link type as couchbase
- Order cluster hostname
- Order user credentials
- Specify the desired encryption type (in this case none)
1 2 3 4 5 6 7 8 9 |
$ curl -u <username>:<pwd> -X POST "http://<analytics_hostname>/analytics/link" -d dataverse=Default -d name=remoteOrders -d type=couchbase -d hostname=<orders_hostname> -d username=<orders_username> -d password=<orders_password> -d encryption=none |
Let’s now set up the “customer” remote link on the Analytics cluster. This step is similar to the one listed above, except we have to provide a new remote link name (in this case remoteCustomers) along with customer cluster host details and credentials. In this case we choose “full” as the encryption type (for illustration purposes) and we include the required certificate parameter.
1 2 3 4 5 6 7 8 9 |
$ curl -u <username>:<pwd> -X POST "http://<analytics_hostname>/analytics/link" -d dataverse=Default -d name=remoteCustomers -d type=couchbase -d hostname=<customer_hostname> -d username=<customer_username> -d password=<password> -d encryption=full --data-urlencode "certificate=$(cat ./targetClusterRootCert.pem)" |
The certificate in targetClusterRootCert.pem
can be retrieved from the web console of the target cluster.
The illustration below is after both remote links are set up:
Step 2: Create datasets and connect remote links
Using the Analytics workbench, we’ll now create two datasets named “orders” and “customers” on the two remote links we created above:
1 2 3 |
CREATE DATASET orders ON `ecommerce` AT remoteOrders WHERE docType = 'order'; |
1 2 3 |
CREATE DATASET customers ON `customer360` AT remoteCustomers WHERE docType = ‘customer’; |
Next, we’ll go ahead and connect both the remoteOrders and remoteCustomers links to allow data ingestion to take place from the Orders and Customers data cluster to the Analytics cluster. This demonstrates the powerful NoETL feature of JSON analytics. To be clear, no ETL is needed to move our NoSQL JSON data from one system to another before being able to analyze it. This saves time and processing power, enabling us to analyze the data right away and in its natural (application) form on the Analytics cluster.
1 |
CONNECT Link remoteOrders; |
1 |
CONNECT Link remoteCustomers; |
Step 3: Query using SQL++
As the last step, we can now run the SQL++ query listed below (looks exactly like SQL :)) to join orders and customers to get the top 3 customers with the highest sales.
1 2 3 4 5 6 |
SELECT c.name Customer, SUM(i.qty * i.price) Sales FROM orders o, o.items i, customers c WHERE o.custid = c.custid GROUP BY c.name ORDER BY Sales DESC LIMIT 3; |
Here are the JSON query results:
1 2 3 4 5 |
[ { "Customer": "D. Pitts", "Sales": 19005.31 }, { "Customer": "F. Robert", "Sales": 13036.8 }, { "Customer": "S. Weaver","Sales": 4639.92 } ] |
Woohoo! Remote links worked and we are now able to combine and analyze customer and order data together. Users can now develop a variety of complex ad hoc queries for further data exploration, answer new business questions, and bring in additional Couchbase data sources.
Benefits
Here are key benefits that come from using remote links:
- Extend Analytics’ reach. Ingesting data from multiple clusters enables more data to be consolidated. Use cases include combining and correlating data from multiple locations or multiple applications, as we have just seen.
- Lower Analytics’ total cost of ownership. The possibility of an independent Analytics cluster can reduce or eliminate the need for Analytics nodes to be included in each individual cluster, again as we have seen in the example above.
- Enable even faster time to insight. Customers can gain more insight immediately by performing correlations across different datasets without requiring the data of interest to first be published to a data warehouse. Notice how few steps were needed to enable us to analyze our data; no ETL was involved and the data was immediately available.
Summary
Remote links help lower TCO, improve resource utilization, and enable hybrid transactional/analytical processing (HTAP) for NoSQL solution development and deployments, as is often needed in modern applications. Remote links allow users to bring more data together in a single place, which enables organizations to gather more insights and do more correlation-style analyses across different datasets drawn from different clusters.
You can learn more about Remote Links here. Register here for our upcoming “What’s new in release 6.6 webinar”.
Explore Couchbase Server 6.6 resources
Co-author
Idris Motiwala, Principal Product Manager
Idris is a Principal Product Manager, Analytics at Couchbase with 20+ years experience in design, development and execution of software products at both Fortune 500s and startups leading teams in digital transformation, cloud and analytics. Idris holds an MS in Technology Management and certifications in product management.