Introduction

One of our exciting new features in Couchbase Server 7.0 Beta is support for User-Defined Functions (UDFs) in the Analytics Service. UDFs are reusable and parameterizable N1QL queries that can be used to modularize queries and increase code reuse.

UDF’s Definition

UDFs have two parts:

  1. Function Signature: The signature consists of the function name and number of parameters. A function can take zero, one, or more parameters. Each function belongs to a dataverse and must have a unique signature within that dataverse. This means a user can create two functions in the same dataverse that have the same function name but a different number of parameters
  2. Function Body: A function body can be either an expression or a subquery and can refer to existing datasets or to other functions in the same or different dataverses.

Here are two examples showing the signature (my_dataverse.GetOrders(…)) and the body (everything between the curly braces):

If the CREATE ANALYTICS FUNCTION statement doesn’t provide a dataverse name, then the function is created in the active dataverse. If a dataverse is not specified, then the function will be created in the “Default” dataverse. A dataverse can be provided via a “USE <<dataverse>>” statement in the query or via a query_context REST API parameter.

UDF example in Analytics Service

Before we show the full lifecycle of an Analytics UDF, let’s first establish some context. Diagram A below represents sample data that could be used by a travel business to manage airline and hotel bookings. We have a bucket named travel (similar to a database in an RDBMS), with a scope called inventory (similar to a schema in an RDBMS), and four collections named airline, airport, hotel, and route (similar to tables in an RDBMS). For each of these collections, we create four corresponding Analytics collections (aka datasets) and organize them within the travel.inventory scope (aka dataverse).

Diagram A

Sample use case

For this use case, assume your analytics team wants to query frequently which airlines within a given country fly the most routes and what their percentile ranks are. Let’s see how we can use UDFs to help out the team.

Manage your UDF’s 

For this example, we create the UDF in the Analytics scope or dataverse designated as travel.inventory. Here’s the DDL statement to create a function that will render the formulation  of our query much more succinct:

The UDF takes the parameter in_country as input for the country name, and it is used in the query as a filter. The query performs a join between airline and route collections and performs aggregations for each airline in order to:

  1. Count number of routes 
  2. Calculate the percentile rank of route counts using the PERCENT_RANK function with OVER clause. (The OVER clause retrieves a specific set of rows relative to the current row and performs an aggregation over the id field. You can learn more about window functions in this blog post)

Evaluate your UDF’s

The UDF we created can now be used in the following  query that determines the top 3 airlines for a given country (“United States” in this case):

The result of evaluating this query shows the top 3 airlines flying with most routes along with their percentile ranks:

Drop your UDFs

Once the UDF in our example is not needed anymore it can be dropped using this DDL statement:

Summary

Reusable UDFs are helpful to streamline and modularize your code while also providing more flexibility for your system.

Next Steps

I hope you are excited about this new functionality of UDF’s for Analytics from Couchbase. We’d love to hear your thoughts with your comments below about the new UDF feature and how you’re using it. Below is a list of resources for you to get started and we look forward to your feedback on Couchbase Forums

Documentation 

What’s new

Release notes

Get the Beta

Download

Author

Spread the love

Author

Posted by james.oquendo

Couchbase team member

Leave a reply