N1QL Now Supports User-Defined Functions

Declarative query languages have been a major gear shift in the world of database engines.

The SQL++ query language (formerly N1QL) is primarily a declarative query language. You tell the query what to get and N1QL works out the rest of the details of how to make it happen.

Yet, the ability to programmatically instruct your database query comes in handy in several situations. After all, you know your own business logic – N1QL does not.

Enter user-defined functions (UDFs). UDFs give you more control over a particular query and let you instruct the query language on how certain tasks are done. The Couchbase Server 7.0 release includes user-defined functions for the N1QL query language.

N1QL is rather agnostic as far as underlying programming languages go. Rather than specifying its own procedural language, N1QL employs a Language Manager, meaning that it is already designed to support a multitude of guest languages.

For now, the languages supported are inline, an internal language that allows you to code any valid N1QL expression (including subqueries) and JavaScript. Let’s begin by comparing these two supported languages for N1QL UDFs.

N1QL UDF Example: Basic Usage

In order to add your business logic, you have to create functions, like the inline example below:

CREATE FUNCTION add(arg1, arg2) { arg1 + arg2 }

1	CREATE FUNCTION add(arg1, arg2) { arg1 + arg2 }

You can then use your business logic freely wherever an expression is allowed, or directly through the EXECUTE FUNCTION statement:

select add(1, 2)
EXECUTE FUNCTION add(1, 2)

1 2	select add(1, 2) EXECUTE FUNCTION add(1, 2)

And when you no longer have a use for it, you just drop it:

DROP FUNCTION add

1	DROP FUNCTION add

N1QL UDF Example: JavaScript

Creating and dropping JavaScript user-defined functions is slightly more convoluted, for reasons that will become apparent in the next section.

JavaScript UDFs are technically external functions. That’s because they’re written in a different language and are executed in process different from the N1QL Query Service.

Your first step is to create the JavaScript code:

curl -v -X POST http://localhost:8093/evaluator/v1/libraries/math -u Administrator:password -d 'function add(a, b) { let data = a + b; return data; }'

1	curl -v -X POST http://localhost:8093/evaluator/v1/libraries/math -u Administrator:password -d 'function add(a, b) { let data = a + b; return data; }'

Next, create the N1QL function:

CREATE FUNCTION javascriptAdd(a, b) LANGUAGE javascript AS "add" AT "math"

1	CREATE FUNCTION javascriptAdd(a, b) LANGUAGE javascript AS "add" AT "math"

Once created, you can create and drop JavaScript UDFs exactly like inline ones. When you no longer need the library, you can delete it using this command:

curl -v -X DELETE http://localhost:8093/evaluator/v1/libraries/math -u Administrator:password

1	curl -v -X DELETE http://localhost:8093/evaluator/v1/libraries/math -u Administrator:password

New in Couchbase 7.0: User-Defined Functions for Scopes & Collections

One of the major features of the Couchbase Server 7.0 release is the introduction of Scopes and Collections to further help you organize your JSON documents. Collections are storage units for a group of similar documents, and Scopes are larger storage units holding multiple Collections, allowing you to partition applications.

With Scopes and Collections,
Couchbase Server – a document database – now offers all the strengths of the relational model – schema, tables, columns – without any of its weaknesses.

Quick side-step: In order to allow backward compatibility, the Query Service now has a query_context REST API parameter that indicates which Bucket and Scope should be used to resolve relative keyspace names, for example:

cbq> \set -query_context "default:travel-sample.scope1";
cbq> select * from airlines;

1 2	cbq> \set -query_context "default:travel-sample.scope1"; cbq> select * from airlines;

In the statement above, the SELECT statement resolves airlines to the Collection named default:travel-sample.scope1.airlines.

Okay, now back to user-defined functions:

The logic behind Scopes and Collections with UDFs is that you could have a development, pre-production and production environments of the same application in three different Scopes, where three copies of the same application are deployed with the same logic. Conversely, you could have Scopes contain different deployments of the same application where different business logic is required (e.g., online shops where different discount or delivery structures are applied, or an accountancy application, where different taxation rules are applied).

In the first case, having a global definition for individual functions suffices because the business logic is the same. But in order to support the second case, UDFs had to be partitioned across Scopes. Each Scope can have its own instance of the same UDF, each with different logic.

It’s also worth noting that Couchbase Server 7.0 does not force you to switch over to Scopes and Collections, so it was important that UDFs not be tied to the Collections feature.

A Note on Global UDFs

Global UDFs are not dependent on Scopes, meaning that they are backward compatible with UDFs introduced in 6.5.

If you are not using Collections, you don’t have to do anything special to use them.
Global UDFs are referenced using a two-part fully qualified name, as below:

CREATE FUNCTION default:bucket1.scope1.func1()  { 0 };
EXECUTE FUNCTION default:bucket1.scope1.func1();

1 2	CREATE FUNCTION default:bucket1.scope1.func1() { 0 }; EXECUTE FUNCTION default:bucket1.scope1.func1();

The Importance of the query_context Parameter

“Okay, now I am confused,” I hear you say. “Which functions were you using then in the previous examples? The names weren’t fully qualified.”

The answer lies in the query_context REST API parameter setting: if the query_context is unset, the N1QL parser resolves the name to global functions. If the query_context is set, the parser uses the query_context value to resolve the names to the relevant Scope functions, much like it did with non-qualified keyspace names.

You can seamlessly deploy your application against Scopes or against Buckets just by changing the setting of the query_context REST API parameter.

Object Resolution Inside Functions

When user-defined functions reference fully qualified objects, there’s no ambiguity as to which object they mean:

CREATE FUNCTION func { (SELECT * FROM default:bucket1.scope1.collection1) };

1	CREATE FUNCTION func { (SELECT * FROM default:bucket1.scope1.collection1) };

The next question is, how do functions resolve objects referenced inside them? For instance:

CREATE FUNCTION func1() { func2() };
CREATE FUNCTION default:func3 { (SELECT * FROM keyspace1) };

1 2	CREATE FUNCTION func1() { func2() }; CREATE FUNCTION default:func3 { (SELECT * FROM keyspace1) };

The key here is the principle of least surprise: during execution, user-defined functions switch query context to the path under which they were created and always reference objects inside that path.

It does not matter if you called them with a relative or a fully qualified path, a function called with the same parameters always returns the same results, which is taken from the same objects.

In the first example above, func2() would resolve to a global or a Scope function depending on the query_context setting at creation time.

For func3(), keyspace1 would be the Bucket keyspace1.

Mixing Global vs. Scope Functions

There’s nothing to stop you from using global UDFs alongside the Collections feature: just use the fully qualified names.

Similarly, if you want to use a Scope function created in a different Scope, all you have to do is reference it directly.

JavaScript Functions, Revisited

Here’s why JavaScript functions should be split into two parts – i.e., a body and a separate function definition:

- When you create the body separately from the function definition, you can reuse the same body in multiple places. The same function in multiple places could reference the same body, and when the body is edited or redefined, then that change automatically applies across all related function definitions.
- Similarly, this pattern is also important when it comes to dropping or deleting functions. When you need to drop one instance of a function, the body needs to stay in place if it’s used by other instances.

Tips & Tricks for Using N1QL User-Defined Functions in Couchbase 7.0

Naming Your UDFs

UDF names are identifiers and cannot match predefined function names. If the names do match, then the predefined function takes precedence, meaning your UDF doesn’t get used.

If you want to use a predefined function name (for some reason), you have to fully qualify it when you reference it. For example:

CREATE FUNCTION length(arg) { 0 };
SELECT default:length(type) FROM `travel-sample`;

1 2	CREATE FUNCTION length(arg) { 0 }; SELECT default:length(type) FROM `travel-sample`;

UDF Parameters

N1QL user-defined functions support three types of parameter lists:

Empty parameters

CREATE FUNCTION func1() { 0 }

1

CREATE FUNCTION func1() { 0 }

The function takes no parameters. Any parameter passed results in an error.
Variadic parameters

CREATE FUNCTION func1(...) { array_length(args) }

1

CREATE FUNCTION func1(...) { array_length(args) }

Three dots denote a variadic function. It can take any number of parameters of any type. The parameters are contained in an array names args.
Named parameters

CREATE FUNCTION func1(arg1) { arg1 }

1

CREATE FUNCTION func1(arg1) { arg1 }

The parameters are not typed, but the number of arguments passed is enforced.

Some Tips on Overloading & Type Handling

Untyped parameters and variadic functions are the closest that Couchbase will get to function overloading: where the same function is defined multiple times with a different parameter list – or different parameter types – in order to be able to operate differently depending on the input.

I would recommend this N1QL strategy instead: Have a single function which checks the arguments received and acts accordingly.

An example of a variadic function follows:

CREATE FUNCTION variadic(...) { CASE WHEN array_length(args) != 1 THEN "wrong args: " || to_string(array_length(args))  WHEN type(args[0]) = "string" THEN args[0] ELSE "wrong type " || type(args[0]) || ": " || to_string(args[0]) END }

1	CREATE FUNCTION variadic(...) { CASE WHEN array_length(args) != 1 THEN "wrong args: " \|\| to_string(array_length(args)) WHEN type(args[0]) = "string" THEN args[0] ELSE "wrong type " \|\| type(args[0]) \|\| ": " \|\| to_string(args[0]) END }

And a non-variadic function:

CREATE FUNCTION twoargs(arg1, arg2) { CASE WHEN type(arg1) != "string" THEN "wrong arg1 " || type(arg1) || ": " || to_string(arg1) WHEN type(arg2) != "string" THEN "wrong arg2 " || type(arg2) || ": " || to_string(arg2) ELSE arg1 || arg2 END }

1	CREATE FUNCTION twoargs(arg1, arg2) { CASE WHEN type(arg1) != "string" THEN "wrong arg1 " \|\| type(arg1) \|\| ": " \|\| to_string(arg1) WHEN type(arg2) != "string" THEN "wrong arg2 " \|\| type(arg2) \|\| ": " \|\| to_string(arg2) ELSE arg1 \|\| arg2 END }

Parameter Names vs. Document Fields

Consider the following function:

CREATE FUNCTION docsOfType(type) { (SELECT * from `travel-sample` WHERE type=type) }

1	CREATE FUNCTION docsOfType(type) { (SELECT * from `travel-sample` WHERE type=type) }

Notice how the function declared a parameter that happens to have the same name as a document field? Clearly that function doesn’t achieve its intent – as there’s no way to distinguish in between the two.

Here’s how you use parameters names to get around this problem: Parameter names override document fields (either way, the above query returns all documents in the travel-sample dataset).

In order to access document fields, you have two choices. Either refer to the document field with their fully qualified name, as you can see below:

CREATE FUNCTION docsOfType(type) { (SELECT * from `travel-sample` WHERE travel-sample.type=type) }

1	CREATE FUNCTION docsOfType(type) { (SELECT * from `travel-sample` WHERE travel-sample.type=type) }

Or, remove the ambiguity by renaming the parameter, as below:

CREATE FUNCTION docsOfType(vType) { (SELECT * from `travel-sample` WHERE type=vType) }

1	CREATE FUNCTION docsOfType(vType) { (SELECT * from `travel-sample` WHERE type=vType) }

Return Values with UDFs

User-defined functions only return one value of any type. If you need to return more than one value, return an array or an object.

But watch your return types! Remember that SELECT statements executed inside UDFs return arrays, so even if your SELECT returns just one document, it is a one-element array. Instead, return the first element of the array.

For example, this function below doesn’t do what you want:

CREATE FUNCTION tsample() { (SELECT * FROM `travel-sample` LIMIT 1) }

1	CREATE FUNCTION tsample() { (SELECT * FROM `travel-sample` LIMIT 1) }

But this one does:

CREATE FUNCTION tsample1st() { (SELECT * FROM `travel-sample` LIMIT 1)[0] }

1	CREATE FUNCTION tsample1st() { (SELECT * FROM `travel-sample` LIMIT 1)[0] }

Using UDF return values as data to query

At the opposite side of the spectrum, since UDFs can return arrays, they can be gainfully used to dynamically generate data to be queried, much like your would do in Relational engines with concepts like Table Functions or Collection Derived Tables, as in the following example

SELECT * FROM tsample() sample

1	SELECT * FROM tsample() sample

Although the example function selects data from a bucket, any valid way to construct an array can be used to generate the data.

When You Need to Update an Existing Function

Sometimes you need to redefine a function. The OR REPLACE clause of the CREATE FUNCTION statement allows you to do just that in a single step:

CREATE OR REPLACE FUNCTION tsample1st() { (SELECT * FROM `travel-sample` LIMIT 1)[0] }

1	CREATE OR REPLACE FUNCTION tsample1st() { (SELECT * FROM `travel-sample` LIMIT 1)[0] }

User Access Privileges

Any N1QL statement executed inside a UDF is executed with the same privileges as the user who submitted the request. As such, your user needs to have appropriate privileges to access all of the objects that the given UDF references.

On top of all this, the user also needs to have the privilege to execute functions. Different privilege levels exist for internal, external, global and Scope-level functions. In order to create and drop functions, the user needs to have been granted permission to manage functions. For example:

GRANT query_manage_global_functions TO user1;
GRANT query_execute_external_functions ON default:test.scope1 TO user1;

1 2	GRANT query_manage_global_functions TO user1; GRANT query_execute_external_functions ON default:test.scope1 TO user1;

Conclusion

I hope you found this post helpful in understanding when and how to leverage N1QL user-defined functions with Couchbase 7.0. Check out the documentation for more information on user-defined functions, context on N1QL objects and query_context and security roles for UDFs.

Itching to try out UDFs for yourself?
Take Couchbase 7 for a spin

Marco Greco, Software Architect, Couchbase

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts