Introduction

Couchbase Server 6.5 introduced User Defined Functions as a Developer Preview feature.

N1QL is primarily a declarative language – where you ask what to get, and the language works out for you the actual method.

While declarative languages have been the major gear shift in the world of database engines, the ability to programmatically instruct your query comes handy in several situations: after all you know of your business logic, N1QL does not.

Enter UDFs – your path to instructing the language as to how certain things are done.

N1QL is rather agnostic as far as underlying programming languages go, and rather than specifying its own procedural language, it employs a Language Manager, meaning that it is already designed to support a multitude of guest languages.

For now the languages supported are Inline, an internal language that allows you to code any valid N1QL expression (including subqueries) and Javascript.

Basic usage

In order to add your business logic, you have to create functions, like the inline example below:

You can then use your business logic freely wherever an expression is allowed, or directly through the EXECUTE FUNCTION statement:

And when you no longer have a use for it, you just drop it:

Javascript

Creating and dropping javascript UDFs is slightly more convoluted, for reasons that will become apparent in the next section.

Javascript functions are technically external functions, in that they written in a different language and exist and are executed in processes different to the N1QL service.

This requires first creating the javascript code:

Next step, creating the N1QL function:

Once created, javascript UDFs are used and dropped exactly as inline ones.

When the function body is no longer needed, it can be deleted with

New for 7.0: UDFs and collections

One of the major features getting past Developer Preview in 7.0 is Collections – Couchbase Server, a document database, now has the ability to group documents of a similar nature in storage units called Collections.

In turn Collections can be grouped in larger storage units called Scopes, thereby allowing to partition applications.

Quick side-step: in order to allow backwards compatibility, the Query Service now has a query_context REST API parameter that indicates which bucket and scope should be used to resolve relative keyspace names, for example

The SELECT statement resolves airlines to collection default:travel-sample.scope1.airlines.

Back to UDFs:

The general idea is that you could have a development, pre-production and production environments of the same application in three different scopes, where three copies of the same application are deployed with the same logic, or, conversely, you could have scopes contain different deployments of the same application where different business logic is required (think about online shops where different discount or delivery structures are applied, or different companies for an accountancy application, where different taxation rules are applied).

While for the first case, having a global definition for individual functions suffices, after all the business logic is the same, the only way in which we can seamlessly support the second case is if we partition UDFs across scopes, so that each scope could have their own instance of the same UDF, each with different logic.

The other side of the coin is that Couchbase Server 7.0 does not force you to switch to collections – hence there is a need for UDFs that are not tied to the Collections feature.

Two types of UDFs

Global UDFs

These are not dependent on scopes, meaning that they are backwards compatible with UDFs introduced in 6.5.

If you are not using collections, you don’t have to do anything special to use them.

Global UDFs are referenced using a two part fully qualified name, eg

Scope UDFs

These are the new breed of UDFs and their definition depends on the scope under which they are created.

Scope UDFs are referenced using a four part fully qualified name which points to the bucket and scope they have been created under.

Basic usage revisited

“OK, so now I am confused”, I hear you say, “Which functions was I using in the previous examples? The names weren’t fully qualified”.

The answer lies again in the query_context REST API parameter setting: if unset, the N1QL parser resolves the name to global functions, if set, it uses the query_context value to resolve them to the relevant scope functions, much like it did with non qualified keyspace names.

You can seamlessly deploy tour application against scopes or buckets by just changing the setting of the query_context REST API parameter.

Object resolution inside functions

When function reference fully qualified objects, there’s no ambiguity as to what object they mean:

The next question is, how do functions resolve objects referenced inside them? For instance:

The key here is the principle of least surprise: during execution, functions switch query context to the path under which they were created and always reference objects inside that path.

It does not matter if you called them with a relative or a fully qualified path, a function called with the same parameters will always return the same results, which will have been taken from the same objects.

So, for instance, for the first example above, func2() would resolve to a global or a scope function depending on the query_context setting at creation time.

For func3(), keyspace1 would be bucket keyspace1.

And what if I want to mix up things?

There’s nothing to stop you global functions along side the Collections feature: just use the fully qualified names.

Similarly, if you want to use a scope function created in a different scope, all you have to do is reference it directly.

Javascript, take 2

It now makes sense that creating javascript functions is a two step process:

  • by creating the body separately from the function definition, we are allowing to reuse the same body in multiple places, eg the same function in multiple could use the same body, and when the body is redefined, the change is automatically applied across all function definitions
  • similarly, it makes sense to split dropping definition and body, because when dropping one instance of a function, the body needs to stay in place if used by other instances

Tips and tricks

UDF names

These are identifiers, and cannot match predefined function names.

If they do, the predefined function takes precedence, meaning, the UDF does not get used.

If you really, really want to use a predefined function name, you have to fully qualify it when you reference it. As an example

Parameters

N1QL UDFs support three types of parameter lists:

  • empty

The function takes no parameters. Any parameter passed will result in an error.

  • variadic

Three dots denote a variadic function – it can take any number of parameters of any type.

The parameters are contained in an array names args.

  • named

The parameters are not typed, but the number of arguments passed is enforced.

Overloading and type handling

Untyped parameters and variadic functions are the closest that we get to function overloading, where the same function is defined multiple times with a different parameter list, or different parameter types in order to be able to operate differently depending on the input.

The strategy in N1QL is to instead have a single function which checks the arguments received and acts accordingly.

An example of a variadic function follows:

And non variadic:

Parameter names vs document fields

Consider the following function:

Notice how we declared a parameter that has happens to be named in the same way as a document field?

Clearly that function does not achieve its intent – as there’s no way to distinguish in between the two.

So here’s how it works – parameter names override document fields (either way, the above in query would return all documents in travel-sample): to access fields either you refer them with their fully qualified name

or remove the ambiguity by renaming the parameter

Return values

Functions only return one value, of any type.

If you need to return more than one value, return an array or an object.

Watch your return types

Remember that SELECT statements executed inside UDFs return arrays, so even if your SELECT returns just one document, it is a one element array.

Return the first element of the array instead.

This function doesn’t do what you want:

This does:

CREATE OR REPLACE

There are times in which you want to redefine a function – the ‘OR REPLACE’ clause of the CREATE FUNCTION statement allows you to do just that in a single step:

Privileges

N1QL executed inside UDFs is executed as the user that has submitted the request: as such, the user needs to have appropriate privileges to access all the object the UDF references.

On top of this, the user also needs to have assigned the privilege to execute functions.

Different privileges exist for internal and external functions, for global functions and for scope functions under each scope.

In order to create and drop functions, the user needs to have been granted permission to manage functions.

Again, there are different privileges for internal vs external, global vs scope, etc, for example:

Limitations

For the beta, javascript only supports the vanilla syntax, meaning that executing N1QL statements and logging are disallowed.

In time, the javascript runner will have the same language capabilities as in the Eventing service.

It also misses a UI for creating functions and debugging them, and the only way to manage function bodies is through the REST API, which, again, will be made available at a later time.

Currently, functional indexes cannot use UDFs.

Conclusion

Turns out that telling a query language how to do things is not as pointless as it originally seemed.

This blog briefly shows you how to do it.

References

User Defined Functions

N1QL objects and query_context

UDF roles

 

Author

Spread the love

Author

Posted by james.oquendo

Couchbase team member

Leave a reply