New in Couchbase Server 7.0: N1QL per request memory quota

Introduction

Unlike other services, the N1QL service so far hasn’t offered the ability to size its memory footprint.

The principal reason for this disparity boils down to one simple fact: while the bulk of the memory consumption for services like Data or Index is caches, and when new documents come in, there is always something that can be evicted should space be in high demand, the bulk of the N1QL service operation relies on transient values (either fetched documents, or computed values) which start their life in one stage or another of an individual request, and and expire before the request ends.

In N1QL there is nothing to evict and replace, there’s no balancing act to keep the clock ticking – if resources are not available, the only option is failure.

Add to this that there are parts of the Eventing, Index and FTS code running inside the N1QL service – they use N1QL memory resources, but N1QL has no control over them – and you get the picture that implementing a N1QL service per node quota is a near impossible task.

Still, while in general terms N1QL memory consumption is not an issue, requests quickly load and discard documents, and the world is a happy place, from time to time the odd greedy request comes along and spoils the game for everyone.

This is an issue.

But, before we proceed any further, let’s discard for a moment the components N1QL has no control of, and let’s consider if a node wide transient value pool would even be desirable.

The operation of such a device would roughly go as follows: whenever a request needs a value, it allocates the corresponding size from the global pool, and as soon as it has finished with it, it returns it to the pool. When memory runs out, all allocations fail until enough memory is freed.

Now enter our greedy request, which grabs as much as it can and doesn’t release it. What’s the fate of all the frugal requests? Recall that there’s no evictions possible, and the only option is failure: the other requests will end one by one, in error and with an error, until the culprit finally fails.

This is akin to the teacher sending the whole class to the principal’s office after being hit with some chalk, rather than investigating and sending just the culprit marching.

Enter per request memory quota

N1QL has grown eyes on the back and can see to whom the chalk goes.

When request quota is turned on, each request gets its own pool. Memory tracking operates as usual, but now when the pool is exhausted, it’s only the culprit that fails.

“I hear what you are saying” I hear you say, “but a node wide setting would be much more practical!”

We’ve implemented one by stealth – the N1QL service allows a fixed number of requests running at any one time: this is controlled by the servicers setting, and defaults to 4 times the number of cores on the query node. The overall node memory quota amounts to the number of servicers times the per request quota.

The two quotas are intimately intertwined – we’ve chosen to make the per request quota explicit because we wanted to be clear that’s individual requests that are being tracked, not the node in its entirety.

How do I use it?

Settings

There’s two settings: the /admin/settings node REST parameter memory-quota, and the /query/service request REST parameter memory_quota

They express in megabytes the maximum amount of memory a request can use at any one time.

The default memory-quota is 0, meaning that memory quota is turned off. Memory_quota overrides the node wide setting, provided that the requested value does not exceed it.

A couple of examples:

curl https://localhost:8093/admin/settings -u Administrator:password -H "content-type: application/json" -d '{"memory-quota": 10, "distribute": true}'

1	curl https://localhost:8093/admin/settings -u Administrator:password -H "content-type: application/json" -d '{"memory-quota": 10, "distribute": true}'

sets memory quota to 10MB for the whole node and replicates the setting to all other nodes

curl https://localhost:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=true'

1	curl https://localhost:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=true'

sets memory quota to 10MB for the single request

cbq&gt; \set -memory_quota 10;

1	cbq> \set -memory_quota 10;

sets memory quota to 10MB for the duration of the cbq session

What about the UI?

Sorry – a service wide UI memory quota setting hasn’t made it in time for the beta….

Responses

If memory quota is set by whatever means, then several N1QL responses may contain additional information.

Metrics

The metrics section of the response will contain a usedMemory field showing the amount of document memory used to execute the request.

   "metrics": {
       "elapsedTime": "1.651329ms",
       "executionTime": "1.439388ms",
       "resultCount": 1,
       "resultSize": 153,
       "serviceLoad": 2,
       "usedMemory": 63
   }

"metrics": {

"elapsedTime": "1.651329ms",

"executionTime": "1.439388ms",

"resultCount": 1,

"resultSize": 153,

"serviceLoad": 2,

"usedMemory": 63

}

If no document memory is used, this metric may be omitted, much like mutations or errorCount are.

Controls

The controls section of the response will also report the memory quota set

   "controls": {
       "scan_consistency": "unbounded",
       "use_cbo": "true",
       "memoryQuota": "10",
       "stmtType": "SELECT"
   }

"controls": {

"scan_consistency": "unbounded",

"use_cbo": "true",

"memoryQuota": "10",

"stmtType": "SELECT"

}

system:active_requests and system:completed_requests

usedMemory and memoryQuota show up here as well

       "active_requests": {
           "elapsedTime": "2.604037201s",
           "executionTime": "2.603863968s",
           "memoryQuota": 10,
           "node": "127.0.0.1:8091",
           "phaseCounts": {
               "fetch": 15936,
                "primaryScan": 16362
            },
           "phaseOperators": {
               "authorize": 1,
               "fetch": 1,
               "primaryScan": 1
           },
           "phaseTimes": {
               "authorize": "721.012µs",
               "fetch": "588.220088ms",
               "instantiate": "11.547µs",
               "parse": "1.317113ms",
               "plan": "167.599824ms",
               "primaryScan": "29.589176ms"
           },
           "remoteAddr": "127.0.0.1:55084",
           "requestId": "da0d5ed0-a3d9-4ad2-86af-ae483fdadbce",
           "requestTime": "2020-11-19T15:45:27.368Z",
           "scanConsistency": "unbounded",
           "state": "running",
           "statement": "select * from `travel-sample`",
           "useCBO": true,
           "usedMemory": 3187392,
           "userAgent": "curl/7.58.0",
           "users": "Administrator"
       }

"active_requests": {

"elapsedTime": "2.604037201s",

"executionTime": "2.603863968s",

"memoryQuota": 10,

"node": "127.0.0.1:8091",

"phaseCounts": {

"fetch": 15936,

"primaryScan": 16362

"phaseOperators": {

"authorize": 1,

"fetch": 1,

"primaryScan": 1

"phaseTimes": {

"authorize": "721.012µs",

"fetch": "588.220088ms",

"instantiate": "11.547µs",

"parse": "1.317113ms",

"plan": "167.599824ms",

"primaryScan": "29.589176ms"

"remoteAddr": "127.0.0.1:55084",

"requestId": "da0d5ed0-a3d9-4ad2-86af-ae483fdadbce",

"requestTime": "2020-11-19T15:45:27.368Z",

"scanConsistency": "unbounded",

"state": "running",

"statement": "select * from `travel-sample`",

"useCBO": true,

"usedMemory": 3187392,

"userAgent": "curl/7.58.0",

"users": "Administrator"

}

Under the hood

How is memory used, anyway?

Before we delve into some of the mechanics of the memory quota operation, we should probably learn a little bit about how a request uses memory.

As you have already sussed, the usedMemory metrics field has been introduce to gauge the memory requirements of an individual statement before it is let loose on the field. So let’s do a couple of experiments and see how it behaves.

$ curl https://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=true'
...
    "status": "success",
    "metrics": {
        "elapsedTime": "6.353013176s",
        "executionTime": "6.352882377s",
        "resultCount": 31591,
        "resultSize": 95381286,
        "serviceLoad": 2,
        "usedMemory": 3264905
    }
}

$ curl https://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=true'

...

"status": "success",

"metrics": {

"elapsedTime": "6.353013176s",

"executionTime": "6.352882377s",

"resultCount": 31591,

"resultSize": 95381286,

"serviceLoad": 2,

"usedMemory": 3264905

}

Clearly, the used memory is not the size of the result set.

Let’s try again, but this time without formatting, so that the size of the result set is as close as possible to the size of the data in storage:

$ curl https://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=false'
...
"status": "success",
"metrics": {"elapsedTime": "1.559014343s","executionTime": "1.558937894s","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 588514}
}

$ curl https://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=false'

...

"status": "success",

"metrics": {"elapsedTime": "1.559014343s","executionTime": "1.558937894s","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 588514}

}

It’s also not the size of the data fetched.

Let’s try to remove the cost of displaying the results to the screen:

$ curl https://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=false'&gt;/tmp/res 2&gt;&amp;-; tail /tmp/res
...
"status": "success",
"metrics": {"elapsedTime": "854.674127ms","executionTime": "854.575802ms","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 188626}
}

$ curl https://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=false'>/tmp/res 2>&-; tail /tmp/res

...

"status": "success",

"metrics": {"elapsedTime": "854.674127ms","executionTime": "854.575802ms","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 188626}

}

Same query, same format, different storage, less memory used.

What can we derive from this? At the very least that for some types of statements, the memory consumption is more a function of the circumstances of that particular run than the statement itself.

Request execution phase operation

In simple terms, the execution phase of a request employs a pipeline of operators executing in parallel, each receiving values from the previous stage, processing it, and sending it to the next.

The infrastructure connecting operators sports a values queue so that each operator is not blocked by the previous or the next (the execution engine is actually more complicated than that – some operators are inlined into others and some others only exist to carry out orchestration work, so value queues are not always involved, but still).

For example, a simple

SELECT * FROM bucket WHERE field = constant

1	SELECT * FROM bucket WHERE field = constant

uses an Index scan to produce keys, which are sent to a Fetch to retrieve documents from the KV, which are sent to a Filter to exclude documents that do not apply, and those that do are sent to a Projection to extract fields and marshal them into JSON (if necessary) and finally passed to a Stream which writes them back to the client.

Values that complete the course are eventually disposed of by the garbage collector.

If there were available cores to execute all these operators in parallel, and all operators executed at exactly the same speed, for the example above there would never be more than five documents traversing the pipeline at any one time, even though the request might process any number of documents.

Of course, a Scan might produce keys much faster that a Fetch could gather document, marshalling could be expensive, and sending results over the wire back to the client might be slow, so even if there are cores available, the queues described above will be used as buffers for values waiting to be processed along the line, which has the effect of temporarily increasing the amount of memory a request needs to process the sequence of incoming values.

This explains why both making the Projection more efficient (pretty=false), or Stream (sending to a file rather than the terminal) has a beneficial effect on memory consumption: faster operators mean fewer values stuck in the intra-operator queues.

With request load increasing, the N1QL kernel will have more operators to schedule, meaning that while they are not run, the value queue for the previous operator will increase in size, meaning even more memory is required to process individual requests: loaded nodes will use more memory than those with little activity.

Memory quota operation

For the purpose of the previous discussion, I have ignored all those cases in which memory grows without values being exchanges: hash joins, ORDER BYs and GROUP BYs spring to mind.

Those particular cases are handled by the first mode of operation of Memory Quota: sort, aggregate or hash buffer grows beyond a specific threshold, memory quota throws and error, and the request fails.

However, as we have seen, there are a number of circumstances which cause memory consumption grow without fault on the request part.

In these cases, the Memory Quota feature employs techniques to try to control memory usage and help requests complete without needing excessive resources.

Consumer heartbeat

A pipeline works well if both producers and consumers proceed at the same pace.

Should the producer not execute, the request just stalls, but if the consumer doesn’t, not only the request stalls, but the producer’s value queue also increases in size.

To counter this possibility, the consumer operator is equipped with a heartbeat, which is monitored by the producer – when the consumer is waiting but does not attempt to receive values after a set number of successful send operations on the producer’s part, the producer will yield until the consumer manages to execute.

This is not exact science, as unluckily the language used to develop N1QL does not permit to yield to specific operators, but this works as a cooperative effort: if enough producers yield, all consumers will have a fair shot at having kernel time, which means that memory usage should naturally decrease.

Per operator quota

Since yielding is not an exact science, it could very well be that individual operators accrue a substantial memory usage even when consumers manage to run from time to time, because individual consumers still get less kernel time than their producers.

To address this, N1QL also has a per producer memory pool. A producer yields (and does not fail) also in case of this smaller pool being exhausted, resuming operation when the consumer receives a value.

This will cause prior producers to exhaust their own pool and yield, thereby allowing the whole request to progress without consuming the whole request pool, possibly (but by no means necessarily) at the expense of throughput.

Miscellaneous tricks

So far N1QL has also relied on the garbage collector to return value memory to the heap, and the memory manager to allocate value structures.

As part of the memory tracking effort we have introduce techniques to mark the memory as unused before the garbage collector itself gets CPU time an manages to process all the pending unused values.

We also have small, ad hoc pools to store some unused value structures, already allocated and available for reuse, so that the garbage collector doesn’t have to be exercised over and over again for specific types of dynamic memory allocation and is instead free to process memory that matters.

Conclusion

N1QL had a history of being a bit laissez faire with requests memory usage. It has now set aside some carrots and sticks.

Authors

James oquendo, Senior Software Engineer, Couchbase

Couchbase team member

View all posts
shyamraj

Shyam Raj is the Director of the Platform and Security Product Management Group for Couchbase and lives in the United Kingdom. His dedicated team is responsible for the Reliability, Availability, Serviceability and Security architecture of Couchbase Server and the SaaS Database, Capella. This team also own cloud-native platforms like the Couchbase Kubernetes Autonomous Operator. Ian has a vast range of experience as a Software Engineer, Technical Support Engineer, Quality Assurance Engineer and Systems Administrator. Ian has led global technical teams for the majority of his 20 year professional career and holds several patents in the areas of information security, virtualisation and hardware design. https://www.linkedin.com/in/ianmccloy/

View all posts

Spread the love

Products

Why Couchbase?

See How Capella Stacks Up

By Industry

By Use Case

Why NoSQL

What is NoSQL and why choose it?

Popular Docs

By Developer Role

Developer Playground

Start a Capella session

Resource Center

Education

Certification Exams 2023

Get Couchbase certified

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

New in Couchbase Server 7.0: N1QL per request memory quota

Introduction

Enter per request memory quota

How do I use it?

Settings

What about the UI?

Responses

Metrics

Controls

system:active_requests and system:completed_requests

Under the hood

How is memory used, anyway?

Request execution phase operation

Memory quota operation

Consumer heartbeat

Per operator quota

Miscellaneous tricks

Conclusion

Authors

Author

Posted by James oquendo, Senior Software Engineer, Couchbase

Posted by shyamraj

Leave a reply Cancel reply