Categories
Linux Python

Massive Nextcloud log file quickly analysed using Python

I ran into a problem with quite a buggy Nextcloud instance on a host with limited quota. The Nextcloud log file would baloon at a crazy rate. So at one point, I snatched a 700 MB sample (yeah, that took maybe an hour or so) and wondered: what’s wrong?

So, first things first: Nextcloud’s log files are JSON files. Which makes them excruciatingly difficult to read. Okay, better than binary, but still, not an eye pleaser. They wouldn’t be easy to grep either. So, Python to the rescue as it has the json module*.

First, using head I looked at the first 10 lines only. Why? Because I had no idea of the performance of this little script of mine and I wanted to check it out first.

head -n 10 nextcloud.log > nextcloud.log.10

Because these logs are scattered with user and directory names and specifics of that particular Nextcloud instance (it’ll be NC from here on), I won’t share any of them here. Sorry. But if you have NC yourself, just get it from the /data/ directory of your NC instance.

I found each line to contain one JSON object (enclosed in curly brackets). So, let’s read this line-by-line and feed it into Python’s JSON parser:

import json

with open("nextcloud.log.10", "r") as fh:
    for line in fh:
        data = json.loads(line)

At this point, you can already get an idea of how long each line is processed. If you’re using Jupyter Notebook, you can place the with statement into its own cell and simply use the %%timeit cell magic for a good first impression. On my machine it says

592 µs ± 7.65 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

which is okay: roughly 60 µs per line.

Next, I wanted to inspect a few lines and make reading easier: pretty print, or pprint as its module is called, to the rescue!

from pprint import pprint

pprint(data)

This pretty prints the last line. If you want to access all 10 lines, create for instance an empty array data_lines first and do data_lines.append(data) inside the for loop.

{'reqId': '<redacted>',
 'level': 2,
 'time': '2025-02-06<redacted>',
 'remoteAddr': '<redacted>',
 'user': '<redacted>',
 'app': 'no app in context',
 'method': 'GET',
 'url': '/<redacted>/apps/user_status/api/<redacted>?format=json',
 'message': 'Temporary directory /www/htdocs/<redacted>/tmp/ is not present or writable',
 'userAgent': 'Mozilla/5.0 (Linux) <redacted> (Nextcloud, <redacted>)',
 'version': '<redacted>',
 'data': []}

Okay, there is a message which might be interesting, but I found another one:

{'reqId': '',
'level': 0,
'time': '2025-02-06T',
'remoteAddr': '',
'user': '',
'app': 'no app in context',
'method': 'PROPFIND',
'url': '//',
'message': 'Calling without parameters is deprecated and will throw soon.',
'userAgent': 'Mozilla/5.0 (Linux) (Nextcloud, 4)',
'version': '',
'exception': {'Exception': 'Exception',
   'Message': 'No parameters in call to ',
    …

Now, this is much more interesting: It contains a key exception with a message and a long traceback below.

I simply want to know:

  • How many of these exceptions are there?
  • How many unique messages are there?

In other words: Is this a clusterfuck, or can I get this thing silent by fixing a handful of things?

So, the idea is simple:

  1. Read each line.
  2. Check if the line contains an exception keyword.
  3. In that case, count it and…
  4. … append the corresponding message to a list.
  5. Finally, convert that list into a set.

And here is how this looks in Python:

import json
from pprint import pprint

lines = 0
exceptions = 0
ex_messages = []

with open("nextcloud.log", "r") as fh:
    for line in fh:
        lines += 1
        data = json.loads(line)
        
        if "exception" in data.keys():
            exceptions += 1
            msg = data["exception"]["Message"]
            ex_messages.append(msg)

print(f"{lines:d} read, {exceptions:d} exceptions.")

s_ex_msg = set(ex_messages)
print(f"{len(s_ex_msg):d} unique message types.")

pprint(s_ex_msg)

I had

37460 read, 32537 exceptions.
22 unique message types.

That’s a lot of exceptions but a surprisingly small number of unique messages, i.e. possible individual causes.

In my case, it mainly showed me what I knew beforehand: The database was a total mess.

But see what you find.

Exercise: See how you need to modify the script to count how many out of the 32537 exceptions correspond to each of the 22 unique messages. And toot about it.

*) I wonder if people will come and propose to use simplejson, as I’ve read in the wild, because “it’s faster!!!”. Use %%timeit to find out. Anything else is Mumpitz (forum voodoo).