I ran into a problem with quite a buggy Nextcloud instance on a host with limited quota. The Nextcloud log file would baloon at a crazy rate. So at one point, I snatched a 700 MB sample (yeah, that took maybe an hour or so) and wondered: what’s wrong?
So, first things first: Nextcloud’s log files are JSON files. Which makes them excruciatingly difficult to read. Okay, better than binary, but still, not an eye pleaser. They wouldn’t be easy to grep
either. So, Python to the rescue as it has the json
module*.
First, using head
I looked at the first 10 lines only. Why? Because I had no idea of the performance of this little script of mine and I wanted to check it out first.
head -n 10 nextcloud.log > nextcloud.log.10
Because these logs are scattered with user and directory names and specifics of that particular Nextcloud instance (it’ll be NC from here on), I won’t share any of them here. Sorry. But if you have NC yourself, just get it from the /data/
directory of your NC instance.
I found each line to contain one JSON object (enclosed in curly brackets). So, let’s read this line-by-line and feed it into Python’s JSON parser:
import json
with open("nextcloud.log.10", "r") as fh:
for line in fh:
data = json.loads(line)
At this point, you can already get an idea of how long each line is processed. If you’re using Jupyter Notebook, you can place the with
statement into its own cell and simply use the %%timeit
cell magic for a good first impression. On my machine it says
592 µs ± 7.65 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
which is okay: roughly 60 µs per line.
Next, I wanted to inspect a few lines and make reading easier: pretty print, or pprint
as its module is called, to the rescue!
from pprint import pprint
pprint(data)
This pretty prints the last line. If you want to access all 10 lines, create for instance an empty array data_lines
first and do data_lines.append(data)
inside the for
loop.
{'reqId': '<redacted>',
'level': 2,
'time': '2025-02-06<redacted>',
'remoteAddr': '<redacted>',
'user': '<redacted>',
'app': 'no app in context',
'method': 'GET',
'url': '/<redacted>/apps/user_status/api/<redacted>?format=json',
'message': 'Temporary directory /www/htdocs/<redacted>/tmp/ is not present or writable',
'userAgent': 'Mozilla/5.0 (Linux) <redacted> (Nextcloud, <redacted>)',
'version': '<redacted>',
'data': []}
Okay, there is a message
which might be interesting, but I found another one:
{'reqId': '',
'level': 0,
'time': '2025-02-06T',
'remoteAddr': '',
'user': '',
'app': 'no app in context',
'method': 'PROPFIND',
'url': '//',
'message': 'Calling without parameters is deprecated and will throw soon.',
'userAgent': 'Mozilla/5.0 (Linux) (Nextcloud, 4)',
'version': '',
'exception': {'Exception': 'Exception',
'Message': 'No parameters in call to ',
…
Now, this is much more interesting: It contains a key exception
with a message and a long traceback below.
I simply want to know:
- How many of these exceptions are there?
- How many unique messages are there?
In other words: Is this a clusterfuck, or can I get this thing silent by fixing a handful of things?
So, the idea is simple:
- Read each line.
- Check if the line contains an
exception
keyword. - In that case, count it and…
- … append the corresponding message to a
list
. - Finally, convert that
list
into aset
.
And here is how this looks in Python:
import json
from pprint import pprint
lines = 0
exceptions = 0
ex_messages = []
with open("nextcloud.log", "r") as fh:
for line in fh:
lines += 1
data = json.loads(line)
if "exception" in data.keys():
exceptions += 1
msg = data["exception"]["Message"]
ex_messages.append(msg)
print(f"{lines:d} read, {exceptions:d} exceptions.")
s_ex_msg = set(ex_messages)
print(f"{len(s_ex_msg):d} unique message types.")
pprint(s_ex_msg)
I had
37460 read, 32537 exceptions.
22 unique message types.
That’s a lot of exceptions but a surprisingly small number of unique messages, i.e. possible individual causes.
In my case, it mainly showed me what I knew beforehand: The database was a total mess.
But see what you find.
Exercise: See how you need to modify the script to count how many out of the 32537 exceptions correspond to each of the 22 unique messages. And toot about it.
*) I wonder if people will come and propose to use simplejson
, as I’ve read in the wild, because “it’s faster!!!”. Use %%timeit
to find out. Anything else is Mumpitz (forum voodoo).