Categories
Linux Python

Massive Nextcloud log file quickly analysed using Python

I ran into a problem with quite a buggy Nextcloud instance on a host with limited quota. The Nextcloud log file would baloon at a crazy rate. So at one point, I snatched a 700 MB sample (yeah, that took maybe an hour or so) and wondered: what’s wrong?

So, first things first: Nextcloud’s log files are JSON files. Which makes them excruciatingly difficult to read. Okay, better than binary, but still, not an eye pleaser. They wouldn’t be easy to grep either. So, Python to the rescue as it has the json module*.

First, using head I looked at the first 10 lines only. Why? Because I had no idea of the performance of this little script of mine and I wanted to check it out first.

head -n 10 nextcloud.log > nextcloud.log.10

Because these logs are scattered with user and directory names and specifics of that particular Nextcloud instance (it’ll be NC from here on), I won’t share any of them here. Sorry. But if you have NC yourself, just get it from the /data/ directory of your NC instance.

I found each line to contain one JSON object (enclosed in curly brackets). So, let’s read this line-by-line and feed it into Python’s JSON parser:

import json

with open("nextcloud.log.10", "r") as fh:
    for line in fh:
        data = json.loads(line)

At this point, you can already get an idea of how long each line is processed. If you’re using Jupyter Notebook, you can place the with statement into its own cell and simply use the %%timeit cell magic for a good first impression. On my machine it says

592 µs ± 7.65 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

which is okay: roughly 60 µs per line.

Next, I wanted to inspect a few lines and make reading easier: pretty print, or pprint as its module is called, to the rescue!

from pprint import pprint

pprint(data)

This pretty prints the last line. If you want to access all 10 lines, create for instance an empty array data_lines first and do data_lines.append(data) inside the for loop.

{'reqId': '<redacted>',
 'level': 2,
 'time': '2025-02-06<redacted>',
 'remoteAddr': '<redacted>',
 'user': '<redacted>',
 'app': 'no app in context',
 'method': 'GET',
 'url': '/<redacted>/apps/user_status/api/<redacted>?format=json',
 'message': 'Temporary directory /www/htdocs/<redacted>/tmp/ is not present or writable',
 'userAgent': 'Mozilla/5.0 (Linux) <redacted> (Nextcloud, <redacted>)',
 'version': '<redacted>',
 'data': []}

Okay, there is a message which might be interesting, but I found another one:

{'reqId': '',
'level': 0,
'time': '2025-02-06T',
'remoteAddr': '',
'user': '',
'app': 'no app in context',
'method': 'PROPFIND',
'url': '//',
'message': 'Calling without parameters is deprecated and will throw soon.',
'userAgent': 'Mozilla/5.0 (Linux) (Nextcloud, 4)',
'version': '',
'exception': {'Exception': 'Exception',
   'Message': 'No parameters in call to ',
    …

Now, this is much more interesting: It contains a key exception with a message and a long traceback below.

I simply want to know:

  • How many of these exceptions are there?
  • How many unique messages are there?

In other words: Is this a clusterfuck, or can I get this thing silent by fixing a handful of things?

So, the idea is simple:

  1. Read each line.
  2. Check if the line contains an exception keyword.
  3. In that case, count it and…
  4. … append the corresponding message to a list.
  5. Finally, convert that list into a set.

And here is how this looks in Python:

import json
from pprint import pprint

lines = 0
exceptions = 0
ex_messages = []

with open("nextcloud.log", "r") as fh:
    for line in fh:
        lines += 1
        data = json.loads(line)
        
        if "exception" in data.keys():
            exceptions += 1
            msg = data["exception"]["Message"]
            ex_messages.append(msg)

print(f"{lines:d} read, {exceptions:d} exceptions.")

s_ex_msg = set(ex_messages)
print(f"{len(s_ex_msg):d} unique message types.")

pprint(s_ex_msg)

I had

37460 read, 32537 exceptions.
22 unique message types.

That’s a lot of exceptions but a surprisingly small number of unique messages, i.e. possible individual causes.

In my case, it mainly showed me what I knew beforehand: The database was a total mess.

But see what you find.

Exercise: See how you need to modify the script to count how many out of the 32537 exceptions correspond to each of the 22 unique messages. And toot about it.

*) I wonder if people will come and propose to use simplejson, as I’ve read in the wild, because “it’s faster!!!”. Use %%timeit to find out. Anything else is Mumpitz (forum voodoo).

Categories
Embedded Engineering Linux Python

Red Pitaya using only pyVISA

The Red Pitaya boards offer an SCPI server over an TCP/IP Socket connection. The makers describe how to use it. But instead of using plain pyVISA, they provide their own SCPI class.

That’s fine, because that class also provides handy functions to set the various in-built applications (signal generator and the likes).

But it is unnecessary complicated for a blinky example. And in my case, where I only needed some scriptable DIOs, it was quite cumbersome.

So, here is the blinky re-written in plain pyVISA:

import pyvisa as visa
from time import sleep

rm = visa.ResourceManager()
rp = rm.open_resource("TCPIP::169.254.XXX.XXX::5000::SOCKET",
                 read_termination="\r\n",
                 write_termination="\r\n"
                 )

print(rp.query("*IDN?"))

while True:
    rp.write("DIG:PIN LED0,1")
    sleep(.5)
    rp.write("DIG:PIN LED0,0")
    sleep(.5)

The magic lies in the read and write terminations. They have to be set to '\r\n'(in that order), or else the communication simply won’t work and time out.

Make sure you install a reasonably recent pyVISA and pyVISA-py (from pip) or libvisa (from your distro’s repository) before you start. For me (Ubuntu) this works as follows:

pip install -U pyvisa pyvisa-py
sudo apt install libvisa

This integrates nicely with existing instrument command structures and allows for quick testing.

Categories
SymPy

Quick Transfer Functions with SymPy

SymPy allows quick and (relatively 😜) easy manipulation of algebraic expressions. But it can do so much more! Using the sympy.physics.control toolbox, you can very inspect linear time-invariant systems in a very comfortable way.

I was looking at a paper (Janiszowski, 1993) which features three transfer functions of different properties and wanted to look at their pole-zero diagrams as well as their Bode plots in real frequency. The transfer functions where the following:

(1)    \begin{align*} G_1(s) &= \frac{ 2 + 42s }{ (1 + 2s)(1 + 40s) } \\ G_2(s) &= \frac{ 5 - 60s }{ (1 + 4s)(1 + 40s) } \\ G_3(s) &= \frac{ 4(1 + s) }{ 1 + 4s + 8s^2 } \end{align*}

Drawing the pole-zero diagrams from this representation is easy:

  • Zeros: Set numerator to zero
  • Poles: Set denominator (or its individual terms) to zero

Drawing the Bode diagram however would’ve involved some basic programming. But neither is necessary.

import sympy as sy
from sympy.physics.control.lti import TransferFunction as TF
from sympy.physics.control import bode_plot, pole_zero_plot

s = sy.symbols("s", complex=True)

G1 = (2 + 42*s)/((1 + 2*s)*(1 + 40*s))
tf = TF(*G.as_numer_denom(), s)
bode_plot(tf)
pole_zero_plot(tf)
Bode plot of the transfer function G1(s)
Pole-Zero diagram of the transfer function G1(s)

Well, that was easy. TransferFunction (abbreviated with TF in my examples) requires numerator and denominator to be passed as separate arguments. sympy_expr.as_numer_denom() is convenient, as it returns a (numerator, denominator) tuple. Using the asterisk expands this.

Now, what else can we do with this? Look at RLC resonators, for example:

A parallel resonator of resistance R, inductance L and capacitance C

The reactance of the inductor and capacitor are given by

(2)    \begin{equation*} X_C = \frac{1}{sC} \quad \text{and} \quad X_L = sL \end{equation*}

where s is basically the complex frequency (we’re looking at this with Laplace eyes). Now, off to SymPy we go:

R, C, L = sy.symbols("R, C, L", real=True, positive=True)
Xc, Xl = 1/(s*C), s*L

def par(a, b):
    # Shorthand for a parallel circuit
    return a*b/(a+b)

G = par(Xc, par(R, Xl)).ratsimp()
G = G.subs({R: 1e3, L: 1e-6, C: 1e-6})
tf = TF(*G.as_numer_denom(), s)
bode_plot(tf, 5, 7)
pole_zero_plot(tf)

For visualisation we choose R = 1 kΩ, L = 1 µH and C = 1 µF.

Bode plot of the parallel RLC circuit with R = 1 kΩ, L = 1 µH and C = 1 µF, and hence a resonance frequency of 1 MHz
And its pole-zero plot just for the lolz.

Why is this useful? Well, because it is quick and robust and once you have your transfer function typed out, you get the impulse and step responses for free:

import sympy as sy
from sympy.physics.control import impulse_response_plot
from sympy.physics.control import step_response_plot

s = sy.symbols("s", complex=True)
G1 = (2+42*s)/((1+2*s)*(1+40*s))
G2 = (5-60*s)/((1+4*s)*(1+40*s))
G3 = 4*(1+s)/(1+4*s+8*s**2)

for G in [G1, G2, G3]:
    tf = TF(*G.as_numer_denom(), s)
    bode_plot(tf)
    pole_zero_plot(tf)
    step_response_plot(tf, upper_limit=60)
    impulse_response_plot(tf, upper_limit=60)

Note: SymPy’s adaptive plotting is not particularly good at plotting oscillations, so the impulse response of the RLC circuit above will look ugly.