Rsyslog Multiline Messages: Why They Stick & How To Fix

by Admin 56 views
Rsyslog Multiline Messages: Why They Stick & How to Fix

Hey everyone, ever faced that head-scratcher where your Rsyslog messages seem to stick together, showing up like "message1#012_message2_"? It's like your logs are playing a game of 'two-for-one', and trust me, it's not fun when you're trying to parse or monitor them. This common issue, often caused by the sneaky #012 character (which is just a fancy way of saying "newline"), can throw a wrench into your logging setup. But don't sweat it, guys! We're diving deep into why this happens with Rsyslog and, more importantly, how to fix it so your logs are clean, crisp, and exactly what you expect. We'll explore configurations, best practices, and some clever Rsyslog tricks to get your messages flowing smoothly.

The heart of the problem often lies in how Rsyslog processes incoming data, especially when dealing with non-standard log sources. When a sender combines multiple logical log entries into a single UDP packet or TCP payload, separated by a newline (\n), Rsyslog might interpret the entire chunk as one single message. This happens because while imudp and imtcp are designed to receive log data, their default parsing behavior might not always anticipate multiple, independently framed syslog messages within a single transport unit. For instance, if a device sends SyslogHeader1 MsgPart1\nSyslogHeader2 MsgPart2 as one blob, Rsyslog might correctly parse SyslogHeader1 MsgPart1, but then append \nSyslogHeader2 MsgPart2 directly into the msg property of the first log event. When you see #012 in your raw_message or msg output, it's essentially the visible representation of that embedded newline character. This isn't necessarily a bug in Rsyslog itself, but rather a mismatch between how your log source sends data and how Rsyslog is configured to initially interpret it. Understanding this distinction is crucial for applying the right fix, whether it involves adjusting the sender, tweaking Rsyslog's input modules, or implementing post-reception message splitting and re-parsing. We'll walk through these scenarios to give you a comprehensive understanding and actionable solutions. This issue commonly impacts anyone running Rsyslog in various environments, including Docker, where log sources might not adhere to strict syslog formatting. The key takeaway here is that while the network packet might be a single unit, the application layer payload inside it contains multiple logical entries that Rsyslog needs to intelligently disentangle. We'll make sure your Rsyslog setup correctly handles these multi-part messages, ensuring data integrity and accurate logging.

Understanding the #012 Menace: What's Happening to Your Rsyslog Messages?

Alright, let's break down this whole #012 character mystery. So, you're seeing your logs concatenate, right? Like message1#012message2? Well, the #012 isn't some random gibberish; it's the octal representation of the ASCII newline character (\n). Yep, it's literally a line break. The reason you're seeing it plastered between your supposed individual log entries is usually because your log source is sending multiple log lines as one big chunk of data, and Rsyslog, in its initial processing, isn't splitting them into separate events like you'd expect. Think of it like this: your Rsyslog is expecting neatly packaged individual letters, but it's getting an entire paragraph, and it's trying to fit the whole paragraph into the space meant for one letter, with the newlines just becoming part of that "letter's" content.

The problem really starts during the ingestion phase. When Rsyslog uses input modules like imudp (for UDP traffic) or imtcp (for TCP traffic), it typically expects each UDP packet or TCP line to contain a single, properly formatted syslog message. However, some applications or devices don't play by these rules. They might shove several distinct log messages, each with its own timestamp and source info, into a single UDP packet or send a multi-line string over a single TCP connection, all delimited by \n. For example, your tcpdump might show one UDP packet being sent to the server. But when Rsyslog receives that single packet, if its payload contains SYSLOG_HDR_1 message_A\nSYSLOG_HDR_2 message_B, Rsyslog's default parser might only fully process SYSLOG_HDR_1 message_A, and then treat \nSYSLOG_HDR_2 message_B as just part of the message body (msg property) of the first event. This is especially common with older or custom log senders that aren't strictly adhering to RFC 3164 or RFC 5424 framing. They essentially put a newline inside what Rsyslog considers a single message's payload.

The implications of this "message sticking" are pretty huge, guys. First off, your log parsing efforts go out the window. If you're using tools to extract fields or patterns, they'll likely only see the first part of the concatenated message or get confused by the embedded newline and the subsequent "stuck" message. Secondly, monitoring and alerting become a nightmare. An alert trigger based on message_A might never fire if it's buried within a larger, unparsed message_A#012message_B string. You're effectively losing valuable log visibility and fidelity. Imagine searching for a specific error message, but it's always bundled with another, irrelevant one; your search results become noisy and less actionable. Furthermore, if you're sending these logs to destinations like Kafka (omkafka) or another syslog server (omfwd), they're going to receive this single, concatenated message. Your downstream systems will inherit the same parsing headaches, potentially leading to incorrect data analysis, delayed incident response, or even data loss if those systems simply drop improperly formatted entries. The raw_message property, when inspected in your output (like Kafka), clearly shows this message1 #012message2 format because your template is likely taking the msg property, which already contains this concatenated string, and outputting it as-is. So, while tcpdump confirms the data arrived as a single network unit, Rsyslog treated its content as a single application-layer message, rather than splitting it into two distinct events at the point of reception. This is what we need to address! This phenomenon is not limited to specific Rsyslog versions but rather stems from how data is initially structured by the log sender and subsequently interpreted by Rsyslog's input modules. Our goal is to ensure each independent log event, even if initially part of a larger payload, is correctly identified, parsed, and processed for accurate logging and analysis down the line.

Your Rsyslog Configuration Deep Dive: Unpacking the Issue

Let's take a closer look at your Rsyslog configuration to really understand where the messages might be getting stuck. You've got a pretty standard setup for forwarding logs to Kafka and another syslog server, which is cool. Your inputs are imudp on port 6514 and imtcp on port 7514, both routing to a ruleset named tokafka. These input modules are where Rsyslog first receives data. Normally, imudp and imtcp are designed to parse incoming network streams for syslog messages. If they receive a properly framed syslog message (either RFC 3164 or RFC 5424), they extract its components – timestamp, hostname, application, and the message body (msg).

Your template(name="outfmt" type="list" option.jsonf="on") is configured to format these extracted properties into a JSON output. You're pulling @timestamp from timereported, host from hostname, message from msg, and so on. This is all standard stuff. The critical part here is property(outname="message" name="msg" format="jsonf"). This line tells Rsyslog to take whatever is in its internal msg property and put it into the JSON field named "message". When you see message1#012message2 in your Kafka output, it means that at the point your outfmt template is applied, the Rsyslog internal $msg property already contains this concatenated string. This is the smoking gun! It indicates that Rsyslog's default parser, when it first received the data via imudp or imtcp, interpreted the entire message1\nmessage2 payload as one single message body. It did not, as might be desired, split it into two separate log events.

The environment plays a role too. You're running Rsyslog 8.2102.0-15.el8 in a Docker container. While Docker itself doesn't directly cause this issue, it means your Rsyslog instance is isolated. The way the source application sends logs to your Docker container's imudp or imtcp ports is key. Your tcpdump observation—that there's only one UDP package containing the message1#012message2 payload—confirms that the entire string, including the newline, arrived as a single unit at the network layer. Rsyslog then ingested this single network unit and processed it. The omkafka and omfwd actions are just taking this already-processed, concatenated $msg property and sending it downstream. They aren't going to split the message; they just forward what they receive from the tokafka ruleset.

Essentially, the problem isn't with your output queues (LinkedList), worker threads, or resubmitOnFailure settings. Those are for ensuring reliable delivery after the message has been processed internally. The core challenge is upstream: how Rsyslog is initially parsing the incoming raw data. The fact that the second part of your example RP/0/RSP0/CPU0:Dec 4 01:37:05.252 UTC: isis[1009]: %ROUTING-ISIS-5-ADJCHANGE : Adjacency to xxxxx (xcxcxcxc) (L2) Up, Restarted #012<190>1 2025-12-04T01:37:05+00:00 UTC+0000 another-host - - - %LOG_LOCAL7-6-SYSTEM_MSG [login,session][info][subj-[uni/userext]/sess-xxxxxx] From-1.1.1.1-client-type-REST-Failure actually contains a second, distinct syslog header (<190>1 2025-12-04T01:37:05+00:00) further complicates things. This means the sender isn't just putting a newline within a single message; it's concatenating two entirely separate syslog messages into one network payload. Rsyslog's default parser, encountering the \n within what it assumed was the body of the first message, didn't trigger a new parsing cycle for the subsequent header. This requires us to manually intervene and teach Rsyslog how to break apart and re-process these combined giants. The configuration for omkafka and omfwd is robust for delivery, but it's crucial to understand that their role starts after Rsyslog has determined what constitutes a single log event. Our objective here is to modify Rsyslog's behavior before it hands off the messages to these output modules, ensuring each distinct log entry is treated as a separate event.

The Fixes: Taming Multiline Messages in Rsyslog

Alright, guys, now for the good stuff: how to fix this multiline message mess! Since we've pinpointed that your log source is sending multiple, distinct syslog messages (or at least lines that should be distinct messages) crammed into a single payload, and Rsyslog's default parser isn't separating them, we need to step in. We've got a few options, ranging from ideal (if possible) to practical Rsyslog-based solutions.

Option 1: Fix the Source (The Gold Standard)

Honestly, the best solution, the true gold standard, is to address the problem at its root: fix the log source. If the device or application sending these logs can be configured to send each syslog message as a separate UDP packet or TCP line, adhering to standard syslog protocols (RFC 3164 or RFC 5424 framing), then all your Rsyslog headaches disappear. This ensures that Rsyslog's imudp and imtcp modules can correctly identify and parse each log event individually from the get-go.

Why is this the best? Because it reduces complexity in your Rsyslog configuration, improves parsing accuracy, and ensures your logs are properly formatted from the very beginning. You won't have to jump through hoops with complex rulesets or extra modules. Always check the configuration options of your log-generating devices or applications first. Sometimes, a simple setting like "syslog format" or "message delimiter" can solve everything. If your source is a custom application, updating its logging logic to ensure proper syslog message framing is an investment that pays off in spades for the entire logging pipeline. This approach guarantees that each log event arrives at Rsyslog as a distinct, self-contained entity, making all downstream processing smooth and reliable. Prioritizing source-side corrections minimizes the need for complex, resource-intensive transformations within Rsyslog itself, leading to a more efficient and maintainable logging architecture. This fundamental solution prevents the #012 character from ever appearing in a concatenated fashion, simplifying your entire log management process and enhancing the reliability of your data analysis and alerting systems. It’s always worth exploring this option first, even if it seems challenging, because its long-term benefits are substantial for overall system health and observability.

Option 2: Rsyslog-Based Splitting and Re-parsing (When You Can't Fix the Source)

When tweaking the log source isn't an option (and let's be real, that's often the case in large, heterogeneous environments!), Rsyslog has your back. We need to implement logic within Rsyslog to intercept these combined messages, split them, and then re-process each part as if it were an independent log event. This is where the powerful mmsplittxt module comes into play, combined with careful handling to ensure proper re-parsing.

The mmsplittxt module is specifically designed to take a property (like $msg or $rawmsg) and split it into multiple new log messages based on a delimiter. For your #012 problem, our delimiter will be the actual newline character (\n).

Here's a step-by-step approach and the necessary Rsyslog configuration:

Step 2.1: Load the Required Modules

First, ensure you have mmnormalize (useful for property manipulation and control flow) and mmsplittxt loaded.

module(load="mmnormalize")
module(load="mmsplittxt")
# Ensure imudp/imtcp are already loaded from your config
module(load="imudp")
module(load="imtcp")

Step 2.2: Implement the Splitting Logic in Your Ruleset

You'll modify your tokafka ruleset to first check for newlines. If a newline is present, you'll use mmsplittxt to split the message and send the newly created individual lines to a separate ruleset for further processing. The original message, if successfully split, should then be stopped to prevent duplicate processing.

ruleset(name="tokafka") {
    # Preserve the original raw message for splitting, or just use $msg
    set $.originalMsg = $msg;
    set $.hadMultiline = 0;

    # Check if the message contains a newline character
    if $.originalMsg contains "\n" then {
        set $.hadMultiline = 1;
    }

    if $.hadMultiline == 1 then {
        # If a newline was found, split the original message into separate lines.
        # mmsplittxt will queue new message events, each containing one split line,
        # and send them to the 'handleSplitLines' ruleset.
        action(type="mmsplittxt"
               ruleset="handleSplitLines"
               split.property="$.originalMsg" # Use the original message content
               split.delimiter="\n"           # Split by newline
               split.maxLines="100");         # Max lines to split per original message

        # IMPORTANT: Stop processing the original multiline message here.
        # The new, split messages will be handled by 'handleSplitLines'.
        stop;
    } else {
        # If no newline, or if mmsplittxt fails for some reason,
        # process the message as a single log event as usual.
        action(
            name="kafka-action"
            type="omkafka"
            template="outfmt"
            confParam=["compression.codec=lz4", "queue.buffering.max.messages=400000"]
            topic="topic-testxxxx"
            broker="brokerlistxxxxx"
            resubmitOnFailure="on"
            keepFailedMessages="on"
            queue.spoolDirectory="/var/log/rsyslog_cus"
            queue.filename="kafka-queue"
            queue.type="LinkedList"
            queue.size="360000"
            queue.saveonshutdown="on"
            queue.discardmark="350000"
            queue.discardseverity="4"
            queue.workerThreads="4"
            queue.maxdiskspace="2g"
            queue.maxfilesize="100M"
            queue.timeoutenqueue="0"
            queue.dequeuebatchsize="4096"
            action.resumeRetryCount="-1"
            action.resumeInterval="10"
            action.reportSuspension="on"
        )
        action(
            name="fwd-action"
            type="omfwd"
            Target="target"
            Port="514"
            Protocol="udp"
            Template="RSYSLOG_TraditionalFileFormat"
            queue.spoolDirectory="/var/log/rsyslog_cus"
            queue.filename="xxxxxx-queue"
            queue.type="LinkedList"
            queue.size="360000"
            queue.saveonshutdown="on"
            queue.discardmark="350000"
            queue.discardseverity="4"
            queue.workerThreads="2"
            queue.maxdiskspace="2g"
            queue.maxfilesize="100M"
            queue.timeoutenqueue="0"
            queue.dequeuebatchsize="4096"
            action.resumeRetryCount="-1"
            action.resumeInterval="10"
            action.reportSuspension="on"
        )
    }
}

Step 2.3: Create a Ruleset to Handle Each Split Line

Now, for the really important part: what happens in handleSplitLines? When mmsplittxt sends a message to this ruleset, the $msg property of the new event will contain one of your split lines. However, the other properties (like hostname, timereported, app-name) will be inherited from the original multiline message. This is problematic if each split line is actually a full syslog message with its own header, as your example suggests (<190>1 2025-12-04T01:37:05...).

To correctly process these, you'll need to re-extract the syslog components from each split line. This typically involves using mmnormalize with regular expressions to parse $msg and extract fields like severity, facility, timestamp, hostname, and the actual message content for each individual line. This can be quite complex, as syslog formats vary.

Advanced handleSplitLines (If Each Line is a Full Syslog Message - HIGHLY RECOMMENDED for your case): Given your example (RP/0/... #012<190>1...), the second line is a full syslog message. This means the default property inheritance is not what you want. You need to tell Rsyslog to re-parse each $msg as a new, independent syslog message. This is much more robust.

Rsyslog doesn't have a direct reparse() function for $msg. However, we can use a trick: mmnormalize with specific templates or regexes to extract elements and manually assign them, or even set $rawmsg = $msg and then route to a specific "re-parser" ruleset. A more elegant, albeit slightly more advanced, solution involves using imfile with a named pipe or the impipe module to feed the split lines back into Rsyslog's primary parsing pipeline. For the sake of this article being human-readable and actionable without excessive complexity, let's illustrate how you could use mmnormalize to extract the parts, assuming a consistent pattern.

ruleset(name="handleSplitLines") {
    # $msg now contains one split line, e.g., "<190>1 2025-12-04T01:37:05..."
    # We need to re-parse this string to get its *own* timestamp, hostname, etc.

    # Store current line for parsing
    set $.tempMsg = $msg;
    
    # Try to parse as RFC 5424 (PRI VERSION TIMESTAMP HOSTNAME APP-NAME PROCID MSGID STRUCTURED-DATA MSG)
    # This regex is simplified and might need adjustment for full RFC compliance.
    # It tries to find PRI, TIMESTAMP, HOSTNAME, APP-NAME and the rest as MESSAGE.
    if $.tempMsg matches "^<([0-9]+)>([0-9])\\s([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\\.[0-9]+)?(\\+[0-9]{2}:[0-9]{2})?)?\\s([^\\s]+)\\s([^\\s]+)\\s(.*){{content}}quot; then {
        set $syslogfacility = field(1, 'numeric');
        set $syslogseverity = field(1, 'numeric', 'from_pri'); # Extract severity from PRI
        set $timereported = field(3);
        set $hostname = field(4);
        set $app-name = field(5);
        set $msg = field(6); # The rest is the actual message content
    } else if $.tempMsg matches "^<([0-9]+)>([A-Za-z]{3}\\s+[0-9]{1,2}\\s[0-9]{2}:[0-9]{2}:[0-9]{2})\\s([^\\s]+)\\s(.*){{content}}quot; then {
        # Try to parse as RFC 3164 (PRI MONTH DAY HH:MM:SS HOSTNAME MESSAGE)
        set $syslogfacility = field(1, 'numeric');
        set $syslogseverity = field(1, 'numeric', 'from_pri');
        set $timereported = field(2, 'syslogdate'); # Rsyslog has special field types for this
        set $hostname = field(3);
        set $msg = field(4);
    } else {
        # If no syslog header found, just use the line as the message content
        # and keep original properties (timereported, hostname, etc.)
        # This means the line is not a full syslog message but just content.
        set $msg = $.tempMsg;
    }

    # Now, forward these properly parsed individual messages.
    # Your template 'outfmt' will now use the *newly extracted* $timereported, $hostname, $msg, etc.
    action(
        name="kafka-action-reparsed"
        type="omkafka"
        template="outfmt"
        confParam=["compression.codec=lz4", "queue.buffering.max.messages=400000"]
        topic="topic-testxxxx"
        broker="brokerlistxxxxx"
        resubmitOnFailure="on"
        keepFailedMessages="on"
        queue.spoolDirectory="/var/log/rsyslog_cus"
        queue.filename="kafka-queue-reparsed"
        queue.type="LinkedList"
        queue.size="360000"
        queue.saveonshutdown="on"
        queue.discardmark="350000"
        queue.discardseverity="4"
        queue.workerThreads="4"
        queue.maxdiskspace="2g"
        queue.maxfilesize="100M"
        queue.timeoutenqueue="0"
        queue.dequeuebatchsize="4096"
        action.resumeRetryCount="-1"
        action.resumeInterval="10"
        action.reportSuspension="on"
    )
    action(
        name="fwd-action-reparsed"
        type="omfwd"
        Target="target"
        Port="514"
        Protocol="udp"
        Template="RSYSLOG_TraditionalFileFormat" # Will use new properties
        queue.spoolDirectory="/var/log/rsyslog_cus"
        queue.filename="xxxxxx-queue-reparsed"
        queue.type="LinkedList"
        queue.size="360000"
        queue.saveonshutdown="on"
        queue.discardmark="350000"
        queue.discardseverity="4"
        queue.workerThreads="2"
        queue.maxdiskspace="2g"
        queue.maxfilesize="100M"
        queue.timeoutenqueue="0"
        queue.dequeuebatchsize="4096"
        action.resumeRetryCount="-1"
        action.resumeInterval="10"
        action.reportSuspension="on"
    )
}

Important Considerations for Re-parsing:

  • The regexes provided above are illustrative and need to be refined based on the exact format of your split lines. Syslog parsing can be very nuanced. You might have to create multiple if...then...else if blocks to match different log formats that appear in your split lines.
  • If you have many different syslog formats, this regex-based approach in mmnormalize can become very long and complex. In such scenarios, consider simplifying the input at the source or using a more advanced parsing solution if mmnormalize becomes unmanageable.
  • For very complex multi-line re-parsing, advanced Rsyslog users sometimes resort to using a named pipe (ompipe -> impipe) to feed the split lines back into a fresh Rsyslog input, which then triggers the full syslog parser automatically. This is beyond a beginner-friendly setup but demonstrates the flexibility.

Final Thoughts: Test, Monitor, and Refine

No matter which option you choose, always remember to test thoroughly. Deploy changes in a staging environment first. Use logger to simulate your problematic log messages and check the output in Kafka or the forwarded syslog stream. Monitor your Rsyslog instance for any errors or performance issues after implementing these changes. This journey from message1#012message2 to two perfectly separated log events might take a little effort, but with these tips, you'll get your Rsyslog setup purring like a kitten! Keep those logs clean, guys! Proper parsing is crucial for reliable monitoring, faster debugging, and accurate data analysis. By tackling these concatenated messages head-on, you're not just fixing a bug; you're significantly improving the quality and usability of your entire logging infrastructure. Remember, small iterative changes, coupled with careful observation, are key to successful Rsyslog configuration management. Good luck!