Description
The logging module, in most common configurations, is vulnerable to log injection attacks.
For example:
import logging
logging.basicConfig(format='%(asctime)s %(message)s')
logging.warning('message\n2022-06-17 15:15:15,123 was logged.message')
results in
2022-06-16 14:03:06,858 message
2022-06-17 15:15:15,123 was logged.message
All available log formatters in the standard library should provide a straightforward way to tell the difference between log message contents and log file format framing. For example, if your output format is newline-delimited, then it cannot allow raw newlines in messages and should "sanitize" by quoting them somehow.
Twisted deals with this by quoting them with trailing tabs, so, for example, the following code:
from twisted.logger import globalLogBeginner, textFileLogObserver, Logger
import sys
globalLogBeginner.beginLoggingTo(
[textFileLogObserver(sys.stdout)], redirectStandardIO=False
)
log = Logger()
log.info("regular log message\nhaha i tricked you this isn't a log message")
log.info("second log message")
Produces this output:
2022-06-17T15:35:13-0700 [__main__#info] regular log message
haha i tricked you this isn't a log message
2022-06-17T15:35:13-0700 [__main__#info] second log message
I'd suggest that the stdlib do basically the same thing.
One alternate solution is just documenting that no application or framework is ever allowed to log a newlines without doing this manually themselves (and unfortunately this seems to be where the Java world has ended up, see for example spring-projects/spring-framework@e9083d7 ), but putting the responsibility on individual projects to do this themselves means making app and library authors predict all possible Formatters that they might have applied to them, then try to avoid any framing characters that that Formatter might use to indicate a message boundary. Today the most popular default formatter uses newlines. But what if some framework were to try to make parsing easier by using RFC2822? Now every application has to start avoiding colons as well as newlines. CSV? Better make sure you don't use commas. Et cetera, et cetera.
Pushing this up to the app or framework means that every library that wants to log anything derived from user data can't log the data in a straightforward structured way that will be useful to sophisticated application consumers, because they have to mangle their output in a way which won't trigger any log-parsing issues with the naive formats. In practice this really just means newlines, but if we make newlines part of the contract here, that also hems in any future Formatter improvements the stdlib might want to make.
I suspect that the best place to handle this would be logging.Formatter.format; there's even some precedent for this, since it already has a tiny bit of special-cased handling of newlines (albeit only when logging exceptions).
(The right thing to do to avoid logging injection robustly is to emit all your logs as JSON, dump them into Cloudwatch or Honeycomb or something like that and skip this problem entirely. The more that the standard logging framework can encourage users to get onto that happy path quickly, the less Python needs to worry about trying to support scraping stuff out of text files with no schema as an interesting compatibility API surface, but this is a really big problem that I think spans more than one bug report.)
Linked PRs
Metadata
Metadata
Assignees
Projects
Status