-
-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security feature: Add feature to sanitize strings against CSV injection (aka Formula Injection). #326
Comments
First of all, thank you for reporting the issue. I will have to read this with thought. But my first thought is that this seems related to some specific use cases; and that the suggestion cannot be a general rule for all usage. It sounds like this would specifically be against injection when loading documents in Excel -- a common use case to be sure, but not at all the only use case. And as such it cannot dictate general processing: it would not make any sense to -- for example -- remove leading minus sign from negative numbers. It would likely make sense to add new |
You're right. It's certainly application specific. As is the vulnerability. Though, I do feel that "export to csv" like functionality is within the domain of this project. That said, it seems I didn't consider all the constraints. The negative number like problem seems hard to solve without inspecting the entire string. On the performance front, perhaps we can qualify which strings we apply the sanitize operation to by checking if the first character is in the set of offending characters. If not, we can omit the sanitize operation. As for the operation itself, I'll try and think about edge cases and ways to save on time. On the naming front, SANITIZE_FOR_EXCEL is better than what I originally suggested, but may be too narrow. Formulae are used by every major spreadsheet app, including Google Sheets, OfficeLibre, Zoho Sheets, Apple Numbers on MacOS, etc. All of them use the four characters mentioned above. A better name for the feature might be SANITIZE_FOR_CONSUMER. |
Yes, I agree that improvement(s) to allow safe(r) operation are important and useful, I hope it did not sound I suggest otherwise. While I agree that Now, one practical question: it seemed to me that:
is this true? (2) is esp. relevant since ideally I'd prefer not mutating actual values -- but obviously if it won't make any difference then... something needs to be done. And come to think of that, it may even be necessary to offer some sort of handlers for deciding how to replace possibly problematic characteres (drop/change). |
I can't see the security issue here - the security risk is on the consumer side (the Excel user) as opposed to the producer side. The Excel user shouldn't be loading untrusted files. I'm not aware of any CSV libs that have a feature like this. If you can provide some links to equivalent support in other CSV tools, I'd be interested in seeing them. If you want to export data using so that Excel users can load it up - why not produce an xlsx file (using Apache POI or something like that)? Then you get to control whether a cell is a text cell or a formula cell. |
Summary
We'd like to add a feature to
CsvGenerator
that will enable sanitizing string values against CSV Injection.CsvGenerator.Feature.SANITIZE_STRING_VALUES
The current method of protecting against this vulnerability is not reliable. Specifically, we request that for all string values, all matches that match the regular expression
^[-\=\+@]+
should be replaced by an empty string before the string value is written.Background
https://www.veracode.com/blog/secure-development/data-extraction-command-execution-csv-injection
Rationale
The existing feature that is most-often used to protect against CSV injection is
CsvGenerator.Feature.ALWAYS_QUOTE_STRINGS
. There are two problems with this approach.Many developers are attempting to address the issue by using reflection to manually "sanitize" the declared string fields on their POJOs prior to serializing them as a CSV row. The risk is that they are doing it improperly, allowing exposure to still exist in some form. For example, one user on stack overflow decided to prepend malicious string with an apostrophe (
'
), instead of just removing the offending characters entirely.Clearly, a more targeted feature is needed to protect the end user.
Mitigating the Risk
The risk is mitigated by ensuring that the following characters do not appear at the start of the string:
=
)+
)-
)@
)Acceptance criteria
Strings should be transformed:
ALWAYS_QUOTE_STRINGS
is applied.The following transformations can be used as a guide for feature.
The text was updated successfully, but these errors were encountered: