Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Add new quality query to detect String#replaceAll with non-regex first argument #19115

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Mar 25, 2025

No description provided.

@Copilot Copilot bot review requested due to automatic review settings March 25, 2025 14:31
@owen-mc owen-mc requested a review from a team as a code owner March 25, 2025 14:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new quality query to detect cases where String#replaceAll is used with a non-regular expression as the first argument, recommending the use of String#replace for better performance.

  • Introduces a test case to validate the new query.
  • Adds a change note to document the new quality query.
  • Provides a performance guideline detailing the issue and recommendation.

Reviewed Changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated no comments.

File Description
java/ql/test/query-tests/StringReplaceAllWithNonRegex/Test.java Added test cases demonstrating compliant and non-compliant usage.
java/ql/src/change-notes/2025-03-25-string-replace-all-with-non-regex.md Added change note for the new quality query.
java/ql/src/Performance/StringReplaceAllWithNonRegex.md Added performance guideline explaining why String#replace is preferable when a non-regex literal is used.
Files not reviewed (3)
  • java/ql/src/Performance/StringReplaceAllWithNonRegex.ql: Language not supported
  • java/ql/test/query-tests/StringReplaceAllWithNonRegex/StringReplaceAllWithNonRegex.expected: Language not supported
  • java/ql/test/query-tests/StringReplaceAllWithNonRegex/StringReplaceAllWithNonRegex.qlref: Language not supported

Tip: Copilot code review supports C#, Go, Java, JavaScript, Markdown, Python, Ruby and TypeScript, with more languages coming soon. Learn more

smowton
smowton previously approved these changes Mar 25, 2025
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks plausible

## References

- Java SE Documentation: [String.replaceAll](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/String.html#replaceAll(java.lang.String,java.lang.String)).
- Common Weakness Enumeration: [CWE-1176](https://cwe.mitre.org/data/definitions/1176.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: a . is missing from the end of line.

//only contains characters that could be a simple string
firstArg.getValue().regexpMatch("^[a-zA-Z0-9]+$")
select replaceAllCall,
"This call to 'replaceAll' should be a call `replace` as its $@ is not a regular expression.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ` work in the alert message? replaceAll and replace should probably be quoted the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In VS Code, ` doesn't seem to do anything, so I've changed it to '.

@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 26, 2025

I ran MRVA. There are 417 results in the top 1,000 repos. 114 repos had at least one result. The most in one repo is 86 but that's an outlier - the next highest is 15. All the results look like TPs to me.

tamasvajk
tamasvajk previously approved these changes Mar 27, 2025
@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 27, 2025

I have validated autofixes. As I expected, they are all good (though we tend to fix multiple instances at once rather than just the specified one). So I think this is ready to merge now.

@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 27, 2025

@knewbury01 for awareness.

Copy link
Contributor

@tamasvajk tamasvajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this query be added to the ccr.qls code-quality.qls?

@owen-mc owen-mc added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Mar 27, 2025
@mchammer01 mchammer01 self-requested a review March 31, 2025 06:07
@mchammer01
Copy link
Contributor

I will review this for Docs today.

mchammer01
mchammer01 previously approved these changes Mar 31, 2025
Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owen-mc 👋🏻 - Approving on behalf of Docs ✨
Highlighted minor things + asked some questions for my own understanding.


## Overview

The underlying implementation of `String#replaceAll` uses `Pattern#compile` and expects a regular expression as its first argument. However in cases where the argument could be represented by just a plain `String` that does not represent an interesting regular expression, a call to `String#replace` may be more performant as it does not need to compile the regular expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding: what is an "interesting" regular expression? 🤔
Also the clause inside "however" is a bit long. Would it be possible to shorten/simplify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, "interesting" means "uses any of the special syntax allowed in regular expressions to do more than just match text verbatim". I will think about how to improve.


## References

- Java SE Documentation: [String.replaceAll](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/String.html#replaceAll(java.lang.String,java.lang.String)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. I don't see String.replaceAll mentioned at all in the link provided. This seems to link to the String class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should go to the replaceAll method in the page on the String class. It seems to work for me.

/**
* @id java/string-replace-all-with-non-regex
* @name Use of `String#replaceAll` with a first argument which is not a regular expression
* @description Using `String#replaceAll` is less performant than `String#replace` when the first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, I am not a developer, so I would be grateful if you could explain what the problem is with this being less performant. Does it take more time to execute?

Would something like be correct/acceptable?
(Sorry I hadn't hear of the adjective "performant" before. Discovered since that it is indeed a word in the engineering/computing area 😅 )

Using String#replaceAll instead of String#replace when the first argument is not a regular expression causes performance issues.

Feel free to ignore if you think that developers will understand what this query does with the description as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Java ready-for-doc-review This PR requires and is ready for review from the GitHub docs team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants