Skip to content

Overhaul Apache RAT#4569

Open
dsmiley wants to merge 7 commits into
apache:mainfrom
dsmiley:rat-overhaul
Open

Overhaul Apache RAT#4569
dsmiley wants to merge 7 commits into
apache:mainfrom
dsmiley:rat-overhaul

Conversation

@dsmiley

@dsmiley dsmiley commented Jun 29, 2026

Copy link
Copy Markdown
Contributor
  • Upgrade from 0.15 to 0.18
  • Use a config file
  • Remove custom license patterns that match nothing anymore
  • Ensure works even without a git working tree (e.g. smoketester source release)
  • Add a canary to assert that header detection is actually working each run
  • Synchronize RatTask executions to avoid shared Ant task state issues under parallel Gradle execution
  • Consolidate all exclude patterns into a single include-everything-then-exclude approach

Note: requires the PR that excludes dev-docs and dot-files from the source release, in order for RAT to pass on smoketest release

* Upgrade from 0.15 to 0.18
* Use a config file
* Remove custom license patterns that match nothing anymore
* Ensure works even without a git working tree (e.g. smoketester source release)
* Add a canary to assert that header detection is actually working each run
* Synchronize RatTask executions to avoid shared Ant task state issues
  under parallel Gradle execution
* Consolidate all exclude patterns into a single include-everything-then-exclude approach
@dsmiley dsmiley requested a review from janhoy June 29, 2026 16:17

@dsmiley dsmiley left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chickened out and didn't pull out jgit... but was tempted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of tying this to RAT, the needs are too simple/basic to warrant that coupling.

exclude "**/.*" // e.g. .gradle, .github, .junie, .gitignore, etc.
} else {
// Source release (no git): scan everything, exclude generated/ignored paths manually.
include "**"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key change here, esp. for the source release validation (smoketester). Include all by default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude had a comment on this change:

.gradle/ is no longer excluded in the no-git path (the old code excluded .gradle/** unconditionally; the new **/.* exclude only lives inside the if (trackedFiles != null) branch). In an extracted source release the running Gradle invocation itself creates .gradle/ and buildSrc/.gradle with files like checksums.lock and file-system.probe that the extension-based excludes don't catch — RAT could flag them during smoketesting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-git path is is the source release, tested by the smoketester. I was tempted to make that the only path for RAT, albeit it'd suck to get no PR time feedback on an oversight. So for the source release... we can ensure that the smoketester runs RAT by itself before any other source checking... thus nothing is generated yet. But I suppose .gradle is a concern... hmm; I will add an exclusion rule for that.

// TODO: SOLR-15601: Some of these should carry the license, perhaps?
exclude "**/*.html"
exclude subprojects.collect {
projectDir.toPath().relativize(it.projectDir.toPath()).toString().replace('\\', '/')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was overly simple before, thus would exclude any "core" dir anywhere for example

@dsmiley

dsmiley commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

I'll merge this weekend if no feedback.

I did some testing of course... this is also what motivated the "canary" aspect. I kinda tested smoketester but had some local edits in relation to local/snapshot releases so I can't guarantee bug-free

@janhoy janhoy mentioned this pull request Jul 2, 2026
1 task

@janhoy janhoy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for caring for boring stuff like this. Added some comments for stuff flagged by claude

Comment thread gradle/validation/rat-sources.gradle
exclude "**/.*" // e.g. .gradle, .github, .junie, .gitignore, etc.
} else {
// Source release (no git): scan everything, exclude generated/ignored paths manually.
include "**"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude had a comment on this change:

.gradle/ is no longer excluded in the no-git path (the old code excluded .gradle/** unconditionally; the new **/.* exclude only lives inside the if (trackedFiles != null) branch). In an extracted source release the running Gradle invocation itself creates .gradle/ and buildSrc/.gradle with files like checksums.lock and file-system.probe that the extension-based excludes don't catch — RAT could flag them during smoketesting.

<!-- Machine-generated source files that carry no license header -->
<license family="GEN" id="GEN" name="Generated">
<any>
<!-- none so far -->

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include it if there are no rules?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be some but all were old or part of Lucene. I thought it best to leave this here ready to easily add one if needed; it's minor. Of course, we actually do have generated source but I haven't looked at why we don't need this. For some I known we exclude categorically because it's not committed. For committed like a query parser or two, I think it's intermixed with human authored stuff so it gets a license. I think.

// Detects whether a comment block contains Apache license header text.
def isLicense = { matcher ->
def content = matcher.group(1)
content.contains('Licensed to the Apache Software Foundation') ||

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment by claude that I have not been able to validate but here it is:

The new string-based isLicense only recognizes Apache header text, while the old RAT default matchers also caught MIT/BSD-style javadoc headers. Coverage narrows slightly; probably an acceptable trade-off

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed; acceptable trade-off. Prefer to keep things simple.

dsmiley added 4 commits July 4, 2026 01:13
# Conflicts:
#	gradle.lockfile
#	gradle/libs.versions.toml
This option (PR-apache#2685, ported from a similar Lucene feature) added
complexity: parallel lists of alt-JDK homes/runners/versions
threaded through the Java config, a closure to dedupe per-JDK test runs,
and per-JDK copies of the unpacked binary distribution. It was never
well exercised -- it merged via lazy consensus with the sole tester
noting they hadn't actually tried it with a different JDK version -- and
it isn't invoked anywhere in the documented release process
(releaseWizard.yaml never passes it). It has caused at least two real
bugs, including a TypeError crash when no alt JDK is given at all.

Testing with another JDK is just as easy by setting JAVA_HOME and
re-running the script; the modest cost of redoing some work is a fair
trade for a much simpler script.
@dsmiley

dsmiley commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

I'm going to incorporate here an explicit step to the smoketester to invoke RAT. The only issue with my attempts right now to do so is that the smoketester writes the output of these gradle invocations to log files in the present directory, which means RAT sees it and sees that the log file has no license. I'll put it into build/ instead.

* move logs to under build/
* move no-daemon to an env var, off of CLI
@dsmiley

dsmiley commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Including a removal of the alt-java thing... but that is a separate PR and should be done first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants