Overhaul Apache RAT#4569
Conversation
* Upgrade from 0.15 to 0.18 * Use a config file * Remove custom license patterns that match nothing anymore * Ensure works even without a git working tree (e.g. smoketester source release) * Add a canary to assert that header detection is actually working each run * Synchronize RatTask executions to avoid shared Ant task state issues under parallel Gradle execution * Consolidate all exclude patterns into a single include-everything-then-exclude approach
dsmiley
left a comment
There was a problem hiding this comment.
I chickened out and didn't pull out jgit... but was tempted.
There was a problem hiding this comment.
Instead of tying this to RAT, the needs are too simple/basic to warrant that coupling.
| exclude "**/.*" // e.g. .gradle, .github, .junie, .gitignore, etc. | ||
| } else { | ||
| // Source release (no git): scan everything, exclude generated/ignored paths manually. | ||
| include "**" |
There was a problem hiding this comment.
key change here, esp. for the source release validation (smoketester). Include all by default.
There was a problem hiding this comment.
Claude had a comment on this change:
.gradle/ is no longer excluded in the no-git path (the old code excluded .gradle/** unconditionally; the new **/.* exclude only lives inside the if (trackedFiles != null) branch). In an extracted source release the running Gradle invocation itself creates .gradle/ and buildSrc/.gradle with files like checksums.lock and file-system.probe that the extension-based excludes don't catch — RAT could flag them during smoketesting.
There was a problem hiding this comment.
The no-git path is is the source release, tested by the smoketester. I was tempted to make that the only path for RAT, albeit it'd suck to get no PR time feedback on an oversight. So for the source release... we can ensure that the smoketester runs RAT by itself before any other source checking... thus nothing is generated yet. But I suppose .gradle is a concern... hmm; I will add an exclusion rule for that.
| // TODO: SOLR-15601: Some of these should carry the license, perhaps? | ||
| exclude "**/*.html" | ||
| exclude subprojects.collect { | ||
| projectDir.toPath().relativize(it.projectDir.toPath()).toString().replace('\\', '/') |
There was a problem hiding this comment.
was overly simple before, thus would exclude any "core" dir anywhere for example
|
I'll merge this weekend if no feedback. I did some testing of course... this is also what motivated the "canary" aspect. I kinda tested smoketester but had some local edits in relation to local/snapshot releases so I can't guarantee bug-free |
janhoy
left a comment
There was a problem hiding this comment.
Thanks for caring for boring stuff like this. Added some comments for stuff flagged by claude
| exclude "**/.*" // e.g. .gradle, .github, .junie, .gitignore, etc. | ||
| } else { | ||
| // Source release (no git): scan everything, exclude generated/ignored paths manually. | ||
| include "**" |
There was a problem hiding this comment.
Claude had a comment on this change:
.gradle/ is no longer excluded in the no-git path (the old code excluded .gradle/** unconditionally; the new **/.* exclude only lives inside the if (trackedFiles != null) branch). In an extracted source release the running Gradle invocation itself creates .gradle/ and buildSrc/.gradle with files like checksums.lock and file-system.probe that the extension-based excludes don't catch — RAT could flag them during smoketesting.
| <!-- Machine-generated source files that carry no license header --> | ||
| <license family="GEN" id="GEN" name="Generated"> | ||
| <any> | ||
| <!-- none so far --> |
There was a problem hiding this comment.
Why include it if there are no rules?
There was a problem hiding this comment.
There used to be some but all were old or part of Lucene. I thought it best to leave this here ready to easily add one if needed; it's minor. Of course, we actually do have generated source but I haven't looked at why we don't need this. For some I known we exclude categorically because it's not committed. For committed like a query parser or two, I think it's intermixed with human authored stuff so it gets a license. I think.
| // Detects whether a comment block contains Apache license header text. | ||
| def isLicense = { matcher -> | ||
| def content = matcher.group(1) | ||
| content.contains('Licensed to the Apache Software Foundation') || |
There was a problem hiding this comment.
A comment by claude that I have not been able to validate but here it is:
The new string-based isLicense only recognizes Apache header text, while the old RAT default matchers also caught MIT/BSD-style javadoc headers. Coverage narrows slightly; probably an acceptable trade-off
There was a problem hiding this comment.
indeed; acceptable trade-off. Prefer to keep things simple.
# Conflicts: # gradle.lockfile # gradle/libs.versions.toml
This option (PR-apache#2685, ported from a similar Lucene feature) added complexity: parallel lists of alt-JDK homes/runners/versions threaded through the Java config, a closure to dedupe per-JDK test runs, and per-JDK copies of the unpacked binary distribution. It was never well exercised -- it merged via lazy consensus with the sole tester noting they hadn't actually tried it with a different JDK version -- and it isn't invoked anywhere in the documented release process (releaseWizard.yaml never passes it). It has caused at least two real bugs, including a TypeError crash when no alt JDK is given at all. Testing with another JDK is just as easy by setting JAVA_HOME and re-running the script; the modest cost of redoing some work is a fair trade for a much simpler script.
|
I'm going to incorporate here an explicit step to the smoketester to invoke RAT. The only issue with my attempts right now to do so is that the smoketester writes the output of these gradle invocations to log files in the present directory, which means RAT sees it and sees that the log file has no license. I'll put it into build/ instead. |
* move logs to under build/ * move no-daemon to an env var, off of CLI
|
Including a removal of the alt-java thing... but that is a separate PR and should be done first. |
Note: requires the PR that excludes dev-docs and dot-files from the source release, in order for RAT to pass on smoketest release