-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(scripts): Add dependency version scanner tool #16867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
chalmerlowe
wants to merge
38
commits into
main
Choose a base branch
from
feat/add-version-scanner
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,614
−0
Open
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
f446ff7
feat(scripts): Add dependency version scanner tool
chalmerlowe 256b048
perf(search): Apply bot suggestions for regex optimization and imports
chalmerlowe 1010399
refactor(benchmark): Use tempfile for unique names and safe cleanup
chalmerlowe 68f61ee
refactor(benchmark): Remove redundant directory check
chalmerlowe cc960b4
test(integration): Check exit code of subprocess in integration test
chalmerlowe a4ad9ce
test(unit): Remove redundant and brittle test_regex_patterns
chalmerlowe 2743957
test(unit): Move import yaml to top of file
chalmerlowe 47450bb
refactor(benchmark): Remove redundant directory check in main
chalmerlowe c777e44
test(unit): Remove duplicate import yaml from function
chalmerlowe 8aab801
feat(version_scanner): handle invalid format strings in config and ad…
chalmerlowe f63053c
feat(version_scanner): handle PermissionError when reading config fil…
chalmerlowe 2af97b3
feat(version_scanner): extract read_package_file and handle file errors
chalmerlowe cb29438
refactor(version_scanner): simplify target resolution and remove dupl…
chalmerlowe ea0e8be
feat(version_scanner): add format_match_for_csv helper and tests
chalmerlowe a8824af
feat(version_scanner): integrate GitHub link generation into CSV report
chalmerlowe baafb74
feat(version_scanner): default output to results directory
chalmerlowe a1cc08e
feat(version_scanner): ignore version_scanner directory during scan
chalmerlowe 3ceea9b
feat(version_scanner): broaden version regex and add case insensitivity
chalmerlowe d756c07
feat(version_scanner): strip newlines from matched strings
chalmerlowe 075d04b
feat(version_scanner): add word boundaries and truncate long context …
chalmerlowe 85e9ff5
feat(version_scanner): add console summary table
chalmerlowe 5c8f673
feat(version_scanner): add .scannerignore file support
chalmerlowe efb3331
feat(version_scanner): move ignore defaults to .scannerignore file
chalmerlowe bf39072
docs(version_scanner): add README.md
chalmerlowe 9d9ce22
docs(version_scanner): update README options and CLI help strings
chalmerlowe 14e4dcc
feat(version_scanner): set default for --github-repo
chalmerlowe 7fc03ca
feat(version_scanner): default config path to script directory
chalmerlowe f64eac4
feat(version_scanner): support case-insensitive file ignores and add …
chalmerlowe fc47dd6
feat(version_scanner): update small package list for demos
chalmerlowe 95f6f19
Merge remote-tracking branch 'origin/main' into feat/add-version-scanner
chalmerlowe 761def6
Merge branch 'origin/main' into feat/add-version-scanner
chalmerlowe 9289c8c
feat(version_scanner): add combined_version_string rule and use word …
chalmerlowe d771258
feat(scanner): add ability to detect ignore pragma
chalmerlowe bafae70
feat(scanner): move .scannerignore to script directory and update loo…
chalmerlowe 94174bb
chore(scanner): ignore repositories.bzl in scanner
chalmerlowe d652dbf
feat(scanner): add filename scanning support
chalmerlowe a1188c8
docs(scanner): update README with known issues and add binary ignores…
chalmerlowe 0a6ae92
docs(version-scanner): merge migration guide into README.md
chalmerlowe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| .conductor/ | ||
| scanner_report.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| # Directories and files to ignore by the version scanner | ||
| .git | ||
| __pycache__ | ||
| .tox | ||
| .nox | ||
| venv | ||
| .venv | ||
| .conductor | ||
| version_scanner | ||
| docs | ||
| samples | ||
| changelog.md | ||
| .librarian | ||
| goldens | ||
| # Ignore pandoc references in repositories.bzl | ||
| repositories.bzl | ||
|
|
||
| # Ignore binary media files | ||
| *.jpg | ||
| *.png | ||
| *.gif | ||
| *.ico |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| # Automated Dependency Version Scanner | ||
|
|
||
| This tool scans the repository for hardcoded references to specific dependency versions (like Python 3.7) that need to be upgraded or removed. | ||
|
|
||
| ## Usage | ||
|
|
||
| Run the script from the repository root: | ||
|
|
||
| ```bash | ||
| python3 scripts/version_scanner/version_scanner.py -d <dependency> -v <version> [options] | ||
| ``` | ||
|
|
||
| ### Options | ||
|
|
||
| * `-d`, `--dependency`: Name of the dependency (e.g., python, protobuf) | ||
| * `-v`, `--version`: Specific version to search for (e.g., 3.7, 4.25.8) | ||
| * `-p`, `--path`: Root directory to scan (defaults to current directory) | ||
| * `--package`: Specific subdirectory filter (useful for monorepos) | ||
| * `--package-file`: Path to a file containing a list of package directories to scan | ||
| * `--config`: Path to the regex configuration file (defaults to scripts/version_scanner/regex_config.yaml) | ||
| * `-o`, `--output`: Path to the output CSV file (defaults to <dependency>-<version>-<timestamp>.csv) | ||
| * `--github-repo`: GitHub repository URL base (defaults to https://github.com/googleapis/google-cloud-python) | ||
| * `--branch`: GitHub branch for links (defaults to main) | ||
|
|
||
| ## Configuration | ||
|
|
||
| The scanner uses a YAML configuration file (`regex_config.yaml`) to define rules and regex patterns. | ||
|
|
||
| ## Ignoring Directories | ||
|
|
||
| You can create a `.scannerignore` file in the directory you are scanning (usually the repo root) to list directories to skip, one per line. | ||
|
|
||
| ## Known Issues & Future Investigations | ||
| - **Binary Ignores in `.scannerignore`**: Recursive wildcard ignores (e.g., `*.jpg`) currently do not effectively ignore deeply nested binary files. The scanner logic should be investigated to support robust globbing or full-path suffix matching. | ||
|
|
||
| --- | ||
|
|
||
| ## Universal Prompt for EOL Runtime & Dependency Migration | ||
|
|
||
| ### Context & Overview | ||
|
|
||
| #### Overview | ||
| This plan outlines the approach to update Python packages to drop support for end-of-life Python runtimes (3.7, 3.8, 3.9) OR for deprecated dependencies, and ensure the packages are configured for modern Python. | ||
|
|
||
| #### High-Level Strategy | ||
| - **One Branch Per Package**: To keep PRs manageable and isolated, we suggest a dedicated worktree and branch for each package (e.g., `feat/drop-<dependency>-<version>-<package-name>` i.e. `feat/drop-protobuf-4.25.8-google-cloud-bigquery`). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is only for hand-written packages, right? I assume others would get their updates through the generator? Should we recommend doing a generator update first, to clean up most of the packages? |
||
| - **Small & Reversible Commits**: Group changes into logical commits (Metadata, Nox, Docs, Cleanup, Tests) following Conventional Commits. | ||
|
|
||
| --- | ||
|
|
||
| ### Per-Package Workflow | ||
|
|
||
| Follow these steps for each package in the target list. Context and warnings are provided inline before the steps where they apply. | ||
|
|
||
| #### Step 1: Sync & Branch | ||
| 1. Ensure `main` branch is up to date. | ||
| 2. Create the feature branch: `git checkout -b feat/drop-<dependency>-<version>-<package-name>`. | ||
|
|
||
| #### Step 2: Scan (Baseline) | ||
| 1. Run the `version_scanner` for the package to get a list of all occurrences of the dependency and version. | ||
| > [!TIP] | ||
| > Use `# version-scanner: ignore` or `ignore-next-line` in code to silence true false-positives and maintain clean reports. | ||
|
|
||
| --- | ||
|
|
||
| #### 💡 Context for Step 3: Standards & Cleanup | ||
| *Before applying changes, review these standards to ensure consistency:* | ||
|
|
||
| ##### Runtime Version Checks | ||
| - **Standard**: Use `sys.version_info < (X, Y)`. | ||
| - **Rationale**: Python compares tuples lexicographically, making this robust. | ||
| - **Avoid**: `sys.version_info.minor < Y` or string conversions. | ||
|
|
||
| ##### Pytest Skips | ||
| - **Standard**: `@pytest.mark.skipif(sys.version_info < (X, Y), reason="Requires Python X.Y+")`. | ||
| - **Avoid**: String-based conditions like `@pytest.mark.skipif("sys.version_info < ...")`. | ||
|
|
||
| ##### Noxfile Version Matches | ||
| - **Standard**: `session.python == "X.Y"` (Nox uses strings). | ||
| - **Avoid**: `float(session.python) < X.Y` (fails for `3.10`). | ||
|
|
||
| ##### Cleanup Rules | ||
| - **Polyfills**: Remove dead `try/except` blocks guarding polyfills for features now standard in 3.10+. | ||
| - **Obsolete Skips**: Remove pytest skips for features now universally available. | ||
|
|
||
| ##### Dependency Specific rules | ||
| - Use idiomatic python references to detect dependency versions and to compare against the target version. | ||
|
|
||
| --- | ||
|
|
||
| #### 💡 Context for Step 3: Disposition Rules | ||
| *Every reference to the dependency version found by the scanner must be dispositioned in one of these ways:* | ||
|
|
||
| 1. **Update**: Update the reference if still necessary (e.g., changing `3.9` to `3.10` in support files). | ||
| 2. **Delete**: Delete if no longer relevant (dead code, obsolete comments). | ||
| 3. **Pragma Ignore**: Use `# version-scanner: ignore` or `# version-scanner: ignore-next-line` but ONLY for immutable historical facts or true false positives. Do NOT use for things that might change in future upgrades. | ||
|
|
||
| #### Step 3: Apply Changes | ||
| 1. Update `setup.py` or `pyproject.toml` metadata and `requires-python`. | ||
| 2. Update `noxfile.py` to remove old versions from sessions. | ||
| 3. Update `README.rst` and `CONTRIBUTING.rst` documentation. | ||
| 4. Remove compatibility code and skips based on the standards above. | ||
| 5. **Sync Documentation**: If the package has a `docs` folder containing a `README.rst`, copy the updated top-level `README.rst` to overwrite it (unless it is a symlink). | ||
| 6. Continue with the update process until all rows from the scan have been properly dispositioned. | ||
|
|
||
| --- | ||
|
|
||
| #### Step 4: Verify (Post-Scan) | ||
| 1. Run the `version_scanner` again. The result should be 0 matches (or only valid ignores). | ||
|
|
||
| --- | ||
|
|
||
| #### 💡 Context for Step 5: Constraints & Conflicts | ||
| *Review these lessons learned when dealing with constraints:* | ||
|
|
||
| - **Lowest Runtime Constraints**: The file for the lowest accepted runtime (e.g., `constraints-3.10.txt`) must have pins matching the lowest acceptable versions in `setup.py` or `pyproject.toml`. | ||
| - **Philosophy on Warnings**: Do not simply block warnings (like `six` or `pkg_resources`) to make tests pass. **Bump the lower bounds** of dependencies to versions that don't trigger warnings on the current lowest acceptable runtime. This protects customers who use strict warning filters. | ||
| - **SQLAlchemy Transition**: For libraries supporting both 1.4 and 2.0, use `SQLALCHEMY_SILENCE_UBER_WARNING=1` in specific legacy Nox sessions rather than silencing globally. | ||
|
|
||
| --- | ||
|
|
||
| #### Step 5: Local Test | ||
| 1. Run unit tests using Nox (e.g., `nox -s unit`). | ||
| > [!TIP] | ||
| > Use `nox -s unit-3.10` to save time when debugging specific runtime failures. | ||
| 2. Run `blacken` and `lint` sessions. | ||
|
|
||
| #### Step 6: Push & PR | ||
| 1. Push the branch and create the PR using the template in the Appendix. | ||
|
|
||
| --- | ||
|
|
||
| ## Appendix | ||
|
|
||
| ### PR Template [^1] | ||
| ```text | ||
| This PR updates `<dependency>` to establish version x.y.z as the minimum supported version. | ||
|
|
||
| ### Changes | ||
| * Configuration: Updated `setup.py` and `noxfile.py` to require <dependency> <version> and remove references to older versions. | ||
| * Cleanup: Removed dead code and polyfills no longer needed. | ||
|
|
||
| Fixes internal issue: http://b/482126936 🦕 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Candidates for `.conductor` or `gemini.md` | ||
|
|
||
| *The following guidelines are universal for AI assistants workin' in this repo and should be moved to `.conductor` files or Gemini memories:* | ||
|
|
||
| 1. **AI & LLM Guidelines for Verification**: | ||
| - Use Git Worktrees to scan branches without switching. | ||
| - Run scanner from main branch pointing to worktree. | ||
| - Bypass env artifacts by worktree only checking out tracked files. | ||
| 2. **Automated Bisection**: | ||
| - Use `version_bisector.py` to find lowest workable versions. | ||
| - Abort tests early as soon as collection succeeds to save time. | ||
|
|
||
| [^1]: Adapted from the standard PR template used in this repository. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| import argparse | ||
| import os | ||
| import random | ||
| import subprocess | ||
| import sys | ||
| import tempfile | ||
| import time | ||
| from typing import List, Dict | ||
|
|
||
| def get_package_subset(packages_dir: str, count: int) -> List[str]: | ||
| """ | ||
| Get a randomized subset of package names from the specified directory. | ||
|
|
||
| Args: | ||
| packages_dir: Path to the directory containing packages. | ||
| count: Number of packages to return. | ||
|
|
||
| Returns: | ||
| A list of package directory names. | ||
| """ | ||
| all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))] | ||
|
|
||
| if count >= len(all_packages): | ||
| return all_packages | ||
|
|
||
| return random.sample(all_packages, count) | ||
|
|
||
| def run_benchmark( | ||
| scanner_path: str, | ||
| root_path: str, | ||
| package_file: str, | ||
| dependency: str, | ||
| version: str | ||
| ) -> float: | ||
| """ | ||
| Run the scanner and return the duration in seconds. | ||
| """ | ||
| cmd = [ | ||
| "python3", scanner_path, | ||
| "-d", dependency, | ||
| "-v", version, | ||
| "-p", root_path, | ||
| "--package-file", package_file | ||
| ] | ||
|
|
||
| start_time = time.perf_counter() | ||
|
|
||
| try: | ||
| result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) | ||
| except subprocess.CalledProcessError as e: | ||
| print(f"Error running benchmark: {e}") | ||
| return -1.0 | ||
|
|
||
| duration = time.perf_counter() - start_time | ||
| return duration | ||
|
|
||
| def run_benchmarks( | ||
| scanner_path: str, | ||
| root_path: str, | ||
| packages_dir: str, | ||
| counts: List[int], | ||
| dependency: str, | ||
| version: str | ||
| ) -> Dict[int, float]: | ||
| """Runs benchmarks for specified counts and returns a dict of results.""" | ||
| results = {} | ||
|
|
||
| for count in counts: | ||
| subset = get_package_subset(packages_dir, count) | ||
| print(f" Testing {len(subset)} packages (e.g., {subset[:3]}...)") | ||
|
|
||
| # Create temp package file | ||
| with tempfile.NamedTemporaryFile(mode='w', delete=False) as f: | ||
| for pkg in subset: | ||
| f.write(f"packages/{pkg}\n") | ||
| pkg_file = f.name | ||
|
|
||
| try: | ||
| duration = run_benchmark(scanner_path, root_path, pkg_file, dependency, version) | ||
| results[count] = duration | ||
| finally: | ||
| # Clean up | ||
| if os.path.exists(pkg_file): | ||
| os.remove(pkg_file) | ||
|
|
||
| return results | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Benchmark the version scanner.") | ||
|
|
||
| parser.add_argument( | ||
| "-s", "--scanner-path", | ||
| default="version_scanner.py", | ||
| help="Path to version_scanner.py" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-r", "--root-path", | ||
| required=True, | ||
| help="Path to the monorepo root directory" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-p", "--packages-dir", | ||
| help="Path to packages directory (defaults to <root-path>/packages)" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-d", "--dependency", | ||
| default="python", | ||
| help="Dependency to search for" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-v", "--version", | ||
| default="3.7", | ||
| help="Version to search for" | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "-c", "--counts", | ||
| default="1,10,50", | ||
| help="Comma-separated list of package counts to test" | ||
| ) | ||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| packages_dir = args.packages_dir or os.path.join(args.root_path, "packages") | ||
|
|
||
| if not os.path.exists(packages_dir): | ||
| print(f"Error: Packages directory not found: {packages_dir}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| counts = [int(c) for c in args.counts.split(',')] | ||
|
|
||
| all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))] | ||
|
|
||
| total_packages = len(all_packages) | ||
|
|
||
| print(f"Found {total_packages} packages in {packages_dir}") | ||
|
|
||
| # Filter counts that are greater than total packages | ||
| counts = [c for c in counts if c <= total_packages] | ||
| # Add total if not already there | ||
| if total_packages not in counts: | ||
| counts.append(total_packages) | ||
|
|
||
| print(f"Running benchmarks for counts: {counts}") | ||
|
|
||
| results = run_benchmarks( | ||
| scanner_path=args.scanner_path, | ||
| root_path=args.root_path, | ||
| packages_dir=packages_dir, | ||
| counts=counts, | ||
| dependency=args.dependency, | ||
| version=args.version | ||
| ) | ||
|
|
||
| print("\nBenchmark Results:") | ||
| print(f"{'Packages':<10} | {'Time (seconds)':<15}") | ||
| print("-" * 30) | ||
| for count, duration in results.items(): | ||
| print(f"{count:<10} | {duration:<15.4f}") | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran this, I gt a ModuleNotFound error. is there a requirements.txt or anything that captures the dependencies?