Skip to content

refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation#13196

Open
grahamsedman wants to merge 1 commit into
apache:masterfrom
grahamsedman:fix/src-tscore-compile-parse-rules-errors
Open

refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation#13196
grahamsedman wants to merge 1 commit into
apache:masterfrom
grahamsedman:fix/src-tscore-compile-parse-rules-errors

Conversation

@grahamsedman
Copy link
Copy Markdown

📌 Summary

This PR refactors src/tscore/CompileParseRules.cc to address compilation errors, type safety issues, modern C++ best practices, and documentation gaps. The changes ensure compatibility across 32-bit and 64-bit targets, fix narrowing conversion warnings, and improve code clarity and maintainability.


✅ Problems Addressed

Problem Root Cause Solution
Narrowing conversion errors char is signed on x86, causing negative values (e.g., -128) for ASCII >127, which cannot be narrowed to unsigned char on 32-bit targets. Use uint8_t for character arrays and explicitly cast to unsigned when writing to files.
Unused header #include "tscore/ink_string.h" is included but never used. Removed the unused header.
Poor header organization Headers were split into two sections, reducing readability. Consolidated all headers at the top of the file.
Missing documentation No Doxygen comments for the file, functions, or variables. Added comprehensive Doxygen documentation.
Non-thread-safe uint_to_binary The function used a static char buf[33], making it unsafe for concurrent use. Refactored to return std::string (RAII-managed, thread-safe).
C-style I/O Used FILE* and fprintf, which are less type-safe and require manual resource management. Replaced with std::ofstream (RAII-based, type-safe).
Poor Doxygen practice @section license causes warnings due to duplicate labels across files. Removed @section and kept the license text directly in the file header.
Unused macro #define COMPILE_PARSE_RULES is defined but never used. Removed the macro.
Outdated data types Used unsigned int and char instead of fixed-width types (uint8_t, uint32_t). Replaced with <cstdint> types for portability and clarity.

🔧 Key Changes

1. File Header & Includes

  • Removed: #define COMPILE_PARSE_RULES, #include "tscore/ink_string.h".
  • Added: <cstdint>, <fstream>, <iomanip>, <string>.
  • Refactored: Consolidated all includes at the top of the file.
  • Documentation: Added a detailed Doxygen file header explaining the purpose, generated files, and modern C++ features used.

2. Data Types

  • Bitmask arrays: Changed from unsigned int to uint32_t for explicit size and future-proofing.
  • Character arrays: Changed from char to uint8_t to guarantee an unsigned range (0-255) and avoid sign-extension issues.
  • Loop variables: Changed from int to uint16_t or uint8_t for semantic clarity.

3. uint_to_binary Function

  • Before: Used a static char buf[33] (non-thread-safe).
  • After: Returns std::string (thread-safe, RAII-managed).
  • Parameter: Changed from unsigned int to uint32_t for explicit size.

4. File I/O

  • Before: Used FILE* and fprintf (C-style, manual resource management).
  • After: Uses std::ofstream (RAII-based, type-safe, modern C++).
  • Format specifiers: Replaced %d with static_cast<unsigned> to avoid sign-extension issues when printing uint8_t values.

5. Doxygen Documentation

  • Added detailed comments for:
    • File purpose and generated outputs.
    • Each global array (parseRulesCType, tparseRulesCType, etc.).
    • The uint_to_binary function.
    • The main function (including its steps and classification functions).

🧪 Testing

  • Compilation: Verified on x86 (signed char) and ARM (unsigned char) targets.
  • Output: Confirmed that generated files (ParseRulesCType, ParseRulesCTypeToUpper, ParseRulesCTypeToLower) contain correct values (no negative numbers for ASCII >127).
  • Thread Safety: The refactored uint_to_binary is now thread-safe.

- Replace C-style file I/O (fopen, fprintf, fclose) with std::ofstream
for RAII-based file handling
- Use fixed-width integer types (uint32_t, uint8_t) instead of unsigned
int and char for portability and clarity
- Refactor uint_to_binary function to use std::string instead of static
buffer for thread safety
- Add comprehensive Doxygen documentation for the file, functions, and
arrays
- Remove @section license License from file header to prevent Doxygen
warnings about multiple use of section label, as @section is intended
for major structured documentation sections, not repetitive boilerplate
- Remove obsolete COMPILE_PARSE_RULES macro and ink_string.h dependency
- Improve output formatting using std::setw, std::setfill, and std::hex
for consistent alignment
- Replace int loop variables with uint16_t for better type safety
- Add static_cast for explicit type conversions
@grahamsedman grahamsedman changed the title feat: Refactor CompileParseRules.cc for type safety, modern C++, and documentation refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant