Skip to content

fix(scraper): guard against AttributeError when img has no parent in process_image#1995

Open
devteamaegis wants to merge 1 commit into
unclecode:mainfrom
devteamaegis:fix/process-image-guard-null-parent
Open

fix(scraper): guard against AttributeError when img has no parent in process_image#1995
devteamaegis wants to merge 1 commit into
unclecode:mainfrom
devteamaegis:fix/process-image-guard-null-parent

Conversation

@devteamaegis
Copy link
Copy Markdown

Bug

process_image calls img.getparent() and immediately accesses parent.tag without checking whether parent is None:

parent = img.getparent()
if parent.tag in ["button", "input"]:   # ← AttributeError if parent is None
    return None
parent_classes = parent.get("class", "").split()   # ← same crash

In lxml, getparent() returns None for detached elements — which can happen when remove_empty_elements_fast or other tree modifications remove a parent while child <img> elements are still referenced. Any such image passed to process_image will crash:

AttributeError: 'NoneType' object has no attribute 'tag'

Root cause

The companion removal path at the call-site (line ~328) already guards this correctly:

parent = img.getparent()
if parent is not None:
    parent.remove(img)

But process_image itself — where the crash actually lives — does not.

Fix

Add a None check before accessing .tag / .get(). When the image has no parent, treat parent_classes as empty and let the image proceed to normal scoring — there is no parent context to filter on, so the safest default is to include the image.

parent = img.getparent()
if parent is None:
    parent_classes = []
else:
    if parent.tag in ["button", "input"]:
        return None
    parent_classes = parent.get("class", "").split()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant