Skip to main content

NoParser

A no-op parser that returns raw response content without any processing.

This is useful when you only need the raw response data and don't require HTML parsing, link extraction, or content selection functionality.

Hierarchy

Index

Methods

find_links

  • find_links(parsed_content, selector): Iterable[str]
  • Find all links in result using selector.


    Parameters

    • parsed_content: TParseResult

      Parsed HTTP response. Result of parse method.

    • selector: str

      String used to define matching pattern for finding links.

    Returns Iterable[str]

    Iterable of strings that contain found links.

is_blocked

  • Detect if blocked and return BlockedInfo with additional information.

    Default implementation that expects is_matching_selector abstract method to be implemented. Override this method if your parser has different way of blockage detection.


    Parameters

    • parsed_content: TParseResult

      Parsed HTTP response. Result of parse method.

    Returns BlockedInfo

    BlockedInfo object that contains non-empty string description of reason if blockage was detected. Empty string in reason signifies no blockage detected.

is_matching_selector

  • is_matching_selector(parsed_content, selector): bool
  • Find if selector has match in parsed content.


    Parameters

    • parsed_content: TParseResult

      Parsed HTTP response. Result of parse method.

    • selector: str

      String used to define matching pattern.

    Returns bool

    True if selector has match in parsed content.

parse

parse_text

select

  • Use css selector to select page element and return it.


    Parameters

    • parsed_content: TParseResult

      Content where the page element will be located.

    • selector: str

      Css selector used to locate desired html element.

    Returns Sequence[TSelectResult]

    Selected element.