Skip to content

archaeo_super_prompt.modeling.entity_extractor.ne_selector

source module archaeo_super_prompt.modeling.entity_extractor.ne_selector

Module for Named Entities Selector class with thesaurus-fuzzymatching.

Classes

  • NeSelector Filter of chunks according to wanted strings among the entities.

source class NeSelector(field_name: str, compatible_entities: set[NerXXLEntities], wanted_matches: ThesaurusProvider, keep_chunks_without_identified_values=False)

Bases : BaseTransformer

Filter of chunks according to wanted strings among the entities.

Initialize the Named Entity Selector from the data about the field.

Parameters

  • field_name : str a label describing the entities to be extracted

  • compatible_entities : set[NerXXLEntities] a set of entity types to consider for selecting the chunks

  • wanted_matches : ThesaurusProvider a frozen function giving at runtime the list of matches (can be huge)

  • keep_chunks_without_identified_values if True, the chunks with entities in the desired group of entity types are always kept, even if no thesaurus has been identified among these entities. If False, these chunks are only kept if there is not any chunk where hesaurus has been identified.

Returns

  • A Transformer to select only chunks in which named thesaurus occur.

Methods

  • transform Filter the identified named entities and filter the chunks.

source method NeSelector.transform(X: DataFrame[ChunksWithEntities])DataFrame[ChunksWithThesaurus]

Filter the identified named entities and filter the chunks.

According to the information about the field to be extracted, filter the named entities for each chunk and keep only chunks with a non-empty filtered named-entities list.