archaeo_super_prompt.modeling.entity_extractor.ne_selector
source module archaeo_super_prompt.modeling.entity_extractor.ne_selector
Module for Named Entities Selector class with thesaurus-fuzzymatching.
Classes
-
NeSelector — Filter of chunks according to wanted strings among the entities.
source class NeSelector(field_name: str, compatible_entities: set[NerXXLEntities], wanted_matches: ThesaurusProvider, keep_chunks_without_identified_values=False)
Bases : BaseTransformer
Filter of chunks according to wanted strings among the entities.
Initialize the Named Entity Selector from the data about the field.
Parameters
-
field_name : str — a label describing the entities to be extracted
-
compatible_entities : set[NerXXLEntities] — a set of entity types to consider for selecting the chunks
-
wanted_matches : ThesaurusProvider — a frozen function giving at runtime the list of matches (can be huge)
-
keep_chunks_without_identified_values — if True, the chunks with entities in the desired group of entity types are always kept, even if no thesaurus has been identified among these entities. If False, these chunks are only kept if there is not any chunk where hesaurus has been identified.
Returns
-
A Transformer to select only chunks in which named thesaurus occur.
Methods
-
transform — Filter the identified named entities and filter the chunks.
source method NeSelector.transform(X: DataFrame[ChunksWithEntities]) → DataFrame[ChunksWithThesaurus]
Filter the identified named entities and filter the chunks.
According to the information about the field to be extracted, filter the named entities for each chunk and keep only chunks with a non-empty filtered named-entities list.