archaeo_super_prompt.modeling.entity_extractor.model
source module archaeo_super_prompt.modeling.entity_extractor.model
Core functions for inferring and filtering named entities in chunks.
Functions
-
fetch_entities — Infer into the remote NER model to find named entities in each chunk.
-
gatherEntityChunks — Gather the chunk of entity output from one text chunk.
-
postrocess_entities — Return a set of the occured entities for each chunks.
-
filter_entities — For each text chunk, keep only the entities included in the given group of allowed entity types.
source fetch_entities(chunks: list[str])
Infer into the remote NER model to find named entities in each chunk.
source gatherEntityChunks(entity_chunks: list[NerOutput], confidence_treshold: float)
Gather the chunk of entity output from one text chunk.
source postrocess_entities(entitiesPerTextChunk: list[list[NerOutput]], confidence_treshold: float)
Return a set of the occured entities for each chunks.
Parameters
-
entitiesPerTextChunk : list[list[NerOutput]] — for each chunk, a list of its retrieved entities ordered by their occurence in the chunk's text content
-
confidence_treshold : float — a treshold between 0 and 1 to tolerate only a subset of entities
source filter_entities(complete_entity_sets: list[list[CompleteEntity]], allowed_entities: set[NerXXLEntities]) → list[list[CompleteEntity]]
For each text chunk, keep only the entities included in the given group of allowed entity types.