Skip to content

archaeo_super_prompt.dataset

source package archaeo_super_prompt.dataset

Subpackage for loading all the data.

This cover metadata tables of Magoh's records, but also the sets of thesauri related to some fields.

Classes

  • MagohDataset Class to interact with the general training/evaluation dataset.

  • SamplingParams Parametres for sampling records in the training dataset.

Modules

  • thesauri Code for loading thesaurus sets from data files.

source class MagohDataset(params: IdSet | SamplingParams)

Class to interact with the general training/evaluation dataset.

At the initialisation, fetch the data from the cache or from the remote dataset if needed.

Fetch intervention records from the Magoh's training database.

Parameters

  • params : IdSet | SamplingParams a set of intervention identifiers to be fetched or a group of sampling params to randomly fetch intervention records

Attributes

  • intervention_data A DataFrame with the truth metadata of registered records in Magoh.

  • legacy_intervention_data : DataFrame[OutputStructuredDataSchema] The intervention data in the old schema for the legacy model.

  • findings Return a dataframe with the fetched findings data.

  • files Return all the files with their related intervention id.

Methods

source property MagohDataset.intervention_data

A DataFrame with the truth metadata of registered records in Magoh.

source property MagohDataset.legacy_intervention_data: DataFrame[OutputStructuredDataSchema]

The intervention data in the old schema for the legacy model.

source method MagohDataset.get_answer(id_: InterventionId)ExtractedStructuredDataSeries

Return the metadata of a magoh record with the given id.

Raises

  • Exception

source method MagohDataset.filter_good_records_for_training(ids: set[InterventionId], condition: Callable[[DataFrame[FeaturedOutputStructureDataSchema]], Series[bool]])set[InterventionId]

Return only the ids for which the intervention records match a given condition.

Parameters

  • ids : set[InterventionId] the set of interventions to select

  • condition : Callable[[DataFrame[FeaturedOutputStructureDataSchema]], Series[bool]] a function taking the training metadata dataframe and returning a series of boolean to filter the records with unusable values

source method MagohDataset.get_answers(ids: set[InterventionId])

Return the answers for each of the asked interventions.

Raises

  • Exception

source property MagohDataset.findings

Return a dataframe with the fetched findings data.

source method MagohDataset.get_files_for_batch(ids: set[InterventionId])

Return the files only realted to the given intervention ids.

source property MagohDataset.files

Return all the files with their related intervention id.

source class SamplingParams()

Bases : NamedTuple

Parametres for sampling records in the training dataset.

source package thesauri

Code for loading thesaurus sets from data files.