archaeo_super_prompt.dataset
source package archaeo_super_prompt.dataset
Subpackage for loading all the data.
This cover metadata tables of Magoh's records, but also the sets of thesauri related to some fields.
Classes
-
MagohDataset — Class to interact with the general training/evaluation dataset.
-
SamplingParams — Parametres for sampling records in the training dataset.
Modules
-
thesauri — Code for loading thesaurus sets from data files.
source class MagohDataset(params: IdSet | SamplingParams)
Class to interact with the general training/evaluation dataset.
At the initialisation, fetch the data from the cache or from the remote dataset if needed.
Fetch intervention records from the Magoh's training database.
Parameters
-
params : IdSet | SamplingParams — a set of intervention identifiers to be fetched or a group of sampling params to randomly fetch intervention records
Attributes
-
intervention_data — A DataFrame with the truth metadata of registered records in Magoh.
-
legacy_intervention_data : DataFrame[OutputStructuredDataSchema] — The intervention data in the old schema for the legacy model.
-
findings — Return a dataframe with the fetched findings data.
-
files — Return all the files with their related intervention id.
Methods
-
get_answer — Return the metadata of a magoh record with the given id.
-
filter_good_records_for_training — Return only the ids for which the intervention records match a given condition.
-
get_answers — Return the answers for each of the asked interventions.
-
get_files_for_batch — Return the files only realted to the given intervention ids.
source property MagohDataset.intervention_data
A DataFrame with the truth metadata of registered records in Magoh.
source property MagohDataset.legacy_intervention_data: DataFrame[OutputStructuredDataSchema]
The intervention data in the old schema for the legacy model.
source method MagohDataset.get_answer(id_: InterventionId) → ExtractedStructuredDataSeries
Return the metadata of a magoh record with the given id.
Raises
-
Exception
source method MagohDataset.filter_good_records_for_training(ids: set[InterventionId], condition: Callable[[DataFrame[FeaturedOutputStructureDataSchema]], Series[bool]]) → set[InterventionId]
Return only the ids for which the intervention records match a given condition.
Parameters
-
ids : set[InterventionId] — the set of interventions to select
-
condition : Callable[[DataFrame[FeaturedOutputStructureDataSchema]], Series[bool]] — a function taking the training metadata dataframe and returning a series of boolean to filter the records with unusable values
source method MagohDataset.get_answers(ids: set[InterventionId])
Return the answers for each of the asked interventions.
Raises
-
Exception
source property MagohDataset.findings
Return a dataframe with the fetched findings data.
source method MagohDataset.get_files_for_batch(ids: set[InterventionId])
Return the files only realted to the given intervention ids.
source property MagohDataset.files
Return all the files with their related intervention id.
source class SamplingParams()
Bases : NamedTuple
Parametres for sampling records in the training dataset.
Code for loading thesaurus sets from data files.