Installation
We describe in this section all the steps to install a development workspace for developing with the extraction.
1. Environment requirements
In your environment you need the following dependencies (cf. the next section for a rootless, isolated installation of this environment):
python(>= 3.13)poetry(>=2.1)graphvizjupyter-notebookorjupyter-lab, with the python packagepoetry-kerneljustfor running the main commands
Recommended setup with conda
The easiest way to get this environment in your user session on your machine is creating a conda environment from either
- the
environment.ymlfile:
$CONDA env create --prefix ./arch-env -f environment.yml
- the following command, to get the last versions of the dependencies:
$CONDA create --prefix ./arch-env python=3.13 poetry graphviz jupyterlab \
poetry-kernel just nbdime nbstripout
Then work inside this environment with the following command:
$CONDA activate ./arch-env
2. Project's dependencies
just install
For developing with the notebooks, run also this rule (this enables stripout of the cells' outputs for the git diffs and also the nbdiff):
just init-nb-git-workspace
3. Required data
Remote databases
- A postgresql with the following tables must be available:
- featured_intervention_data
- findings
- A minio file store must also be available.
To set the credentials to these databases, fill a .env file from the
.env.example file.
Local csv files
The following csv files must be present in the the data/raw directory
data/raw/
└── thesaurus
├── comune.csv
└── provincie.csv
The keys of these files must be at least the following:
comune.csv: id (int), id_com (int), nome (str), provincia (int)provincie.csv: id_prov (int), nome (str), sigla (char[2])
4. Remote AI models
According to the components you use in the extraction model, you will need AI models to be run in other processes. The program fetches them through HTTP requests.
See the models/README.md file to set those models.
Once you can connect to them, set those environment variables if needed:
NER_MODEL_HOST_URLVLM_HOST_URLVLLM_SERVER_BASE_URLorOLLAMA_SERVER_BASE_URLorOPENAI_API_KEY