Ingest

This guide explains how data from the Covim Biosample metadata form can be ingested into the instance.

import lamindb as ln
import bionty as bt
import lnschema_covim as cv
import pandas as pd
 connected lamindb: anonymous/testdata

Load the Excel form

# We only use the first row for demonstration purposes
df = pd.read_excel("../../CovimBiosampleForm.xlsx", sheet_name="Example").head(1)
df
institute_abbreviation name internal_id files study_id study_name description methodology comments affiliations doi authors sample_id sample_type sample_type_comments date_of_sample_collection timepoint donor_id sex age data_modality experiment_type instrument_type protocol_details sequencing_platform alignment_pipeline reference_genome specificity experiment_description group primary_diagnosis pcr_test_result date_of_pcr_test days_post_symptom_onset days_post_icu_admission severity maximal_severity severity_criteria disease_phase vaccination_status date_of_vaccination immunocompromised_class immunocompromised_condition immunocompromised_category year_last_transplant secondary_infections comorbidities medications status_on_release
0 Site_A Mock 1 1206 Bonn_Mockfile_1.fcs\nBonn_Mockfile_2.fcs\nBonn... COVID-19-IMMUNE-02 A COVID-19 study on smoked mice. This study examines the effect of smoking on m... interventional study The study was done with our new mice. DZNE 10.1038/s41576-023-00586-w Einstein, A.; Theis J., F. SAMPLE-COVID-19-SMOKE-001 plasma Has a bunch of weird tissue. 2025-01-03 00:00:00 2 SMO-420 male 2 single-cell sequencing RNA sequencing iSeq 100 Standard 10X 10X Chromium BWA GRCh38 MALAT1 Just some experiment where we put mice subject... CON U07.2 negative 2024-02-01 20 2.0 1 2 NIH convalescent 1 2024-02-06 immunodeficiency_transplant kidney transplantation Z94.0 2014 A01.0\nA02.1 U07.1\nI51.9\nK83.9 A01AC02\nA01AC03 recovered
cv.list_files_from_biosample_form(df)
✗ Couldn't find the following 3 files:
   /home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/Bonn_Mockfile_1.fcs
   /home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/Bonn_Mockfile_2.fcs
   /home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/Bonn_Mockfile_3.fcs

   → please pass the correct `basedir`
   → or modify the `files` column in biosample form!

Pass the correct basedir if relative paths are specified in the biosample form:

basedir = "../../lnschema_covim/datasets"
cv.list_files_from_biosample_form(df, basedir=basedir)
['../../lnschema_covim/datasets/Bonn_Mockfile_1.fcs',
 '../../lnschema_covim/datasets/Bonn_Mockfile_2.fcs',
 '../../lnschema_covim/datasets/Bonn_Mockfile_3.fcs']

Perform ingestion

Here we show how to ingest files from a mock biosample and link them to the metadata fields.

# Track the current notebook
# Run ln.track() to generate the tracking id
ln.track()
 created Transform('Y4f1ITkdkxo80000'), started new Run('uIdOIwaM...') at 2025-06-20 11:35:32 UTC
 notebook imports: bionty==1.5.0 lamindb==1.6.2 lnschema_covim==0.1.3 pandas==2.3.0
 recommendation: to identify the notebook across renames, pass the uid: ln.track("Y4f1ITkdkxo8")
# Save any new values that are not in the registry yet
cv.Comorbidity(ontology_id="U07.2", name="COVID-19, virus not identified").save()
Comorbidity(uid='3587tO5t', ontology_id='U07.2', name='COVID-19, virus not identified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:33 UTC)
# optionally: pass a custom curate_function
biosample = cv.ingest_markers(
    meta=df,
    basedir=basedir,
)
biosample
 creating relationship records...
! calling anonymously, will miss private instances
 source added!
! record with similar name exists! did you mean to load it?
uid ontology_id name space_id source_id run_id created_at created_by_id _aux branch_id
id
1 3587tO5t U07.2 COVID-19, virus not identified 1 None 1 2025-06-20 11:35:33.272000+00:00 1 None 1
 creating curated files records...
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
    4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
    → curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
    4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
    → curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
    4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
    → curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
 creating biosample record...
Biosample(uid='38gDDpfmHy9B', name='Mock 1', internal_id='1206', institute_abbreviation='Site_A', sample_id='hidden', sample_type='plasma', sample_type_comments='Has a bunch of weird tissue.', data_modality='single-cell sequencing', experiment_type='RNA sequencing', instrument_type='iSeq 100', protocol_details='Standard 10X', sequencing_platform='10X Chromium', alignment_pipeline='BWA', reference_genome='GRCh38', detection_method='not applicable', specificity='MALAT1', experiment_description='Just some experiment where we put mice subject to smoke.', donor_uid='7LjRnBdYD0BA', donor_id='hidden', sex='male', age=2, immunocompromised_class='immunodeficiency_transplant', immunocompromised_condition='kidney transplantation', immunocompromised_category=Comorbidity(uid='5ab5fyj4', ontology_id='Z94.0', name='Kidney transplant status', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC), year_last_transplant='2014', date_of_sample_collection=2025-01-03, timepoint=2, group='CON', pcr_test_result='negative', date_of_pcr_test=2024-02-01, days_post_symptom_onset=20, days_post_icu_admission=2, severity=1, maximal_severity=2, severity_criteria='NIH', disease_phase='convalescent', vaccination_status=1, date_of_vaccination=2024-02-06, status_on_release='recovered', branch_id=1, space_id=1, created_by_id=1, run_id=1, primary_diagnosis_id=1, organism_id=1, created_at=2025-06-20 11:35:38 UTC)

Congratulations, you have ingested your data into the database and you are done! 🎉

If you wish to share metadata with others via the remote immunohub instance (note that data will be kept locally):

# make sure you restart your notebook session
!lamin load covim/immunohub

import lamindb as ln
import lnschema_covim as cv

artifact = ln.Artifact.filter(...).one()
# or loop over artifacts associated with a biosample
# for artifact in biosample.artifacts.all():
cv.transfer_artifact_to_immunohub(artifact)

Confirm data is correctly ingested

You can now check whether the data is correctly ingested.

If you want to learn more on how you can interact with this database, take a look at the general guide of LaminDB.

Check files linked to the biosample:

biosample.artifacts.df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
2 z6nhhqqsAGyJv7DW0000 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 w1tcBwOshHfaAbxq-_5iNA None 52780 md5 True False 1 1 2 None True 1 2025-06-20 11:35:38.027000+00:00 1 None 1
3 8FnCsUY8yhEuxYWt0000 curated/Bonn_Mockfile_2.h5ad None .h5ad dataset AnnData 9705912 mj-XlUwEYYfMokMdhtWBdQ None 74845 md5 True False 1 1 2 None True 1 2025-06-20 11:35:38.505000+00:00 1 None 1
4 olgXgqRpQLa3hhAx0000 curated/Bonn_Mockfile_3.h5ad None .h5ad dataset AnnData 2598016 a6iFbjqFSqvTYom5ZVX3MA None 19338 md5 True False 1 1 2 None True 1 2025-06-20 11:35:38.927000+00:00 1 None 1

Check linked metadata records:

biosample.primary_diagnosis
Comorbidity(uid='3587tO5t', ontology_id='U07.2', name='COVID-19, virus not identified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:33 UTC)
biosample.secondary_infections.list()
[Infection(uid='4xDuGFBE', ontology_id='A01.0', name='Typhoid fever', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:34 UTC),
 Infection(uid='2IY6EfBB', ontology_id='A02.1', name='Salmonella sepsis', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:34 UTC)]
biosample.comorbidities.list()
[Comorbidity(uid='7aBlR4dA', ontology_id='U07.1', name='COVID-19', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:34 UTC),
 Comorbidity(uid='3L1EqfoP', ontology_id='I51.9', name='Heart disease, unspecified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:35 UTC),
 Comorbidity(uid='1ergEUAW', ontology_id='K83.9', name='Disease of biliary tract, unspecified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:35 UTC)]
biosample.medications.list()
[Medication(uid='1mDLWRT4', ontology_id='A01AC02', name='dexamethasone', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:36 UTC),
 Medication(uid='6gEjoyp8', ontology_id='A01AC03', name='hydrocortisone', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)]

Query the database

Query biosamples

biosample = cv.Biosample.filter(name="Mock 1").one()
biosample
Biosample(uid='38gDDpfmHy9B', name='Mock 1', internal_id='1206', institute_abbreviation='Site_A', sample_id='hidden', sample_type='plasma', sample_type_comments='Has a bunch of weird tissue.', data_modality='single-cell sequencing', experiment_type='RNA sequencing', instrument_type='iSeq 100', protocol_details='Standard 10X', sequencing_platform='10X Chromium', alignment_pipeline='BWA', reference_genome='GRCh38', detection_method='not applicable', specificity='MALAT1', experiment_description='Just some experiment where we put mice subject to smoke.', donor_uid='7LjRnBdYD0BA', donor_id='hidden', sex='male', age=2, immunocompromised_class='immunodeficiency_transplant', immunocompromised_condition='kidney transplantation', immunocompromised_category=Comorbidity(uid='5ab5fyj4', ontology_id='Z94.0', name='Kidney transplant status', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC), year_last_transplant='2014', date_of_sample_collection=2025-01-03, timepoint=2, group='CON', pcr_test_result='negative', date_of_pcr_test=2024-02-01, days_post_symptom_onset=20, days_post_icu_admission=2, severity=1, maximal_severity=2, severity_criteria='NIH', disease_phase='convalescent', vaccination_status=1, date_of_vaccination=2024-02-06, status_on_release='recovered', branch_id=1, space_id=1, created_by_id=1, run_id=1, primary_diagnosis_id=1, organism_id=1, created_at=2025-06-20 11:35:38 UTC)

Query data objects

Let’s first query for all the ingested files that are curated:

ln.Artifact.filter(key__startswith="curated/").df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
2 z6nhhqqsAGyJv7DW0000 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 w1tcBwOshHfaAbxq-_5iNA None 52780 md5 True False 1 1 2 None True 1 2025-06-20 11:35:38.027000+00:00 1 None 1
3 8FnCsUY8yhEuxYWt0000 curated/Bonn_Mockfile_2.h5ad None .h5ad dataset AnnData 9705912 mj-XlUwEYYfMokMdhtWBdQ None 74845 md5 True False 1 1 2 None True 1 2025-06-20 11:35:38.505000+00:00 1 None 1
4 olgXgqRpQLa3hhAx0000 curated/Bonn_Mockfile_3.h5ad None .h5ad dataset AnnData 2598016 a6iFbjqFSqvTYom5ZVX3MA None 19338 md5 True False 1 1 2 None True 1 2025-06-20 11:35:38.927000+00:00 1 None 1

Query for a single file by key:

artifact = ln.Artifact.filter(key="curated/Bonn_Mockfile_1.h5ad").one()
artifact
Artifact(uid='z6nhhqqsAGyJv7DW0000', is_latest=True, key='curated/Bonn_Mockfile_1.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=6878592, hash='w1tcBwOshHfaAbxq-_5iNA', n_observations=52780, branch_id=1, space_id=1, storage_id=1, run_id=1, schema_id=2, created_by_id=1, created_at=2025-06-20 11:35:38 UTC)
artifact.describe()
Artifact .h5ad/AnnData
├── General
│   ├── .uid = 'z6nhhqqsAGyJv7DW0000'
│   ├── .key = 'curated/Bonn_Mockfile_1.h5ad'
│   ├── .size = 6878592
│   ├── .hash = 'w1tcBwOshHfaAbxq-_5iNA'
│   ├── .n_observations = 52780
│   ├── .path = 
│   │   /home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/.lamindb/z6nhhqqsAGyJv7DW0000.h5ad
│   ├── .created_by = anonymous
│   ├── .created_at = 2025-06-20 11:35:38
│   └── .transform = 'Ingest'
├── Dataset features
│   └── var1                     [Feature]                                                           
marker                      cat[bionty.CellMarker]     CD11c, CD16, CD1c, CD203c, CD3, CD45, CD…
└── Labels
    └── .biosamples                 covim.Biosample            Mock 1                                   
        .cell_markers               bionty.CellMarker          Cd14, CD66b, Cd19, CD1c, CD203c, CD8, CD…

Access the biosample record linked to this file:

biosample = artifact.biosamples.first()
biosample
Biosample(uid='38gDDpfmHy9B', name='Mock 1', internal_id='1206', institute_abbreviation='Site_A', sample_id='hidden', sample_type='plasma', sample_type_comments='Has a bunch of weird tissue.', data_modality='single-cell sequencing', experiment_type='RNA sequencing', instrument_type='iSeq 100', protocol_details='Standard 10X', sequencing_platform='10X Chromium', alignment_pipeline='BWA', reference_genome='GRCh38', detection_method='not applicable', specificity='MALAT1', experiment_description='Just some experiment where we put mice subject to smoke.', donor_uid='7LjRnBdYD0BA', donor_id='hidden', sex='male', age=2, immunocompromised_class='immunodeficiency_transplant', immunocompromised_condition='kidney transplantation', immunocompromised_category=Comorbidity(uid='5ab5fyj4', ontology_id='Z94.0', name='Kidney transplant status', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC), year_last_transplant='2014', date_of_sample_collection=2025-01-03, timepoint=2, group='CON', pcr_test_result='negative', date_of_pcr_test=2024-02-01, days_post_symptom_onset=20, days_post_icu_admission=2, severity=1, maximal_severity=2, severity_criteria='NIH', disease_phase='convalescent', vaccination_status=1, date_of_vaccination=2024-02-06, status_on_release='recovered', branch_id=1, space_id=1, created_by_id=1, run_id=1, primary_diagnosis_id=1, organism_id=1, created_at=2025-06-20 11:35:38 UTC)

Read in data

Load it into memory as an AnnData object:

Note

load uses readfcs.read under the hood.

adata = artifact.load()
adata
AnnData object with n_obs × n_vars = 52780 × 22
    var: 'n', 'channel', 'marker', 'PnB', 'PnR', 'PnG', 'PnE'
    uns: 'meta'

Query data based on cell markers

cell_markers = bt.CellMarker.lookup()
ln.Artifact.filter(feature_sets__cell_markers=cell_markers.cd8).list()
[]

Update existing biosample records

To update an existing biosample, simply rerun .ingest by passing update=True:

# here we have new versions of the files in another directory
# you can also modify the files column to include the new keys without passing a new basedir: Bonn_Mockfile_v2/Bonn_Mockfile_1.fcs ...
basedir = "../../lnschema_covim/datasets/Bonn_Mockfile_v2"

biosample = cv.ingest_markers(df, basedir=basedir, update=True)
 updating relationship records...
 returning existing Study record with same name: 'A COVID-19 study on smoked mice.'
 returning existing Reference record with same name: 'COVIM study by Einstein, A.; Theis J., F.'
 updating curated files records...
 returning existing Feature record with same name: 'marker'
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
 returning existing schema with same hash: Schema(uid='xDhIaviqoU0AW9Fq', n=-1, is_type=False, itype='Composite', otype='AnnData', dtype='num', hash='W7nL3a2jzF7VnbHONKD5CQ', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
    4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
    → curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
 creating new artifact version for key='curated/Bonn_Mockfile_1.h5ad' (storage: '/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata')
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
    4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
    → curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
 creating new artifact version for key='curated/Bonn_Mockfile_2.h5ad' (storage: '/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata')
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
    4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
    → curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
 creating new artifact version for key='curated/Bonn_Mockfile_3.h5ad' (storage: '/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata')
 returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
 updating biosample record...

You’ll notice that files with the same filenames will be assigned with new version:

biosample.artifacts.df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
2 z6nhhqqsAGyJv7DW0000 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 w1tcBwOshHfaAbxq-_5iNA None 52780 md5 True False 1 1 2 None False 1 2025-06-20 11:35:38.027000+00:00 1 None 1
3 8FnCsUY8yhEuxYWt0000 curated/Bonn_Mockfile_2.h5ad None .h5ad dataset AnnData 9705912 mj-XlUwEYYfMokMdhtWBdQ None 74845 md5 True False 1 1 2 None False 1 2025-06-20 11:35:38.505000+00:00 1 None 1
4 olgXgqRpQLa3hhAx0000 curated/Bonn_Mockfile_3.h5ad None .h5ad dataset AnnData 2598016 a6iFbjqFSqvTYom5ZVX3MA None 19338 md5 True False 1 1 2 None False 1 2025-06-20 11:35:38.927000+00:00 1 None 1
5 z6nhhqqsAGyJv7DW0001 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 NxpH6lNn4oGyyTtoAIWBxw None 52780 md5 True False 1 1 2 None True 1 2025-06-20 11:35:40.108000+00:00 1 None 1
6 8FnCsUY8yhEuxYWt0001 curated/Bonn_Mockfile_2.h5ad None .h5ad dataset AnnData 9705912 V0OTnlrrsy7uAAn70d999A None 74845 md5 True False 1 1 2 None True 1 2025-06-20 11:35:40.596000+00:00 1 None 1
7 olgXgqRpQLa3hhAx0001 curated/Bonn_Mockfile_3.h5ad None .h5ad dataset AnnData 2598016 iRtwuHPK3rQ5O0XfcxAclw None 19338 md5 True False 1 1 2 None True 1 2025-06-20 11:35:41.033000+00:00 1 None 1

Query both versions:

biosample.artifacts.filter(key__endswith="Bonn_Mockfile_1.h5ad").df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
2 z6nhhqqsAGyJv7DW0000 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 w1tcBwOshHfaAbxq-_5iNA None 52780 md5 True False 1 1 2 None False 1 2025-06-20 11:35:38.027000+00:00 1 None 1
5 z6nhhqqsAGyJv7DW0001 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 NxpH6lNn4oGyyTtoAIWBxw None 52780 md5 True False 1 1 2 None True 1 2025-06-20 11:35:40.108000+00:00 1 None 1

Get latest version of each artifact:

biosample.artifacts.all().latest_version().df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
5 z6nhhqqsAGyJv7DW0001 curated/Bonn_Mockfile_1.h5ad None .h5ad dataset AnnData 6878592 NxpH6lNn4oGyyTtoAIWBxw None 52780 md5 True False 1 1 2 None True 1 2025-06-20 11:35:40.108000+00:00 1 None 1
6 8FnCsUY8yhEuxYWt0001 curated/Bonn_Mockfile_2.h5ad None .h5ad dataset AnnData 9705912 V0OTnlrrsy7uAAn70d999A None 74845 md5 True False 1 1 2 None True 1 2025-06-20 11:35:40.596000+00:00 1 None 1
7 olgXgqRpQLa3hhAx0001 curated/Bonn_Mockfile_3.h5ad None .h5ad dataset AnnData 2598016 iRtwuHPK3rQ5O0XfcxAclw None 19338 md5 True False 1 1 2 None True 1 2025-06-20 11:35:41.033000+00:00 1 None 1

Get the latest version of an artifact:

artifact = biosample.artifacts.filter(key__endswith="Bonn_Mockfile_1.h5ad")
artifact.latest_version()
<ArtifactBasicQuerySet [Artifact(uid='z6nhhqqsAGyJv7DW0001', is_latest=True, key='curated/Bonn_Mockfile_1.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=6878592, hash='NxpH6lNn4oGyyTtoAIWBxw', n_observations=52780, branch_id=1, space_id=1, storage_id=1, run_id=1, schema_id=2, created_by_id=1, created_at=2025-06-20 11:35:40 UTC)]>