Ingest¶
This guide explains how data from the Covim Biosample metadata form can be ingested into the instance.
import lamindb as ln
import bionty as bt
import lnschema_covim as cv
import pandas as pd
→ connected lamindb: anonymous/testdata
Load the Excel form¶
# We only use the first row for demonstration purposes
df = pd.read_excel("../../CovimBiosampleForm.xlsx", sheet_name="Example").head(1)
df
| institute_abbreviation | name | internal_id | files | study_id | study_name | description | methodology | comments | affiliations | doi | authors | sample_id | sample_type | sample_type_comments | date_of_sample_collection | timepoint | donor_id | sex | age | data_modality | experiment_type | instrument_type | protocol_details | sequencing_platform | alignment_pipeline | reference_genome | specificity | experiment_description | group | primary_diagnosis | pcr_test_result | date_of_pcr_test | days_post_symptom_onset | days_post_icu_admission | severity | maximal_severity | severity_criteria | disease_phase | vaccination_status | date_of_vaccination | immunocompromised_class | immunocompromised_condition | immunocompromised_category | year_last_transplant | secondary_infections | comorbidities | medications | status_on_release | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Site_A | Mock 1 | 1206 | Bonn_Mockfile_1.fcs\nBonn_Mockfile_2.fcs\nBonn... | COVID-19-IMMUNE-02 | A COVID-19 study on smoked mice. | This study examines the effect of smoking on m... | interventional study | The study was done with our new mice. | DZNE | 10.1038/s41576-023-00586-w | Einstein, A.; Theis J., F. | SAMPLE-COVID-19-SMOKE-001 | plasma | Has a bunch of weird tissue. | 2025-01-03 00:00:00 | 2 | SMO-420 | male | 2 | single-cell sequencing | RNA sequencing | iSeq 100 | Standard 10X | 10X Chromium | BWA | GRCh38 | MALAT1 | Just some experiment where we put mice subject... | CON | U07.2 | negative | 2024-02-01 | 20 | 2.0 | 1 | 2 | NIH | convalescent | 1 | 2024-02-06 | immunodeficiency_transplant | kidney transplantation | Z94.0 | 2014 | A01.0\nA02.1 | U07.1\nI51.9\nK83.9 | A01AC02\nA01AC03 | recovered |
cv.list_files_from_biosample_form(df)
✗ Couldn't find the following 3 files:
/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/Bonn_Mockfile_1.fcs
/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/Bonn_Mockfile_2.fcs
/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/Bonn_Mockfile_3.fcs
→ please pass the correct `basedir`
→ or modify the `files` column in biosample form!
Pass the correct basedir if relative paths are specified in the biosample form:
basedir = "../../lnschema_covim/datasets"
cv.list_files_from_biosample_form(df, basedir=basedir)
['../../lnschema_covim/datasets/Bonn_Mockfile_1.fcs',
'../../lnschema_covim/datasets/Bonn_Mockfile_2.fcs',
'../../lnschema_covim/datasets/Bonn_Mockfile_3.fcs']
Perform ingestion¶
Here we show how to ingest files from a mock biosample and link them to the metadata fields.
# Track the current notebook
# Run ln.track() to generate the tracking id
ln.track()
→ created Transform('Y4f1ITkdkxo80000'), started new Run('uIdOIwaM...') at 2025-06-20 11:35:32 UTC
→ notebook imports: bionty==1.5.0 lamindb==1.6.2 lnschema_covim==0.1.3 pandas==2.3.0
• recommendation: to identify the notebook across renames, pass the uid: ln.track("Y4f1ITkdkxo8")
# Save any new values that are not in the registry yet
cv.Comorbidity(ontology_id="U07.2", name="COVID-19, virus not identified").save()
Comorbidity(uid='3587tO5t', ontology_id='U07.2', name='COVID-19, virus not identified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:33 UTC)
# optionally: pass a custom curate_function
biosample = cv.ingest_markers(
meta=df,
basedir=basedir,
)
biosample
→ creating relationship records...
! calling anonymously, will miss private instances
→ source added!
! record with similar name exists! did you mean to load it?
| uid | ontology_id | name | space_id | source_id | run_id | created_at | created_by_id | _aux | branch_id | |
|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||
| 1 | 3587tO5t | U07.2 | COVID-19, virus not identified | 1 | None | 1 | 2025-06-20 11:35:33.272000+00:00 | 1 | None | 1 |
→ creating curated files records...
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
→ curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
→ curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
→ curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
→ creating biosample record...
Biosample(uid='38gDDpfmHy9B', name='Mock 1', internal_id='1206', institute_abbreviation='Site_A', sample_id='hidden', sample_type='plasma', sample_type_comments='Has a bunch of weird tissue.', data_modality='single-cell sequencing', experiment_type='RNA sequencing', instrument_type='iSeq 100', protocol_details='Standard 10X', sequencing_platform='10X Chromium', alignment_pipeline='BWA', reference_genome='GRCh38', detection_method='not applicable', specificity='MALAT1', experiment_description='Just some experiment where we put mice subject to smoke.', donor_uid='7LjRnBdYD0BA', donor_id='hidden', sex='male', age=2, immunocompromised_class='immunodeficiency_transplant', immunocompromised_condition='kidney transplantation', immunocompromised_category=Comorbidity(uid='5ab5fyj4', ontology_id='Z94.0', name='Kidney transplant status', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC), year_last_transplant='2014', date_of_sample_collection=2025-01-03, timepoint=2, group='CON', pcr_test_result='negative', date_of_pcr_test=2024-02-01, days_post_symptom_onset=20, days_post_icu_admission=2, severity=1, maximal_severity=2, severity_criteria='NIH', disease_phase='convalescent', vaccination_status=1, date_of_vaccination=2024-02-06, status_on_release='recovered', branch_id=1, space_id=1, created_by_id=1, run_id=1, primary_diagnosis_id=1, organism_id=1, created_at=2025-06-20 11:35:38 UTC)
Congratulations, you have ingested your data into the database and you are done! 🎉
If you wish to share metadata with others via the remote immunohub instance (note that data will be kept locally):
# make sure you restart your notebook session
!lamin load covim/immunohub
import lamindb as ln
import lnschema_covim as cv
artifact = ln.Artifact.filter(...).one()
# or loop over artifacts associated with a biosample
# for artifact in biosample.artifacts.all():
cv.transfer_artifact_to_immunohub(artifact)
Confirm data is correctly ingested¶
You can now check whether the data is correctly ingested.
If you want to learn more on how you can interact with this database, take a look at the general guide of LaminDB.
Check files linked to the biosample:
biosample.artifacts.df()
| uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||||
| 2 | z6nhhqqsAGyJv7DW0000 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | w1tcBwOshHfaAbxq-_5iNA | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:38.027000+00:00 | 1 | None | 1 |
| 3 | 8FnCsUY8yhEuxYWt0000 | curated/Bonn_Mockfile_2.h5ad | None | .h5ad | dataset | AnnData | 9705912 | mj-XlUwEYYfMokMdhtWBdQ | None | 74845 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:38.505000+00:00 | 1 | None | 1 |
| 4 | olgXgqRpQLa3hhAx0000 | curated/Bonn_Mockfile_3.h5ad | None | .h5ad | dataset | AnnData | 2598016 | a6iFbjqFSqvTYom5ZVX3MA | None | 19338 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:38.927000+00:00 | 1 | None | 1 |
Check linked metadata records:
biosample.primary_diagnosis
Comorbidity(uid='3587tO5t', ontology_id='U07.2', name='COVID-19, virus not identified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:33 UTC)
biosample.secondary_infections.list()
[Infection(uid='4xDuGFBE', ontology_id='A01.0', name='Typhoid fever', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:34 UTC),
Infection(uid='2IY6EfBB', ontology_id='A02.1', name='Salmonella sepsis', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:34 UTC)]
biosample.comorbidities.list()
[Comorbidity(uid='7aBlR4dA', ontology_id='U07.1', name='COVID-19', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:34 UTC),
Comorbidity(uid='3L1EqfoP', ontology_id='I51.9', name='Heart disease, unspecified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:35 UTC),
Comorbidity(uid='1ergEUAW', ontology_id='K83.9', name='Disease of biliary tract, unspecified', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:35 UTC)]
biosample.medications.list()
[Medication(uid='1mDLWRT4', ontology_id='A01AC02', name='dexamethasone', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:36 UTC),
Medication(uid='6gEjoyp8', ontology_id='A01AC03', name='hydrocortisone', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)]
Query the database¶
Query biosamples¶
biosample = cv.Biosample.filter(name="Mock 1").one()
biosample
Biosample(uid='38gDDpfmHy9B', name='Mock 1', internal_id='1206', institute_abbreviation='Site_A', sample_id='hidden', sample_type='plasma', sample_type_comments='Has a bunch of weird tissue.', data_modality='single-cell sequencing', experiment_type='RNA sequencing', instrument_type='iSeq 100', protocol_details='Standard 10X', sequencing_platform='10X Chromium', alignment_pipeline='BWA', reference_genome='GRCh38', detection_method='not applicable', specificity='MALAT1', experiment_description='Just some experiment where we put mice subject to smoke.', donor_uid='7LjRnBdYD0BA', donor_id='hidden', sex='male', age=2, immunocompromised_class='immunodeficiency_transplant', immunocompromised_condition='kidney transplantation', immunocompromised_category=Comorbidity(uid='5ab5fyj4', ontology_id='Z94.0', name='Kidney transplant status', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC), year_last_transplant='2014', date_of_sample_collection=2025-01-03, timepoint=2, group='CON', pcr_test_result='negative', date_of_pcr_test=2024-02-01, days_post_symptom_onset=20, days_post_icu_admission=2, severity=1, maximal_severity=2, severity_criteria='NIH', disease_phase='convalescent', vaccination_status=1, date_of_vaccination=2024-02-06, status_on_release='recovered', branch_id=1, space_id=1, created_by_id=1, run_id=1, primary_diagnosis_id=1, organism_id=1, created_at=2025-06-20 11:35:38 UTC)
Query data objects¶
Let’s first query for all the ingested files that are curated:
ln.Artifact.filter(key__startswith="curated/").df()
| uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||||
| 2 | z6nhhqqsAGyJv7DW0000 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | w1tcBwOshHfaAbxq-_5iNA | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:38.027000+00:00 | 1 | None | 1 |
| 3 | 8FnCsUY8yhEuxYWt0000 | curated/Bonn_Mockfile_2.h5ad | None | .h5ad | dataset | AnnData | 9705912 | mj-XlUwEYYfMokMdhtWBdQ | None | 74845 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:38.505000+00:00 | 1 | None | 1 |
| 4 | olgXgqRpQLa3hhAx0000 | curated/Bonn_Mockfile_3.h5ad | None | .h5ad | dataset | AnnData | 2598016 | a6iFbjqFSqvTYom5ZVX3MA | None | 19338 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:38.927000+00:00 | 1 | None | 1 |
Query for a single file by key:
artifact = ln.Artifact.filter(key="curated/Bonn_Mockfile_1.h5ad").one()
artifact
Artifact(uid='z6nhhqqsAGyJv7DW0000', is_latest=True, key='curated/Bonn_Mockfile_1.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=6878592, hash='w1tcBwOshHfaAbxq-_5iNA', n_observations=52780, branch_id=1, space_id=1, storage_id=1, run_id=1, schema_id=2, created_by_id=1, created_at=2025-06-20 11:35:38 UTC)
artifact.describe()
Artifact .h5ad/AnnData ├── General │ ├── .uid = 'z6nhhqqsAGyJv7DW0000' │ ├── .key = 'curated/Bonn_Mockfile_1.h5ad' │ ├── .size = 6878592 │ ├── .hash = 'w1tcBwOshHfaAbxq-_5iNA' │ ├── .n_observations = 52780 │ ├── .path = │ │ /home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata/.lamindb/z6nhhqqsAGyJv7DW0000.h5ad │ ├── .created_by = anonymous │ ├── .created_at = 2025-06-20 11:35:38 │ └── .transform = 'Ingest' ├── Dataset features │ └── var • 1 [Feature] │ marker cat[bionty.CellMarker] CD11c, CD16, CD1c, CD203c, CD3, CD45, CD… └── Labels └── .biosamples covim.Biosample Mock 1 .cell_markers bionty.CellMarker Cd14, CD66b, Cd19, CD1c, CD203c, CD8, CD…
Access the biosample record linked to this file:
biosample = artifact.biosamples.first()
biosample
Biosample(uid='38gDDpfmHy9B', name='Mock 1', internal_id='1206', institute_abbreviation='Site_A', sample_id='hidden', sample_type='plasma', sample_type_comments='Has a bunch of weird tissue.', data_modality='single-cell sequencing', experiment_type='RNA sequencing', instrument_type='iSeq 100', protocol_details='Standard 10X', sequencing_platform='10X Chromium', alignment_pipeline='BWA', reference_genome='GRCh38', detection_method='not applicable', specificity='MALAT1', experiment_description='Just some experiment where we put mice subject to smoke.', donor_uid='7LjRnBdYD0BA', donor_id='hidden', sex='male', age=2, immunocompromised_class='immunodeficiency_transplant', immunocompromised_condition='kidney transplantation', immunocompromised_category=Comorbidity(uid='5ab5fyj4', ontology_id='Z94.0', name='Kidney transplant status', branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC), year_last_transplant='2014', date_of_sample_collection=2025-01-03, timepoint=2, group='CON', pcr_test_result='negative', date_of_pcr_test=2024-02-01, days_post_symptom_onset=20, days_post_icu_admission=2, severity=1, maximal_severity=2, severity_criteria='NIH', disease_phase='convalescent', vaccination_status=1, date_of_vaccination=2024-02-06, status_on_release='recovered', branch_id=1, space_id=1, created_by_id=1, run_id=1, primary_diagnosis_id=1, organism_id=1, created_at=2025-06-20 11:35:38 UTC)
Read in data¶
Load it into memory as an AnnData object:
Note
load uses readfcs.read under the hood.
adata = artifact.load()
adata
AnnData object with n_obs × n_vars = 52780 × 22
var: 'n', 'channel', 'marker', 'PnB', 'PnR', 'PnG', 'PnE'
uns: 'meta'
Query data based on cell markers¶
cell_markers = bt.CellMarker.lookup()
ln.Artifact.filter(feature_sets__cell_markers=cell_markers.cd8).list()
[]
Update existing biosample records¶
To update an existing biosample, simply rerun .ingest by passing update=True:
# here we have new versions of the files in another directory
# you can also modify the files column to include the new keys without passing a new basedir: Bonn_Mockfile_v2/Bonn_Mockfile_1.fcs ...
basedir = "../../lnschema_covim/datasets/Bonn_Mockfile_v2"
biosample = cv.ingest_markers(df, basedir=basedir, update=True)
→ updating relationship records...
→ returning existing Study record with same name: 'A COVID-19 study on smoked mice.'
→ returning existing Reference record with same name: 'COVIM study by Einstein, A.; Theis J., F.'
→ updating curated files records...
→ returning existing Feature record with same name: 'marker'
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
→ returning existing schema with same hash: Schema(uid='xDhIaviqoU0AW9Fq', n=-1, is_type=False, itype='Composite', otype='AnnData', dtype='num', hash='W7nL3a2jzF7VnbHONKD5CQ', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
→ curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
→ creating new artifact version for key='curated/Bonn_Mockfile_1.h5ad' (storage: '/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata')
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
→ curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
→ creating new artifact version for key='curated/Bonn_Mockfile_2.h5ad' (storage: '/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata')
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
! using default organism = human
! using default organism = human
! using default organism = human
! 4 terms not validated in feature 'marker' in slot 'var': 'CD14', 'CD19', 'CD4', 'HLA-DR'
4 synonyms found: "CD14" → "Cd14", "CD19" → "Cd19", "CD4" → "Cd4", "HLA-DR" → "HLADR"
→ curate synonyms via: .standardize("marker")
! using default organism = human
! using default organism = human
! using default organism = human
→ creating new artifact version for key='curated/Bonn_Mockfile_3.h5ad' (storage: '/home/runner/work/lnschema-covim/lnschema-covim/docs/guide/testdata')
→ returning existing schema with same hash: Schema(uid='7Be8O9pNRZNdgzsM', n=1, is_type=False, itype='Feature', hash='LaIQq6vJLW1jLqWNdjfMIw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-06-20 11:35:37 UTC)
→ updating biosample record...
You’ll notice that files with the same filenames will be assigned with new version:
biosample.artifacts.df()
| uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||||
| 2 | z6nhhqqsAGyJv7DW0000 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | w1tcBwOshHfaAbxq-_5iNA | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | False | 1 | 2025-06-20 11:35:38.027000+00:00 | 1 | None | 1 |
| 3 | 8FnCsUY8yhEuxYWt0000 | curated/Bonn_Mockfile_2.h5ad | None | .h5ad | dataset | AnnData | 9705912 | mj-XlUwEYYfMokMdhtWBdQ | None | 74845 | md5 | True | False | 1 | 1 | 2 | None | False | 1 | 2025-06-20 11:35:38.505000+00:00 | 1 | None | 1 |
| 4 | olgXgqRpQLa3hhAx0000 | curated/Bonn_Mockfile_3.h5ad | None | .h5ad | dataset | AnnData | 2598016 | a6iFbjqFSqvTYom5ZVX3MA | None | 19338 | md5 | True | False | 1 | 1 | 2 | None | False | 1 | 2025-06-20 11:35:38.927000+00:00 | 1 | None | 1 |
| 5 | z6nhhqqsAGyJv7DW0001 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | NxpH6lNn4oGyyTtoAIWBxw | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:40.108000+00:00 | 1 | None | 1 |
| 6 | 8FnCsUY8yhEuxYWt0001 | curated/Bonn_Mockfile_2.h5ad | None | .h5ad | dataset | AnnData | 9705912 | V0OTnlrrsy7uAAn70d999A | None | 74845 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:40.596000+00:00 | 1 | None | 1 |
| 7 | olgXgqRpQLa3hhAx0001 | curated/Bonn_Mockfile_3.h5ad | None | .h5ad | dataset | AnnData | 2598016 | iRtwuHPK3rQ5O0XfcxAclw | None | 19338 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:41.033000+00:00 | 1 | None | 1 |
Query both versions:
biosample.artifacts.filter(key__endswith="Bonn_Mockfile_1.h5ad").df()
| uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||||
| 2 | z6nhhqqsAGyJv7DW0000 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | w1tcBwOshHfaAbxq-_5iNA | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | False | 1 | 2025-06-20 11:35:38.027000+00:00 | 1 | None | 1 |
| 5 | z6nhhqqsAGyJv7DW0001 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | NxpH6lNn4oGyyTtoAIWBxw | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:40.108000+00:00 | 1 | None | 1 |
Get latest version of each artifact:
biosample.artifacts.all().latest_version().df()
| uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||||
| 5 | z6nhhqqsAGyJv7DW0001 | curated/Bonn_Mockfile_1.h5ad | None | .h5ad | dataset | AnnData | 6878592 | NxpH6lNn4oGyyTtoAIWBxw | None | 52780 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:40.108000+00:00 | 1 | None | 1 |
| 6 | 8FnCsUY8yhEuxYWt0001 | curated/Bonn_Mockfile_2.h5ad | None | .h5ad | dataset | AnnData | 9705912 | V0OTnlrrsy7uAAn70d999A | None | 74845 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:40.596000+00:00 | 1 | None | 1 |
| 7 | olgXgqRpQLa3hhAx0001 | curated/Bonn_Mockfile_3.h5ad | None | .h5ad | dataset | AnnData | 2598016 | iRtwuHPK3rQ5O0XfcxAclw | None | 19338 | md5 | True | False | 1 | 1 | 2 | None | True | 1 | 2025-06-20 11:35:41.033000+00:00 | 1 | None | 1 |
Get the latest version of an artifact:
artifact = biosample.artifacts.filter(key__endswith="Bonn_Mockfile_1.h5ad")
artifact.latest_version()
<ArtifactBasicQuerySet [Artifact(uid='z6nhhqqsAGyJv7DW0001', is_latest=True, key='curated/Bonn_Mockfile_1.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=6878592, hash='NxpH6lNn4oGyyTtoAIWBxw', n_observations=52780, branch_id=1, space_id=1, storage_id=1, run_id=1, schema_id=2, created_by_id=1, created_at=2025-06-20 11:35:40 UTC)]>