If your question is not answered here, please don’t hesitate to contact us: aqqua@geomar.de
What kind of data are you looking for?
We’re gathering images of marine and freshwater zooplankton and phytoplankton. All kinds of labels/identification are welcome but optional, as we’re using self-supervised learning for training our foundational model, which does not require labels.I have millions of images, do you want them all?
Yes, we try to gather all existing plankton images, as the foundation model requires as much image data from diverse regions and imaging devices as possible.What is a foundation model?
A foundation model is a machine-learning model trained at scale, usually with self-supervised methods on broad, multimodal data, that can be adapted to carry out diverse downstream tasks Bommassani et al. 2022. AqQua is a foundational model for plankton computer vision that will be trained using state of the art vision transformers on billions of plankton images from diverse imaging devices. This model will be fine-tuned for the downstream tasks of plankton identification, classification, trait detection, outlier detection and global interpolation of plankton distribution.How can I transfer my data?
If your data is on Ecotaxa, you can provide us with view access (see next question for details). If you use web hosting services (AZURE, GLOBUS, ...), you can share your dataset via these. We also can download image data from IFCB dashboards. You could also send us a harddrive. Other options (e.g. FTP, ...) would also be possible. Please inform us about your preferred method via the Data Sharing Form.My data is on EcoTaxa, how can I share it with you?
Please fill in the Data Sharing Form and also provide view access to the aqqua@geomar.de user on EcoTaxa. This enables us to download your data. We will inform you once we have downloaded your data, so that you can revoke access, if you would like to.I have many projects on EcoTaxa that I would like to share. Is there something quicker than adding the AqQua user manually?
You can download these Python scripts that use the EcoTaxa API to generate a list of your EcoTaxa projects. You can at any time, should you wish to, change the access rights for multiple projects in bulk via the EcoTaxa API.How is AqQua funded?
AqQua is funded via the Helmholtz Foundation Model Initiative. It is a one-shot endeavour to collect the data and build the foundation model. The project is funded for three years.What will happen to the data that is shared with you?
We will build the AqQua Dataset by bringing together data from thousands of individual sources, a suite of different imaging devices, and from across diverse habitats. The AqQua Dataset will be published under an open-access license earliest in July 2027. Every data contribution will be duly acknowledged and every data contributor will be co-author on a joint dataset paper. Using the AqQua Dataset, we will train a foundational model and fine-tune it for multiple downstream tasks, including classification, trait extraction, and global interpolation of plankton and particle distribution. The developed code, models, and tools will be made open source and shared with the plankton imaging community to help with plankton image recognition tasks and to support further method development. For example, this could include contributing a generalist image recognition model to EcoTaxa.How will global interpolation work in detail?
You can explicitly choose if you would like to share your data for global interpolation studies within AqQua. We will then also need the volume sampled per image acquisition. We will used boosted regression trees and possibly other machine learning algorithms to learn the global plankton or particle distribution and associated process rates from the AqQua image data. Please see Drago et al. 2020 and Clements et al. (2022, 2023) for further details.What do I gain from sharing data with you?
By sharing data with us for model development, you contribute to the diversity of the AqQua dataset and increase the chances that the developed model will be particularly useful to the kind of data that you are working with. Every data contributor will be co-author on a joint dataset paper and invited to contribute to further publications.Can I only contribute data with validated annotations?
All kinds of labels/identification are welcome but optional, as we’re using self-supervised learning for training our foundational model, which does not require labels.Although I am the contact person of a project, it is not my decision to make if the data can be shared. How do I proceed?
You don’t have to make the decision yourself. Check with the principal investigator, data owner, or other relevant stakeholders before proceeding. Then, let us know.
Also, if your data is hosted on EcoTaxa, please make sure that you are correctly listed as the contact person of a project. If not, select the correct person in the EcoTaxa project settings:
- In the menu, select “Project / Edit project settings”.
- In the “Priviliges” tab, select the correct person as contact.
- Click “Save”.