Skip to the content.
We're collecting data!

FAQs

If your question is not answered here, please don’t hesitate to contact us: aqqua@geomar.de

Information on AqQua
What is a foundation model? A foundation model is a machine-learning model trained at scale, usually with self-supervised methods on broad, multimodal data, that can be adapted to carry out diverse downstream tasks Bommassani et al. 2022. AqQua is a foundational model for plankton computer vision that will be trained using state of the art vision transformers on billions of plankton images from diverse imaging devices. This model will be fine-tuned for the downstream tasks of plankton identification, classification, trait detection, outlier detection and global interpolation of plankton distribution.
How will global interpolation work in detail? You can explicitly choose if you would like to share your data for global interpolation studies within AqQua. We will then also need the volume sampled per image acquisition. We will used boosted regression trees and possibly other machine learning algorithms to learn the global plankton or particle distribution and associated process rates from the AqQua image data. Please see Drago et al. 2020 and Clements et al. (2022, 2023) for further details.
How is AqQua funded? AqQua is funded via the Helmholtz Foundation Model Initiative. It is a one-shot endeavour to collect the data and build the foundation model. The project is funded for three years.
Data Collection
What kind of data are you looking for? We’re gathering images of marine and freshwater zooplankton and phytoplankton. All kinds of labels/identification are welcome but optional, as we’re using self-supervised learning for training our foundational model, which does not require labels.
What do I gain from sharing data with you? By sharing data with us for model development, you contribute to the diversity of the AqQua dataset and increase the chances that the developed model will be particularly useful to the kind of data that you are working with. Every data contributor will be co-author on a joint dataset paper and invited to contribute to further publications.
I have millions of images, do you want them all? Yes, we try to gather all existing plankton images, as the foundation model requires as much image data from diverse regions and imaging devices as possible.
What will happen to the data that is shared with you? We will build the AqQua Dataset by bringing together data from thousands of individual sources, a suite of different imaging devices, and from across diverse habitats. The AqQua Dataset will be published under an open-access license earliest in July 2027. Every data contribution will be duly acknowledged and every data contributor will be co-author on a joint dataset paper. Using the AqQua Dataset, we will train a foundational model and fine-tune it for multiple downstream tasks, including classification, trait extraction, and global interpolation of plankton and particle distribution. The developed code, models, and tools will be made open source and shared with the plankton imaging community to help with plankton image recognition tasks and to support further method development. For example, this could include contributing a generalist image recognition model to EcoTaxa.
Are you only interested in data with validated annotations? No! Annotations are welcome but strictly optional as we’re using self-supervised learning for training our foundational model. This does not require labels.
Although I am the contact person of a project, it is not my decision to make if the data can be shared. How do I proceed?

You don’t have to make the decision yourself. Check with the principal investigator, data owner, or other relevant stakeholders before proceeding. Then, let us know.
Also, if your data is hosted on EcoTaxa, please make sure that you are correctly listed as the contact person of a project. If not, select the correct person in the EcoTaxa project settings:

  • In the menu, select “Project / Edit project settings”.
  • In the “Priviliges” tab, select the correct person as contact.
  • Click “Save”.

Data Transfer
How can I transfer my data? We support a number of different transfer methods. If you are unsure, please contact us and we will work together with you to determine the best option for your data. The optimal method depends largely on the size of the data. If the data is already externally accessible, you can just provide us with access to the existing location. Please inform us about your preferred method during the data sharing form submission. The suggestions below are purely to support you in your choice, other options are always possible.
My data is larger than ~200GB For such large datasets we recommend Globus or using an FTP server. Please contact your IT department to find out if your institute provides a Globus instance and for information on how to set up a data share. Once set up, it allows for easy upload and download of terabyte-scale datasets. If Globus is not available we recommend using an FTP server. If you don't have one available, please contact us for access to our own FTP server. Transfers to an FTP server can be continued after an interruption without having to start from scratch. An alternative but still valid option would even be to send us a physical hard drive.
My data is larger than ~20GB and smaller than ~200GB For datasets of this size we suggest GigaMove. This service allows one to upload files of up to 100GB and share access via a simple link.
My data is smaller than ~20GB For datasets of this size we suggest to either use one of the options listed before or to use a cloud based storage system such as google drive, dropbox or nextcloud.
My data is already on EcoTaxa, how can I share it with you? If your data is already on EcoTaxa, you can share it with us by simply adding the aqqua@geomar.de user with view permissions to your project. This will enable us to download your data. We will inform you once we have downloaded your data, so that you can revoke access, if you would like to.
I have many projects on EcoTaxa that I would like to share. Is there something quicker than adding the AqQua user manually? You can download these Python scripts that use the EcoTaxa API to access your EcoTaxa projects. There are two scripts. The first one generates a list of all your projects. You can use this list during the data sharing form submission The second script helps you to easily add the AqQua user to a subset of your projects. You can at any time, should you wish to, change the access rights for multiple projects in bulk via the EcoTaxa API.