Add your own computation as a process
Even if the process is running, it may be deleted during platform maintenance. We strongly recommend saving the process configuration at the time of creation using the “Save” button to facilitate restart in case of an outage, and saving your code to a remote repository or in your “My Files” space. The process content will use resources from your temporary storage, not the permanent storage corresponding to the content in “My Files”.
The following content is a copy of the Process playground README.
Process helm charts playground
What is a process ?
In order to have a broad compatibility, we chose to follow standard, and in the case of computation the OGC API - process corresponded to our needs. Their definition of a process is the following “The OGC API - Processes standard supports the wrapping of computational tasks into executable processes that can be offered by a server through a Web API and be invoked by a client application. The standard specifies a processing interface to communicate over a RESTful protocol using JavaScript Object Notation (JSON) encodings. Typically, these processes execute well-defined algorithms that ingest vector and/or coverage data to produce new datasets.”
Contribution steps
To add any process to our platform, you will have to make a merge request to this repositories.
⚠️ This is a development repository: A chart older than a week will be automatically deleted. To pass your process into production, please see Create a merge request
This tutorial provides guidelines to create your own helm chart in order to deploy your process on the EDITO datalab. Do not hesitate to look at other charts to get inspired. In our example we use Python but any language can be used to create a process. Please adapt the python references to the other languages if needed.
First thing first, you will need to have a container image hosted on a public repository. This image should run a container exposing environment variables to determine inputs and outputs location, if they are needed. We strongly recommand to follow the 12-factor app methodology.
In this tutorial, we will take the Coral bleaching detection process as an example. At the time being, it only has one input parameter determining if the process is launched as a small demonstration (small resources needed to run).
And as you can see, this project satisfies the minimal requirement for hosting a process on the datalab, which is having a public container image available with environment variables if an input is needed.
Containerizing your computation code
In order to containerize our computation code, we used Docker. If needed, please read the documentation to know more about getting started with Docker. In case you are using python together with a micromamba environment, micromamba quick start can be of help.
As a further example, here is one of our Dockerfile using python with a micromamba environment:
FROM mambaorg/micromamba:1.4.1-kinetic
COPY --chown=$MAMBA_USER:$MAMBA_USER ./conda_environment_bleaching.yaml /tmp/env.yaml
RUN micromamba install -y -n base -f /tmp/env.yaml && \
micromamba clean --all --yes
COPY ./bleaching.py /bleaching.py
CMD [ "python", "/bleaching.py" ]The important points to follow is that you will need to reproduce the steps you are doing manually inside the Dockerfile in order for your code to be able to run. So all dependencies must be installed, via an environment manager, a dependency manager or one by one. Furthermore the input must be made available via an environment variable, either its whole content or an URL to access it, and used accordingly into your code. At last you must launch your code, as done in the example with CMD [ "python", "/bleaching.py" ].
Output written to user storage
The process used as a template directly writes the output data to the user personal storage. The code inside the container needs to write its output in a specific path determined thanks to an environment variable EDITO_INFRA_OUTPUT. Thanks to the use of this environment variable the template process contains an additional step that will copy the content of this path to the user personal storage. Please go to Writing directly to the user personal storage to know more about it.
Python use of environment variable
In order to use the environment variables inside the python code, only the built-in os library is needed. In this project the input is a boolean named SMALL_DEMO and used inside the python code to limit the quantity of data used for the demonstration.
import os
os.environ.get("SMALL_DEMO")Clone the repository
Once you conternized your process, you can then clone the repository to start your contribution.
git clone https://gitlab.mercator-ocean.fr/pub/edito-infra/process-playground.gitCreate your own chart folder
You can start by copying the content of the coral-bleaching folder inside your own folder.
cp coral-bleaching my-processIf you know what you are doing, you can also start from scratch, with an empty helm chart.
helm create my-processUpdate the chart configuration
If you copied the coral-bleaching folder, you will then need to adjust some files.
Edit the Chart.yaml file
Change the following fields and leave the others unchanged:
- name (the name of your process. This name must only consist of lower case alphanumeric characters, start with an alphabetic character, and end with an alphanumeric character. Hyphens (-) are allowed, but are known to be a little trickier to work with in Helm templates. The directory that contains a chart MUST have the same name as the chart)
- description (a brief description of your process)
- home (a page to learn more about your process, generate a “Learn more” button on the process tile)
- icon (an image that represent the underlying process)
- keywords (a list of useful keywords that can be used to retrieve your process from the datalab search bar)
- version (the version of the chart. Starts with 1.0.0 and update later if you need some changes)
- appVersion (the version of the process running inside your docker container. Maybe a version of your computation is present inside the repository where your process is versioned)
All of these attributes are mandatory, please find an icon even a generic one to illustrate your process.
Edit the templates/NOTES.txt file
The content will be rendered and displayed in a pop-up window while the process is being launched. This text targets any user discovering your process. If you have an estimation of the time needed for the process to complete, it could be an interesting information to add. By default the name of your process will be indicated if you keep the original notes, you can have access to Helm values such as { .Chart }, { .Release }, etc., as you can see in our example. You can use other Helm values in this template file. Please take a look at the official Helm documentation to learn more about it.
Edit the values.yaml file
Replace the input environment variable smallDemo by your own and its default value, or add new ones. If you need an output environment variable, add a “processOutputs” section at the bottom with its name. As you can see, this named variable is not exactly the same one as the name from the python code. The correspondence will be done in a further step.
...
demo:
smallDemo: true
...Edit the values.schema.json file
Replace into the file the input information by your own and add any necessary according to the environment variables you put into value.yaml.
{
...
"demo": {
"description": "Process inputs",
"type": "object",
"properties": {
"smallDemo": {
"type": "boolean",
"description": "To run a small demo of the process",
"default": true
}
}
}
...
}Edit the templates/job.yaml file
Replace the container image reference image: docker.mercator-ocean.fr/moi-docker/bleaching:... by your own. Replace or add the environment variables by precising their names and values. The easiest way is to use the Helm values to directly inject the value of the environment variable in this way : "{{ .Values.demo.smallDemo }}". As you can see, the link between the environment variables and the names used inside the values.yaml file is indicated here.
env:
- name: SMALL_DEMO
value: "{{ .Values.demo.smallDemo }}"When you push your branch, your charts will automatically be published and accessible on EDITO datalab process playground (there may be a 5-minute refresh delay).
Note: in the job.yaml file, the following lines must not be touched:
metadata:
name: {{ .Release.Name }}Create a merge request
Once you think your chart is ready to be published, you can:
- Make sure the metadata are complete in the
Chart.yamlandREADME.mdfiles. - Please provide somehow a point of contact for the users to reach you.
- Pick a catalog category in which your contribution fit the best.
- Create a merge request on the repository and ping @pub/edito-infra/codeowners in the description to catch our attention.
If everything is good, we will migrate your chart to the category you provided, and you will be granted accesses to maintain them (bug fixes, new versions, etc.).
Additional information
Writing directly to the user personal storage
Thanks to the use of EDITO_INFRA_OUTPUT environment variable the template process contains an additional step that will copy the content of this path to the user personal storage. This step can be found in the file job.yaml as a container named copy-output.
These variables are added as environment variables in the container with:
envFrom:
{{- if .Values.s3.enabled }}
- secretRef:
name: {{ include "library-chart.secretNameS3" . }}
{{- end }}This container needs to have access to specific environment variables to be able to access the user S3 bucket:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_SESSION_TOKENAWS_S3_ENDPOINTAWS_DEFAULT_REGION
These variables have been exported thanks to the s3 section of the values.schema.json and the presence of the file secret-s3.yaml.
Include Copernicus Marine Service credentials
It is possible to load Copernicus Marine Service credentials as environement variables in the process. The following configuration will automatically import the credentials configured in the user’s My Account.
First, to automatically load Copernicus Marine Service credentials into the service configuration, add the following property in the values.schema.json file:
{
"properties": {
...
"copernicusMarine": {
"x-onyxia": {
"overwriteSchemaWith": "copernicusMarine.json"
}
},
...
}
}Add the following properties in the values.yaml files:
copernicusMarine:
enabled: false
username: ""
password: ""Then create a secret-copernicusmarine.yaml file inside the templates folder with the following content:
{{- define "library-chart.secretNameCopernicusMarine" -}}
{{- if .Values.copernicusMarine.enabled }}
{{- $name:= (printf "%s-secretcopernicusmarine" (include "library-chart.fullname" .) ) }}
{{- default $name .Values.copernicusMarine.secretName }}
{{- else }}
{{- default "default" .Values.copernicusMarine.secretName }}
{{- end }}
{{- end }}
{{- if .Values.copernicusMarine.enabled -}}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "library-chart.secretNameCopernicusMarine" . }}
labels:
{{- include "library-chart.labels" . | nindent 4 }}
stringData:
COPERNICUSMARINE_SERVICE_USERNAME: "{{ .Values.copernicusMarine.username }}"
COPERNICUSMARINE_SERVICE_PASSWORD: "{{ .Values.copernicusMarine.password }}"
{{- end }}Finally, load the secret values as environment variables in the container:
envFrom:
{{- if .Values.copernicusMarine.enabled }}
- secretRef:
name: {{ include "library-chart.secretNameCopernicusMarine" . }}
{{- end }}GPU-based processes
If the process relies on a GPU card, the container must include a cuda integration, for example it could derive a micromamba image integrating cuda:
# Dockerfile
FROM mambaorg/micromamba:1.5.6-focal-cuda-12.1.1
In addition, some changes must be applied to values.schema.json and template/job.yaml:
In values.schema.json, use ide/resources-gpu.json instead of ide/resources.json:
...
"properties": {
"resources": {
"x-onyxia": {
"overwriteSchemaWith": "ide/resources-gpu.json"
}
},
...
In template/job.yaml, add the following tolerations:
...
spec:
template:
spec:
tolerations:
- effect: NoSchedule
key: node.cloudferro.com/type
operator: Equal
value: gpu
...