Understanding EDITO resources

This documentation provides detail on how the EDITO cloud infrastructure resources work. It focuses on the behavior, current configuration and quotas for the use of both internal computing cluster and storage.

Computing cluster

Elastic cluster

EDITO hosts its own cloud computing cluster, allowing users to run services and processes without having to think too much about the underlying infrastructure (virtual machine, CPU, GPU, etc.). The cluster is configured in a way to optimized computational resource usage at any time, by automatically scaling up and down computation nodes. For example, when a user start a service, it will by default run on one of this node. The system automatically choose a node that have enough resources (CPU, RAM, disk storage) to host the new service. If not active node has enough resources, a new node will be automatically provisioned to host the service. In case of dynamic provisioning, users can experience some latency before their brand new service is up and running.

Understanding resources requests and limits

Requests are the minimum guaranteed amount of a resource that is reserved for a service/process.

Limits, on the other hand, are the maximum amount of a resource to be used by a service/process. This means that the service/process can never consume more than the memory amount or CPU amount indicated.

It’s possible for a service/process to use more resource than its request. However, a service/process is not allowed to use more than its resource limit. For CPU resources, the limit act as a threshold, throttling your service/process when reached. On the other hand, when a service/process tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.

Virtual CPUs, milli CPUs

The computational resource a service needs to run on the cluster is expressed as mCPU (milli CPU). The mCPU is used in cloud resources to express the amount of CPU usage time requested by a service to run. While 1000 mCPU correspond to 1 vCPU (virtual CPU, users can consider a vCPU is a CPU), requesting 500 mCPU means the service will request 50% of the time of a CPU to run. This means all the CPU’s cores can be exploited at any time, allowing computation parallelization/multi-threading. Specifically, it’s not because you request less than 1000 mCPU that you can’t run a multi-threaded service.

CPU nodes and GPU nodes

Currently there are two types of computational nodes available in EDITO cloud computing cluster, CPU nodes and GPU nodes. GPUs have a specific resources, such as VideoRAM. To use GPU a user need to start a service or a process that is configured for GPUs.

Current node configurations

The following table summarize the current cloud computing cluster configuration:

Node type vCores RAM (GB) Web Disk Storage SSD (GB) VideoRAM min. node count max. node count
CPU 8 32 128 1 30
GPU 16 112 320 48 1 4

This configuration is arbitrary and does not reflect the capacity of the platform once fully operational. Please contact the support if you need nodes with higher capabilities.

Distributed computing frameworks

EDITO architecture supports distributed computing frameworks, such as Dask or Spark, that services or processes can rely on. Please contact the support if you need access to a particular framework.

Quotas

EDITO is public and share resources funded by the European Commission. To avoid abuses, users have usage quotas. The follwing table summarize the current quotas for personal and group projects:

Project kind CPU vCores RAM (GB) Web Disk Storage SSD (GB) GPU vCores max. pod count
Personal project 8 32 50 1 10
Group project 16 64 100 1 100

Note: “pods” are entities in which services and processes run. For simplicity, one can consider a service or process needs one pod to run.

If your services or processes never launch, face performance issues, or are stopped without an explicit action of yours, the root cause might be due to these restrictions. Please contact the support if you need bumped quotas.

Data lake

The EDITO data lake is not a data lake in the classical sens; it is rather the “data access” component of EDITO, composed of both the EDITO data storage and the EDITO data catalog.

Data storage

EDITO provides an elastic cloud object storage allowing users to store personal, group and public data. Basically, as a user, you have access to your personal storage that you only manage (you can make part of it publicly accessible). There are also group storage, managed by the group members and a “public” storage, managed by us in which everything is public.

While not running on Amazon Web Service, the object storage are compatible with AWS S3 API:

Storage kind Technology Governance / Management / Ownership
Personal project One S3 bucket User
Group project One S3 bucket Group members
Public Several S3 buckets Administrators (for now)

Owners of personal or group project storage can decide of the visibility of their storage content; they can share data or make the publicly available. Learn more about interacting with your storage here.

Quotas

EDITO is public and share resources funded by the European Commission. To avoid abuses, users have usage quotas. The follwing table summarize the current quotas for personal and group projects, as well of for the public storage.

Storage kind Volume amount (GB)
Personal project 20
Group project 20
Public N/A

Please contact the support if you need bumped quotas.

External data storage

External S3-compatible storage can be configured in project settings, allowing to seamlessly work with it in parallel of (or instead of) the EDITO data storage. Learn more about connection with external storage here.

Data catalog

Referencing data

The EDITO data catalog can reference data inside EDITO data storage or external data. For example, it references Copernicus Marine data that are actually stored and managed by the Copernicus Marine Service. On the other side, it also references the ARCO versions of EMODnet data, that are stored in the public data storage of EDITO. You can also reference data (hosted on your personal storage or any external services/platforms/projects) by interacting with the Data API.

Browsing/searching data

You can browse and search for data in the EDITO data catalog graphically with the EDITO viewer or programmatically with the Data API.

Localisation

Currently, main EDITO cloud resources are provided by CloudFerro in Warsaw region WAW3-1.