1. Introduction and Goals
Shepard is a multi-database storage system for highly heterogenous research data. It provides a consistent API to deposit or access any type of supported data. It is a platform to work with experiment data towards publication.
With the expansion of Shepard, we are creating a more accessible data management platform for research data in order to conduct excellent, data-based research at DLR and beyond.
With the expansion of Shepard, we are increasing the usability, scalability and customizability of the platform to further expand its use, continue to build the community and promote research according to the FAIR principles.
1.1. Quality Goals
Quality goals are prioritized from top to bottom.
-
Usability: The interface should be intuitive to use and provide users with the best possible support in their work. It should be fun to use the interface. The user must be able to find corresponding data easily.
-
Reliability: During an experiment the transferred data must not be lost. Software updates are not allowed to break data.
-
Maintainability: Changes and extensions to the software should be possible efficiently and cost-effectively.
-
Performance: The system must be able to handle large volumes of data efficiently.
-
Operability: It should be easy to get the system up and running. Same for configuring and updating the system.
2. Architecture Constraints
-
On-Premises: The system must be operational without accessing the internet. No cloud services allowed.
-
Respect FAIR principles: Data must meet the following principles: findability, accessibility, interoperability and reusability. The actual implementation in the project still needs to be clarified.
-
Integration ability: While shepard holds the data, analysis is done by different tools. Shepard has to provide a REST interface for accessing those data.
-
Open Source: In general, only software licences are allowed that comply with the Open Source definition - in brief, they allow software to be freely used, modified and shared.
-
Existing data must continue to be usable: Data must not be lost and must remain accessible, especially after software updates. Breaking changes are allowed but there must be a migration strategy if necessary.
-
Software must be operational in the DLR environment:
-
no access to the internet
-
DNS is not available everywhere
-
-
Responsiveness:
-
shepard works well on desktop screens from 14 to 24 inches with 1080p resolution, also on half window size
-
shepard works well on tablets
-
shepard is not optimized for mobile devices
-
-
Browser support: shepard supports at least Firefox ESR and the latest version of Microsoft Edge
-
Accessibility: The basic features should be implemented like high contrast and big font sizes. No special features like support for screen readers are needed.
3. System Scope and Context
This chapter is still under construction |
3.3. Users and Roles
User/Role | Description |
---|---|
Administrator |
The administrator sets up and configures a shepard instance. |
Researcher |
Researcher are using the system as data sink for external data sources. They are running experiments and link data that belongs together. They use the data for further analysis. |
3.4. Use Cases
3.4.1. Create Collection
Collections are meant as the root or container of your entire structure. All information, data, files, permissions, etc. are related to exactly one collection. Usually, a collection is the first thing you create.
3.4.2. Create Project Structure
In order to organize your project, you use Data Objects. They are very generic and can be used to organize your work as you wish. You can use them to create a tree based on experiments, lots, process steps or whatever you want.
3.4.3. Create Container
There are different types of container available, e.g. for files, structured data, timeseries or semantic repositories. You can use them to group things together. If you want to store some images of a video camera, you can create a file container and upload them. If you need to store coordinates of a robot movement, you can create a timeseries containers and store the data there.
4. Solution Strategy
This chapter is still under construction |
Quality Goal | Scenario | Solution Approach | Link to Details |
---|---|---|---|
Usability |
|||
Reliability |
|||
Maintainability |
|||
Performance |
|||
Operability |
4.1. Backend (Quarkus)
4.1.1. Modularization
Previously, the modules have been organized along their technical purpose. That means that there was one package for all endpoints, one package for all neo4j entities, one package for all DAOs etc. This leads to logic and domain objects that are closely coupled being far away from each other, making both changes very distributed across the code base and the code of one feature hard to grasp. To mitigate this, we decided on a new modularization strategy, that will be applied with each update or refactoring touching a part of the code.
The first example is the new timeseries module under |
The backend is split into modules by their functionality. That means that there is e.g. a module for timeseries including managing containers and timeseries data, a module for managing files, for collections & data objects etc.
Each of those modules contains all technical components needed to fulfill it’s purpose, including endpoints, services, domain model objects, DAOs, entities and repositories.
Additionally, there may be modules for special functionalities like authentication.
4.1.2. REST Endpoints
We use quarkus-rest to define REST endpoints.
For authentication and general request validation, we use filters following the Jakarta REST way.
The filters can be found at filters/
.
In general, all requests need to be done by authenticated users.
The JWTFilter
and the UserFilter
take care of validating authentication.
Some endpoints should be public, for example /healthz
and /versionz
.
To make this possible we use the PublicEndpointRegistry
class.
In this class we register all public endpoints in a static string array.
The authentication filters will be bypassed for endpoints in this array.
Since the /healthz
endpoint is automatically public thanks to the SmallRye Health extension, we don’t need to add it to the PublicEndpointRegistry
.
4.2. Frontend (Nuxt)
4.2.1. Structure
Each route is defined in pages/. The root file of a route itself should not contain a lot of logic, instead it should invoke one or more components.
In components/ all components are stored.
These are grouped in folders by domain, e.g. components/collections
.
Stateful logic can be extracted into composables and are stored in composables/.
Stateless utility functions should be stored under utils/.
Routing
We aim to make routes as understandable as possible and try to make each resource view have a unique URL in order to be able to directly link to resources.
This means that for example the collections list view is at /collections
.
One collection can be found at /collections/:id
.
Since data objects belong to a collection and share a common side bar menu with their collection, they can be found at /collections/:id/dataobjects/:id
.
Links
In order to navigate users to another page we aim to avoid:
-
javascript only navigations like
router.push(…)
to make users able to see where they will be redirected -
standard href links to avoid rerendering the whole page
Instead, we use NuxtLink as much as possible. This enables a url display on hover as well as client-side navigations in a hydrated page.
4.2.2. Backend Interaction
For the interaction with the backend we check a generated openapi client into the repository, see here for more information.
In order to directly instantiate the API clients with our default configuration, we use the createApiInstance
utility.
5. Building Block View
This chapter is still under construction |
5.1. Whitebox Overall System
The backend is designed to be modular and expandable.
One of the main tasks of the backend is the management of metadata.
The data structure is used to manage the attached data.
This structure as well as the corresponding references is managed by the core
package and stored in Neo4j.
Contained Blackboxes:
Building Block | Responsibility |
---|---|
Backend |
Handles incoming requests from actors like the shepard Timeseries Collector, Webfrontend and Users. Allows these actors to upload, search and retrieve different kinds of data. |
Webfrontend (UI) |
Allows users to interact (upload, search, retrieve) with the stored data via a webapp. |
External Services & Scripts |
External services that interact with a shepard instance. For example the shepard Timeseries Collector (sTC). It is one of many tools in the shepard ecosystem and aims to collect data from different sources and stores them as timeseries data to shepard. |
5.2. Level 2
Multiple databases are used to enable CRUD operations for many types of data. The user can access these databases via the REST API.
Each database integration has to create its own data structure as needed.
The general structure with Entities
, References
and IOEntities
is available to all integrations.
In order to create a new database integration, one needs to create a new package.
This package has to contain at least one database connector instance, the necessary data objects and a service class.
In addition, the corresponding REST endpoints and the respective references must be implemented.
Contained Blackboxes:
Building Block | Responsibility |
---|---|
Authorization |
This module handles user permissions and roles. Most of the endpoints are protected and can only be used by authenticated users or a valid API key. |
Collections & DataObjects |
Manages metadata and references of organizational elements like |
Timeseries (InfluxDB) |
Manages timeseries data. Consists of a connector to handle the database and a service to provide a higher level view on the database operations. |
Timeseries (TimescaleDB) |
Manages timeseries data. Uses TimescaleDB instead of InfluxDB. In an experimental state right now. Will replace the old InfluxDB timeseries module. |
Structured Data & Files |
Manages structured data and file uploads. Consists of a connector to handle the database connection and two services to provide a higher-level view on the database operations for structured data and files. |
Status |
Contains a health and version endpoint that is accessible via REST. It is easy extensible to provide status information about the backend like the current state of database connections. |
5.2.1. Level 3
Lab Journal
The Lab Journal module allows users to create, edit and delete journal entries.
Lab Journal Entries can be used for documentation purposes.
This feature is thought to be used via the frontend but can also be used directly via the REST interface if needed.
Lab Journal entries are stored in the neo4j database.
This module has a dependency to the Collections & DataObjects
module because they are always linked to a DataObject
.
This module also has a dependency to the Authoriziation
module because the user needs the correct permissions to see and edit lab journal entries.
Following our solution strategy, we have the following classes:
-
LabJournalEntryRest
contains the REST endpoints -
LabJournalEntryIO
is the data transfer object used in the REST interface -
LabJournalEntryService
contains the business logic -
LabJournalEntry
is the main business entity containing the content as html string -
LabJournalEntryDAO
is used for communication with the neo4j database
Timeseries (TimescaleDB)
This module will replace the old implementation using InfluxDB. For decisions and reasoning, check out ADR-008 Database Target Architecture, ADR-010 Postgres/Timescaledb Image and ADR-011 Timescale database schema.
The new module handles persisting timeseries data in a TimescaleDB. It includes all relevant endpoints and services. The database schema includes two tables:
-
A timeseries table containing the metadata for each timeseries (measurement, field, etc.) similar to the metadata in an Influx timeseries.
-
A hypertable containing the data points.
The schema for both tables is defined by the database migrations in src/main/java/resources/db/migration
.
The timeseries table is managed in the code using hibernate entities.
The data point table is managed directly using custom queries, since we want to make full use of TimescaleDB features and performance.
8. Cross-cutting Concepts
8.1. Documentation Concept
8.1.1. Target Groups & Needs
Needs marked with (!)
are not yet fulfilled, but well be taken into account in the future.
Target Group | Needs |
---|---|
Researchers |
|
Integrators |
|
Project Managers |
|
Administrators |
|
Backend/Frontend developers |
|
Maintainers |
|
8.1.2. Documentation Artifacts
The following artifacts are provided as documentation of shepard:
Artifact | Notes | Link |
---|---|---|
Architectural Documentation |
|
|
Wiki (Consumer Documentation) |
Explains basic concepts relevant for using shepard. Also includes examples how to interact with shepard. |
|
Release Notes |
Contains information for each release of shepard. |
|
OpenAPI Spec |
The OpenAPI spec describes the REST API of shepard. |
|
Administrator Documentation |
Contains all relevant information for administrators to successfully operate a shepard instance. |
|
CONTRIBUTING.md |
Contains all relevant information on how to contribute to shepard, including:
|
|
GitLab Issues |
GitLab issues are used to track bugs, feature requests and todos for developers, including relevant discussions. |
Unresolved directive in src/08_concepts/index.adoc - include::migrations.adoc[]
8.2. Authentication
We decided to rely on external identity providers with shepard. This allows us to use existing user databases such as Active Directories. In addition, we do not have to implement our own user database. Most shepard instances use keycloak as their identity provider. However, we want to be compatible with the OIDC specification so that other OIDC identity providers could work with shepard.
The JWTFilter
, which filters every incoming request, implements authentication by validating the submitted JWT.
For this purpose, the JWT is decoded with a statically configured public key.
OIDC allows the key to be obtained dynamically from the identity provider.
However, we decided that a static configuration is more secure and has practically no disadvantages.
The attack vector we are trying to mitigate here is that an attacker gains access to the infrastructure and somehow injects their own public key, which shepard would accept from that point on.
If configured, the system also checks whether certain roles are present among the JWT’s realm_access.roles
attribute.
This can be done by configuring the variable OIDC_ROLES
for the backend.
The backend then only accepts JWTs with the specified role.
This enables the reuse of existing identity providers for different shepard instances, each of which can be accessed by different user groups.
For example, if someone uses an Active Directory for Keycloak to fetch users from, then Keycloak could add specific roles to people based on AD groups they belong to.
In addition to OIDC, we also allow authentication via API keys. Shepard generates these keys itself and stores them in our internal database. Although the API keys are also JWTs, we have to check whether the specified key can be found in our database. Otherwise, we would continue to accept keys that have already been deleted, which is not the intended behavior.
8.2.1. Using Nuxt-Auth
To be able to authenticate in the frontend and acquire a JWT token, we use @sidebase/nuxt-auth as our authentication module in the frontend.
Adjust Nuxt config
The file nuxt.config.ts
holds configuration for the application.
That is where we need to add the @sidebase/nuxt-auth
module to the modules array.
Then, in the same config object, populate the auth
configuration.
The auth
config holds details about our authentication provider and session refresh management.
Add the @sidebase/nuxt-auth
module to the modules array and enable the auth
configuration in nuxt.config.ts
export default defineNuxtConfig({
modules: [
"@sidebase/nuxt-auth",
...,
],
auth: {
isEnabled: true,
provider: {
type: "authjs",
...,
},
sessionRefresh: {...},
},
...,
})
Details about the auth config attributes could be found in the docs
Add environment variables
A couple of env variables are needed for this to work. These variables are documented in the setup of the frontend.
To be able to make use of them we should list them in the runtimeConfig
.
export default defineNuxtConfig({ runtimeConfig: {
authSecret: "",
oidcClientId: "",
oidcIssuer: "",
}
})
Configure the authentication provider
After the configuration adjustment mentioned previously, an automatically generated auth path is created /api/auth
.
Which is where we should create our OIDC provider config under /src/server/api/auth/[…].ts
.
export default NuxtAuthHandler({
secret: runtimeConfig.authSecret,
providers: [
{
id: "oidc",
name: "OIDC",
type: "oauth",
...,
},
],
})
Details about the provider config can be found in the NextAuth docs
After this setup we should be able to authenticate using the specified OIDC provider.
To handle token and session refresh we can use the jwt()
and session()
callbacks to control the behavior in the same NuxtAuthHandler
.
8.3. Authorization
8.3.1. Requirements
-
The backend can verify the identity of users
-
Users are uniquely identified in the backend by usernames
-
The backend can easily verify whether a user has permissions to a particular object
-
This check is quick and easy to perform, so there is no noticeable delay
-
Current records can still be used
Owner
-
Objects have an unique owner
-
Objects without owners belong to everyone (backward compatibility)
-
Owners can be changed later
-
Owners automatically have all permissions to the object
-
Owners automatically have all permissions on all subordinate objects (inheritance)
-
Newly created objects belong to the creator unless otherwise specified
Permissions
-
There are different permissions for readability, writability and managability
-
Permissions can be set only for collections and containers, but apply to all subordinate objects
-
For each object, there is a list of users who are allowed to read/write/manage the object
-
The different permissions build upon each other (read < write < manage)
-
Permissions can be edited by all users with manage permissions
-
Collections and container can be created by all users with access
-
Newly created objects can be read and written by everyone with access
Long living access tokens (Api Keys)
-
Api Keys are used to authenticate and authorize a client for a specific task
-
Api Keys belong to one user
-
Api Keys can only authorize something as long as the user is allowed to do so
-
If a user no longer exists, his Api Keys are automatically invalidated
Payload databases
-
Creation of new data is allowed to any logged in user
-
Integrated databases contain payload containers represented by a
container
object in the data model -
Users can create payload containers via the root endpoints
-
Containers can be populated with data via the
type/container_id/
URL (e.g./files/<id>/
,/timeseries/<id>/
) -
Containers can be restricted by the permission system mentioned above
-
A reference contains the specific ID of the uploaded data inside the container
-
Multiple references can point to one and the same data, or narrow it down further
8.3.2. Implementation
Users
-
Users are stored in Neo4j
-
A user also has the following attributes (arrows → indicate relationships)
-
owned_by → List of entities, references and containers (n:1)
-
readable_by → List of entities (n:m)
-
writable_by → List of entities (n:m)
-
managable_by → List of entities (n:m)
Endpoints
-
A endpoint
/…/<id>/permissions
can be used to manage permissions of an object -
Allowed methods are
GET
andPUT
-
Permissions follow the following format:
{
"readableBy": [
<usernames>
],
"writableBy": [
<usernames>
],
"managableBy": [
<usernames>
],
"ownedBy": <username>
}
Api Keys
-
Api Keys are stored in Neo4j
-
Each time an AccessToken is accessed, it must be checked that the owner of this token also has the corresponding authorization
-
Api Keys have the following attributes
-
uid: UUID
-
name: String
-
created_at: Date
-
jws: Hex String (Will never be delivered after creation)
-
belongs_to: User (n:1)
8.4. User Information
Shepard needs to know certain information about the current user, such as the first and last name and e-mail address.
We can retrieve some information from the given JWT, as Keycloak usually adds some information there.
However, most of the fields are not required by the specification, so we have to use other measures to get the required information.
OIDC specifies a UserinfoEndpoint
which can be used to retrieve some data about the current user.
We have implemented a UserinfoService
to access this data.
Each time a user sends a request, the UserFilter
fetches the relevant user information from the identity provider and updates the internally stored data if necessary.
To reduce the number of requests, we have implemented a grace period during which no new data is retrieved.
8.5. Dependency Updates
Dependencies of shepard are updated regularly. To automate most of this, we use renovate in GitLab. The configuration for the repository is located at renovate.json. In order for the config to be active, it has to be present at the default branch of the repository (main). The renovate runner is located in this (private) repository: https://gitlab.com/dlr-shepard/renovate-runner.
The developer team is responsible of regularly handling the pull requests opened by renovate. This should happen once a month directly after creating a monthly release. As a reminder, monthly update tickets are part of the sprints.
8.5.1. Performing Updates
We handle the merge requests opened by renovate by performing the following steps for each update:
-
reading the change logs of the dependency
-
testing if everything is still working
-
applying necessary changes if they are not too much effort
-
merging the branch or suspending the update.
Also, the dependencies in package-lock.json should be updated.
This is done by running npm update
in the top level directory.
8.5.2. Suspending an Update
In case we could not perform the update it should be suspended and documented in the list of suspended updates. The reason can either be too much effort (we create a new story for that update) or that the update is not possible or feasible right now.
This can be done by excluding the library or specific version in the renovate config. Afterwards the config needs to be merged to main with the following commands:
git checkout main
git cherry-pick <commit-hash>
Afterwards the merge request can be closed.
8.5.3. Abandoned dependency updates
Sometimes, when the configuration changes or dependency updates were done without renovate, the bot might abandon a merge request. In this case the merge request is not automatically closed and has to be closed manually. The corresponding branch must also be deleted manually to keep things clean.
Use Only LTS updates for Quarkus
In our technology choices we decided to only rely on LTS updates of Quarkus. Currently, there is no way to advice the renovate bot to respect only LTS releases of Quarkus. Therefore, we have to manually check that the Quarkus update is always the latest LTS release. We do not want to update to non-LTS versions of Quarkus. A list of current Quarkus releases can be found here Quarkus releases.
Suspended Updates
Package and version | Issue that blocks an update |
---|---|
tomcat<11 |
v11 is still a pre-release |
influxdb⇐1.8 |
V2 introduces major breaking changes. Since we want to move to timescaleDB anyways, we disregard any new updates that require some kind of migration effort. |
chronograf<1.10 |
The container cannot be started with v1.10. We expect to move away from influxdb in the future, so we will stick with v1.9 for the time being. |
neo4j<5 |
V5 introduces major breaking changes |
mongo<5 |
no real reason, there are some major changes, but nothing serious |
vue<3 |
not compatible with bootstrap v4 |
vue-router<4 |
not compatible with vue v2 |
vuex<4 |
not compatible with vue v2 |
bootstrap<5 |
v5 has no vue integration |
portal-vue<3 |
needed for bootstrap-vue |
typescript<5 |
not compatible with vue v2 |
@vue/tsconfig<2 |
not compatible with vue v2 |
vue-tsc<2.0.24 |
|
eslint<9 |
|
@vue/eslint-config-prettier<10 |
Has peer dependency to current version of eslint |
@vue/eslint-config-typescript<14 |
Has peer dependency to current version of eslint |
neo4j-ogm<4 |
v5 is not compatible with neo4j v4 |
jjwt<0.12 |
v0.12.x introduces a series of breaking changes in preparation for v1.0. It is recommended to stay on v0.11 until v1.0 is finished to fix all changes at once. |
junit-jupiter<5.11 |
Not possible atm because parametrized tests in combination with CsvSource do not work any longer. We will wait for the next version. |
@vueuse/core<12 (old frontend) |
v12 drops support for Vue v2 |
vite<6 (old frontend) |
Peer dependency to @vueuse/core v12 (reason in the line above) |
versions-maven-plugin<2.18 |
Maven report fails to generate in pipeline job |
license-maven-plugin<2.5 |
Maven report fails to generate in pipeline job |
8.6. Export Collections
The export feature exports an entire collection including all data objects, references and referenced payloads to a zip file. Metadata is added in the form of a ro-crate-metadata.json
file as per the Research Object Crate specification.
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"name": "Research Object Crate",
"description": "Research Object Crate representing the shepard Collection",
"@id": "./",
"@type": "Dataset",
"hasPart": [
...
]
},
{
"about": {
"@id": "./"
},
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork"
},
...
]
}
The zip file contains all files on top level. This conforms to both, the RoCrate specification and to our internal structure. Relationships between elements are written down as metadata.
<RO-Crate>/
| ro-crate-metadata.json
| DataObject1
| Reference2
| Payload3
| ...
Organizational elements are added as json files, as they would also be displayed via the rest api. These files are named according to their shepard ID. This ensures that the files are unique. Payloads are added as they are, time series are exported in the corresponding CSV format. For each exported part there is an object in the file ro-crate-metadata.json
with additional metadata. We use the field additionalType
to specify the respective data type of an organizational element.
{
"name": "DataObject 1",
"encodingFormat": "application/json",
"dateCreated": "2024-07-02T06:41:19.813",
"additionalType": "DataObject",
"@id": "123.json",
"author": {
"@id": "haas_tb"
},
"@type": "File"
}
Shepard also adds the respective authors to the metadata.
{
"@id": "haas_tb",
"email": "tobias.haase@dlr.de",
"givenName": "Tobias",
"familyName": "Haase",
"@type": "Person"
}
8.7. Ontologies
8.7.1. Registries
BARTOC knows about terminology registries, including itself. Registries also provide access to full terminologies either via an API (terminology service) or by other means (terminology repository).
Typical "interfaces":
-
sparql
-
jskos
-
ontoportal
-
webservice
-
ols
-
skosmos
(others could in include eclass or IEEE iris)
8.7.2. Semantic Repository
-
GET, POST
…/semanticRepository/
-
GET, PUT, DELETE
…/semanticRepository/{containerId}
{
"id": 123,
"name": "Ontobee",
"sparql-endpoint": "http://www.ontobee.org/sparql"
}
8.7.3. Semantic Annotation
-
GET, POST
…/collections/{collectionId}/annotations/
-
GET, PUT, DELETE
…/collections/{collectionId}/annotations/{annotationId}
-
GET, POST
…/collections/{collectionId}/dataObjects/{dataObjectId}/annotations/
-
GET, PUT, DELETE
…/collections/{collectionId}/dataObjects/{dataObjectId}/annotations/{annotationId}
-
GET, POST
…/collections/{collectionId}/dataObjects/{dataObjectId}/references/{referenceId}/annotations/
-
GET, PUT, DELETE
…/collections/{collectionId}/dataObjects/{dataObjectId}/references/{referenceId}/annotations/{annotationId}
{
"id": 456,
"propertyRepositoryId": 123,
"property": "http://purl.obolibrary.org/obo/UO_0000012",
"valueRepositoryId": 123,
"value": "http://purl.obolibrary.org/obo/RO_0002536"
}
8.7.4. Ideas
-
Look into Federated Queries regarding SPARQL https://www.w3.org/TR/2013/REC-sparql11-federated-query-20130321/
Ontologies of interest
-
Semantic Sensor Network Ontology https://www.w3.org/TR/vocab-ssn/#intro
-
Units of Measure http://www.ontology-of-units-of-measure.org/
References / Examples of semantic annotation in other systems
<annotation>
<propertyURI label="is about">http://purl.obolibrary.org/obo/IAO_0000136</propertyURI>
<valueURI label="grassland biome">http://purl.obolibrary.org/obo/ENVO_01000177</valueURI>
</annotation>
-
CATIA Nomagic2022 https://docs.nomagic.com/display/MCM2022x/Working+with+annotations
8.8. Search Concept
8.8.1. Structured Data
Query documents using native mongoDB mechanics
-
Receiving search query via POST request
{ "scopes": [ { "collectionId": 123, "dataObjectId": 456, "traversalRules": ["children"] } ], "search": { "query": { "query": "{ status: 'A', qty: { $lt: 30 } }" }, "queryType": "structuredData" } }
-
Find all relevant references (children of dataObject with id 456)
-
Find references containers
-
Build query
db.inventory.find( {"_id": $in: [ list of containers from 3 ] (implicit AND by), <user query>})
-
Query mongoDB (4)
-
Return results
{ "resultSet": [ { "collectionId": 123, "dataObjectId": 456, "referenceId": 789 } ], "search": { "query": { "query": "{ status: 'A', qty: { $lt: 30 } }" }, "queryType": "structuredData" } }
8.8.5. Organizational Elements
Query collections, data objects and references
Query objects
The query object consists of logical objects and matching objects. Matching objects can contain the following attributes:
-
name
(String) -
description
(String) -
createdAt
(Date) -
createdBy
(String) -
updatedAt
(Date) -
updatedBy
(String) -
attributes
(Map[String, String])
The following logical objects are supported:
-
not
(has oneclause
) -
and
(has a list ofclauses
) -
or
(has a list ofclauses
) -
xor
(has a list ofclauses
) -
gt
(greater than, hasvalue
) -
lt
(lower than, hasvalue
) -
ge
(greater or equal, hasvalue
) -
le
(lower or equal, hasvalue
) -
eq
(equals, hasvalue
) -
contains
(contains, hasvalue
) -
in
(in, has a list ofvalues
)
{
"AND": [
{
"property": "name",
"value": "MyName",
"operator": "eq"
},
{
"property": "number",
"value": 123,
"operator": "le"
},
{
"property": "createdBy",
"value": "haas_tb",
"operator": "eq"
},
{
"property": "attributes.a",
"value": [1, 2, 3],
"operator": "in"
},
{
"OR": [
{
"property": "createdAt",
"value": "2021-05-12",
"operator": "gt"
},
{
"property": "attributes.b",
"value": "abc",
"operator": "contains"
}
]
},
{
"NOT": {
"property": "attributes.b",
"value": "abc",
"operator": "contains"
}
}
]
}
Procedure
-
Receiving search query via POST request
{ "scopes": [ { "collectionId": 123, "dataObjectId": 456, "traversalRules": ["children"] } ], "search": { "query": { "query": "<json formatted query string (see above)>" }, "queryType": "organizational" } }
-
Find all relevant elements (here the nodes with IDs 1, 2 and 3)
-
Build query
MATCH (n)-[:createdBy]-(c:User) WHERE ID(n) in [1,2,3] AND c.username = "haas_tb" AND n.name = "MyName" AND n.description CONTAINS "Hallo Welt" AND n.`attributes.a` = "b" AND ( n.createdAt > date("2021-05-12") OR n.`attributes.b` CONTAINS "abc" ) RETURN n
-
Query neo4j (3)
-
Return results
{ "resultSet": [ { "collectionId": 123, "dataObjectId": 456, "referenceId": null } ], "search": { "query": { "query": "<>" }, "queryType": "organizational" } }
8.8.6. User
-
Receiving search query via GET request
/search/users
-
Possible query parameters are
username
,firstName
,lastName
, andemail
-
Build query to enable regular expressions
MATCH (u:User) WHERE u.firstName =~ "John" AND u.lastName =~ "Doe" RETURN u
-
Query neo4j (2)
-
Return results
[ { "username": "string", "firstName": "string", "lastName": "string", "email": "string", "subscriptionIds": [0], "apiKeyIds": ["3fa85f64-5717-4562-b3fc-2c963f66afa6"] } ]
8.8.7. OpenAPI Spec
openapi: 3.0.2
info:
title: FastAPI
version: 0.1.0
paths:
/search/:
post:
summary: Search
operationId: search_search__post
requestBody:
content:
application/json:
schema:
$ref: "#/components/schemas/SearchRequest"
required: true
responses:
"200":
description: Successful Response
content:
application/json:
schema:
$ref: "#/components/schemas/SearchResult"
"422":
description: Validation Error
content:
application/json:
schema:
$ref: "#/components/schemas/HTTPValidationError"
components:
schemas:
HTTPValidationError:
title: HTTPValidationError
type: object
properties:
detail:
title: Detail
type: array
items:
$ref: "#/components/schemas/ValidationError"
Query:
title: Query
required:
- query
type: object
properties:
query:
title: Query
type: string
QueryType:
title: QueryType
enum:
- structuredData
- timeseries
- file
type: string
description: An enumeration.
Result:
title: Result
required:
- collectionId
- dataObjectId
- referenceId
type: object
properties:
collectionId:
title: Collectionid
type: integer
dataObjectId:
title: Dataobjectid
type: integer
referenceId:
title: Referenceid
type: integer
Scope:
title: Scope
required:
- collectionId
- traversalRules
type: object
properties:
collectionId:
title: Collectionid
type: integer
dataObjectId:
title: Dataobjectid
type: integer
traversalRules:
type: array
items:
$ref: "#/components/schemas/TraversalRule"
SearchEntity:
title: SearchEntity
required:
- query
- queryType
type: object
properties:
query:
$ref: "#/components/schemas/Query"
queryType:
$ref: "#/components/schemas/QueryType"
SearchRequest:
title: SearchRequest
required:
- scopes
- search
type: object
properties:
scopes:
title: Scopes
type: array
items:
$ref: "#/components/schemas/Scope"
search:
$ref: "#/components/schemas/SearchEntity"
SearchResult:
title: SearchResult
required:
- resultSet
- search
type: object
properties:
resultSet:
title: Resultset
type: array
items:
$ref: "#/components/schemas/Result"
search:
$ref: "#/components/schemas/SearchEntity"
TraversalRule:
title: TraversalRule
enum:
- children
- parent
- predecessors
- successors
type: string
description: An enumeration.
ValidationError:
title: ValidationError
required:
- loc
- msg
- type
type: object
properties:
loc:
title: Location
type: array
items:
type: string
msg:
title: Message
type: string
type:
title: Error Type
type: string
8.9. Release Process
A shepard release consists of a new version number, build artifacts (container, clients, etc.), a release tag on main
, and release notes.
8.9.1. Release frequency
Usually a new shepard version is released on the first Monday of the month. However, this date is not fixed and can be postponed by a few days if necessary. This monthly release increases the release version number.
We use semantic versioning, meaning that the version number consists of a major, minor and patch number in the following format:
MAJOR.MINOR.PATCH
.
Minor is the default version increase for a release.
Breaking changes imply a Major release.
And hotfixes or patches are performed as a Patch release (see Hotfix process below).
Currently there are two workflows for releases, one minor/ major release and a patch release. Both release types are explained step by step below.
8.9.2. Performing releases
These steps describe a regular (monthly) release for shepard but can also be used to release an unplanned patch release.
Furthermore, there are two ways to create an unplanned patch/ hotfix release.
The first option is the more classical hotfix approach, meaning that it only brings the changes from merge requests containing hotfixes from the develop
branch to the main branch.
The steps needed for this option are explained below in the section: Performing a hotfix release.
The second option includes creating an MR containing the needed patch changes on develop
branch, then merge the develop
branch on main
and create a new minor release.
This would create a new out-of-cycle release containing the patch and all changes from develop since the last release.
This option is the same procedure as a regular release, which is described right below.
-
Finish development and make sure the develop branch is stable, the pipeline is successful and no code reviews are open
-
Optional: Merge the
main
branch intodevelop
in order to reapply any cherry-picked commits -
Merge the
develop
branch into themain
branch -
Prepare an official release by using the shepard release script
-
To setup the release script, follow the steps listed in the Scripts README.md
-
Run the following command:
poetry run cli release ./token.txt
-
The script will ask if the release is Patch, Minor or Major and calculates the new version accordingly. The script automatically uses Major if the previous changes contain breaking changes.
-
Verify the listed merged merge requests
-
Verify the release notes created by the script. (editor opens automatically)
-
Suggest a release title that will be appended to the version number.
-
Confirm the generated details.
-
Verify that everything was successfully created. (GitLab Release, Release Notes, etc.)
8.9.3. Performing a hotfix release
Hotfixes are changes to the main
branch outside the regular releases to fix urgent bugs or make small changes that need to be applied to the default branch of the project.
The steps below describe how one can release a single hotfix MR without having to merge the develop
branch on main.
This means that the other changes on develop
branch are only merged when a new regular release is created.
-
As usual, a merge request with the hotfix must be created, reviewed, and merged to
develop
-
The resulting MR commit must be cherry-picked from
develop
tomain
git checkout main git cherry-pick <commit-hash> git push
-
The shepard release script needs to be run, in order to create a new hotfix release.
-
To setup the release script, follow the steps listed in the Scripts README.md
-
Run the following command:
poetry run cli release ./token.txt
-
The script will ask if the release is Patch, Minor or Major and calculates the new version accordingly. Here you should select a Patch version, since you only want to release a hotfix/ patch.
-
Verify the listed merged merge request
-
Verify the release notes created by the script. (editor opens automatically)
-
Suggest a release title that will be appended to the version number.
-
Confirm the generated details.
-
Verify that everything was successfully created. (GitLab Release, Release Notes, etc.)
8.9.4. Actions done by the release script in the background
-
Collecting all previous merge requests from the last version until now.
-
Analyze if previous changes contain breaking changes.
-
-
A Gitlab Release including release notes directed at administrators and users is created
-
The title is the title given by the user concatenated with the version tag
-
A short paragraph describes the most important changes
-
Breaking changes are listed in a prominent way
-
Other changes besides dependency updates are listed below
-
-
A release tag
<version number>
onmain
is created-
The script automatically uses a Major version increase if the previous changes contain breaking changes.
-
-
Ask the user if the script should automatically create a 'Update Dependencies issue' for the current milestone after performing a successful release. This is done since we agreed on updating all dependencies after performing a release.
8.10. Configuration
8.10.1. Backend (Quarkus)
This section is a short summary of this page.
Application Properties
Setting Properties
Quarkus reads configuration properties from several sources. More information on the sources and how they override each other can be found here.
We define a standard value for most properties under src/main/resources/application.properties
.
For the dev and test environment, we provide properties with a %dev
, %test
or %integration
prefix overriding the default value.
Additonally they can be overridden locally using a .env
file.
We use this for configuration differing between developers, e.g. the OIDC config.
In a dockerized setup they can be overridden by providing environment variables to the service.
To support administrators, relevant configuration options are documented in infrastructure/.env.example and infrastructure/README.md.
Reading Properties
Poperties can be either injected or accessed programatically.
Feature Toggles
With feature toggles we want to conditionally build shepard images with or without a certain feature. This is especially useful for features under development.
To define a feature toggle, we add the property to configuration.feature.toggles.FeatureToggleHelper
and create a class in configuration.feature.toggles
that contains the name of the property, an isEnabled
method as well as the method id of isEnabled
.
An example could look like this:
package de.dlr.shepard.configuration.feature.toggles;
public class ExperimentalTimeseriesFeatureToggle {
public static final String TOGGLE_PROPERTY = "shepard.experimental-timeseries.enabled";
public static final String IS_ENABLED_METHOD_ID =
"de.dlr.shepard.configuration.feature.toggles.ExperimentalTimeseriesFeatureToggle#isEnabled";
public static boolean isEnabled() {
return FeatureToggleHelper.isToggleEnabled(TOGGLE_PROPERTY);
}
}
We can then use this feature toggle in multiple ways:
Conditionally Excluding Beans at Buildtime
Quarkus provides us with a mechanism to conditionally exclude beans at buildtime. For example, the endpoints of an experimental feature can be enabled or disabled at build time to be included in dev builds but excluded in release builds.
For example, the ExperimentalTimeseriesRest
can have a @IfBuildProperty
annotation like this:
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
@Path(Constants.EXPERIMENTAL_TIMESERIES_CONTAINERS)
@RequestScoped
@IfBuildProperty(name = ExperimentalTimeseriesFeatureToggle.TOGGLE_PROPERTY, stringValue = "true")
public class ExperimentalTimeseriesRest {
...
}
In this example the endpoints are only available when shepard.experimental-timeseries.enabled
was true
at build time.
The @IfBuildProperty annotations are evaluated at build-time.
Make sure to add the property to the application.properties file during build, so that the build artifact has the same value at runtime.
|
See here for more information.
Connect further Configuration with a Feature Toggle
For a feature toggle, we want a single property to control it.
In case we need to adapt further configuration based on the feature toggle (e.g. disabling hibernate), we can reference the property like this:
shepard.experimental-timeseries.enabled=false
quarkus.hibernate-orm.active=${shepard.experimental-timeseries.enabled}
To reenable a feature for a dev or test profile, we can then activate the toggle for these profiles.
In order to execute tests conditionally based on a toggle we use the isEnabled
and METHOD_ID
method of the feature toggle class.
In a test class or method, we can then add a annotation like this:
@QuarkusTest
@EnabledIf(ExperimentalTimeseriesFeatureToggle.IS_ENABLED_METHOD_ID)
public class ExperimentalTimeseriesContainerServiceTest {
...
}
Feature toggles in the pipeline
Don’t mistake the build profiles dev
and prod
for the profiles for dev and prod images.
To have our dev environment as close to the production environment as possible, the dev images are also built using the prod
profile.
In order to enable the feature for a dev or prod build, we provide the feature toggle in the get-version
pipeline job for dev or prod.
Make sure to provide the toggle for all pipelines and adapt it in the application.properties before building.
Otherwise the value of the test profile is used, which can lead to errors.
|
8.10.2. Frontend (Nuxt)
This section is a short summary of this page.
Setting properties
We define environment variables in the Nuxt config like this:
export default defineNuxtConfig({ runtimeConfig: {
// A value that should only be available on server-side
apiSecret: '123',
// Values that should be available also on client side
public: {
apiBase: '/api'
}
}})
These values can be overridden by a .env
file like this:
NUXT_API_SECRET=api_secret_token
NUXT_PUBLIC_API_BASE=https://nuxtjs.org
In order to ease the configuration we provide a .env.example
file with all relevant variables.
That file can be copied to .env
and filled with the appropriate values.
8.11. Versioning
8.11.1. Introduction
As a shepard user I want to be able to use different versions of data sets to facilitate collaboration in a research project and lay the groundwork for future features like branching, visualization of differences or restore functionality.
I can define a version of a collections to mark a milestone in the project in order to freeze the status that the data set has right now via the API.
There is always the one active version called HEAD
which is the working copy that can be edited on shepard and there can be n versions on shepard that are then read-only.
If I never define a version as a user then there is no change in functionality for me.
Versions are identified by a UUID.
Version should be applied for organizational elements, not for payload data.
A version is always for a whole collection.
Data objects and references inherit the version from their enclosing collection.
Versioning is explicit, users have to create versions actively.
8.11.2. Behavior
The following image displays a collection with references with no manually created versions.
After the creation of a new version, the data will look like this:
Semantic Annotations are copied when creating a version, just like the collection, data objects and references.
Permissions are collection-wide and across all versions.
8.11.3. Endpoints
Endpoint | Description | Request Body | Response |
---|---|---|---|
POST /collections |
create first collection |
|
|
POST /collections |
create second collection |
|
|
POST /collections/cid1/versions |
create first version of first collection |
|
|
GET /collection/cid1/versions |
get versions of first collection |
|
|
POST /collections/cid1/dataObjects |
create first dataObject in first collection |
|
|
POST /collections/cid1/dataObjects |
create second dataObject in first collection with first dataObject in first collection as parent |
|
|
GET /collections/cid1/dataObjects?versionUID=collection1version1uid |
there are no dataobjects in the first version of collection1 |
|
8.11.4. Edge Case: CollectionReferences and DataObjectReferences
When we create a new version of a referenced collection, the reference will move with the HEAD
and the old collection will not be referenced anymore:
When we referenced an old version of a collection and a new version is created, the reference stays unchanged:
POST /collections/cid2/dataObjects |
create first dataObject in collection 2 |
|
|
POST /collections/cid1/versions |
create first version of collection 2 |
|
|
POST /collections/cid1/dataObjects/c1did1/dataObjectReferences |
create dataObjectReference from first dataObject in collection1 to first dataObject in collection2 without version |
|
|
POST /collections/cid1/dataObjects/c1did1/dataObjectReferences |
create dataObjectReference from first dataObject in collection1 to first dataObject in collection2 with version |
|
|
GET /collections/cid2/dataObjects/c2did1 |
fetch referenced dataObject with incoming counter |
|
|
POST /collections/cid1/versions |
create second version of collection 2 |
|
|
GET /collections/cid2/dataObjects/c2did1 |
fetch referenced dataObject from HEAD, incoming is still the same |
|
|
GET /collections/cid2/dataObjects/c2did1?versionUID=collection2version2uid |
fetch referenced dataObject from version 2, incoming is now empty |
|
8.12. OpenAPI Specification
Quarkus provides the Smallrye OpenAPI extension.
Documentation on OpenAPI and swagger can be found here.
The generated OpenAPI spec is available at /shepard/doc/swagger-ui
of a running shepard backend.
8.12.1. Enhancing the Schema with Filters
The generated schemas can be adapted using filters. For example, we use this to adapt the paths to match the root path of our api.
More information on this can be found here.
8.12.2. Path Parameter Order
Quarkus sorts the list of path parameters in the OpenAPI spec alphabetically instead of by occurrence in the path.
For example, the following endpoint:
@DELETE
@Path("/{" + Constants.APIKEY_UID + "}")
@Tag(name = Constants.APIKEY)
@Operation(description = "Delete api key")
@APIResponse(description = "deleted", responseCode = "204")
@APIResponse(description = "not found", responseCode = "404")
public Response deleteApiKey(
@PathParam(Constants.USERNAME) String username,
@PathParam(Constants.APIKEY_UID) String apiKeyUid
) {
// Some code
}
will lead to the following OpenAPI spec:
delete:
tags:
- apikey
description: Delete api key
operationId: deleteApiKey
parameters:
- name: apikeyUid
in: path
required: true
schema:
type: string
- name: username
in: path
required: true
schema:
type: string
responses:
"204":
description: deleted
"404":
description: not found
As we want the parameters to be sorted by occurrence in the path, this is not intended and can lead to issues in generated clients.
To fix this, we define the order of path and query params in the OpenAPI spec manually using @Parameter
annotations.
We do this for all path and query parameters.
For the above example, the result would look like this:
@DELETE
@Path("/{" + Constants.APIKEY_UID + "}")
@Tag(name = Constants.APIKEY)
@Operation(description = "Delete api key")
@APIResponse(description = "deleted", responseCode = "204")
@APIResponse(description = "not found", responseCode = "404")
@Parameter(name = Constants.USERNAME)
@Parameter(name = Constants.APIKEY_UID)
public Response deleteApiKey(
@PathParam(Constants.USERNAME) String username,
@PathParam(Constants.APIKEY_UID) String apiKeyUid
) {
// Some code
}
8.12.3. Format specifier for Datetime
When using Java’s old date API, the generated openapi specification interprets the Date
object as a date-only field.
The code snippet below:
import java.util.Date;
public class SomeClass {
@JsonFormat(shape = JsonFormat.Shape.STRING)
private Date createdAt;
}
generates the following openapi yaml snippet:
SomeClass:
type: object
properties:
createdAt:
format: date
type: string
example: 2024-08-15
However, the Date
object stores date and time, and should be treated by the openapi specification as an object that handles both date and time.
In the example, a modification on the createdAt
field is needed to explicitly specify that the generated format should be in datetime format.
The code snippet below adds a @Schema
annotation, which specifies the format
field:
import java.util.Date;
public class SomeClass {
@JsonFormat(shape = JsonFormat.Shape.STRING)
@Schema(format = "date-time", example = "2024-08-15T11:18:44.632+00:00")
private Date createdAt;
}
This annotation then results in the following openapi specification containing a date-time
format:
SomeClass:
type: object
properties:
createdAt:
format: date-time
type: string
example: 2024-08-15T11:18:44.632+00:00
So, when using Java’s Date
object of the old java.util.Date
API, we need to explicitly specify the format in schema annotation, to generate a datetime object in the openapi specification.
8.12.4. Correct multipart file upload for Swagger UI
To enable multipart file uploads in Swagger, the following schema is expected (Source):
requestBody:
content:
multipart/form-data:
schema:
type: object
properties:
filename:
type: array
items:
type: string
format: binary
Until now we weren’t able to reproduce this exact schema together with a proper file upload in the Swagger ui.
Especially since we have requirements on the filename
property to be not-null and required.
So with annotations-only we could not reproduce this schema in Quarkus.
However, the following code snippet construct of interfaces/ classes and using the implementation
field in the @Schema
annotation, we were able to reproduce functionality for a file upload in the Swagger UI and a proper openapi schema.
@POST
@Consumes(MediaType.MULTIPART_FORM_DATA)
public Response createFile(
MultipartBodyFileUpload body
) {
// ... file handling code ...
}
@Schema(implementation = UploadFormSchema.class)
public static class MultipartBodyFileUpload {
@RestForm(Constants.FILE)
public FileUpload fileUpload;
}
public class UploadFormSchema {
@Schema(required = true)
public UploadItemSchema file;
}
@Schema(type = SchemaType.STRING, format = "binary")
public interface UploadItemSchema {}
This generates the following openapi specification:
paths:
/examplepath:
post:
requestBody:
content:
multipart/form-data:
schema:
$ref: "#/components/schemas/MultipartBodyFileUpload"
components:
schemas:
MultipartBodyFileUpload:
$ref: "#/components/schemas/UploadFormSchema"
UploadFormSchema:
required:
- file
type: object
properties:
file:
$ref: "#/components/schemas/UploadItemSchema"
UploadItemSchema:
format: binary
type: string
This specification is rather complex and nested, but allows i.e., to add a required
field to the UploadItemSchema
schema, which is then generated as a required field in the Swagger UI.
One problem of this approach is that this construct of MultipartBodyFileUpload
, UploadFormSchema
and UploadItemSchema
is needed for every REST endpoint that utilizes a multipart fileupload.
The solution is a combination of these two resources:
8.13. Generated Backend Clients
In order to ease the usage of the backend API, we maintain and publish generated backend clients. They are generated using the OpenAPI Generator.
We currently build and publish clients for Java, Python and TypeScript as part of our release process. In addition to the OpenAPI diff job there are jobs to check if there are changes in the generated clients for the TypeScript and Python client.
In the past there was a python-legacy client and a Cpp client published. They have been discontinued.
8.13.1. Backend Client for shepard Frontend
In order to support concurrent development of frontend and backend we decided to put the generated client for the frontend under version control (ADR-007 Client Generation for Frontend).
The client can be found under backend-client
.
It’s exported members can be imported in frontend files like this:
import { SemanticRepositoryApi } from "@dlr-shepard/backend-client";
import { getConfiguration } from "./serviceHelper";
export default class SemanticRepositoryService {
static createSemanticRepository(params: CreateSemanticRepositoryRequest) {
const api = new SemanticRepositoryApi(getConfiguration());
return api.createSemanticRepository(params);
}
}
(Re)generating the Client
In case the API changed or a new version of the OpenAPI generator shall be used, the client has to be regenerated. This can be done by running the following command in the top level directory. Be aware that a local Java installation is required for the command to run successfully.
npm run client-codegen
The script will also persist the OpenAPI specification used for generation. Afterwards, the frontend code may have to be adjusted.
In order to check if the client is up to date, the generator version as well as the current OpenAPI specification is compared with the ones used for generation in a pipeline job.
8.14. Testing Strategy
To automatically test shepard, several strategies are used. They are described in this section.
8.14.1. Unit Tests
We use junit5 for unit testing parts of our backend code. We aim to cover everything except endpoints by our unit tests.
@QuarkusTest with Running Databases
For special cases, we use tests with the @QuarkusTest
annotation to test beans of a running quarkus instance with running databases.
This is especially used for behaviour strongly coupled to databases, in order to reduce the need for mocking and get more precise test results.
They are executed in a seperate job in the pipeline in order to provide the needed databases.
8.14.2. Integration Tests
To test the overall functionality of the backend we test our http endpoints with integration tests using @QuarkusIntegrationTest
.
In the pipeline, the tests are executed on a quarkus instance based on the build artifact built in the pipeline.
Integration Tests utilizing External REST APIs
Some of the integration tests in the backend rely on an external REST API.
One example are the integration tests that utilize semantic annotations like the CollectionSearcherIT
.
This test includes creating a semantic repository with an external ontology service and executing requests against this endpoint.
This introduces an external dependency to our integration tests, which we cannot control. If the external service is not available, the HTTP connection results in a timeout and the whole integration test fails, even though this is not related to our backend code. Since we want to test if the health check to the external service works, we cannot replace the health check function with a mocked version of this function. By introducing WireMock, it is possible to mock the HTTP response itself.
WireMock is a testing framework to map HTTP responses to specific HTTP requests.
WireMock acts as a simple HTTP server in the background that allows defining rules to match on HTTP requests with pre-defined HTTP responses.
For example, this code snippet in the WireMockResource.java
mocks the behavior for the health check against an external ontology service:
wireMockServer.stubFor(
// stub for health check on: https://dbpedia.org/sparql/
get(urlPathEqualTo("/sparql"))
.withQueryParam("query", equalTo("ASK { ?x ?y ?z }"))
.willReturn(aResponse().withStatus(200).withBody("{ \"head\": {\"link\": [] }, \"boolean\": true }"))
);
The rules in this snippet are defined as the following:
for every GET request to localhost:PORT/sparql
with the query parameter query=ASK { ?x ?y ?z }
, the WireMock HTTP server will return a HTTP response with code 200, containing this JSON string in its body "{ \"head\": {\"link\": [] }, \"boolean\": true }"
.
Since we are using Quarkus as our backend framework, we utilize the Quarkus WireMock Extension. This extensions allows an easier integration into an existing Quarkus application. It directly supports injection, to inject a WireMock server in an integration test. Generally, injection is our preferred way to initialize objects.
However, injection is not used in our current implementation of WireMock mocking due to limitations of our concrete scenario, i.e. usage in static functions, where injection is not possible. Therefore, we utilize WireMock in a static approach.
A proper way to integrate WireMock into a Quarkus integration test is described in the extension’s introduction page and also by the official Quarkus guide for test resources.
WireMock is a strong tool and provides many options to mock complex web services. To name only a few possibilities with WireMock, you are able to use response templates, it provides proxying to forward specific requests, and it supports multiple protocols like gRPC,GraphQL, HTTPS and JWT.
8.14.3. Load and Performance Tests
We are using grafana k6 for load and performance tests.
They are in the directory load-tests
and are written in TypeScript.
Tests can only be triggered on a local development computer but can be configured to use the local or the dev environment.
Configuration
-
Create a file under
load-tests/mount/settings.json
-
Copy contents from
load-tests/mount/settings.example.json
-
Adapt configuration settings as needed
-
run
npm install
inload-tests/
Execute tests
There is a shell script run-load-test.sh
that can be used to execute load tests.
It takes the test file to execute as first parameter.
./run-load-test.sh src/collections/smoke-test.ts
Good to know
-
Webpack
is used for bundling all dependencies into the test file. -
Webpack
usests-loader
to transpile TypeScript to JavaScript. -
K6 does not use a node environment. Therefore some functionality is not available.
-
Webpack.config.js
identifies entry points (tests) dynamically. All*.ts
files are added automatically if they are not located in theutils
folder. -
K6 pushes some metrics to prometheus after test execution.
-
To run the tests against a locally run backend on linux, you need to put the ip address into
settings.json
8.15. Shepard Exceptions
When an exception is to be thrown than in most cases it should be of the abstract type ShepardException
.
The ShepardExceptionMapper
is able to handle such exceptions and in turn informs the user in a human readable way about that exception.
Currently, there are four different sub-types of the ShepardException
:
-
InvalidAuthException: The
InvalidAuthException
is to be thrown when a resource is accessed without sufficient permissions. -
InvalidRequestException: The
InvalidRequestException
is used when the request misses required information or is otherwise invalid. -
ShepardParserException: The
ShepardParserException
is used by the search package to indicate that the search query could not be parsed. -
ShepardProcessingException: The
ShepardProcessingException
indicates an arbitrary issue while processing the request.
8.16. Subscription Feature
The subscriptions feature allows users to react on certain events. Shepard defines some rest endpoints as subscribable. Users than can subscribe to requests handled by these endpoints. In addition to a specific endpoint user have to specify a regular expression which is matched with the respective URL as well as a callback endpoint. The callback endpoint is called by shepard when an endpoint triggers a subscription and the regular expression matches. The callback contains the respective subscription, the actually called URL as well as the ID of the affected object. The callback itself is executed asynchronously to avoid slowing down the response to the request in question.
A common use case for this feature is the automatic conversion of certain data types. For example, if a user wants to know about every file uploaded to a specific container, they would create a subscription in the following form:
{
"name": "My Subscription",
"callbackURL": "https://my.callback.com",
"subscribedURL": ".*/files/123/payload",
"requestMethod": "POST"
}
Once shepard has received a matching request, it sends the following POST' request to the specified callback URL `https://my.callback.com
:
{
"subscription": {
"name": "My Subscription",
"callbackURL": "https://my.callback.com",
"subscribedURL": ".*/files/123/payload",
"requestMethod": "POST"
},
"subscribedObject": {
"uniqueId": "123abc"
},
"url": "https://my.shepard.com/shepard/api/files/123/payload",
"requestMethod": "POST"
}
8.17. Theming
We are using vuetify as component library so we followed their theme configuration guide.
8.17.1. Global definitions and overrides
Global definitions like the font we use and typographies are defined as sass variables under 'nuxtend/styles/settings.scss'. Global overrides of component specific properties are also defined there.
8.17.2. Theme colors
The theme itself that mainly contains colors are defined in 'nuxtend/plugins/vuetify.ts'. The colors are taken over from the Style Guide that resides in Figma.
8.17.3. Styling individual components
There are multiple possibilities to style vue components. We agreed on the following order when styling components.
-
Use global overrides with sass variables if all components of the same type are affected.
-
Use properties of the components if they exist, e.g. VButton has an 'color' property.
-
Use the class property of components to use predefined css helper classes. In the vuetify documentation there is a list of utility classes under 'Styles and animations'.
-
Use the <style> tag to override css classes directly.
8.18. Session Management
As soon as a user authenticates himself, a session is created. We use the session mainly to store the tokens and some user information. We DO NOT persist the session anywhere on the server. As soon as the server restarts or the session ends, the information is lost.
In order to store user specific data like favorites or user selection we make use of the browser storage.
8.18.1. Local storage
In order to access the browsers local storage we make use of VueUse. It has a method called useStorage which gives us access to the local storage. With a key we can access the storage and fetch already stored data. If no data has been found it will fallback to the default value which can be provided in the function parameters.
const state = useStorage('my-store', {hello: 'hi', greeting: 'Hello' })
9. Architecture Decisions
9.1. ADR-000 API V2
26.08.2024: With changes implemented in this MR, some of the endpoints definitions went through minor changes. |
Date |
2021 |
---|---|
Status |
Done |
Context |
1. Ideas
|
Solution |
1. EndpointsOrganisational Entities
User
Database IntegrationsThe following endpoints exist optionally for each kind of data: Structured Data Container
File Container
Timeseries Container
2. FilteringSome filter option can be implemented:
3. BehaviourWhen a generated API client is used and existing objects are modified, only explicitly modified properties should be changed. Example:
In this example, only 5. Example StructuresThe following structures are examples that demonstrate the user’s view of entities. Collection
DataObject
BasicReference
CollectionReference(BasicReference)
DataObjectReference(BasicReference)
URIReference(BasicReference)
TimeseriesReference(BasicReference)
TimeseriesContainer
TimeseriesPayload
FileReference(BasicReference)
FileContainer
FilePayloadThere is no such thing as a file payload, since a file is always treated as a binary stream StructuredDataReference(BasicReference)
|
9.2. ADR-001 Monorepository
Date |
12.06.2024 |
---|---|
Status |
Done |
Context |
Currently the project is spread across multiple repositories for architecture work, backend, deployment, documentation, frontend, publication, releases and further tools of the ecosystem (shepard Timeseries Collector). This means increased effort when working with the repositories, especially for feature development concerning both the backend and the frontend. Also the documentation is not as close to the code as it could (reasonably) be. |
Possible Alternatives |
|
Decision |
We decide for migrating all repos except the shepard Timeseries Collector. The commit history, open issues, wikis and pipelines should be migrated to the monorepo. |
Consequences |
Monorepo has to be set up and previous projects have to be migrated |
9.3. ADR-002 Backend Technology
Date |
02.07.2024 |
---|---|
Status |
Done |
Context |
The purpose of shepards backend is to provide a generic REST interface for its frontend or external communication partners to retrieve and store data in different formats. This data includes different research data (e.g. timeseries data, structured documents, files) and a connecting meta data structure (graph structure). To persist the data it uses Neo4j, MongoDB and InfluxDB databases. The backend of shepard is implemented as a basic Jakarta EE application using Jakarta Servlet and Tomcat. Further used libraries are selected individually, added on top and their interoperability is checked manually. There is no dependency injection. Due to its purpose the backend does not contain a lot of business logic, it rather functions as an adapter for the data types. Replacing the current approach a framework should be chosen to provide more structure and a robust and future-proof architecture. The tender for the extension of shepard listed the following requirements for a new framework:
As the databases in use might change in the near future, not too much time should be spent on migrating the concerning code. |
Possible Alternatives |
The comparison of alternatives can be found in the appendix. |
Decision |
We decide to go with Option Quarkus, because it uses established standards as opposed to Spring Boot defining its own standards. It also feels more modern while still being in broad use. |
Consequences |
|
9.3.1. Appendix
Keep current setup | Spring Boot | Quarkus | Javalin | Micronaut | Non-Java Backend | |
---|---|---|---|---|---|---|
Migration effort |
No effort |
(+) Medium Effort |
(+) Medium Effort |
(-) Large Effort (Most things have to be added manually) |
Medium Effort |
(-) Huge effort (everything has to be rewritten) |
Migration benefit |
(-) No benefit |
(+) Big Benefit, Batteries included (Hibernate integration, Security tools, Dependency Injection out of the box) |
(+) Big Benefit, Batteries included (Hibernate integration, Security tools, Dependency Injection out of the box) |
(-) Low benefit, most things still have to be manually integrated (e.g. database clients & hibernate connection) |
- |
- |
Rest API migration effort |
- |
(+) Medium Effort |
(+) Medium Effort |
Medium Effort |
- |
- |
Broad use and active community |
- |
(++)Widely used, Huge Community |
(+) In productive use, e.g. Keycloak. Medium but growing community |
Medium but growing community |
(-) Small, growing community, small Project |
- |
Detailed and Up-To-Date Documentation |
- |
(++) Detailed docs, lots of questions on stackoverflow (some of them may be outdated) |
(+) Tutorials & Guides provided by Quarkus, some resources on stackoverflow |
(+) |
- |
- |
Good Integration for REST Interface, Neo4j, MongoDB, InfluxDB, potentially PostgreSQL |
- |
(+)
|
(+)
|
|
(+)
|
- |
Developer Experience |
- |
(+) Great |
(++) Best |
Lots of boiler plate code, lots of integrations you have to write yourself |
- |
- |
Easy dev tooling |
- |
(+) Fully integrated with IntelliJ Ultimate Support for Eclipse, VSCode, etc. |
(+) Fully integrated with IntelliJ Ultimate Support for Eclipse, VSCode, etc. |
(+) Standard Support for IntelliJ and Eclipse, no extra functionality e.g. Testing, Modelling, … |
- |
- |
Testability |
- |
(+) Very flexible out of the box tools for Unit and Integration tests. However, “real” e2e tests need a framework like Spock, Cucumber or Cypress |
(+) Extensive support for different testing mechanisms. Also expandable with other testing tools/frameworks. Real e2e tests may probably need a seperate framework |
|
- |
- |
Scalability |
- |
(+) Established support for kubernetes |
(++) Great support for containerization, kubernetes and microservices. Startup time (e.g. for autoscaling is very fast) |
- |
- |
|
Performance |
- |
(+) Similar to Quarkus |
(+) Similar to Spring Boot. Extremely fast because of graalvm, native image support and small footprint |
Small codebase and completely customizable |
- |
- |
Ease of Updates |
- |
(+) Provides ways to analyze potentially breaking changes & diagnoses the Project |
(+) Provides ways to analyze potentially breaking changes & diagnoses the Project |
- |
- |
|
HTTP Endpoint Standard |
- |
(+) Spring |
(++) JAX-RS (same standard as currently in use) |
- |
- |
|
No Vendor Lock |
- |
(!) “Beans architecture” is used in other software too. Using Spring Boot includes using it’s features. So some “lock-in” will be there. But it is an open source framework. |
(!) Using Quarkus includes using it’s features. So some “lock-in” will be there. But it is an open source framework. |
- |
- |
|
Frontend easily integratable |
- |
(+) A REST or GraphQL API can be provided e.g. for a VueJS App. |
(+) A REST or GraphQL API can be provided e.g. for a VueJS App. |
- |
- |
|
Dependency Injection Pattern |
- |
(+) included |
(+) included |
(-) Only if self implemented |
- |
- |
Singleton Pattern |
- |
(+) Beans |
(+) |
- |
- |
|
OIDC support for external ID provider |
- |
(+) There is an OAuth2 client for spring boot |
(+) There is an oidc plugin |
- |
- |
|
OIDC support with integrated ID provider |
- |
(+) Yes, with spring authorization server |
(!) All resources on using OIDC with Quarkus expect a seperate OIDC Provider. |
|||
Experience in the DEV Team |
(!) Limited experience in small projects |
(!) Limited experience in small projects |
||||
Gut Feeling |
More modern, maybe less technical debt inside |
9.4. ADR-003 All-in-One Image
Date |
06.08.2024 |
---|---|
Status |
Done |
Context |
Currently, shepards front- and backend are build, published and run as two separate containers. This leads to effort for administrators because they have to maintain two services in their docker compose file. Even with an integrated image administrators still need to maintain a docker-compose file for the databases and reverse proxy. Exposing two images of basically the same implementation is exposing an implementation detail of shepard to users. Backend and frontend always have the same version number as they share a release process. This could be mitigated by adding a variable to the docker-compose file. Both services have similarities in their configuration, e.g. they both need the OIDC authority. The frontend receives the backend url (which the backend could also use, e.g. for generating a OpenAPI spec with the base URL). Usually Docker containers should follow the single responsibility principle and have one process per container. From https://docs.docker.com/config/containers/multi-service_container/:
The frontend does not have it’s own process apart from nginx since it’s only static html, css and javascript files. Scaling is easier with separate images. Since there is not a lot of server-side load in the current frontend individual scaling is not important. Building an integrated image involves more effort than publishing two separate images following the best practices of their frameworks. If future frontend developments add separate UIs additional efforts for administrators or efforts in integration are necessary. As a full stack developer I want the current version of the frontend to develop vertical features. |
Possible Alternatives |
|
Decision |
We keep the seperate images for now and will revisit the topic when we work on the facilitate deployment. By then, we expect to have a new frontend setup, so that we also save duplicate efforts by postponing the topic for now. |
Consequences |
|
9.5. ADR-004 Prefer Constructor Injection
Date |
19.08.2024 |
---|---|
Status |
Done |
Context |
Quarkus supports dependency injection that we want to use. It supports loosely coupled components providing a better flexibility, modifiability and testability. In general there are two possibilities to use DI, constructor and member injection. |
Decision |
We decided to use constructor injection.
|
Possible Alternatives |
Using the @Inject annotation on non private members.
|
Consequences |
|
9.6. ADR-005 Frontend Tech Stack
Date |
30.08.2024 |
---|---|
Status |
Done |
Context |
shepard provides a frontend. Until now, the frontend basically provides a UI for the backend API. In the future, the frontend will provide useful features easing the interaction with shepard, especially for non-tech-savvy users. The application may also plot timeseries data. The available data may contain a lot more data points than the amount required for a rendered graphic. We don’t necessarily need SEO or server-side rendering. We also want to achieve benefits for API users when developing server-side code for the frontend. Since there is not a lot of developers working on the project, maintainability is very important. For OIDC, authentication with client secrets may be needed to operate with more OIDC providers. We want an easy to understand and maintainable structure for the frontend. The current frontend is written in Vue.js 2. Vue.js 2 reached End of Life in the end of 2023. It already uses the composition API to ease migration to Vue.js 3. It is not possible to update to TypeScript 5 due to incompatibilities. When updating to Vue.js 3, vue router and Vuex have to be updated or replaced. Because of the already existing frontend and the experience in the dev team, we want to stay in the Vue ecosystem. As a UI library, BootstrapVue (based von Bootstrap 4) is used. Bootstrap 4 reached end of life in the beginning of 2023. BootstrapVue is incompatible with Bootstrap 5 and cannot be updated. |
Possible Alternatives |
|
Decision |
We decide to use Nuxt as a JavaScript framework because of the broad use, opinionated defaults & structure while still being open for extension, e.g. to choose the best UI library available. As a UI library we choose Vuetify based on it’s versatility and broad use. |
Consequences |
|
9.6.1. Appendix
Vue.js 3 | Vue.js 3 + Nuxt.js | Vue.js 3 + Quasar | |
---|---|---|---|
Short description |
Vue.js is an open-source front end JavaScript framework for building user interfaces and single-page applications. |
Nuxt is a free and open source JavaScript framework based on Vue.js, Nitro and Vite. The framework is advertised as a "Meta-framework for universal applications". It allows server-side rendering and server-side code in API routes. |
Quasar is an open-source Vue.js based framework for building apps with a single codebase. It can be deployed on the web as a SPA, PWA, SSR, to a Mobile App, using Cordova for iOS & Android, and to a Desktop App, using Electron for Max, Windows and Linux. |
Setup & Migration effort |
Folder structure from Vue2 can probably be reused. Setting up a new project with Vue3 recommended defaults is possible with official Vue tools. Well documented migration path for switching from Vue2 to Vue3. |
Setup probably easier than Vue alone because Nuxt comes with a lot of defaults/recommendations for folder structure, routing, state etc. The migration effort may be a little bit higher because the defaults & recommendations may differ from the current application. |
Quasar brings it’s own CLI tooling. Therefore, the initial setup is easily done. Migration is probably harder, since quasar uses it’s own UI framework and we might have to use that. |
Dev Complexity |
Freedom of choice for many project decisions. Allows flexibility when creating applications, but has the risk of making the wrong decisions or implementing features in a non-optimal way (i.e., project structure). If you are already familiar with Vue, there is no need to learn a new framework. |
Added complexity because it’s not just JavaScript on the browser anymore, we have to think about code running on the server and on the client. API routes & middleware may be handy, but provide a second place to implement server-side functionality. |
Quasar offers some functionality over plain Vuejs. Therefore, the complexity might be a little higher. On the other hand, everything comes out of one box, so there is less confusion to find answers to potential questions. |
Dev Joy (awesome tooling) |
New projects should use Vite, which integrates well with Vue (same author). Vue provides its own Browser DevTools and IDE support. With vue-tsc TypeScript typechecking is possible in SFCs. Vue is a well documented framework with a large community and many community tools. |
Integrated tooling and clear structure do spark joy. |
There is only one documentation to be familiar with. However, a potential vendor lock-in might reduce the dev experience. |
Application Structure provided by the framework (Opinionated Architecture) |
Vue does not restrict on how to structure your code, but enforces high-level principles and general recommendations. |
Nuxt comes with a lot of defaults/recommendations for folder structure, routing, state etc. It’s also easier to keep the app consistent with this structure in mind. We have do document less things ourselves when we follow the recommended structure. |
Quasar offers a default structure and recommendations, but without implications for routing. |
OIDC with client secret |
Vue itself does not provide any authentication or OIDC mechanisms. You’d have to rely on external libraries and tools. Those tools probably cannot use a client secret as all code is delivered to the client. |
Can work, probably with nuxt-oidc-auth or authjs. |
Quasar offers no special functionality for authentication. |
Stable, backed by big Community |
According to the StackOverflow survey from 2024, Vue is currently the 8th most popular frontend framework (source). It has a large community and many sponsors. Since Vue3.js is the third iteration of Vue.js, it improved a lot over the years and has solved many previous problems. |
According to stateofjs Nuxt is among the most used meta frameworks. They seem to have learned to provide good major update experience and try to make the next major update to version 4 as pleasant as possible. |
Quasar is well-known and has some well-known sponsors. |
License / free to use |
MIT License |
MIT License |
MIT License |
Server Resource Need |
Even though Vue has support for SSR, its main focus is often on SPA. Therefore, depending on how exactly the frontend is implemented, the server resources may be lower than the resource need of Nuxt or Quasar. |
More resources needed than for hosting an SPA. May need to be scaled individually in bigger setups. |
Probably same as nuxt. Quasar is designed with performance in mind. |
Adminstration Complexity |
Nothing special |
Can probably run just as good as the frontend in a docker compose setup, as long as it doesn’t need to be scaled. |
Nothing special |
Experience in the DEV Team |
Already developed the old frontend in Vue.js 2, Vite, composition API and script setup. Experience with some Vue.js 3 component development. |
Played around with Nuxt a little bit. Previous experience with Next.js and modern JavaScript meta framework approaches. |
Only known from documentation. |
Gut Feeling |
Nuxt integrates on a rather low level, gives us a structure we can follow and integrate into, is in broad use. |
||
Further Notes |
|
|
|
Further resources:
Bootstrap 5 | primevue | vuetify | Nuxt UI | tailwind | |
---|---|---|---|---|---|
Links: |
|||||
Migration effort |
high |
high |
high |
high |
high |
Easy to use / Versatility |
No wrapper library for vue. StackOverflow suggests to use bootstrap components directly in vue templates without wrapper lib. This is not the vue-way to get things working. |
There are many components available and PrimeVue has a direct Vite and Nuxt integration. The tutorials imply that it is extremely easy to create a beautiful webpage. However, it is not so clear how far one can come without paying money on pre-defined layouts and UI building blocks. |
Vuetify seems to be extremely versatile and provides a lot of options and a comprehensive documentation. |
There are quite some components available, theming and customization seems reasonable. |
We would need to define our own UI library, so it’s probably too much effort. |
Theming (setting colors, spacing & global style overrides) |
Bootstrap has predefined themes that can be bought. |
Has styled and unstyled mode for components. Styled mode utilizes pre-skinned components and default primary color. Unstyled mode allows complete control over CSS properties for components or integration of i.e., tailwind css. |
NuxtUI enables to set colors and style overrides in a the Nuxt config (See here). |
||
Custom CSS Styling for Components |
A lot of things can be custoimized via SASS and CSS. |
See unstyled mode above. |
Class attribute can be set on components. Vuetify uses sass and has a lot of utility classes. |
NuxtUI allows to set a class attribute on components to add classes as well as setting a ui prop to define custom styles. |
|
Effort to adapt to potential style guide (consult with UE) |
Can opt-in for styled mode, meaning components come pre-skinned after the Aura style guide (can be changed). |
||||
Backed by large community / future proof |
Bootstrap itself is still popular. However, the BootstrapVue library still struggles with vue3 and bootstrap 4. There are plans to support bootstrap 5, but they are delayed as of now: Roadmap |
||||
License |
MIT License |
MIT License for OpenSource parts (PrimeVue, core, icons, themes, nuxt-module, auto-import-resolver, metadata) |
MIT License |
MIT License (for Nuxt UI without Pro) |
|
Free to use |
Bootstrap is free to use. Predefined Themes can be bought. |
Not all components are free to use. Single PrimeBlocks (UI building blocks) licenses cost $99/ developer, for small teams it is $349. Allows access to the Figma UI toolkit and Vue UI Blocks (UI building blocks). Single layout templates can be purchased on their own. |
Yes |
Not all components. There is a set of components only available in Nuxt UI Pro, especially for dashboards, Layouts etc. Nuxt UI Pro also contains templates. |
|
figma or sketch UI kit available |
There is a Bootstrap 5 UI kit including Bootstrap Icons |
Not for free |
There is a figma ui kit available here. |
||
Gut Feeling |
There are better alternatives available. |
Vuetify is very popular and seems to support a lot of stuff and has extensive documentation. |
Not known yet, may be not grown enough. |
||
Further Notes |
Vuetify also has a plugin for a quick integration into Nuxt, see here. |
9.7. ADR-006 Removing Cpp client from repository
Date |
26.08.2024 |
---|---|
Status |
Done |
Context |
The shepard backend code generates an OpenAPI specification document that represents the REST API with all possible requests, responses and parameters. Using this OpenAPI specification we are able to generate clients that follow this definition of requests. These clients are automatically generated and are able to communicate with the shepard REST API. The clients are generated by an external tool called OpenAPI Generator. This tool allows to generate these client in multiple programming languages. Until now, we supported, maintained and provided clients for Java, Python, TypeScript and C++. The reasoning for the choice of these clients is provided in the Appendix A. |
Decision |
We decided to remove the Cpp client from the shepard repository. This immediately takes effect. The last valid Cpp client package is the one from the 2024.08.05 release. Meaning that future releases no longer provide a working Cpp client. This decision was made for two major reasons. First, the general usage of the Cpp client is low, since the Cpp client was introduced for a few specific use-cases. Meaning that the client is rarely used and has less importance than the other clients. Second, the amount of work to maintain the Cpp client has gotten too large. It is hard to maintain and easy to break. We encountered problems with the client generation due to changes in the OpenAPI specification. For all clients, these specification changes resulted in breaking changes on the clients. For the other clients (Java, Python, TypeScript) these breaking changes can be documented and fixed by end-users to keep a working version of the client. Even more important, the client building/ compiling is not affected by the OpenAPI changes. So the behavior of these clients changed, but the clients themselves are still working and can be be built. The OpenAPI changes however have a different impact on the Cpp client. Meaning that the compilation of the Cpp client fails, which renders it useless for now. The rest of this section provides a technical overview of the specific problems that occur when compiling the Cpp client.
The main problem here is the implementation of enum types in the OpenAPI generator.
The following snippet shows how older versions of the backend (pre-quarkus) generated enum types like this Previous OpenAPI Enum Declaration
In the OpenAPI specification that is generated by the new quarkus backend, most enum types have their own type and are defined like this: Quarkus OpenAPI Enum Declaration
Even though these two OpenAPI specifications are semantically the same, the Cpp client building fails, because the OpenAPI generator is not able to implement certain methods like a Fixing the client compilation by manually patching the client is possible. However, this patching requires a larger amount of work to maintain the patch for every release of shepard. |
Possible Alternatives |
It is possible for end-users, who still want to use the Cpp client, to come up with their own patches. Basically there are three approaches:
|
Consequences |
|
Appendix A: Generated Clients
Client Language |
OpenAPI Generator |
Reason for Usage |
Python |
Python is a well-known and accepted programming language in the scientific community. Many researchers have experience in Python. Furthermore, the Python ecosystem and its community is well established and provide many resources to learn Python. Additionally, Python is a programming language that allows fast prototyping, since it is an interpreted language and typing is optional. |
|
TypeScript |
TypeScript as a typed superset of the JavaScript programming languages. One of its main purposes is to created web oriented applications. In the shepard project the TypeScript client is used as a library in the shepard frontend, which implements functions and types needed for communicating with the backend REST API. This saves the time and effort when developing the shepard frontend, since every change of the REST API is automatically reflected to this client through the OpenAPI specification. |
|
Java |
Can be seen as an alternative to Python. It is widely adopted and used. Many people have a experience in Java. It has a large ecosystem and is a good fit for standalone applications. |
|
Cpp/ C++ |
Generally, C++ enables creating performant standalone applications. In this project, the Cpp client was used for some specific use cases and not selected based on other general factors. |
9.8. ADR-007 Client Generation for Frontend
Date |
30.08.2024 |
---|---|
Status |
Done |
Context |
Until now, the frontend used the published typescript client to interact with the backend. With the change to the monorepo, we could make atomic changes to the backend and frontend in one commit. This behaviour is not possible though, because the frontend needs the updated client package. In order to mitigate this, we want to make the typescript client used by the frontend more dynamic. For the generation of the client either Docker, Podman or Java is required. |
Possible Alternatives |
Provide a script to (re)generate the client based on a local or remote backend instance and
|
Decision |
We decide to go with option 2 and put the generated client under version control. |
Consequences |
|
9.9. ADR-008 Database Target Architecture
Date |
17.09.2024 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Status |
Done |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Context |
Current state At the moment shepard uses three different databases:
What was the reason for choosing different databases?
Known issues
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Possible Solutions |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Decisions |
Decision 1: Leave it as it is This is not an option because of known issues with InfluxDB. We have to find a solution at least for that database. Decision 2: Meta Data in Neo4j or Postgres
On the green field Postgres might be the better option with less maintenance effort and it’s big ecosystem. In the context of shepard we already have Neo4j, we would need to migrate data, the experience in the team is bigger for Neo4j. All in all we decide to continue with Neo4j. Decision 3: Database for Timeseries & Spatial Data
As MongoDB does not seem to perform well for timeseries and spatial data we decide to store timeseries (and in the future spatial data) in postgres with timescaledb and PostGIS. Decision 4: Database for Files & Structured Data
Postgres supports two ways for storing binary data link (bytea column and LargeObject API). For large files we have to use the LargeObject API. But in both cases the data is stored in a single table. For tables we have a limitation of 32 TB (per-table size limitation). If we want to store multiple projects in one shepard instance, we might exceed this limit. So we are not able to store large objects in postgres. The decision is to stay with MongoDB for files and structured data. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Consequences |
|
9.10. ADR-009 Nuxt OIDC Library
Date |
17.09.2024 |
---|---|
Status |
Done |
Context |
We want to implement authentication using OIDC in the new Nuxt-based frontend. We expect to authenticate with an existing Keycloak instance, similar to the old frontend. In the future authentication with client secrets may be needed to operate with more OIDC providers. For Nuxt 2 there was an auth module that is not yet available for Nuxt 3. |
Possible Alternatives |
|
Decision |
We decide to go with @sidebase/nuxt-auth for its support of multiple built in providers including Keycloak and the superior documentation and community support. |
Consequences |
|
9.11. ADR-010 Postgres/Timescaledb Image
Date |
07.10.2024 |
---|---|
Status |
Done |
Context |
We need to deploy a postgres image with the timescaledb plugin. |
Possible Alternatives |
|
Decision |
We decide to use the timescale docker image due to simplicity. |
Consequences |
We may need to adapt the setup in the future in case we need additional plugins. |
9.12. ADR-011 Timescale database schema
9.13. ADR-012 Lab Journal feature
9.14. ADR-013 Editor Library
Date |
26.11.2024 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Status |
Done |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Context |
For the lab journal feature, we need an editor in the new frontend. In the old frontend, tiptap was used to edit & render descriptions of collections and data objects. Editing lab journals needs more features than the descriptions in the old frontend, e.g. tables & images. In the new frontend, we want one editor for both descriptions and lab journal entries for consistency. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Possible Alternatives |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Decision |
We decide to go with tiptap since
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Consequences |
|
11. Risks and Technical Debts
11.1. Risks
Propability: very unprobable, unprobable, unsure, propable, very propable
Costs: critical, expensive, unsure, cheap, negligible
ID | Name | Description | Possible Actions | Probability | Costs | Priority |
---|---|---|---|---|---|---|
11.2. Technical Debt
Software systems are prone to the build up of cruft - deficiencies in internal quality that make it harder than it would ideally be to modify and extend the system further. Technical Debt is a metaphor, coined by Ward Cunningham, that frames how to think about dealing with this cruft […] - 21 May 2019, Martin Fowler (https://martinfowler.com/bliki/TechnicalDebt.html)
11.2.1. Technical Debt Process
The following process handles all technical debt except for dependency updates, which have their own process. Besides dependency updates, every other technical debt is to be documented in the form of an issue in the backlog.
If a technical debt has been identified and a corresponding issue has been created in the backlog, the usual planning and prioritization process takes care of this debt. This makes the backlog the single source of truth for all known and unresolved technical debt.
Usually, the technical debt can be resolved in this way. In rare cases, it can happen that we want to keep this debt or decide that the debt is not really a problem for us. In these cases, the situation needs to be described in the table below and the corresponding issue can then be closed.
ID | Name | Description | Solution Strategy | Priority |
---|---|---|---|---|
1 |
Missing permissions |
Back in the time when we started developing shepard, there was no authorization implemented. Therefore, not all Collections or Containers are guaranteed to have permissions attached. There is a fallback solution implemented in shepard to take care of such situations. We decided that |
A database migration could be implemented to add empty permissions to all entities. However, we should not make any assumptions about the actual permissions to avoid breaking existing datasets. |
Low |
2 |
Cryptic usernames |
Depending on how the OIDC identity provider is configured, shepard sometimes generates very cryptic user names. The username is retrieved from the |
Since the username is used to identify users in shepard, it is not so easy to change it. A migration would be possible if shepard could fetch all available users from the identity provider and then migrate all users at once. However, this is not possible with the current configuration. Keycloak adds a |
High |
12. Glossary
Term (EN) | Term (DE) | Definition | Sources |
---|---|---|---|
AFP |
AFP |
Automated Fiber Placement |
|
Collection |
Collection |
A collection consists of multiple Data Objects. |
|
Container |
Containers allow users to store data. There are different types of containers, e.g. TimeseriesContainer, StructuredDataContainer and FileContainer. |
||
Context |
Context |
The context defines which Data Objects belongs together and are related to an experiment. |
|
Data Management Plan |
Datenmanagementplan |
||
Data Object |
Data Object |
Represents one piece of information. A DataObject belongs to exactly one Collection. DataObjects can have multiple attributes describing the information. A DataObject may have predecessors, successors or childs. There is only one parent allowed. |
|
End effector |
Endeffektor |
The device at the end of a robotic arm |
|
Entities |
Entities are used to manage connections between payloads. It is an abstract term. The concrete instances are Collections and DataObjects. |
||
Experiment |
Experiment |
An Experiment is a period in time where data is collected and stored for further investigation. |
|
FAIR data principles |
FAIR Prinzipien |
FAIR data are data which meet principles of findability, accessibility, interoperability and reusability. The FAIR principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data. |
|
NDT: Non destructive testing |
NDT: Non destructive testing |
Tests with ultrasound, for example, that do not destroy the component |
|
Ply |
Schicht |
One layer of tapes that lie next to each other. |
|
Reference |
Reference |
A reference connects a Data Object to a concrete value type like documents, urls, timeseries, etc. A Data Object can have multiple References. |
|
Shepard |
Shepard |
Acronym for: Storage for HEterogeneous Product And Research Data |
|
Organizational Element |
Organisationselement |
Describe a group of elements that help organizing and structuring the uploaded data. These elements are Collections, Data Objects and References. |