About
This cookbook shall guide the reader through the different use cases when registering, updating metadata/schema documents.
Target audience
This specification is targeted at people who:
- People who need to know how this service can be used managing metadata/schema documents.
About us
Data Expoitation Methods (DEM), is part of the Steinbuch Centre for Computing (SCC), located at Karlsruhe Institute of Technology (KIT).
MetaStore
Metastore is a metadata repository for managing a large number of metadata documents. While registering metadata documents they are formally quality-controlled, persistently stored and then accessible via a unique identifier. In addition, Metastore allows versioning of metadata documents. A schema formally defines the structure of a metadata document. The internal schema registry (providing XML and JSON) manages these schemas by registering, (persistent) storing and (if necessary) versioning them. All metadata documents are linked to a schema that is used for validation during ingest. For content search of metadata documents an additional indexing service is available to transform the metadata documents to make them ready for elasticsearch. It also provides an easy to use GUI for creating and editing documents.
When to use MetaStore
If at least first point is valid:
- You want to manage XML/JSON documents build on a given XSD/JSON Schema.
- You want to manage huge amounts (up to millions) of documents.
- You want to share (some of) your (schema) documents with others.
- You want to make (some of) your (schema) documents public available.
- You want to make them referencable.
- You want to update your (schema) documents with or without simple versioning.
- You want to use your own (extended) schema.
When NOT to use MetaStore
Don't use MetaStore if
- you want to use plain documents not based on a schema.
- you want use JSON-LD (take a look at Coscine provided by RTWH Aachen university).
At a Glance
MetaStore is a general purpose metadata repository and schema registry service.
It allows you to
- register an (XML/JSON) schema
- update an (XML/JSON) schema
- add metadata documents linked with a registered schema
- validate metadata documents against a registered schema
- update/add metadata documents online via a GUI
MetaStore (Technical View)
MetaStore provides several interfaces:
- REST interface for managing documents
- OAI-PMH for harvesting metadata documents for given schemas
- WebUI for managing documents without any further requirements.
Cookbook
This cookbook will give you an overview how the MetaStore will help you managing your metadata documents. For a lightweight and easy to follow guide a very simple schema is used to focus on the possibilities provided by MetaStore.
Ingredients
- (local) MetaStore instance
- Web Browser
Work Steps
- Working with schema( document)s
- Working with metadata documents
- Managing Access Rights
The following steps will be explained in detail on the next pages:
First of all we have to register a schema which defines the structure of the metadata documents.
Working with schema( document)s
First of all we have to register at least one schema which defines the structure of the metadata documents.
Ingredients
- (local) MetaStore instance
- Web Browser
Recipes
The following steps will be explained in detail on the next pages:
Select/create a Schema
Before starting ingesting metadata documents in MetaStore an appropriate schema should be selected. To do this, some fundamental decisions must be made.
Target Audience
- Data Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
Work Steps (Summary)
Select/create a Schema
- Step 1: Which metadata do I need?
- Step 2: Search for existing metadata standards
- Step 3: Create your own schema (optional)
Step 1: Which metadata do I need?
Metadata should describe the associated data in such a way that it can be intrepreted without further information and also reproduced if necessary. First of all identified metadata entries should be collected.
Step 2: Search for existing metadata standards
In most cases there is already at least one schema document fulfilling your needs containing all entries collected in beforehand.
But how to find this schema?
The Metadata Standards Catalog may be a good starting point. There exists several instances:
There may be other catalogs related to a specific domain.
NOTE: Not all schemas listed there might fit to MetaStore as MetaStore only supports JSON Schema and XML Schema.
If there is no schema fulfilling your needs look for the schema document fitting best and extend the given schema. (See 'Extend an existing schema document')
Step 3: Create your own schema (optional)
If there is really no fitting schema a new schema has to be created.
There are two possibilities:
If you are already familiar with one of these two possibilities you should use this. Otherwise, you should use JSON Schema as it is easier to create and is supported out of the box by the GUI delivered with MetaStore.
There are several tools for creating a schema from scratch:
- JSON Schema
- XML Schema
Let's start with a simple JSON Schema containing
- author
- title
- creation date
The JSON document may look like this:
{
"author": "Last Name, First Name",
"title": "My first document",
"creation_date": "2022-04-11"
}
This example results in the following schema using the first link given above:
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "The root schema",
"description": "The root schema comprises the entire JSON document.",
"default": {},
"examples": [
{
"author": "Last Name, First Name",
"title": "My first document",
"creation_date": "2022-04-11"
}
],
"required": [
"author",
"title",
"creation_date"
],
"properties": {
"author": {
"$id": "#/properties/author",
"type": "string",
"title": "Author:",
"description": "The person who wrote the document.",
"default": "",
"examples": [
"Last Name, First Name"
]
},
"title": {
"$id": "#/properties/title",
"type": "string",
"title": "Title:",
"description": "The title of the document.",
"default": "",
"examples": [
"My first document"
]
},
"creation_date": {
"$id": "#/properties/creation_date",
"type": "string",
"title": "Creation Date:",
"description": "Creation date in the format 'YYYY-MM-DD'.",
"default": "2022-01-01",
"examples": [
"2022-04-11"
]
}
},
"additionalProperties": false
}
Note: The following JSON Schema specifications are supported by the library used:
- Draft v7
- Draft v2019-09
Recommendations for JSON Schemas
If the scientists will later use the web interface to store their metadata documents, here are a few hints to make their work a little easier. As the web interface uses the entries from the schema to create a form from them, some fields should be filled with meaningful values. Each property of the schema contains some optional entries:
- title - The text of the title will be displayed above the input field.
- description - The text of the description will be displayed below the input field and should contain some additional infos.
- default - The input field will be prefilled with the default value. This might be helpful if a field mostly contains a fixed value.
- enum - Defines an array of possible values which is displayed as a dropdown menu.
The entry 'additionalProperties' at the end of the schema should be always 'false' to forbid custom entries.
The used schema above will result in the following form:
Register a new schema document
Once a schema is choosen it may be registered in MetaStore (if not already done).
Target Audience
- Data Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
- Selected schema from the previous step stored on local disc
Work Steps (Summary)
Register a new schema document
- Step 1: Check if schema already exists?
- Step 2: Create new schema
Step 1: Check if schema already exists?
Right now it is difficult to find and/or compare schemas. It is strongly recommended to use the standard labels of the schema when registering them. (E.g.: if you select a schema from the Metadata Standards Catalog use the short name or abbreviation of the schema as identifier (e.g.: Dublin Core -> schemaID = DublinCore).
Step 2: Create new schema
Register schema if it not already exists.
Click on 'Register new Schema' to register a new schema. Fill the form with the metadata for the schema document. The following values are mandatory:
- SchemaId
- Mime Type: application/json or application/xml
- Type: JSON or XML
- Choose schema file via file chooser dialog.
Afterwards the form should look like this:
Note: If no authentication is made, all users are registered as 'SELF'. Otherwise the one who registers the schema gets the administration rights.
After clicking 'CREATE' the new schema will be listed inside the table:
Congratulations, you successfully registered your first schema!
Update an existing schema document
Sometimes there are some typos in the registered schema or you want to extend it with optional fields the schema may be updated. Therefor its version will be incremented. Nevertheless all metadata documents already linked to the old version are still valid as the version is part of the reference. The metadata documents and their metadata stayed untouched.
Target Audience
- Data Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
- Updated schema stored on local disc
Work Steps (Summary)
Update an existing schema document
- Step 1: Download existing schema
- Step 2: Update existing schema locally
- Step 3: Upload updated schema
Step 1: Download schema
Right now editing schemas online is not supported. If the schema is no longer locally available it has to be downloaded from MetaStore. To do so you have to do the following steps:
- Press the 'View' icon of the wanted schema in the schema list.
- Copy 'Schema Document Uri' to the address bar of your browser.
- Right click page and select 'Save page as...' in the context menu.
Step 2: Update existing schema locally
Extend the existing schema by an optional field containing notes. The updated schema will look like this: This example results in the following schema using the first link given above:
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "The root schema",
"description": "The root schema comprises the entire JSON document.",
"default": {},
"examples": [
{
"author": "Last Name, First Name",
"title": "My first document",
"creation_date": "2022-04-11"
}
],
"required": [
"author",
"title",
"creation_date"
],
"properties": {
"author": {
"$id": "#/properties/author",
"type": "string",
"title": "Author:",
"description": "The person who wrote the document.",
"default": "",
"examples": [
"Last Name, First Name"
]
},
"title": {
"$id": "#/properties/title",
"type": "string",
"title": "Title:",
"description": "The title of the document.",
"default": "",
"examples": [
"My first document"
]
},
"creation_date": {
"$id": "#/properties/creation_date",
"type": "string",
"title": "Creation Date:",
"description": "Creation date in the format 'YYYY-MM-DD'.",
"default": "2022-01-01",
"examples": [
"2022-04-11"
]
},
"note": {
"$id": "#/properties/note",
"type": "string",
"title": "Additional note:",
"description": "Add some additional notes here. (optional)",
"default": ""
}
},
"additionalProperties": false
}
As the entry 'notes' is not defined in the 'required' array this entry may be optional. Store the new schema in a file named 'my_first_json_schema_version_2.json'.
Step 3: Upload updated schema
Click the pencil of the appropriate schema and a form will open
After selecting updated schema press 'Update Schema'. Now the version of the schema was increased.
The new form for the metadata documents now looks like this:
Congratulations, you successfully updated your schema!
Extend an existing schema document
In case a suitable (JSON/XML) schema has been found, but some important information is missing, it is strongly recommended not to extend this schema by means of an update. Since old schemas are then no longer compatible with the new version, in such a case a new schema should be registered based on the existing one.
Target Audience
- Data Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
Work Steps (Summary)
Extend an existing schema document
- Step 1: Fetch selected schema
- Step 2: Extend schema
- Step 3: Register new schema
Step 1: Fetch selected schema
First of all the choosen schema has to be stored locally. If the schema is already registered in the MetaStore please refer to Step 1: Download schema.
Step 2: Extend schema
Add the missing fields to the schema and mark them as 'required'. Extend previous schema with a mandatory 'abstract' entry this would look like this:
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "The root schema",
"description": "The root schema comprises the entire JSON document.",
"default": {},
"examples": [
{
"author": "Last Name, First Name",
"title": "My first document",
"creation_date": "2022-04-11"
}
],
"required": [
"author",
"title",
"abstract",
"creation_date"
],
"properties": {
"author": {
"$id": "#/properties/author",
"type": "string",
"title": "Author:",
"description": "The person who wrote the document.",
"default": "",
"examples": [
"Last Name, First Name"
]
},
"title": {
"$id": "#/properties/title",
"type": "string",
"title": "Title:",
"description": "The title of the document.",
"default": "",
"examples": [
"My first document"
]
},
"abstract": {
"$id": "#/properties/abstract",
"type": "string",
"title": "Abstract:",
"description": "Abstract of the document. (< 2000 characters)",
"default": ""
},
"creation_date": {
"$id": "#/properties/creation_date",
"type": "string",
"title": "Creation Date:",
"description": "Creation date in the format 'YYYY-MM-DD'.",
"default": "2022-01-01",
"examples": [
"2022-04-11"
]
},
"note": {
"$id": "#/properties/note",
"type": "string",
"title": "Additional note:",
"description": "Add some additional notes here. (optional)",
"default": ""
}
},
"additionalProperties": false
}
Step 2: Create new schema
Register new schema.
Click on 'Register new Schema' to register a new schema. Fill the form with the metadata for the schema document. The following values are mandatory:
- SchemaId
- Mime Type: application/json or application/xml
- Type: JSON or XML
- Choose schema file via file chooser dialog.
Afterwards the form should look like this:
Note: If no authentication is made, all users are registered as 'SELF'. Otherwise the one who registers the schema gets the administration rights.
After clicking 'CREATE' the new schema will be listed inside the table:
Metadata Management
When at least one schema is registered metadata documents could be managed by MetaStore. All metadata documents stored in MetaStore are successfully validated against a registered schema.
Ingredients
- (local) MetaStore instance with at least one registered schema introduced in section 'Schema Management'
- Web Browser
Recipes
The following steps will be explained in detail on the next pages:
Ingest a metadata document
MetaStore will not provide plain text files. As only JSON and XML are supported the metadata documents have to fulfill a given (JSON/XML) schema. The GUI has full support of JSON documents.
Target Audience
- (Data) Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
- Already registered schema (see 'Schema Management')
- Data related to the metadata
Work Steps (Summary)
Ingest a metadata document
- Step 1: Check if schema is already registered?
- Step 2: Select registered schema
- Step 3: Create metadata document
Step 1: Check if schema is already registered?
If the schema your metadata document should use is not already registered please have a look at chapter 'Select/create a Schema'.
Step 2: Select registered schema
Note the 'Schema Record Identifier' of selected schema or its URL.
Step3: Create metadata document
Fortunately the GUI of MetaStore supports users to input metadata document.
- Select 'Metadata Management'
- Click on 'Register new metadata document'
- Fill form like seen below
Notes:
- If no authentication is made, all users are registered as 'SELF'. Otherwise the one who registers the schema gets the administration rights.
- Identifier is optional and if given it has to be unique. (If no identifier is specified it will be set to a UUID.)
- It's recommended to use a URL as identifier for the related resource.
- It's recommended to use the previously noted 'Schema Record Identifier' as identifier. 'INTERNAL' must then be selected as 'Identifier Type'.
- In this step the first version of the schema will be selected. Therefore no notes are possible.
After clicking 'Show Input Form' the GUI creates an input form which queries all entries. The filled form may look like this:
The new metadata document will be listed inside the table:
Congratulations, you successfully registered your first metadata document!
Read a metadata document
If a metadata document has been ingested, it can also be viewed via the browser, provided the appropriate rights have been granted.
Target Audience
- Data Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
Work Steps (Summary)
Read a metadata document
- Step 1: Search for metadata document
- Step 2: View metadata document (form)
Step 1: Search for metadata document
The entries in the metadata document list can be sorted ascending or descending by selecting by each column. If you want to list only entries of a specific schema you may
- add '?id=My_first_JSON_Schema' to the URL in the address bar
or
- go to tab 'Schema Management'
- search for the wanted schema
- click
Step 2: View metadata document (form)
To view the record and its document you have select . The record of the entry will be viewed:
Clicking on 'Metadata Document' will show the form filled in beforehand:
Update an existing metadata document
Sometimes there are some typos in the ingested metadata document or you want to extend it with optional fields of a newer version of a schema. Therefor its version will be incremented. Nevertheless the old version of the metadata document is still available.
Target Audience
- (Data) Scientists
Ingredients
- (local) MetaStore instance
- Web Browser
- Registered schema
- Metadata document stored in MetaStore
Work Steps (Summary)
Update an existing schema document
- Step 1: Update metadata record
- Step 2: Update metadata document
Note:
- Right now record and document cannot be updated in one step if you want to use an updated schema.
Step 1: Update metadata record
Click on to get a form with the metadata record. Change like shown below:
Click on 'Show Metadata Document' and then click 'Update' to update metadata record.
Note: The version still remains the same but 'Schema version' was updated.
Step 2: Update metadata document
Click once more on to get a form with the metadata record. As the schema version was already changed you can directly click on 'Show Metadata Document' The new form will be showed:
Add some notes and 'Update'
Congratulations, you successfully updated your metadata document!
Access Rights Management
The MetaStore can be used with or without authentication. If authentication is enabled, the creator of he documents is automatically granted administration rights.In the other case, everyone(SELF) who has access also has administration rights. Each user has a SID (unique identifier) which identifies you. If no credentials are available, access is automatically granted as an anonymous user (SID: anonymousUser).
There are four levels of access rights:
ACCESS LEVEL | Description |
---|---|
NONE | No access is allowed. (This is the standard level for all SIDs not mentioned in the access control lists (ACLs).) |
READ | User is allowed to read records and documents but not to change them. |
WRITE | Extend the 'READ' access level. The user can additionally make changes except changing the access rights. |
ADMINISTRATE | Extend the 'WRITE' access level. The user may also change access rights. |
Ingredients
- (local) MetaStore instance with at least one registered schema and related metadata document.
- Web Browser
Recipes
The following steps will be explained in detail on the next pages:
Make Document available for other Users
By default, only creators have access to your recorded schemas/documents. To allow other users to access your documents, you need to update the access rights.
Target Audience
- (Data) Scientists
Ingredients
- (local) MetaStore instance with at least one registered schema and related metadata document.
- Web Browser
- SID of all users you want to grant access.
Work Steps (Summary)
Make Document available for other Users
- Step 1: Check for SID of user(s)
- Step 2: Update record for Digital Object
Step 1: Check for SID of user(s)
First of all, the users who are to be granted access to your document must be asked for their SID.
Step 2: Update record for Digital Object
Click on to get a form with the schema/metadata record. Click 'ADD' to add another entry to the ACL, insert the correct SID and grant the appropriate permission. Repeat this for all users you want to add.
This may look like this:
Click on 'Update Schema' or in case of a metadata document click on 'Show Metadata Document' and afterwards 'Update' to update schema/metadata record.
That's it! The users you add can now also access (dependent of their permissions) the selected document.
Make Document public available
By default, unauthenticated users do not have access to a document. If you want to want to make it publicly available, you must update the access rights.
Target Audience
- (Data) Scientists
Ingredients
- (local) MetaStore instance with at least one registered schema and related metadata document.
- Web Browser
Work Steps (Summary)
Make Document public available
- Step 1: Update record for Digital Object
Step 1: Update record for Digital Object
Click on to get a form with the schema/metadata record. Click 'ADD' to add another entry to the ACL, insert 'anonymousUser' as SID and grant the appropriate permission.
This may look like this:
Note:
- You should never grant higher permissions than 'READ' to unauthenticated users.
Click on 'Update Schema' or in case of a metadata document click on 'Show Metadata Document' and afterwards 'Update' to update schema/metadata record.
That's it! All users can now also access (dependent of their permissions) the selected document.
Make Document public available
By default, unauthenticated users do not have access to a document. If you want to want to make it publicly available, you must update the access rights. If you already granted unauthenticated users access to a document you have to forbid any access to them by setting the permission to 'NONE'.
Target Audience
- (Data) Scientists
Ingredients
- (local) MetaStore instance with at least one registered schema and related metadata document.
- Web Browser
Work Steps (Summary)
Make Document public available
- Step 1: Update record for Digital Object
Step 1: Update record for Digital Object
Click on to get a form with the schema/metadata record. Set permission for 'anonymousUser' to 'NONE'.
This may look like this:
Click on 'Update Schema' or in case of a metadata document click on 'Show Metadata Document' and afterwards 'Update' to update schema/metadata record.
That's it! Unauthenticated users are no longer allowed to access the selected document.