Metacat

Introduction

Metacat, a metadata service to make data and metadata easy to discover, process and manage. Metacat supports many datasources as backend.

Datasheet

Status: 14.06.2022

Homepagehttps://knb.ecoinformatics.org/knb/docs/
Descriptionhttps://knb.ecoinformatics.org/knb/docs/intro.html
Codehttps://github.com/NCEAS/metacat
CommunitiesDataONE
Version2.18.0 (released on 19.05.2022)

Features

Status: 15.02.2022

Supported Schema(s)DTD
Supported Format(s)XML
Interface(s)REST/Thrift interfaceseveral implementations are available
Open Sourceyes
LicenseGPL 2.0
Versioningyeshistory of documents
AAIyesinternal password file or LDAP
External Storageyessupports many storage systems as backend. (Amazon S3 (via Hive), Druid, Elasticsearch, Redshift, Snowflake and MySQL)
ReferencableyesDOI

Description

  • Register Schema:
    • Support for arbitrary schemas of a specific format (e.g. JSON Schema, XSD)
    • The schema should at least be referencable by a unique identifier.
  • Update Schema:
    • Possibility to
      • work on different versions of a schema
      • adapt schemas over time
  • Validate Schema:
    • Check schema for correct syntax
  • Ingest Metadata:
    • Store metadata (document) in repository
      • Ideally with previous validation
  • Update Metadata:
    • Possibility to update already ingested metadata (documents).
  • Validate Metadata:
    • Possibility to validate documents on the basis of registered schemas.
  • Search by Administrative MD:
    • Search documents by their metadata (e.g. ingest date, ingester, ...)
  • Search by Content:
    • Search documents by their content
  • Persistent Identifier:
    • Support for Persistent Identifiers (e.g. DOI, Handle)

Additional Features

Status: 25.02.2022

  • Support OAI-PMH (oai_dc, EML)

Functionality

Status: 15.02.2022

FunctionSupportedRemarks
Register SchemaoStore DTD(s) as package
Update Schemao
Validate Schema-
Ingest Metadata+
Update Metadata+
Validate Metadata+provide DTD(s)/package
Search by ...
... Administrative MD+filter
... Content+pathquery (similar to XPath)
since version 2.1 SOLR is used for indexing (DataONE out of the box but also own documents by configuration)
Persistent Identifier+DOI using the EZID service

Remarks

At a higher level, Metacat features can be categorized as follows:

  • Data abstraction and interoperability
  • Business and user-defined metadata storage
  • Data discovery
  • Data change auditing and notifications
  • Hive metastore optimizations