-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Questions & Todo:
- Discuss how Annotations should be implemented in HashStore
- What format should we use to store annotation content in
/hashstore/metadata? JSON-LD or EML? - What is HashStore's responsibility when storing annotations?
- Is the EML document already formed at this point?
- Where is the content coming from?
- Who currently creates the EML documents to be stored?
- Summarize issue discussion into substorage design document
Initial Proposal to kickstart the conversation (the content below is not final, and will likely change):
- A dataset that is represented by an EML document can be broken down to 2 components:
- Attributes that describe the dataset (ex. title, author, method, keywordSet, etc.)
- Attributes that represent the tables associated with the dataset (ex. dataTable, otherEntity, etc.)
- A
HashStore annotationis a mapping document that should consist of a single parent member and a list that represents the child members- This document's location in
hashstore/metadatais formed by calculating the SHA-256 hex digest of a givenpidandformatId- The parent member's value is the id (location) of the parent metadata document in
hashstore/metadata
- The id/location/address of this document is formed by calculating the SHA-256 hex digest of a givenpid,formatIdand the string "parent".Ex. sha-256(pid + formatId + "parent")
- This document is composed of the attributes/content that describe the dataset (ex. title, author, method, keywordSet, etc.) - The List/HashMap of child members are represented with a number as the key, and the id (location) of the child's metadata document in
hashstore/metadataas the value
- The id/address of each child is formed by calculating the SHA-256 hex digest of a givenpid,formatIdand(int) key.Ex. sha-256(pid + formatId + 0)where 0 is the first table in the dataset
- Each child represents a data table in the dataset, or chunk of data that belongs to the dataset
- The parent member's value is the id (location) of the parent metadata document in
- This document's location in
- Note: The format of the parent/child metadata documents to be stored/chunked requires further discussion/clarification
---
title: HashStoreAnnotation Class
---
classDiagram
direction RL
class HashStoreAnnotation{
+String Parent
+List~Dict/KVP~ Children
+setParent(string)
+setChildren(List)
+getContent()
+setContent()
+getChildrenTotal()
}
Example/flow to store an annotation document:
hs_annotation = HashStoreAnnotation()
// Get and store parent content
// Get and store children content
// Get parent location
dataset_parent = sha-256(pid + formatId + "parent")
// Create child list
dataset_children = [
{0: sha-256(pid + formatId + 0)},
{1: sha-256(pid + formatId + 1)},
...
]
hs_annotation.setParent(dataset_parent)
hs_annotation.setChildren(dataset_children)
// getContent() will format the document to be written based on the chosen format
hs_annotation_content = hs_annotation.getContent()
hashstore.store_metadata(pid, hs_annotation_content, formatId)
Example/flow to work with/retrieve an annotation document:
// Retrieve the mapping document
hs_annotation_stream = hashstore.retrieve_metadata(pid, formatId)
hs_annotation = HashStoreAnnotation.setContent(hs_annotation_stream)
hsa_parent = hs_annotation.parent
hsa_children = hs_annotation.children
// Iterate over the first 1000 table items
for i in range(0, 1000):
rel_path = shard(hsa_children[i])
location = `/hashstore/metadata/` + rel_path
// ... Do what we will with each child element
Metadata
Metadata
Assignees
Labels
No labels