While there is a great amount of flexibility in deposit structure in order to accommodate diverse workflows, MorphoSource has preferences and recommendations for what to include and not to include in terms of "raw" and "derivative" data. These recommendations are based on maximizing meaningful reproducibility and reuse potential while minimizing deposit storage requirements. When contributors pay for their data storage, MorphoSource is more open to inclusion of large, but rarely critical raw files, though we may still ask contributors tothink to think about whether it is in their best interest to use their storage space on them. Whether contributors are paying for storage or not, when special conditions exist requiring more extensive and larger or more limited, derivative deposits, MorphoSource will accept these if the contributors make a strong case for deviating from the preferred structures described below.
There is a lot of information on this page, and if you are feeling overwhelmed you may want to first review this page on Example Deposit Structures.
We describe preferences here in terms of an anticipated derivative chain while being somewhat ambiguous about whether data are technically "raw" or "derived"
...
Modality | Non-preferred PRIMARY data | Rationale for non-prefered PRIMARY data | preferred PRIMARY data | non-preferred SECONDARY data | preferred/acceptable SECONDARY data | TERTIARY data |
---|---|---|---|---|---|---|
Ct/MRI scans | Too raw:
Too derived:
Other:
| Too raw: These files (1) are very large and may be 2-5 times the size the "preferred" image stacks; (2) raw scanner output may often include multiple specimens that were scanned together for efficiency; (3) scanner raw data cannot be visualized 3-dimensionally without further processing and critical metadata values to allow successful processing are not reliably available; (4) our user communities do not request these files or (as far as we know) work with them aside from when they first process scanner output into image stacks. Too derived: These files have been processed too much to effectively communicate: (1) the quality or limitations of the raw data and more primary derivatives, (2) to have very good reuse potential Other: Reconstructed image stacks that (1) include lots of "empty space" are a waste of server space and should be cropped to minimize this prior to upload; (2) include multiple specimens break the data model of MorphoSource and are forbidden. (3) Proprietary formats have poor preservation and reuse potential. Their use also deepens inequities between users who can afford expensive software and those who cannot. |
| Too derived:
Other:
|
|
|
Photogrammetry | Too raw:
Too derived:
| Too raw: (1) raw format uncropped digital photographs may take up 2 orders of magnitude more space than "preferred" compressed, cropped images with no significant effect on quality of 3D models. Too derived: These files have been processed too much to effectively communicate: (1) the quality or limitations of the raw data and more primary derivatives, (2) to have very good reuse potential, including (3) regenerating 3D models from photo collections with updated algorithms or simply to check reproducibility. |
| Too derived:
Other:
|
|
|
Structured light / laser Surface scans |
| Proprietary formats have poor preservation and reuse potential. Their use also deepens inequities between users who can afford expensive software and those who cannot. |
| Too derived:
Other:
|
|
|