Skip to content

Harvesting

The data harvesting stage involves obtaining "raw" information that is available to clean, structure, and validate it, and then distribute it on the network.

Metadata

Metadata collection is carried out using models created based on the requirements of each user and following the SEP-001 specification (the standard on which Nucleus is based for metadata management), which provides flexibility for different use cases.

Underneath the validation and schematization of the models is pydantic, so we can use python standard library types to define fields.

In the example below, we have a model called Nucleus that extends the Model class. It includes fields such as name, description, and contributors, each defined with their respective types (str and List[str]).

from nucleus.sdk.harvest import Model

class Nucleus(Model):
    name: str
    description: str
    contributors: List[str]

To create an instance of the Nucleus model and populate it with data, you can do the following:

nucleus = Nucleus(
    name="Nucleus the SDK",
    description="Building block for multimedia decentralization",
    contributors=["Jacob", "Geo", "Dennis", "Mark"],
)

Media

Let's explore multimedia resources and how to collect them using the SDK's built-in types.

In order to properly handle multimedia resources such as images, videos, music, text, and more, it is important to collect and categorize them using the appropriate types defined or provided by the SDK. These types allow for easy identification and handling of the resources during subsequent stages or processes in the pipeline.

Here's an example of how to collect media using the built-in media types:

import nucleus.sdk.harvest as harvest

# harvest image and video using built-in types 
image = harvest.image(path=Path("/local/path/image.jpg"))
video = harvest.video(path=Path("/local/path/video.mp4"))

Note

It is also possible to create our own multimedia type as long as it is accompanied by an engine that takes care of its processing. Please check built-in media types and built-in engines.