MinIO Managed Storage for Your Projects

Author

Reads 11K

MINI
Credit: pexels.com, MINI

MinIO is a highly scalable and secure object storage system that's designed for the cloud-native world. It's built to handle massive amounts of data with ease.

With MinIO, you can store and manage data with a high degree of scalability and flexibility. This makes it an ideal choice for projects that require large amounts of storage.

One of the key benefits of MinIO is its ability to scale horizontally, which means you can add more nodes to your storage cluster as your project grows. This ensures that your storage needs are always met.

MinIO also supports a wide range of use cases, including AI/ML, IoT, and edge computing.

Getting Started

MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. You can use it to build high performance infrastructure for machine learning, analytics, and application data workloads.

It's API compatible with Amazon S3 cloud storage service, which is a big plus if you're already using S3. This means you can easily integrate MinIO into your existing workflow.

MinIO provides quickstart instructions on running it on bare metal hardware, including container-based installations. You can find these instructions in the README file.

For Kubernetes environments, you'll want to use the MinIO Kubernetes Operator instead.

Installation Options

Credit: youtube.com, Installing MinIO S3 Storage to support GitHub Enterprise Server

MinIO can be installed in several ways, and the right option for you depends on your needs. For early development and evaluation, a standalone MinIO server can be run as a container using a specific set of commands.

This approach is best suited for small-scale projects and testing environments. However, if you're looking to deploy MinIO with features like versioning, object locking, and bucket replication, you'll need to use a distributed deployment with Erasure Coding, which requires a minimum of 4 drives per MinIO server.

You can also download and run a standalone MinIO server on macOS using a single command, replacing /data with the path to the drive or directory where you want MinIO to store data.

Curious to learn more? Check out: How to Run Next Js App

Container Installation

For early development and evaluation, you can run a standalone MinIO server as a container using specific commands.

Standalone MinIO servers are best suited for this purpose. Certain features, however, require distributed deployment with Erasure Coding.

Distributed deployment with Erasure Coding is necessary for features like versioning, object locking, and bucket replication.

Deploying MinIO with Erasure Coding enabled is recommended for extended development and production.

A minimum of 4 drives per MinIO server is required for this deployment.

Curious to learn more? Check out: Distributed File System for Cloud

GNU/Linux

Credit: youtube.com, GNU/Linux: Understanding important installation steps

Installing MinIO on GNU/Linux is a straightforward process. You'll need to run a command to download and run the MinIO server.

The command to run a standalone MinIO server on Linux hosts is as follows:

```bash

./minio server /data

```

Replace /data with the path to the drive or directory in which you want MinIO to store data.

To run the command, you'll first need to download the MinIO server. The URL for the download depends on your Linux host's architecture. Here are the supported architectures and corresponding URLs:

Keep in mind that standalone MinIO servers are best suited for early development and evaluation. For extended development and production, deploy MinIO with Erasure Coding enabled.

Testing and Validation

Testing and validation are crucial steps in ensuring that your MinIO setup is working correctly. You can access the workbench associated with your Data Connection and run a series of commands to test the connection.

To test the connection, you'll need to install and import the MinIO Python Client SDK. This can be done by running the command `pip install minio` and then importing the necessary modules. The MinIO client doesn't like URLs with protocol/schema, so you'll need to use your endpoint.com instead of https://yourendpoint.com.

Credit: youtube.com, What is MinIO

Here are the specific steps to test the connection:

If everything is working correctly, you should see a list of buckets and their creation dates. If you encounter any errors, you can use the `S3Error` exception to catch and handle the issue.

Test Connectivity

Testing connectivity is a crucial step in ensuring that your data connection is working correctly. You can test connectivity using the MinIO Client mc, which provides a modern alternative to UNIX commands.

To test if everything is working correctly, you can access the workbench associated with your Data Connection and run the following commands in a Jupyter notebook. First, install and import the MinIO Python Client SDK using pip install minio from minio import Minio from minio.error import S3Error import os import datetime.

Next, access Data Connection properties as environment variables. Set the AWS_S3_ENDPOINT, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_S3_BUCKET environment variables using os.getenv(). The MinIO client doesn't like URLs with protocol/schema, so use your endpoint.com instead of https://yourendpoint.com.

Computer server in data center room
Credit: pexels.com, Computer server in data center room

You can also create a matching Data Connection for MinIO in RHOAI by clicking on Add data connection, filling out the required field to match with your newly-deployed Minio Object Storage, and then clicking on Create.

Here's a step-by-step guide to testing connectivity:

  1. Install and import the MinIO Python Client SDK using pip install minio from minio import Minio from minio.error import S3Error import os import datetime.
  2. Access Data Connection properties as environment variables using os.getenv().
  3. Create the MinIO client using Minio(AWS_S3_ENDPOINT, access_key=AWS_ACCESS_KEY_ID, secret_key=AWS_SECRET_ACCESS_KEY, secure=True).
  4. Test the connection by listing all buckets using client.list_buckets().
  5. Create a sample local file and upload it to MinIO using client.fput_object().
  6. Download a file from MinIO using client.fget_object().

Test Using Console

You can test if your server has started successfully by pointing your web browser to http://127.0.0.1:9000. MinIO Server comes with an embedded web-based object browser.

MinIO runs its console on a random port by default, so you might need to choose a specific port if you want to access it. Use the --console-address option to pick a specific interface and port.

Related reading: Microsoft Azure Web Sites

Upgrading and Maintenance

Upgrading MinIO is a breeze, thanks to its non-disruptive upgrade process that requires zero downtime. This means you can upgrade all servers simultaneously without worrying about any disruptions to your MinIO service.

MinIO upgrades are atomic, ensuring that all transactions are processed safely and reliably. You can upgrade directly from https://dl.min.io, or host your own mirrors at a URL of your choice, such as https://my-artifactory.example.com/minio/.

Intriguing read: Upgrade Dropbox

Credit: youtube.com, How to Setup Minio S3 Object Storage on TrueNAS Scale

If you've installed the MinIO server binary by hand, you can use the command `mc admin update` to initiate the upgrade process. This is a quick and easy way to get your MinIO service up to date.

For deployments without external internet access, you'll need to download the MinIO binary from https://dl.min.io and replace the existing binary. Be sure to apply executable permissions with `chmod +x /opt/bin/minio` before proceeding with the upgrade.

If you're using Systemd MinIO service, you can upgrade via RPM/DEB packages in parallel on all servers, or replace the binary on all nodes. Remember to apply executable permissions with `chmod +x /opt/bin/minio` before restarting the service with `mc admin service restart alias/`.

Here are the steps to upgrade MinIO in different scenarios:

By following these straightforward steps, you'll be able to upgrade your MinIO service with minimal fuss and disruption.

Understanding MinIO

MinIO is an open-source object storage system designed to be highly scalable and performance-oriented. It's built on top of the Amazon S3 API, making it a great choice for applications that already use S3.

MinIO is highly performant, capable of handling millions of requests per second, making it suitable for high-traffic applications. This is due in part to its ability to scale horizontally, allowing it to easily adapt to changing workloads.

Worth a look: Aws S3 Glacier

Contribute to Project

Credit: youtube.com, Understanding MinIO

To contribute to a MinIO project, you need to have a good understanding of its architecture. MinIO is an open-source object storage server that can be deployed in a distributed manner.

You can contribute to MinIO by reporting bugs or suggesting new features on its GitHub page. MinIO has a large community of contributors and users who actively participate in its development.

MinIO is built using the Go programming language, which makes it highly scalable and performant. To contribute code, you'll need to have a good grasp of Go and the MinIO codebase.

The MinIO codebase is well-documented, making it easier for new contributors to get started. You can start by reviewing the code and familiarizing yourself with its architecture.

MinIO uses a distributed design, which allows it to scale horizontally and handle large amounts of data. This design also makes it highly available and fault-tolerant.

To contribute to MinIO, you'll need to create a pull request on its GitHub page. This involves submitting your code changes and waiting for review and feedback from the community.

What Is It?

Credit: youtube.com, Understanding MinIO

MinIO is a high-performance object store that's compatible with S3.

It can be deployed on a wide variety of platforms, giving you flexibility.

You can choose from multiple flavors of MinIO to suit your needs.

MinIO can be managed and hosted using a real-time UI, or a developer-friendly CLI & API.

This makes it easy to automate tasks and scale your storage as needed.

With Northflank cloud or your own cloud account, you can deploy, monitor, backup, and scale MinIO with ease.

IAM Permissions

To set up MinIO with an Amazon S3-compatible object store, you'll need to grant the right IAM permissions. This ensures that your Destination can interact with the object store without any issues.

To write to an object store, you'll need three specific permissions: s3:ListBucket, s3:GetBucketLocation, and s3:PutObject. These permissions are the foundation for any object store interaction.

If your Destination needs to do multipart uploads to the object store, you'll also need two additional permissions: kms:GenerateDataKey and kms:Decrypt. These permissions are crucial for handling large files.

Here are the required IAM permissions for MinIO:

  • s3:ListBucket
  • s3:GetBucketLocation
  • s3:PutObject
  • kms:GenerateDataKey
  • kms:Decrypt

Configuration and Settings

Credit: youtube.com, MinIO Introduction (Hands-On Setup and Usage)

To optimize your Parquet files, note that on Linux, you can use the Cribl Stream CLI's parquet command to view a Parquet file, its metadata, or its schema.

Data page version can be set to either V1 or V2, with V2 being the default, which is great for improving compression, but may not be compatible with older Parquet readers.

Group row limit is set to 10,000 rows by default, but you can adjust this value to suit your needs.

Here are the key settings to consider:

Log invalid rows can be toggled on to output up to 20 unique rows that were skipped due to data format mismatch, but this requires the log level to be set to debug.

Parquet Settings

If you need to write out Parquet files, you can use the Cribl Stream CLI's parquet command on Linux to view a Parquet file, its metadata, or its schema.

On Linux, you can use the Cribl Stream CLI's parquet command to view a Parquet file, its metadata, or its schema. Cribl Edge Workers support Parquet only when running on Linux, not on Windows.

For more insights, see: Dropbox Command Line Windows

Credit: youtube.com, Parquet File Format | Apache Spark

To avoid problems such as data mismatches, see Working with Parquet for pointers.

You can toggle on automatic schema generation to automatically generate a Parquet schema based on the events of each Parquet file that Cribl Stream writes.

This will expose the following additional field: If you need to modify a schema or add a new one, you can follow the instructions in our Parquet Schemas topic.

Advanced Settings

In the advanced settings, you can customize the user interface to your liking. This includes changing the theme, font size, and layout.

Some users prefer a dark mode for easier reading at night. You can enable this feature in the advanced settings.

The advanced settings also allow you to set up automatic backups of your data. This ensures that your important files are safe in case something goes wrong.

You can choose to back up your data daily, weekly, or monthly, depending on your needs. This flexibility is a key benefit of the advanced settings.

To optimize performance, you can adjust the cache settings. This can help speed up your system and make it more responsive.

What Composes File Names

Detailed view of a black data storage unit highlighting modern technology and data management.
Credit: pexels.com, Detailed view of a black data storage unit highlighting modern technology and data management.

File names in MinIO are composed of several components. The bucket name is always the first part of the file path.

The key prefix and partitioning expression are combined to form the next part of the file path. This is often displayed as a folder in MinIO, but it's actually just part of the overall key name.

The partitioning expression can include variables like ${host} and ${sourcetype}, which are replaced with actual values when the file is created.

The file name prefix is a fixed string that's added to the file path. In the example provided, the file name prefix is the default CriblOut.

The filename itself is a unique identifier, often including a random string to ensure uniqueness. The file extension indicates the data format, such as json in the example provided.

The combination of these components results in a full file path that's displayed in MinIO.

Deployment and Management

MinIO's deployment and management features make it a great choice for cloud-native applications. It supports Kubernetes, which allows for easy scaling and management of MinIO clusters.

For another approach, see: Cloud Data Management Interface

Credit: youtube.com, MinIO Object Management: Versioning Lab

To deploy MinIO, you can use a Docker container or a binary installation. This gives you flexibility in how you choose to run MinIO.

MinIO's management features include a web-based console for monitoring and configuring your cluster. This makes it easy to keep an eye on performance and make adjustments as needed.

Use Cases

When you're looking to deploy and manage MinIO, it's essential to consider its use cases. MinIO can replace traditional S3 storage, making it a cost-effective solution for your cloud infrastructure.

MinIO is particularly well-suited for storing static files and file storage, where data is infrequently accessed. This is because MinIO is designed to handle high-performance, low-latency storage needs.

For logging and metrics, MinIO provides a scalable and efficient solution, making it perfect for storing large amounts of data. This is especially useful for machine learning and analytics, where data is often stored and processed in large quantities.

For another approach, see: Gdrive Large Size Movies

Credit: youtube.com, Use Case: Application Deployment

If you're looking for a secondary back-up storage solution, MinIO is a great option, providing a secure and reliable way to store your data. Disaster recovery is also a key use case for MinIO, allowing you to quickly restore your data in the event of a disaster.

MinIO can also be used for archiving data, providing a long-term storage solution for your files.

Here are some of the key use cases for MinIO:

  • S3 Replacement
  • Static files and file storage
  • Log and metric storage
  • Machine learning and analytics
  • Secondary back-up storage
  • Disaster recovery
  • Archiving

[Deploy in Project]

Deploying Minio in your project is a straightforward process. You can deploy it by clicking on the "+" button labeled "Import YAML" in your project.

To deploy Minio, you'll need to paste a specific YAML configuration into the box. By default, the size of the storage is 20 GB, but you can change it if you need to (see line 11 of the YAML configuration).

The YAML configuration also allows you to edit the default user and password (lines 21-22). Make sure to press "Create" after you've made any necessary changes.

A unique perspective: Default save to Pc Not Onedrive

Credit: youtube.com, 💻 Staging vs Production Environments : How Tech Startups Deploy?

After deploying Minio, you should see a running Minio pod and two Minio routes: one for programmatic access to Minio and one for browser-based access to Minio.

Here's a quick summary of the steps to deploy Minio:

  1. Click on the "+" button labeled "Import YAML" in your project.
  2. Paste the YAML configuration into the box and make any necessary changes.
  3. Press "Create" to deploy Minio.

Now that Minio is deployed, you can create a bucket in it to make it useful.

Create a Matching Data Connection

Creating a matching data connection for your MinIO instance is a crucial step in integrating it with your data science project. To do this, click on Add data connection in RHOAI, inside your Data Science Project.

Fill out the required field to match with your newly-deployed MinIO Object Storage. This will create a data connection that maps to your mybucket bucket in your MinIO Instance.

You now have a Data Connection that can be used for various purposes, such as secondary back-up storage and disaster recovery. This data connection can be used to store and manage your data in a more organized way.

Here are some common use cases for MinIO Object Storage:

  • S3 Replacement
  • Static files and file storage
  • Log and metric storage
  • Machine learning and analytics
  • Secondary back-up storage
  • Disaster recovery
  • Archiving

Deploying on OpenShift

Credit: youtube.com, Openshift Tutorials - Deployments

Deploying on OpenShift is a straightforward process that leverages the power of Kubernetes to manage containerized applications.

You can create a new OpenShift project, which is essentially a namespace for your application, using the `oc new-project` command.

OpenShift supports a wide range of applications, including Java, Python, and Node.js, making it a versatile platform for development.

To deploy an application on OpenShift, you'll need to create a Docker image and push it to a registry like Docker Hub or Quay.

The `oc new-app` command allows you to create a new application from a Docker image, making the deployment process even easier.

OpenShift provides a robust set of tools for monitoring and logging, including Prometheus and Grafana, to help you troubleshoot and optimize your application.

You can also use the `oc expose` command to create a route for your application, making it accessible from outside the cluster.

OpenShift's built-in scaling and resource management features ensure that your application can handle changes in traffic and resource demands.

The `oc scale` command allows you to scale your application up or down, depending on your needs, and the `oc resources` command helps you monitor resource utilization.

Take a look at this: OpenShift

Kubernetes with Northflank

Networking cables plugged into a patch panel, showcasing data center connectivity.
Credit: pexels.com, Networking cables plugged into a patch panel, showcasing data center connectivity.

Deploying MinIO onto Kubernetes can be a complex task, but Northflank offers a comprehensive stateful workload solution.

With Northflank, you can run a highly scalable and performant MinIO database in your AWS, GCP, and Azure accounts using Kubernetes.

You'll no longer need to manually roll MinIO Helm charts, YAML, statefulsets, replicasets, services, persistent volumes, ingress, Horizontal Pod Autoscaling, Pod Disruption Budgets, prometheus metrics, certificates, and logging.

This can save you a significant amount of time and effort, allowing you to focus on other important tasks.

Frequently Asked Questions

Is MinIO still free?

Yes, MinIO is 100% free and open source. It's licensed under the AGPL v3.0, which means you can use it without any licensing fees.

Is MinIO the same as S3?

MinIO and AWS S3 are both object storage solutions, but they have different approaches to scalability and management. While MinIO is a self-managed, high-performance option, AWS S3 is a fully managed service with pay-as-you-go pricing.

Gilbert Deckow

Senior Writer

Gilbert Deckow is a seasoned writer with a knack for breaking down complex technical topics into engaging and accessible content. With a focus on the ever-evolving world of cloud computing, Gilbert has established himself as a go-to expert on Azure Storage Options and related topics. Gilbert's writing style is characterized by clarity, precision, and a dash of humor, making even the most intricate concepts feel approachable and enjoyable to read.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.