Red Hat OpenShift Data Science is a powerful platform that allows data scientists to work more efficiently and effectively. It's a cloud-native platform that integrates popular data science tools and libraries.
With OpenShift Data Science, you can create a collaborative environment for data science teams to work together seamlessly. This is made possible by its integration with popular tools like Jupyter Notebooks, TensorFlow, and scikit-learn.
The platform also provides a scalable infrastructure that can handle large datasets and complex computations. This means you can focus on developing your models without worrying about the underlying infrastructure.
By leveraging OpenShift Data Science, you can streamline your data science workflow and accelerate time-to-insight.
Project Management
Project Management is crucial for the success of any data science project, especially with Red Hat OpenShift. With OpenShift, you can manage your project's infrastructure and scalability with ease.
Red Hat OpenShift provides a robust project management system that allows you to track your project's progress, assign tasks, and collaborate with team members in real-time. This feature is especially useful for large-scale data science projects that require multiple stakeholders.
By using OpenShift's project management tools, you can streamline your workflow, reduce errors, and increase productivity, ultimately leading to better project outcomes.
Working in an Environment
To work in an environment, you need to ensure you're logged in to the right places. Ensure that you have logged in to Red Hat OpenShift Data Science and the OpenShift Dedicated web console.
To access the OpenShift Data Science dashboard, you'll need to have it installed on your OpenShift Dedicated or Red Hat OpenShift Service on Amazon Web Services (ROSA) cluster. Installing it is a crucial step.
You'll also need to add at least one user to the user group for OpenShift Data Science, as described in Adding users for OpenShift Data Science.
To log in to the dashboard, follow these steps:
- Log in to OpenShift web console.
- Click the application launcher.
- Right-click on Red Hat OpenShift Data Science and copy the URL for your OpenShift Data Science instance.
- Provide this instance URL to your users to let them log in to OpenShift Data Science.
Confirm that you and your users can log in to OpenShift Data Science by using the instance URL.
Project Workbench
To create a project workbench, you need to have logged in to Red Hat OpenShift Data Science and created a data science project. This project workbench enables you to create a new Jupyter notebook from an existing notebook container image to access its resources and properties.
You can create a workbench by clicking on the name of the project you want to add the workbench to from the Data Science Projects page. Then, click on the "Create workbench" button in the Workbenches section of the project details page.
The workbench creation process is straightforward, and you can configure its properties as needed. Once created, the workbench appears on the project details page, and you can see its status in the Workbenches section.
Here are the steps to create a workbench in detail:
- Click on the name of the project you want to add the workbench to from the Data Science Projects page.
- Click on the "Create workbench" button in the Workbenches section of the project details page.
- Configure the properties of the workbench you are creating.
- Click on the "Create workbench" button to complete the process.
The status of the workbench can be seen in the Workbenches section of the project details page, and it displays a status of "Starting" when the workbench server is starting, and "Running" when the workbench has successfully started.
Operator
An Operator in the context of OpenShift Data Science is a way to manage and automate various tasks, such as deploying and managing data science pipelines.
To uninstall an Operator, you need to be part of the cluster-admins user group in OpenShift Dedicated and have installed or configured the service on your OpenShift Dedicated cluster.
You can uninstall an Operator by following these steps: change into the Administrator perspective, change into the redhat-ods-applications project, click Operators → Installed Operators, delete any Operator resources or instances, and then click the Actions drop-down menu and select Uninstall Operator.
Uninstalling an Operator will stop it from running and receiving updates, but it will not remove any custom resource definitions or managed resources for the Operator. These still exist and must be cleaned up manually.
Here are the steps to uninstall an Operator in more detail:
- Change into the Administrator perspective.
- Change into the redhat-ods-applications project.
- Click Operators → Installed Operators.
- Click on the Operator that you want to uninstall.
- Delete any Operator resources or instances.
- Click the Actions drop-down menu and select Uninstall Operator.
- Select Uninstall to uninstall the Operator, Operator deployments, and pods.
The Operator is uninstalled from its target clusters, and it no longer appears on the Installed Operators page. The disabled application is no longer available for your data scientists to use, and is marked as Disabled on the Enabled page of the OpenShift Data Science dashboard.
Enabling Services Connected
Enabling services connected to OpenShift Data Science is a crucial step in getting the most out of your project management workflow. You can install services or enable services connected to OpenShift Data Science using one of the following methods.
Typically, you can install services, or enable services connected to OpenShift Data Science using one of the following methods: enabling the service from the Explore page on the OpenShift Data Science dashboard, installing the Operator for the service from OperatorHub, installing the Operator for the service from Red Hat Marketplace, or installing the service as an Add-on to your OpenShift Dedicated cluster.
Some services, like Jupyter, provide a service endpoint available on the tile for the service on the Enabled page of OpenShift Data Science. However, certain services, such as OpenVINO and Anaconda, provide notebook images for use in Jupyter and do not provide an endpoint link from their tile.
To help you get started quickly, you can access the service’s learning resources and documentation on the Resources page, or by clicking the relevant link on the tile for the service on the Enabled page.
You can access the service’s learning resources and documentation on the Resources page, or by clicking the relevant link on the tile for the service on the Enabled page.
Here are the steps to enable a service connected to OpenShift Data Science:
1. On the OpenShift Data Science home page, click Explore.
2. Click the card of the service that you want to enable.
3. Click Enable on the drawer for the service.
4. If prompted, enter the service’s key and click Connect.
5. Click Enable to confirm that you are enabling the service.
The service that you enabled appears on the Enabled page, and the service endpoint is displayed on the tile for the service on the Enabled page.
Customization and Configuration
Red Hat OpenShift Data Science allows you to customize and configure your environment to suit your specific needs. You can choose from a variety of components to include in your environment, such as Jupyter notebooks, Apache Spark, and TensorFlow.
With Red Hat OpenShift Data Science, you can easily scale your environment up or down as needed, without having to worry about the underlying infrastructure. This means you can focus on your data science work without any hassle.
You can also integrate your environment with other tools and services, such as Git, to streamline your workflow and improve collaboration.
Supported Packages
The Red Hat OpenShift Data Science notebook server images come with Python installed by default, and they include a list of supported packages and versions.
You can find the complete list of packages and versions in the Options for notebook server environments table.
To install additional packages, you'll need to check if the required binaries are included on the notebook server image you want to use.
If the required binaries are not included, you should contact Red Hat Support to request that the binary be considered for inclusion.
You can install packages on a temporary basis using the pip install command, which is a convenient way to get started with a new package.
Providing a list of packages to the pip install command using a requirements.txt file is another way to install multiple packages at once.
Accessing Tutorials
Accessing Tutorials is a crucial step in Customization and Configuration. You can view and access the learning resources for Red Hat OpenShift Data Science and supported applications.
If you're new to Red Hat OpenShift Data Science, you'll want to start with the basics. You can access learning resources for Red Hat OpenShift Data Science and supported applications.
To get the most out of these resources, make sure you're accessing the right ones. You can view and access the learning resources for Red Hat OpenShift Data Science and supported applications.
Here are the learning resources available:
- You can view and access the learning resources for Red Hat OpenShift Data Science and supported applications.
Enabling GPU Support
To enable GPU support in OpenShift Data Science, you'll need to install the NVIDIA GPU Operator. Installing the Node Feature Discovery Operator is a prerequisite to installing the NVIDIA GPU Operator.
You can find more information on how to install these operators in the GPU Operator on OpenShift documentation. The NVIDIA GPU Add-on is no longer supported, so you'll need to uninstall it before installing the NVIDIA GPU Operator.
If you have a previously-installed NVIDIA GPU Add-on, use OpenShift Cluster Manager to uninstall it from your cluster before proceeding.
Frequently Asked Questions
What is Red Hat OpenShift used for?
Red Hat OpenShift is a cloud-based platform for building intelligent applications, enabling data scientists and developers to create AI and ML models. It provides a collaborative environment for data scientists and developers to build, deploy, and manage AI-powered applications.
What is the difference between OpenShift and OpenShift AI?
OpenShift is a container application platform, whereas OpenShift AI is a specialized platform that adds artificial intelligence and machine learning capabilities on top of OpenShift. This distinction enables OpenShift AI to streamline the AI/ML lifecycle with tools and features not found in the standard OpenShift platform.
Sources
- https://almogelfassy.medium.com/journey-into-the-future-with-openshift-data-science-68e365e12f7a
- https://docs.redhat.com/en/documentation/red_hat_openshift_data_science/1/html-single/getting_started_with_red_hat_openshift_data_science/index
- https://thenewstack.io/red-hat-fills-a-gap-with-openshift-data-science/
- https://docs.redhat.com/en/documentation/red_hat_openshift_data_science/1/html-single/installing_openshift_data_science/index
- https://www.infoworld.com/article/3603517/red-hat-openshift-ai-unveils-model-registry-data-drift-detection.html
Featured Images: pexels.com