ML Run

Mahendra S. Chouhan
3 min readDec 6, 2023

--

MLRun is an open-sourced MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment.

Key benefits provided by the MLRun framework includes –

  • Rapid development of code from early stage to production.
  • Elastic scaling of batch and real-time workloads.
  • Feature management — preparation and monitoring of logs.
  • Works anywhere — IDE, multi-cloud, etc.

MLRun is composed of different layers, these convenient abstraction layers provide a lot of features to a wide variety of technology, like automating the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, and more. In every ML experiment, we preferably want to save our code, config, results, logs, input, outputs, etc, so that we can reproduce them in different development environments, MLRun helps to manage, save, reproduce our experiment without any hassle.

MLRun is composed of the following layers:

  • Feature and Artifact Store — handle the ingestion, processing, metadata, and storage of data and features across multiple repositories and technologies.
  • Elastic Serverless Runtimes — converts simple code to scalable and managed microservices with workload-specific runtime engines (such as Kubernetes jobs, Nuclio, Dask, Spark, and Horovod).
  • ML Pipeline Automation — automates data preparation, model training and testing, deployment of real-time production pipelines, and end-to-end monitoring.
  • Central Management — provides a unified portal for managing the entire MLOps workflow. The portal includes a UI, a CLI, and an SDK, which are accessible from anywhere.

The architecture of the MLRun framework

The architecture consists of different basic components, combining these components create a pipeline.

Let’s discuss the main component of MLRun with examples.

To install MLRun on your device, run the following command in your terminal:

pip install mlrun

Let’s discuss some of the main components of MLRun with examples.

1. Project

Project is a container consist of all your source code, metadata, artifacts, logs, models, etc. It helps in organizing all of your activities regarding the ML experiment.

You can define the project name, and then use mlrun.set_environment to set your project name.

from os import path
import mlrun

project_name_base = 'Project_name' # Mention Your Project Name Here

project_name, artifact_path = mlrun.set_environment(project=project_name_base, user_project=True)

print(f'Project name: {project_name}'

Output- Project name: Project_name

2. Function

Functions are the small packages that we can write for the execution of the different individual steps of our pipeline. These steps include not limited to fetching data, transforming data, training multiple models, testing, etc. Below is a simple example of a function that fetches data from MongoDB atlas.

Funtion can be created in four different methods,

  • mlrun.new_function
  • mlrun.code_to_function
  • mlrun.import_function
  • mlrun.function_to_module

We define a simple python function, we can store this function in a source file and use mlrun.code_to_function to create a function object.

def fetch_data(context : MLClientCtx, data_path: DataItem):
context.logger.info('Reading data from {}'.format(data_path))
m_client = pymongo.MongoClient("Mention The Link of Your MongoDB Client Here")
db = m_client.test
m_db = m_client["DB_name"]
db_cm = m_db["DB_name"]
df = pd.DataFrame.from_records(db_cm.find())
suicide_dataset = df
target_path = path.join(context.artifact_path, 'data')
context.logger.info('Saving datasets to {} ...'.format(target_path))
# Store the data sets in your artifacts database
context.log_dataset('suicide_dataset', df=suicide_dataset, format='csv',
index=False, artifact_path=target_path)

3. Run

When a function is executed all information is about is stored in an object that is known as the Run object. This run object is created when you run any function it stores all information like function attributes (such as arguments, input, and outputs), results, and logs of the executed function.

We first define the function object, this function object can be used to execute all functions defined in the source code,

func_obj = mlrun.code_to_function(name='f_obj', kind='job', filename = 'Path of the Source code)

fetch_data_run_obj = func_obj.run(handler='fetch_data',inputs={'data_path': 'Mention Path of the DATA CSV'}, local=True)

We use this object to run our function, in handler we pass the function name, in input, we pass the argument of the function.

fetch_data_run_obj.outputs

This will give the output of the function, in this case, the fetched dataset.

4. Artifact

Design data artifacts (such as data sets, graphs, pickle files, and models) that are produced or used by functions, runs, and workflows. We pass an artifact directory name, this is the directory you want to store your data. The directory structure is given below-

─── Artifact directory
├── Data
├── data (All your datasets)
├── model (saved model and model config)
├── artifacts/project_name-username (Contain all your artifact data)
├── functions/project_name-username (Contain all your function data)
├── runs/project_name-username (Contain all your run object data)

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response