Sagemaker allows for simplified deployment of machine learning models with ease. When training a model, you want to call Sagemaker with some input data. Here, Sagemaker provides 4 cost and limit options:
SageMaker Batch Transform is a batch inference service used to run ML models on existing datasets, saving results directly to S3. It’s suitable for scenarios that don’t require low latency and need to process large volumes of data in batches.
Using Batch Transform eliminates the need to maintain a continuous endpoint and is only needed when performing inference on a dataset at specific times. Sagemaker Batch Transform is a serverless, scalable, and cost-effective solution for running batch inferences on large datasets.
=> Sagemaker Batch Transform = ECS + S3
One inference for one model:
Minimum components to create a Sagemaker Batch Transform Job are as follows:
sagemaker_client = boto3.client('sagemaker')
request = {
"TransformJobName": batch_job_name,
"ModelName": model_name,
"MaxPayloadInMB": payload_size,
"BatchStrategy": "MultiRecord",
"MaxConcurrentTransforms": max_concurrent_tranform_jobs,
"Environment": environment_variables,
"TransformInput": {
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": input_s3_path
}
},
"ContentType": "text/csv",
"SplitType": "Line",
"CompressionType": "None"
},
"TransformOutput": {
'S3OutputPath': output_s3_path,
},
"TransformResources": {
"InstanceType": instance_type,
"InstanceCount": instance_count
}
}
sagemaker_client.create_transform_job(**request)
The configuration specifies the following information:
"SingleRecord"
or "MultiRecord"
.Below are the steps that AWS Sagemaker will perform during the batch transform process:
Initialize EC2
Sagemaker will launch a special EC2 instance according to parameters specified in TransformResources
.
Set up environment using Docker image
The instance will be configured with the Docker image that the Model
object references.
Load environment variables
Sagemaker will load environment variables into the instance.
Model
object may also contain its own environment variables.Model
and in the create_transform_job
parameter, the final value will be taken from create_transform_job
.Health check
Sagemaker sends a ping request to check the instance status:
Determine input data
Through the TransformInput
parameter, Sagemaker determines the data to be processed.
Perform inference
Based on parameters like MaxConcurrentTransforms
, BatchStrategy
, and MaxPayloadInMB
, Sagemaker starts sending input data to the /invocation
endpoint of the instance to perform predictions.
create_transform_job()
, Sagemaker will try to get them from the /execution-parameters
endpoint of the created instance.Save prediction results
All results will be saved according to the configuration in the TransformOutput
parameter.
Stop instance
After completion, Sagemaker will stop the instance.
Sagemaker splits input data blindly, without considering our specific logic. This can cause errors if the data has headers, as only the first part retains the header, while the rest lose it, leading to Batch Transform job failure.
Solutions:
Avoid using feature names in inference, just maintain the correct column order same as training data.
Manually split data into multiple files, each with headers, then provide these multiple files to Batch Transform (each input file will have separate output).
Batch Transform does not support CSV input data containing newline characters within the same cell.
You can adjust mini-batch size by configuring BatchStrategy and MaxPayloadInMB parameters.
MaxPayloadInMB value must not exceed 100 MB.
If you use the additional MaxConcurrentTransforms parameter, ensure the formula:
MaxConcurrentTransforms × MaxPayloadInMB ≤ 100 MB |