I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.
Setup 1:
- Sagemaker Notebook instance ml.g4dn.xlarge. Load the model using
reconstructed_model = keras.models.load_model()
- Decode jpeg image, do some preprocessing to save as numpy arrays
- call
reconstructed_model.predict(). This call returns in ~300-400ms
Setup 2:
- Sagemaker Notebook instance ml.g4dn.xlarge. Upload model artifacts to s3
- Create inference.py to decode jpeg image and do some preprocessing to numpy arrays
- Create model
sm-model = TensorFlowModel(model_data=model_data, entry_point='inference.py', source_dir='src', framework_version="2.4.1", env={"SAGEMAKER_REQUIREMENTS": "requirements.txt"}, role=role)
uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge', endpoint_name='g4dn-xlarge-endpoint')
- Call predict
uncompiled_predictor.predict() This takes ~11-12 seconds to return.
From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.
Screenshots or logs

System information
A description of your system. Please provide:
- Toolkit version:11.0
- Framework version: 2.4.1
- Python version:37
- CPU or GPU:GPU
- Custom Docker image (Y/N):N
I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.
Setup 1:
reconstructed_model = keras.models.load_model()reconstructed_model.predict(). This call returns in ~300-400msSetup 2:
sm-model = TensorFlowModel(model_data=model_data, entry_point='inference.py', source_dir='src', framework_version="2.4.1", env={"SAGEMAKER_REQUIREMENTS": "requirements.txt"}, role=role)uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge', endpoint_name='g4dn-xlarge-endpoint')uncompiled_predictor.predict()This takes ~11-12 seconds to return.From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.
Screenshots or logs

System information
A description of your system. Please provide: