sagemaker.tensorflow.serving.Model with input_handler is much slower than keras.model on GPU instance

I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.

Setup 1:
- Sagemaker Notebook instance ml.g4dn.xlarge. Load the model using ```reconstructed_model =  keras.models.load_model()```
- Decode jpeg image, do some preprocessing to save as numpy arrays
- call ```reconstructed_model.predict()```. This call returns in ~300-400ms

Setup 2:
- Sagemaker Notebook instance ml.g4dn.xlarge. Upload model artifacts to s3
- Create inference.py to decode jpeg image and do some preprocessing to numpy arrays
- Create model ```sm-model = TensorFlowModel(model_data=model_data, entry_point='inference.py', source_dir='src', framework_version="2.4.1", env={"SAGEMAKER_REQUIREMENTS": "requirements.txt"}, role=role)```
    ```uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge', endpoint_name='g4dn-xlarge-endpoint')```
- Call predict ```uncompiled_predictor.predict() ``` This takes ~11-12 seconds to return.

From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.


**Screenshots or logs**
![CloudWatch screenshot](https://user-images.githubusercontent.com/94027667/141177198-bb377f6d-6aa2-4a31-b4d0-381245dd39e1.png)


**System information**
A description of your system. Please provide:
- **Toolkit version**:11.0
- **Framework version**: 2.4.1
- **Python version**:37
- **CPU or GPU**:GPU
- **Custom Docker image (Y/N)**:N


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sagemaker.tensorflow.serving.Model with input_handler is much slower than keras.model on GPU instance #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sagemaker.tensorflow.serving.Model with input_handler is much slower than keras.model on GPU instance #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions