Skip to content

Response Types

custom response

https://fastapi.tiangolo.com/advanced/custom-response/

FileResponse

will save the file on disk and return a path.

JSONResponse

**caveat**: For file larger than 250MB, when using pd.read_json will get error: Could not reserve memory block

Option 1: default

The default one useing jsonable_encoder (return list of jsons) is at least 5x slower than df.to_json.

resp = df.fillna('').to_dict(orient='records')                        #default response
resp = JSONResponse(jsonable_encoder(df.fillna('').to_dict(orient='records'))) #default equvalent

Option 2: df.to_json

5 - 10x faster than default

content = df.fillna('').to_json(orient='records', date_format='iso', date_unit='s') #df.to_json
resp = Response(content, media_type="application/json")

resp.headers['Accept-Encoding'] = 'gzip'  #seems not required for gzip compression!

Performance for json

content = df.fillna('').to_json(orient='records', date_format='iso', date_unit='s')
resp = Response(content, media_type="application/json")

headers = {"Accept":"application/json"}
df = pd.read_json(url, storage_options=headers)                                # 8.996
df = pd.DataFrame.from_dict(requests.get(url, headers=headers).json())         #12.830
df = pd.read_json(io.BytesIO(requests.get(url, headers=headers).content))      #13.907
df = pd.read_json(requests.get(url, headers=headers).content.decode('utf-8'))  #15.657

Response

Response is designed to handle complete responses that are generated in one go. Therefore, Response doesn't have a built-in way to directly accept an io.BytesIO object.

Use Response if cache is required (much faster than StreamingResponse)

  • Response only supports string, json data or bytes data

  • io.BytesIO.getvalue() gets the file like object (in memory) content as bytes

Performance for parquet

bio = io.BytesIO(df.to_parquet(compression='brotli')).getvalue()
bio = io.BytesIO(df.to_parquet(compression=None)).getvalue() #compared to compression, faster but slower for cached data
resp = Response(bio, media_type="bytes/parquet")             #Response only support string or bytes
resp.headers["Content-Disposition"] = 'attachment; filename=data.parquet'

headers = {"Accept": "bytes/parquet"}
df = pd.read_parquet(url, storage_options=headers)                                #1.120
df = pq.read_pandas(io.BytesIO(requests.get(url, headers=headers).content))       #2.125
df = pd.read_parquet(pa.BufferReader(requests.get(url, headers=headers).content)) #2.273
df = pd.read_parquet(io.BytesIO(requests.get(url, headers=headers).content))      #2.550

StreamingResponse

caveat cannot directly work with aiocache import cached

When send a large amount of data, e.g., 50 MB, through API, we might get timeout, other network issues for downloading such a data from the server.

Streaming response will ensure the data being downloaded chunk by chunk to avoid these issues.

io.BytesIO is a class that allows you to create an in-memory buffer for binary data, which can be used to read or write data in chunks. This makes it a natural fit for use with StreamingResponse.

requires an iterator object to send the results in chunks.

https://cloudbytes.dev/snippets/received-return-a-file-from-in-memory-buffer-using-fastapi

return a parquet file (similar performance to json)

Note: StreamingResponse is very slow compared to Response.

Due to the content BytesIO not typing.AsyncIterable? https://github.com/tiangolo/fastapi/issues/2302

#bio = io.BytesIO()
#df.to_parquet(bio)
#bio.seek(0) #can use .getvalue() .getbuffer().nbytes
bio = io.BytesIO(df.to_parquet(compression='brotli'))
resp = StreamingResponse(bio, media_type="bytes/parquet")
resp.headers["Content-Disposition"] = 'attachment; filename=data.parquet'
df = pd.read_parquet(io.BytesIO(req.content))

Solution: Convert the content to async now ony a little slower than using Response.

async def iterfile():
    with bio as f:
        while True:
            chunk = f.read(1024) #chunk_size is 1024 bytes (1 KB to 1 MB good enough)
            if not chunk:
                break
            yield chunk
resp = StreamingResponse(iterfile(), media_type="bytes/parquet")

return an image

https://stackoverflow.com/questions/55873174/how-do-i-return-an-image-in-fastapi

byte_im = BytesIO()
image.save(byte_im, 'JPEG')
byte_im.seek(0)   #get byte_im size: byte_im.getbuffer().nbytes

return StreamingResponse(byte_im, media_type='image/jpeg')

Another solution: faster?

https://stackoverflow.com/questions/66223811/how-to-increase-transfer-speed-when-posting-an-image-to-a-rest-api

data = {'shape': image.shape, 'img': base64.b64encode(image.tobytes())}
image = np.frombuffer(base64.b64decode(data.img)).reshape(data.shape)

api get

https://stackoverflow.com/questions/73564771/fastapi-is-very-slow-in-returning-a-large-amount-of-json-data

To prevent browser show large amount of data

  • set Content-Disposition header to Response using the attachment parameter and passing a filename

Performance

  • parquet (Response) is 2-4x faster than json (Response)

  • json (Response) is 2x (5MB, 1x 50MB) faster than parquet (StreamingResponse, defaul chunk size), will return bytes

  • json is 8x faster than csv. Why csv is slow - not well compressed compared to json and require more time to parse?

  • another test indicates that json has similar or slower performance than csv

import io
import requests
import pandas as pd

@router.get("/get_df",
    status_code=status.HTTP_200_OK,
    description='Test api get',
)
async def get_df(
    request: Request,
):
    df = pd.DataFrame([['i',1],['j', 2]], columns=['k', 'v'])

    if request.headers.get('Accept') == 'text/csv':
        content = io.StringIO(df.to_csv(index=False))
        resp = StreamingResponse(content, media_type='text/csv')
        resp.headers['Content-Disposition'] = 'attachment; filename=data.csv'
    elif request.headers.get('Accept') == 'bytes/parquet':
        content = io.BytesIO(df.to_parquet(compression='brotli'))
        resp = StreamingResponse(content, media_type='bytes/parquet')
        resp.headers['Content-Disposition'] = 'attachment; filename=data.parquet'
    else:
        content = df.fillna('').to_json(orient='records', date_format='iso', date_unit='s')
        resp = Response(content, media_type='application/json')
    return resp

def url_to_df(url, header_type):
    if header_type == 'csv':
        return pd.read_csv(url, storage_options={'Accept':'text/csv'})
    elif header_type == 'parquet':
        return pd.read_parquet(url, storage_options={'Accept':'bytes/parquets'})
    else: #json
        return pd.read_json(url, storage_options={'Accept':'application/json'})

def req_to_df(url, header_type):
    if header_type == 'csv':
        req = requests.get(url, headers={"Accept":"text/csv"})
        return pd.read_csv(io.StringIO(req.content.decode('utf-8')))
    elif header_type == 'parquet':
        req = requests.get(url, headers={"Accept":"bytes/parquet"})
        return pd.read_parquet(io.BytesIO(req.content))
    else: #json
        req = requests.get(url, headers={'Accept':'application/json'})
        return pd.DataFrame.from_dict(req.json()))

def api_get(url, header_type='json', faster=True):
    if faster: #30% faster
        df = url_to_df(url, header_type)
    else:
        df = req_to_df(url, header_type)
    return df