# Copy your application code to the container COPY . /app
# Set the GPU environment variables ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
# set USTC mirror for apt RUN sed -i 's@//.*archive.ubuntu.com@//mirrors.ustc.edu.cn@g' /etc/apt/sources.list # update sources RUN apt-get update # install git for model download, and iputils-ping for ping test RUN apt install -y iputils-ping git # update pip RUN python -m pip install --upgrade pip # set up pypi mirror RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple # requirements file are listed above RUN pip install -r requirements.txt RUN pip install "ray[serve]" requests torch diffusers # install huggingface transformers RUN pip install git+https://github.com/huggingface/transformers
# Set the entry point command (modify as per your needs) CMD ["bash"]
Then create the image based on the dockerfile:
1
docker build -t ray_test_image .
Start head and workers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# Start head node docker run --name head -d -t -p 6379:6379 -p 8265:8265 --network ray-network ray_test_image
# Fetch head node IP HEAD_IP=` docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'head`
# Start rat serve docker exechead sh -c "ray start --head --num-gpus=0 --num-cpus=12" docker exec worker0 sh -c "ray start --address=\"$HEAD_IP:6379\"" docker exec worker1 sh -c "ray start --address=\"$HEAD_IP:6379\"" docker exec worker2 sh -c "ray start --address=\"$HEAD_IP:6379\"" docker exec worker3 sh -c "ray start --address=\"$HEAD_IP:6379\""
Port 6379 is for Ray Serve connections, and port 8265 is for the dashboard.
After head node and worker node are started in the containers, we can read the ray cluster status on the host (because we expose port 6379 before, so we can monitor the ray status directly):
A refresh script might be helpful when the head node is down:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
docker exechead sh -c "ray stop" docker exec worker0 sh -c "ray stop" docker exec worker1 sh -c "ray stop" docker exec worker2 sh -c "ray stop" docker exec worker3 sh -c "ray stop"
# Fetch head node IP HEAD_IP=` docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'head`
# Start rat serve docker exechead sh -c "ray start --head --num-gpus=0 --num-cpus=12" docker exec worker0 sh -c "ray start --address=\"$HEAD_IP:6379\"" docker exec worker1 sh -c "ray start --address=\"$HEAD_IP:6379\"" docker exec worker2 sh -c "ray start --address=\"$HEAD_IP:6379\"" docker exec worker3 sh -c "ray start --address=\"$HEAD_IP:6379\""
Serve Models and Send Requests
Here is an example from ray official docs, we use a simple stable diffution as an example to show how ray serve work.
First we copy serve script and request script to the head container:
The deployment configuration is the most important part:
ray_actor_options lists the required resources for each replica. In addition to the number of GPUs, you can also modify the number of CPUs, memory, accelerator type, and more. For more information, refer to the Ray Actor Options documentation.
autoscaling_config specifies the autoscale rules for ray serve. For more information, refer to Autoscaling config
asyncdefget_image(id): prompt = "a cute cat is dancing on the grass." input = "%20".join(prompt.split(" ")) resp = requests.get(f"http://127.0.0.1:8000/imagine?prompt={input}") withopen(f"output{id}.png", 'wb') as f: f.write(resp.content)
asyncdefmain(): tasks = [] for i inrange(50): task = asyncio.create_task(get_image(i)) tasks.append(task) responses = await asyncio.gather(*tasks) print(responses)
asyncio.run(main())
1 2 3 4 5 6
# copy files to head node docker cp stable.py head:/app docker cp request.py head:/app docker exec -it head bash # following lines are work in head container serve run stable:entrypoint
[!note] Do not terminate the server process above. Use tmux or other background tools to keep this process alive.