Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't start any two docker-compose environments after update to Windows 20H2 #140

Open
stephen-turner opened this issue Jul 28, 2021 · 54 comments
Assignees
Labels
bug Something isn't working Networking Connectivity and network infrastructure

Comments

@stephen-turner
Copy link

This was originally filed by user @marcusschroeder as docker/for-win#9999, but as it can be reliably reproduced on Windows 2010 (maybe 2004?) but not 1903, and as it only happens with Windows containers, we believe that it is a bug in the windows container code.

Original report:

Expected behavior
On my computer I use a couple of docker-compose environments with windows containers in parallel. Before the windows update to 2004 or 20H2 it was no problem to start several environments with
docker-compose up either manually or programmatically. It didn't matter if it was the same env with a different name or a completely different one.

Actual behavior
Now, when starting any two docker compose environments, the second one gets stuck in start up until the first is stopped.

Information
Is it reproducible? Yes, even on another computer.
Is the problem new? Yes, it appeared after windows update 1903 to 20H2
Did the problem appear with an update? Yes
Windows Version: Windows 10 Pro 20H2
Docker Desktop Version: 2.5.0.1 / 2.4.0.0 / 2.3.0.4 / 2.5.1.0 (experimental) / 3.0.0.0

The problem appeared after a Windows update from 1903 to 2004 or 20H2 respectively.

I have tried:

different Docker Desktop versions (see above) to no avail.
WSL2 and LCOW

Steps to reproduce the behavior
Use the following docker-compose.yml:

version: "2.4"


services:
  service_a:
    image: mcr.microsoft.com/windows/servercore/iis
    ports:
      - 80


  service_b:
    image: mcr.microsoft.com/windows/servercore/iis
    ports:
      - 80
  1. Use Docker Desktop with Windows Container Backend
  2. Start first environment with above docker-compose.yml: docker-compose -p first up -d
  3. Wait until environment is up (5-10 seconds)
  4. Start second environment: docker-compose -p second up -d
    At this point docker-compose appears to be stuck and one of the services show status "created" in docker ps -a
  5. Stop first environment in separate terminal docker-compose -p first down
  6. Now, the command from 5. continues and the environment becomes healthy and the service status is "running"
    Furthermore:

if I remove the "ports" from the docker-compose.yml this issue does not occur.
rolling back to windows 1903 also fixed the issue for now until the windows update is applied again

My colleague @StefanScherer has reproduced it without docker compose, but with a second nat network as follows:

$ docker network create -d nat first
10851710aef0c9645393c78f5480cc9d8c2309b079e4d8d82f70c9c6f1ee064f
$ docker run -d --network first -p 8004:80 mcr.microsoft.com/windows/servercore/iis
e89d12002a50b22b3628ddaca1ac06e4b34728ea803c7d5250d84721ddf993bf
$ docker run -d --network first -p 8005:80 mcr.microsoft.com/windows/servercore/iis
93adddb1f6213d21d0b8138b3fd47784938c238fa75256c31f7291c8713c57aa

$ docker network create -d nat second
e30d9c556d386408137db45222dc47d989e4c0d49a7f3a051f56ee93fa18c912
$ docker run -d --network second -p 8006:80 mcr.microsoft.com/windows/servercore/iis
f05233268a2e826f79f323659bf1e3a15333cd5d440117e053ccf04ceaf1a2c8
$ docker run -d --network second -p 8007:80 mcr.microsoft.com/windows/servercore/iis
3ea1546125d05b655e89913b5dd6677e3aa58d2e2d6b4f0fe93b1ff091ebb08e

The Docker Cli of the last container does not return to the shell prompt, and in Docker Dashboard the fourth container is in CREATED state.
When I kill one of the first containers (e89d12) then the Docker Cli shows this error message

docker: Error response from daemon: failed to create endpoint serene_bhabha on network second: failed during hnsCallRawResponse: hnsCall failed in Win32: The specified port already exists. (0x803b0013).

The Port 8007 was not used before.

@ghost ghost added the triage New and needs attention label Jul 28, 2021
@vrapolinario vrapolinario added bug Something isn't working and removed triage New and needs attention labels Jul 29, 2021
@vrapolinario
Copy link
Contributor

I'll get this issue assigned to the proper team soon.

@ghost
Copy link

ghost commented Aug 28, 2021

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

8 similar comments
@ghost
Copy link

ghost commented Sep 28, 2021

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Nov 3, 2021

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Dec 4, 2021

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Jan 4, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Feb 4, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Mar 7, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Apr 7, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented May 7, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@ghost
Copy link

ghost commented Jun 18, 2022

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

@fady-azmy-msft
Copy link
Contributor

Sorry for the delay @stephen-turner, is this still an issue you're facing?

@stephen-turner
Copy link
Author

Thanks for reaching out. I assume it's still present unless some effort has been made to fix it, but I'm not actually the original reporter, I just transferred it from docker/for-win#9999. The OR is @marcusschroeder.

@ghost
Copy link

ghost commented Jul 30, 2022

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

1 similar comment
@ghost
Copy link

ghost commented Aug 30, 2022

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

@vrapolinario
Copy link
Contributor

I don't know what the status of this is since I moved teams. Changed the assignment to @fady-azmy-msft.

@ghost
Copy link

ghost commented Sep 30, 2022

This issue has been open for 30 days with no updates.
@fady-azmy-msft, please provide an update or close this issue.

1 similar comment
@ghost
Copy link

ghost commented Oct 31, 2022

This issue has been open for 30 days with no updates.
@fady-azmy-msft, please provide an update or close this issue.

@fady-azmy-msft
Copy link
Contributor

Apologies for the radio silence. We have been able to reproduce the issue, and are looking into this. I've also created an internal ticket (41698313) to track this internally.

@ghost
Copy link

ghost commented Dec 3, 2022

This issue has been open for 30 days with no updates.
@fady-azmy-msft, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

@ntrappe-msft
Copy link
Contributor

Good news coming soon 🎉

Copy link
Contributor

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

@jag-eagle-technology
Copy link

@ntrappe-msft super excited to hear that good news - any updates?
We're keen to get this resolved as it's a blocker for our current CI/CD process & for the clients using our compose orchestrated software.

@ntrappe-msft
Copy link
Contributor

Hi, I totally understand how important this is to fix asap. @MikeZappa87 do you have any updates? You know more about the status of this Issue than I do.

Copy link
Contributor

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

@ezw21
Copy link

ezw21 commented Jan 31, 2024

I'm also very keen to get this fixed. Has there been any progress - this is pretty core functionality.

Copy link
Contributor

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

1 similar comment
Copy link
Contributor

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

@ntrappe-msft ntrappe-msft added the triage New and needs attention label Apr 2, 2024
@ntrappe-msft
Copy link
Contributor

Reassigning Issue to another dev.

@ntrappe-msft ntrappe-msft removed the triage New and needs attention label Apr 17, 2024
Copy link
Contributor

This issue has been open for 30 days with no updates.
@grcusanz, @adrianm-msft, please provide an update or close this issue.

@sikhness
Copy link

Good news coming soon 🎉

@ntrappe-msft, has there been any progress on this? It's been over half a year since "good news" was supposedly coming soon.

@adrianm-msft
Copy link

@sikhness, we're actively working on this. It’s taking a bit longer than expected, will post any updates here.

@sikhness
Copy link

Thanks @adrianm-msft, we're eagerly awaiting news!
Do you have any workarounds in the meantime as this feature is pretty fundamental.

@adrianm-msft
Copy link

@sikhness, it looks like rolling back to windows 1903 temporarily fixes the issue until a windows update is applied again.

@sikhness
Copy link

Thanks @adrianm-msft, this won't be possible in my environment as my host is running Windows Server 2022 and I don't think it can rollback to a version that old.
Eagerly awaiting the fix for this.

@Jens-G
Copy link

Jens-G commented Jun 26, 2024

Using 2019 also helps. Not that I would recommend that, but indeed it still works there.
Being hit by it myself after moving my environment to Server 2022, so a fix would really be appreciated.

@marceliwac
Copy link

marceliwac commented Jul 9, 2024

I would love to see this issue resolved as well. I've spent countless hours trying to debug the application logic and the docker setup on my Windows Server 2022 machine only to find that the problem is in fact the above.

For those looking for a temporary workaround (and I appreciate that this won't apply to more complex setups!), simplifying the networking stack in the compose file resolved all my issues with the containers sporadically hanging during startup, having to restart the docker service and in some cases the entire system etc. Specifically, I removed the custom networks and customised the default one instead to prevent having multiple networks (default network was being created automatically). I'm only mentioning this here to aid those who also struggled with the same problem in a hope it gets more visibility online.

@sikhness
Copy link

I would love to see this issue resolved as well. I've spent countless hours trying to debug the application logic and the docker setup on my Windows Server 2022 machine only to find that the problem is in fact the above.

For those looking for a temporary workaround (and I appreciate that this won't apply to more complex setups!), simplifying the networking stack in the compose file resolved all my issues with the containers sporadically hanging during startup, having to restart the docker service and in some cases the entire system etc. Specifically, I removed the custom networks and customised the default one instead to prevent having multiple networks (default network was being created automatically). I'm only mentioning this here to aid those who also struggled with the same problem in a hope it gets more visibility online.

Hey @marceliwac,
Would you be able to provide an example of the workaround that you created? What I'm currently doing is that in my compose file, I'm using the nat network as an external network so that it attaches it to the default nat network created by Windows Containers. The problem with that though is that when you restart your machine, internally it seems Windows decides to recreate the default nat network each time so it's internal ID changes. Because of that, any compose files (and thus their already deployed containers) pointing to the older ID prior to the restart, fail to start. So I currently had to create and run a script on each startup that basically brings down all of my compose configs, then recreates them (to get the new network IDs on each boot).

@marceliwac
Copy link

@sikhness I'm afraid the workaround I use wouldn't be of help in this case. All I do is change the network block of docker-compose.yml to customise the default network, rather than create a new one.

If I recall correctly, when looking for the solution to this issue I stumbled upon a few threads which mentioned the same (or adjacent) issues you are facing. The solution involved setting a static MAC address for the HyperV NAT Virtual Adapter. I'm sorry I cannot give you anything more concrete, but here are few links that might be worth looking at. They are not exactly answers to your question but might give you some ideas:

https://superuser.com/questions/1701567/how-to-add-a-static-ip-adress-to-a-virtual-machine-in-hyper-v-to-stop-changing-t
https://superuser.com/questions/1815670/change-mac-adress-virtual-switch-windows-server-with-hyper-v

Copy link
Contributor

This issue has been open for 30 days with no updates.
@grcusanz, @adrianm-msft, please provide an update or close this issue.

@Jens-G
Copy link

Jens-G commented Sep 2, 2024

please provide an update

That woulkd be indeed awesome. It seems as if this breaks a lot of stuff for enough people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Networking Connectivity and network infrastructure
Projects
None yet
Development

No branches or pull requests