-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Cloudstack 9339: Virtual Routers do not handle Multiple Public Interfaces #1519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dsclose I think it is better to split this PR into some isolated PRs, as the issues are isolated. |
|
@ustcweizhou How would you recommend I separate this? I can imagine separating the issues broadly into two parts:
I'm reticent to do this because the first PR would not allow multiple subnets to work on RvR setups. Do you agree with this separation and if so how should it be handled? |
| self.check_is_up() | ||
| self.set_mark() | ||
|
|
||
| if self.dnum != '0': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.dnum is hex, so this line always return true.
you can use ( self.dev != 'eth0') instead, or other
|
@dsclose rebase against master, squash changes to a single commit, thanks tag:needlove |
|
@rhtyd is there a reason you want him to rebase to master? We still support 4.7, so all fixes to that branch will be forward merged to 4.8 and master. I think this PR can stay open against 4.7. |
|
I am also fine with these being separate commits as they are functionally separate. |
|
@swill agree for keeping it against 4.7; but it would be great if @dsclose can squash the changes to a single commit as all of them solve for Cloudstack-9339 issue |
|
Currently I'm investigating @ustcweizhou suggestions above. He's quite correct about not adding the mark for eth0 and I think I've reproduced the problem he reported on NICs higher than eth3. Once we trigger the creation of eth4, the routing table Table_eth4 gets added multiple times - but only once with the fwmark limitation. This prevents any traffic being routed correctly on the VR - even to the point of not being able to connect to it from the HV. @swill @rhtyd - I'll squash and force push this soon. I want to incorporate the snippets given by @ustcweizhou and verify that they work first. |
|
Thank you for working on fixing this. 👍 |
|
That worked a treat. The suggestions made by @ustcweizhou resulted in a very clean set of IP rules and I was able to add IPs on eth4 and eth5 without breaking the router. I'll do a bit more testing tomorrow before squashing and force pushing. |
|
Has anyone tested this with VPC VRs as of yet? |
|
@kiwiflyer I have not. But it won't be worth the trouble until I incorporate the suggestions made by @ustcweizhou into the PR. |
bf285e1 to
17144a4
Compare
|
Squashed and force pushed. Tasks remaining:
Regarding the automated tests for the routers, where should I look for these? If I see some examples I should be able to adjust/add to them. |
|
Looks like the build environment isn't sufficient on jenkins-test-a20: Doing another force push. |
17144a4 to
48055e5
Compare
|
I rebased against the latest 4.7 before force pushing. Has an error been introduced along the way? |
|
There is an issue currently on master which I am trying to get sorted out, but there should not be a problem on |
|
@swill ok, thanks, good luck with the master. |
df1dc8d to
66d3039
Compare
CI RESULTSSummary of the problem(s): Associated Uploads
Uploads will be available until Comment created by |
|
@dsclose we have merge conflicts on this one now. Also, prior to merging the PRs that caused the conflicts, I ran the above CI. You will probably want to review the results of that CI run to fix some things. |
|
@dsclose if you can fix the merge conflicts I can run this again and see what is outstanding. Thanks... |
|
@swill taking a look now. |
66d3039 to
e7a63be
Compare
|
@dsclose sorry to do this to you. Can you close and reopen again to kick off the jobs again? Thanks... |
|
@swill no worries. Happy to do that - the CI output was vast and I wasn't looking forward to combing through it. |
|
I will retest this one. Thanks for kicking it off again, we are green now. :) |
|
I actually had a test running overnight for this one and it had similar results. I think this one will need some work still. I will run it again to see what we get as a result and I will post the next run so you know what the latest issues are. |
|
@swill agreed; in particular, my note concerning the merge of lines 299 and 300 of CsAddress.py - i've not even tested that locally - not had a chance to even figure out what it's doing! |
|
@dsclose we installed your updated commits d582358 and bf285e1 ONLY in our environment to solve the RvR issue (VR network services intermittent hang due to public interface is up on backup VR), however, it's not working as expected, the status of eth2 is still up on backup VR. Do we need install all the commits ? or it's probably our environment issue ? Besides, if we install all the commits, the port forwarding and VPN seems getting broken. We are investigating more. |
|
@luhaijiao I'd recommend trying (edit) c970a04 Port forwarding works for me. I've not tried the VPN functionality. |
e7a63be to
c970a04
Compare
ACS CI BVT RunSumarry: Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0 Failed tests:
Skipped tests: Passed test suits: |
|
I'd recommend merging this if the tests are passing. There is an issue with hairpin NATs which can be solved by adding a route to the guest subnet (scope link) in each routing table. This is easy to do on Redundant routers (as routes are handled in one spot in the transition to master) but needs a bit more work on standalone routers - it looks like the guest subnet and device need to be passed down from CsIP.process all the way to CsAddress.post_change_config. Unfortunately I won't be available to do further work on this PR. |
|
We should get this tested and merged for 4.9 /cc @swill |
|
I tried hard to get this one in, and I just don't see it happening... |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-222 |
|
@murali-reddy @abhinandanprateek ping, please review this one as well, thanks. |
|
@lgtm on code review. |
|
Thanks @dsclose can you help review PR #1659 and see that all of your changes are ported too, in which you may close your PR. |
…non_vpc CLOUDSTACK-9339 Virtual Routers don't handle Multiple Public Interfaces correctlyAs pointed out in CLOUDSTACK-9339, in case of multiple public IP's from different public IP ranges are associated with VR, VR functionality is broken from 4.6. Below are the brief list of problems specific to non-VPC networks addressed in the PR. This PR handles both VPC and non-VPC scenarios. - reverse traffic for the connections accepted on the eth3 and above public interfaces are getting blocked. Need a rule for e.g "-A FORWARD -i eth3 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT" in the FORWARD chain of filter table to permit reverse path traffic for established connections. - outbound public traffic from eth0 to eth3 (or for interfaces above like eth4 eth5 etc) needs rule to run through FW_OUTBOUND chain in the filter table - network stats on public interfaces eth3 are getting gathered - default gateway is missing in the device specific routing table, resulting in traffic to be looked up in main routing table - creating a device specific route table is generating "from all lookup Table_eth3" in the ip rules, resulting in rest of the traffic getting blocked. Picked few commits from #1519 from dsclose (#1519) submitted for 4.7 Marvin tests are added to test below - Static NAT works on the public interfaces above eth2, in case non-vpc networks - Portforwarding works on the public interfaces above eth2, in case non-vpc networks - Route tables are configured as expected for the device specific table for the public interfaces above eth2, in case non-vpc networks - IP tables rules are as expected for the traffic from and to the public interfaces above eth2, in case non-vpc networks * pr/1659: CLOUDSTACK-9339 Virtual Routers don't handle Multiple Public Interfaces correctly Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
|
@murali-reddy I haven't worked on Cloudstack for many months but one thing I do recall; without d582358 and bf285e1, networks with redundant virtual routers will simply not work. Whether the commits are appropriate for the current codebase, however, is unclear. I have no way of testing them so I shall not be confusing matters by raising a speculative PR. I shall close this PR now. |
This PR addresses CLOUDSTACK-9339 and may need a code review from someone familiar with the System VM scripts. In particular, this PR has not been tested in a VPC RvR context. Only standalone routers and RvR routers have been demonstrated.
I expect that a number of tests may need to be modified/written as part of this PR. Any feedback or pointers would be useful as initially I'll be relying on the CI failures to tell me where to look.