Skip to content

Conversation

@QuiteClose
Copy link
Contributor

This PR addresses CLOUDSTACK-9339 and may need a code review from someone familiar with the System VM scripts. In particular, this PR has not been tested in a VPC RvR context. Only standalone routers and RvR routers have been demonstrated.

  • d582358: Leave public interfaces down in backup redundant routers. Previously backup routers were bringing all interfaces up and thus arping public IPs away from the master router.
  • 9ee1eb6: Add the default gateway to the main routing table when interfaces are configured. The gateway for the first public IP was always being added to the main routing table. Sometimes a router would consequently add the gateway for an IP other than the default source-NAT IP. This would prevent outbound connectivity for guest VMs.
  • ad9d72f: Add default gateway to device-specific routing tables. Link-level routes were being put into the device-specific routing tables (accessed via firewall marks) but these are unnecessary. Instead, the default gateway is needed to allow the kernel to make an appropriate routing decision.
  • 8db879e: Only mark guest connections when they are part of a static-NAT. Guest connections were being marked with a zero. This added no functionality and prevented static-NAT rules from routing outbound traffic properly as device-specific routing tables would not be used. Instead, all traffic would be routed out via the default public interface.
  • 788b1be: Allow forwarding and collect network stats on any public interface. Forwarding rules and network stats were limited to eth2 on RvR networks. This needed to be decoupled from eth2 and reapplied to whichever interface was under consideration.
  • b19e8aa: Ensure that CONNMARK --restore-mark only appears once. This is a bit of a hack and can do with being improved. The CONNMARK rule was not being picked up by the de-duplication logic in CsNetfilter and was being added twice. This caused checksum errors on packets traversing NAT.
  • bf285e1: Transition to master state should add all necessary routes. Now that backup routers keep their interfaces down, the route logic executed at configuration-time cannot be applied. Instead, once the interface is brought up during a transition to master, routers must re-evaluate what routes are needed and add them. Unfortunately I couldn't see a way to re-use the existing route logic with the variables that I had in scope so there is some duplication. In some cases, routers did not successfully arp IPs away from the old master so some arp logic was added. During a failover most connections with guest VMs will be maintained with only minor packet loss. SSH sessions implemented via port-forwarding rules on an interface other than the source-NAT interface consistently get dropped, however, so the failover isn't quite seamless. It's possible that there's an easy fix for that.

I expect that a number of tests may need to be modified/written as part of this PR. Any feedback or pointers would be useful as initially I'll be relying on the CI failures to tell me where to look.

@ustcweizhou
Copy link
Contributor

@dsclose I think it is better to split this PR into some isolated PRs, as the issues are isolated.
to be honest, some commits looks good to me ( as we have similar fix in our production), others need testing.

@QuiteClose
Copy link
Contributor Author

@ustcweizhou How would you recommend I separate this? I can imagine separating the issues broadly into two parts:

  1. Routing tables and iptables rules should be properly configured for multiple public NICs. This PR should allow all features to work on standalone virtual routers. To achieve this, we'd need commits 9ee1eb6, ad9d72f, 8db879e, 788b1be and b19e8aa.
  2. Public Interfaces should be down on backup RvRs. This PR would also need to ensure that the transition to master restores the proper routing and tables and iptables rules. Therefore it would need commits d582358 and bf285e1.

I'm reticent to do this because the first PR would not allow multiple subnets to work on RvR setups. Do you agree with this separation and if so how should it be handled?

self.check_is_up()
self.set_mark()

if self.dnum != '0':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.dnum is hex, so this line always return true.
you can use ( self.dev != 'eth0') instead, or other

@rohityadavcloud
Copy link
Member

@dsclose rebase against master, squash changes to a single commit, thanks

tag:needlove

@swill
Copy link
Contributor

swill commented May 2, 2016

@rhtyd is there a reason you want him to rebase to master? We still support 4.7, so all fixes to that branch will be forward merged to 4.8 and master. I think this PR can stay open against 4.7.

@swill
Copy link
Contributor

swill commented May 2, 2016

I am also fine with these being separate commits as they are functionally separate.

@rohityadavcloud
Copy link
Member

@swill agree for keeping it against 4.7; but it would be great if @dsclose can squash the changes to a single commit as all of them solve for Cloudstack-9339 issue

@QuiteClose
Copy link
Contributor Author

Currently I'm investigating @ustcweizhou suggestions above. He's quite correct about not adding the mark for eth0 and I think I've reproduced the problem he reported on NICs higher than eth3.

Once we trigger the creation of eth4, the routing table Table_eth4 gets added multiple times - but only once with the fwmark limitation. This prevents any traffic being routed correctly on the VR - even to the point of not being able to connect to it from the HV.

@swill @rhtyd - I'll squash and force push this soon. I want to incorporate the snippets given by @ustcweizhou and verify that they work first.

@swill
Copy link
Contributor

swill commented May 5, 2016

Thank you for working on fixing this. 👍

@QuiteClose
Copy link
Contributor Author

That worked a treat. The suggestions made by @ustcweizhou resulted in a very clean set of IP rules and I was able to add IPs on eth4 and eth5 without breaking the router. I'll do a bit more testing tomorrow before squashing and force pushing.

@kiwiflyer
Copy link
Contributor

Has anyone tested this with VPC VRs as of yet?

@QuiteClose
Copy link
Contributor Author

QuiteClose commented May 5, 2016

@kiwiflyer I have not. But it won't be worth the trouble until I incorporate the suggestions made by @ustcweizhou into the PR.

@QuiteClose
Copy link
Contributor Author

QuiteClose commented May 6, 2016

Squashed and force pushed. Tasks remaining:

  • Pass CI
  • Situational testing on VPC RvR
  • Some automated tests

Regarding the automated tests for the routers, where should I look for these? If I see some examples I should be able to adjust/add to them.

@QuiteClose QuiteClose changed the title Cloudstack 9339: Virtual Routers don't handle Multiple Public Interfaces Cloudstack 9339: Virtual Routers do not handle Multiple Public Interfaces May 6, 2016
@QuiteClose
Copy link
Contributor Author

Looks like the build environment isn't sufficient on jenkins-test-a20:

[ERROR] Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory file:
   30583
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

Doing another force push.

@QuiteClose
Copy link
Contributor Author

I rebased against the latest 4.7 before force pushing. Has an error been introduced along the way?

@swill
Copy link
Contributor

swill commented May 6, 2016

There is an issue currently on master which I am trying to get sorted out, but there should not be a problem on 4.7. It looks like Jenkins may be out of diskspace again looking at that error.

@QuiteClose
Copy link
Contributor Author

@swill ok, thanks, good luck with the master.

@QuiteClose QuiteClose force-pushed the CLOUDSTACK-9339 branch 2 times, most recently from df1dc8d to 66d3039 Compare May 9, 2016 10:31
@swill
Copy link
Contributor

swill commented May 12, 2016

CI RESULTS

Tests Run: 85
  Skipped: 0
   Failed: 9
   Errors: 10
 Duration: 10h 09m 15s

Summary of the problem(s):

ERROR: Create a redundant VPC with 1 Tier, 1 VM, 1 ACL, 1 PF and test Network GC Nics
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 281, in tearDown
    raise Exception("Warning: Exception during cleanup : %s" % e)
Exception: Warning: Exception during cleanup : Execute cmd: deletenetworkoffering failed, due to: errorCode: 431, errorText:Can't delete network offering 22 as its used by 1 networks. To make the network offering unavaiable, disable it
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: Test iptables default INPUT/FORWARD policies on VPC router
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_routers_iptables_default_policy.py", line 302, in test_01_single_VPC_iptables_policies
    self.entity_manager.do_vpc_test()
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_routers_iptables_default_policy.py", line 490, in do_vpc_test
    self.check_ssh_into_vm(vm.get_vm(), vm.get_ip())
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_routers_iptables_default_policy.py", line 525, in check_ssh_into_vm
    raise Exception("Failed to SSH into VM - %s" % (public_ip.ipaddress.ipaddress))
Exception: Failed to SSH into VM - 192.168.23.9
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: Test redundant router internals
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_routers_network_ops.py", line 510, in test_03_RVR_Network_check_router_state
    zoneid=self.zone.id
  File "/usr/lib/python2.7/site-packages/marvin/lib/base.py", line 2780, in create
    return Network(apiclient.createNetwork(cmd).__dict__)
  File "/usr/lib/python2.7/site-packages/marvin/cloudstackAPI/cloudstackAPIClient.py", line 1887, in createNetwork
    response = self.connection.marvinRequest(command, response_type=response, method=method)
  File "/usr/lib/python2.7/site-packages/marvin/cloudstackConnection.py", line 379, in marvinRequest
    raise e
CloudstackAPIException: Execute cmd: createnetwork failed, due to: errorCode: 530, errorText:Failed to implement persistent guest network
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: Test to verify access to loadbalancer haproxy admin stats page
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 763, in test_03_vpc_internallb_haproxy_stats_on_all_interfaces
    self.execute_internallb_haproxy_tests(vpc_offering)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 838, in execute_internallb_haproxy_tests
    applb.sourceipaddress, self.get_ssh_client(vm, 5), settings)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 497, in get_ssh_client
    self.fail("Unable to create ssh connection: " % e)
TypeError: not all arguments converted during string formatting
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: Test to verify access to loadbalancer haproxy admin stats page
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 784, in test_04_rvpc_internallb_haproxy_stats_on_all_interfaces
    self.execute_internallb_haproxy_tests(redundant_vpc_offering)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 838, in execute_internallb_haproxy_tests
    applb.sourceipaddress, self.get_ssh_client(vm, 5), settings)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 497, in get_ssh_client
    self.fail("Unable to create ssh connection: " % e)
TypeError: not all arguments converted during string formatting
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: Test Site 2 Site VPN Across redundant VPCs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_vpn.py", line 1154, in test_01_redundant_vpc_site2site_vpn
    ssh_client = self._get_ssh_client(vm2, self.services, 10)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_vpn.py", line 898, in _get_ssh_client
    self.fail("Unable to create ssh connection: " % e)
TypeError: not all arguments converted during string formatting
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: Test Site 2 Site VPN Across VPCs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_vpn.py", line 787, in test_01_vpc_site2site_vpn
    ssh_client = self._get_ssh_client(vm2, self.services, 10)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_vpn.py", line 497, in _get_ssh_client
    self.fail("Unable to create ssh connection: " % e)
TypeError: not all arguments converted during string formatting
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: test_02_vpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 253, in test_02_vpc_privategw_static_routes
    self.performVPCTests(vpc_off)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 324, in performVPCTests
    self.check_pvt_gw_connectivity(vm1, public_ip_1, vm2.nic[0].ipaddress)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 553, in check_pvt_gw_connectivity
    (vmObj.get_ip(), e)
NameError: global name 'vmObj' is not defined
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: test_03_vpc_privategw_restart_vpc_cleanup (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 265, in test_03_vpc_privategw_restart_vpc_cleanup
    self.performVPCTests(vpc_off, True)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 324, in performVPCTests
    self.check_pvt_gw_connectivity(vm1, public_ip_1, vm2.nic[0].ipaddress)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 553, in check_pvt_gw_connectivity
    (vmObj.get_ip(), e)
NameError: global name 'vmObj' is not defined
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
ERROR: test_04_rvpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 277, in test_04_rvpc_privategw_static_routes
    self.performVPCTests(vpc_off)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 324, in performVPCTests
    self.check_pvt_gw_connectivity(vm1, public_ip_1, vm2.nic[0].ipaddress)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 553, in check_pvt_gw_connectivity
    (vmObj.get_ip(), e)
NameError: global name 'vmObj' is not defined
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a redundant VPC with two networks with two VMs in each network
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 534, in test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL
    self.do_vpc_test(False)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 679, in do_vpc_test
    self.check_ssh_into_vm(vm.get_vm(), vm.get_ip(), expectFail=expectFail, retries=retries)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 523, in check_ssh_into_vm
    self.fail("Failed to SSH into VM - %s" % (public_ip.ipaddress.ipaddress))
AssertionError: Failed to SSH into VM - 192.168.23.5
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a redundant VPC with two networks with two VMs in each network and check default routes
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 559, in test_02_redundant_VPC_default_routes
    self.do_default_routes_test()
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 701, in do_default_routes_test
    (vmObj.get_ip(), e)
AssertionError: SSH Access failed for <marvin.lib.base.PublicIPAddress instance at 0x28606c8>: SSH connection has Failed. Waited 600s. Error is SSH Connection Failed
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a redundant VPC with two networks with two VMs in each network
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 569, in test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers
    self.do_vpc_test(False)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 679, in do_vpc_test
    self.check_ssh_into_vm(vm.get_vm(), vm.get_ip(), expectFail=expectFail, retries=retries)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 523, in check_ssh_into_vm
    self.fail("Failed to SSH into VM - %s" % (public_ip.ipaddress.ipaddress))
AssertionError: Failed to SSH into VM - 192.168.23.5
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a redundant VPC with 1 Tier, 1 VM, 1 ACL, 1 PF and test Network GC Nics
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 587, in test_04_rvpc_network_garbage_collector_nics
    self.do_vpc_test(False)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 679, in do_vpc_test
    self.check_ssh_into_vm(vm.get_vm(), vm.get_ip(), expectFail=expectFail, retries=retries)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 523, in check_ssh_into_vm
    self.fail("Failed to SSH into VM - %s" % (public_ip.ipaddress.ipaddress))
AssertionError: Failed to SSH into VM - 192.168.23.5
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a redundant VPC with 1 Tier, 1 VM, 1 ACL, 1 PF and test Network GC Nics
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 622, in test_05_rvpc_multi_tiers
    self.do_vpc_test(False)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 679, in do_vpc_test
    self.check_ssh_into_vm(vm.get_vm(), vm.get_ip(), expectFail=expectFail, retries=retries)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_redundant.py", line 523, in check_ssh_into_vm
    self.fail("Failed to SSH into VM - %s" % (public_ip.ipaddress.ipaddress))
AssertionError: Failed to SSH into VM - 192.168.23.5
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a VPC with two networks with one VM in each network and test nics after destroy
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_router_nics.py", line 390, in test_01_VPC_nics_after_destroy
    self.check_ssh_into_vm()
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_router_nics.py", line 448, in check_ssh_into_vm
    self.fail("Failed to SSH into VM - %s" % (public_ip.ipaddress.ipaddress))
AssertionError: Failed to SSH into VM - 192.168.23.9
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Create a VPC with two networks with one VM in each network and test default routes
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_router_nics.py", line 414, in test_02_VPC_default_routes
    self.do_default_routes_test()
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_vpc_router_nics.py", line 470, in do_default_routes_test
    (vmObj.get_ip(), e)
AssertionError: SSH Access failed for <marvin.lib.base.PublicIPAddress instance at 0x3239248>: SSH connection has Failed. Waited 600s. Error is SSH Connection Failed
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Test create, assign, remove of an Internal LB with roundrobin http traffic to 3 vm's in a Single VPC
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 599, in test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80
    self.execute_internallb_roundrobin_tests(vpc_offering)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 668, in execute_internallb_roundrobin_tests
    self.setup_http_daemon(vm)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 519, in setup_http_daemon
    self.fail("Failed to ssh into vm: %s due to %s" % (vm, e))
AssertionError: Failed to ssh into vm: <marvin.lib.base.VirtualMachine instance at 0x328e200> due to not all arguments converted during string formatting
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt
FAIL: Test create, assign, remove of an Internal LB with roundrobin http traffic to 3 vm's in a Redundant VPC
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 617, in test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80
    self.execute_internallb_roundrobin_tests(redundant_vpc_offering)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 668, in execute_internallb_roundrobin_tests
    self.setup_http_daemon(vm)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_internal_lb.py", line 519, in setup_http_daemon
    self.fail("Failed to ssh into vm: %s due to %s" % (vm, e))
AssertionError: Failed to ssh into vm: <marvin.lib.base.VirtualMachine instance at 0x34c5560> due to not all arguments converted during string formatting
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_8UJKVV/results.txt

Associated Uploads

/tmp/MarvinLogs/DeployDataCenter__May_12_2016_06_05_04_5V5GEZ:

/tmp/MarvinLogs/test_network_8UJKVV:

/tmp/MarvinLogs/test_vpc_routers_0OKTB0:

Uploads will be available until 2016-07-12 02:00:00 +0200 CEST

Comment created by upr comment.

@swill
Copy link
Contributor

swill commented May 12, 2016

@dsclose we have merge conflicts on this one now. Also, prior to merging the PRs that caused the conflicts, I ran the above CI. You will probably want to review the results of that CI run to fix some things.

@luhaijiao
Copy link

@Dclose I think this PR is important for RvR environment too, particularly to the commits d582358 and bf285e1. Would be very nice to have it in 4.9 if you can review the CI results and fix the issues.

thanks!

@swill
Copy link
Contributor

swill commented May 24, 2016

@dsclose if you can fix the merge conflicts I can run this again and see what is outstanding. Thanks...

@QuiteClose
Copy link
Contributor Author

@swill taking a look now.

@QuiteClose QuiteClose closed this May 26, 2016
@QuiteClose QuiteClose reopened this May 26, 2016
@swill
Copy link
Contributor

swill commented May 26, 2016

@dsclose sorry to do this to you. Can you close and reopen again to kick off the jobs again? Thanks...

@QuiteClose QuiteClose closed this May 26, 2016
@QuiteClose QuiteClose reopened this May 26, 2016
@QuiteClose
Copy link
Contributor Author

@swill no worries. Happy to do that - the CI output was vast and I wasn't looking forward to combing through it.

@swill
Copy link
Contributor

swill commented May 26, 2016

I will retest this one. Thanks for kicking it off again, we are green now. :)

@swill
Copy link
Contributor

swill commented May 26, 2016

I actually had a test running overnight for this one and it had similar results.

Tests Run: 85
  Skipped: 0
   Failed: 14
   Errors: 6
 Duration: 9h 36m 41s

I think this one will need some work still. I will run it again to see what we get as a result and I will post the next run so you know what the latest issues are.

@QuiteClose
Copy link
Contributor Author

QuiteClose commented May 27, 2016

@swill agreed; in particular, my note concerning the merge of lines 299 and 300 of CsAddress.py - i've not even tested that locally - not had a chance to even figure out what it's doing!

@luhaijiao
Copy link

@dsclose we installed your updated commits d582358 and bf285e1 ONLY in our environment to solve the RvR issue (VR network services intermittent hang due to public interface is up on backup VR), however, it's not working as expected, the status of eth2 is still up on backup VR. Do we need install all the commits ? or it's probably our environment issue ?

Besides, if we install all the commits, the port forwarding and VPN seems getting broken. We are investigating more.

@QuiteClose
Copy link
Contributor Author

QuiteClose commented May 27, 2016

@luhaijiao I'd recommend trying (edit) c970a04

Port forwarding works for me. I've not tried the VPN functionality.

@bvbharatk
Copy link
Contributor

ACS CI BVT Run

Sumarry:
Build Number 182
Hypervisor xenserver
NetworkType Advanced
Passed=67
Failed=6
Skipped=3

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0

Failed tests:

  • test_vpc_vpn.py
    • ContextSuite context=TestRVPCSite2SiteVpn>:setup Failing since 18 runs
    • ContextSuite context=TestVpcRemoteAccessVpn>:setup Failing since 18 runs
    • ContextSuite context=TestVpcSite2SiteVpn>:setup Failing since 18 runs
  • test_routers_iptables_default_policy.py
    • test_01_single_VPC_iptables_policies Failed
  • test_volumes.py
    • test_06_download_detached_volume Failed
  • test_vm_life_cycle.py
    • test_10_attachAndDetach_iso Failed

Skipped tests:
test_vm_nic_adapter_vmxnet3
test_static_role_account_acls
test_deploy_vgpu_enabled_vm

Passed test suits:
test_deploy_vm_with_userdata.py
test_affinity_groups_projects.py
test_portable_publicip.py
test_over_provisioning.py
test_global_settings.py
test_scale_vm.py
test_service_offerings.py
test_routers.py
test_reset_vm_on_reboot.py
test_snapshots.py
test_deploy_vms_with_varied_deploymentplanners.py
test_login.py
test_list_ids_parameter.py
test_public_ip_range.py
test_multipleips_per_nic.py
test_regions.py
test_affinity_groups.py
test_network_acl.py
test_pvlan.py
test_nic.py
test_deploy_vm_root_resize.py
test_resource_detail.py
test_secondary_storage.py
test_disk_offerings.py

@QuiteClose
Copy link
Contributor Author

I'd recommend merging this if the tests are passing. There is an issue with hairpin NATs which can be solved by adding a route to the guest subnet (scope link) in each routing table. This is easy to do on Redundant routers (as routes are handled in one spot in the transition to master) but needs a bit more work on standalone routers - it looks like the guest subnet and device need to be passed down from CsIP.process all the way to CsAddress.post_change_config.

Unfortunately I won't be available to do further work on this PR.

@rohityadavcloud
Copy link
Member

We should get this tested and merged for 4.9 /cc @swill
@abhinandanprateek @murali-reddy @JayapalUradi can you help review this and advise any further changes, thanks.

@swill
Copy link
Contributor

swill commented Jul 25, 2016

I tried hard to get this one in, and I just don't see it happening...

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-222

@rohityadavcloud
Copy link
Member

@murali-reddy @abhinandanprateek ping, please review this one as well, thanks.

@abhinandanprateek
Copy link
Contributor

@lgtm on code review.

@QuiteClose
Copy link
Contributor Author

PR #1659 appears to have superseded this PR. As such I'm concerned about recent activity on this PR.

Is anyone available to clarify what should be done? I'd anticipated closing this PR when PR #1659 matured.

@rohityadavcloud
Copy link
Member

Thanks @dsclose can you help review PR #1659 and see that all of your changes are ported too, in which you may close your PR.

asfgit pushed a commit that referenced this pull request Dec 7, 2016
…non_vpc

CLOUDSTACK-9339 Virtual Routers don't handle Multiple Public Interfaces correctlyAs pointed out in CLOUDSTACK-9339, in case of multiple public IP's from different public IP ranges are associated with VR, VR functionality is broken from 4.6. Below are the brief list of problems specific to non-VPC networks addressed in the PR. This PR handles both VPC and non-VPC scenarios.
- reverse traffic for the connections accepted on the eth3 and above public interfaces are getting blocked. Need a rule for e.g "-A FORWARD -i  eth3 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT" in the FORWARD chain of filter table to permit reverse path traffic for established connections.
- outbound public traffic from eth0 to eth3 (or for interfaces above like eth4 eth5 etc) needs rule to run through FW_OUTBOUND chain in the filter table
- network stats on public interfaces eth3 are getting gathered
- default gateway is missing in the device specific routing table, resulting in traffic to be looked up in main routing table
- creating a device specific route table is generating "from all lookup Table_eth3" in the
  ip rules, resulting in rest of the traffic getting blocked.

Picked few commits from #1519 from dsclose (#1519) submitted for 4.7

Marvin tests are added to test below
- Static NAT works on the public interfaces above eth2, in case non-vpc networks
- Portforwarding works on the public interfaces above eth2, in case non-vpc networks
- Route tables are configured as expected for the device specific table for the public interfaces above eth2, in case non-vpc networks
- IP tables rules are as expected for the traffic from and to the public interfaces above eth2, in case non-vpc networks

* pr/1659:
  CLOUDSTACK-9339 Virtual Routers don't handle Multiple Public Interfaces correctly

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
@murali-reddy
Copy link
Contributor

@dsclose #1659 is merged, so you can close this PR if you wish. Though issue related to redundant VR fixed by commit d582358, bf285e1 in this PR are not addressed by #1659, and still a open issue.

@QuiteClose
Copy link
Contributor Author

@murali-reddy I haven't worked on Cloudstack for many months but one thing I do recall; without d582358 and bf285e1, networks with redundant virtual routers will simply not work.

Whether the commits are appropriate for the current codebase, however, is unclear. I have no way of testing them so I shall not be confusing matters by raising a speculative PR.

I shall close this PR now.

@QuiteClose QuiteClose closed this Dec 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.