Skip to content

Conversation

@RodrigoDLopez
Copy link
Contributor

@RodrigoDLopez RodrigoDLopez commented Aug 24, 2020

Description

On PR#2848 the author added the method preStorageMigrationStateCheck that will check tags when the operator runs migrateVirtualMachine command. We do not need this kind of check, because when running migrations, the operator must have the ability to migrate volumes to the destination pool, and assuming the risk/responsibilities of migrations to different QoS primary storage systems as proposed and implemented in Fix limitation on tag matching in 'migrateVolume' with disk offering replacement
Therefore, this PR removes the unnecessary method and permits the operator to run migrate commands without any kind of tags check, as it should be.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

To test this, I used the cloudmonkey to run migrateVirtualMachine command

@RodrigoDLopez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@RodrigoDLopez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-1814

@RodrigoDLopez RodrigoDLopez deleted the removes_unnecessary_validation_when_migrating_virtual_machine branch August 30, 2020 01:52
@RodrigoDLopez RodrigoDLopez restored the removes_unnecessary_validation_when_migrating_virtual_machine branch August 30, 2020 14:42
@RodrigoDLopez RodrigoDLopez reopened this Aug 30, 2020
Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RodrigoDLopez this removes tags checking altogether? This will allow migrating volumes to storages which are not even suitable wrt tags?

@rafaelweingartner
Copy link
Member

@RodrigoDLopez this removes tags checking altogether? This will allow migrating volumes to storages which are not even suitable wrt tags?

Exactly that. This was discussed and developed within a series of PRs I created some time ago, and it was broken while merging forward. Therefore, @RodrigoDLopez is fixing part of the broken code with this PR. These are the following PRs that were used to deliver this feature:

The implementation was broken down into several different PRs to easy development and testing. Moreover, the feature was already discussed, approved, and merged a long time ago and there are already people using it. What @RodrigoDLopez is doing is just fixing this feature that was broken during a merge forward from 4.11 that was executed on 4.12.

@DaanHoogland
Copy link
Contributor

@rafaelweingartner what branch should this go on (since it was broken in 4.12 you say)

@RodrigoDLopez
Copy link
Contributor Author

Hi @DaanHoogland
I think it's okay to merge this branch just on the master.

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland maybe we need to do some manual testing here. My concern is this puts too much responsibility on the operator if we are skipping tag matching. I would suggest adding a new parameter in migrateVolume API (forced or something) which would bypass the matching tags.
Method preStorageMigrationStateCheck also changes VM state to Migrating and without it that might break.
Bypassing tag matching for volumes will need changes at other places as well such as in the flow of migrateVirtualMachineWithVolume API. In that API whole migration at times depend on suitable storage pools for the volume. Will we change that behaviour as well in future?

cc @RodrigoDLopez @rafaelweingartner @rhtyd

@rafaelweingartner
Copy link
Member

@shwstppr to me it is very simple. It is a feature to provide flexibility and power to operators. It was already discussed with the community, accepted, merged, and released to people. Moreover, there are CloudStack deployments that take advantage of that. Operators must be responsible and aware of the actions they take. Therefore, ACS must fix it (which is what @RodrigoDLopez is doing). If we want to remove this feature, we need a new discussion, and this will break API compatibility

@DaanHoogland I do not have any preference on what version this fix must go in. However, I see it as I explained. It is a feature that was proposed, discussed, and accepted. People use it. And therefore, we should maintain it as designed and sold to the community and documented.

@rohityadavcloud rohityadavcloud requested review from sureshanaparti and removed request for PaulAngus October 20, 2020 11:25
@DaanHoogland
Copy link
Contributor

@sureshanaparti @shwstppr @rhtyd can we merge? Or do you want to re-start the discussion?

@RodrigoDLopez
Copy link
Contributor Author

@rhtyd, @DaanHoogland, @PaulAngus, @shwstppr

Hello guys, happy new year.

Is there something missing here? Everything seems to be ok with the patch. it has no errors, and tests are passing.

@PaulAngus
Copy link
Member

@RodrigoDLopez

  1. the tests above are regression tests. They therefore cannot check whether the feature works when used as they can only look backwards. They are also only one small part of the testing rigour that should be applied to any new feature.

  2. the community is working RCs for the 4.15 release, and certainly, I have not heard that master is available to add new features.

  3. The tag validations are there to stop the service offerings from being broken - if you allow anyone to move VMs or storage contrary to the applied tags they are highly likely to pollute the database with resources in places that their service offerings say that they can't be.

  4. If the problem is with moving between conflicting tags, then there needs to be a mechanism to facilitate changing the service offerings applied to a VM/volume, where the VM/storage is automatically migrated to matching hosts/storage; keeping the integrity of the database intact.

@rafaelweingartner
Copy link
Member

@PaulAngus it is not that complicated. @RodrigoDLopez is not introducing a feature, but rather fixing a bug. The feature was introduced via #2636. It was first released on ACS 4.12. If you check the title of the PR that introduced it "Fix limitation on tag matching in 'migrateVolume' with disk offering replacement", and its code, you will see that there is a mechanism to normalize/update the disk offering in the DB to match the target storage. You can even check this option in the API documentation (https://cloudstack.apache.org/api/apidocs-4.14/apis/migrateVolume.html).

The feature was introduced to allow operators to have means to move the volume around, even if the tags do not match. This allows tasks such as troubleshooting performance problems, or validating if a storage tier upgrade would improve application performance. There was a series of PRs that I created to address these needs two years ago or so. They all have been reviewed, approved and merged, and were released. However, it happens to be that this feature was broken at some point in time, and @RodrigoDLopez is fixing this.

Also, there was an e-mail today stating "[ANNOUNCE] master and 4.15 branches open for merge". Therefore, we expected that to be open for everybody, especially for a bug fix such as this one, that was not even raised as a blocker. We know people using this feature, which required some internal patching to make it work on 4.13 for instance.

Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both code and feature are good, API documentation also describes behavior that matches with this PR fix proposal.

LGTM based on code review.

Side-note: this fixes a regression bug, as we can see on the presented commit history. Should this PR also be marked as blocker and added in time for RC4?

@PaulAngus
Copy link
Member

thank you I didn't see the email this morning.

I still disagree with the premise; and having the facility to change the disk offering at the same time gives no guarantee that the disk offering and the new storage placement will match.

I cannot see how an API that has the ability to break the integrity of the database is justifiable.

@rafaelweingartner
Copy link
Member

rafaelweingartner commented Jan 4, 2021

Please, check the code, and the cited PRs, you can check there in the PR I created there is a method called doesTargetStorageSupportNewDiskOffering (

if (!doesTargetStorageSupportDiskOffering(destPool, diskOffering)) {
). I have not checked if somebody else also broke this check on master, but the code I created (please, refer to the PR, you can see the code there) checks if the new disk offering matches the target storage. However, that is not mandatory, as I said, it can be used as a means to facilitate operational tasks, and it must be used wisely by operators. Moreover, it has been merged, and it is in use by people. I really do not see why so much resistance from you guys here.

@PaulAngus
Copy link
Member

@rafaelweingartner
In your response, you say that the disk offering check is not mandatory, therefore the database integrity problem still exists.

I can't speak for anyone else, but my objection is the perpetuation of the database integrity issue. I only comment now because I wasn't pinged on the other PRs, I was on this one. If I'd seen this before, I would have objected before.

Also, there is no double jeopardy, there is nothing to stop any piece of code being revisited at a later date.

@rafaelweingartner
Copy link
Member

@PaulAngus I disagree. Your concern is only valid if (and only if) the operator is careless, which they should not be. This API is only available (should only be) to root admin.

If we revise this piece of code, we break API compatibility, which has already been broken as a matter of fact. Right now, it is considered as a bug, but if you want to remove this feature, then this has many other implications such as versioning.

@rafaelweingartner
Copy link
Member

One alternative would then be to forbid the override placement. And then, we allow changing the target storage pool, as long as the new disk offering matches it (or the current disk offering, if there is no new disk offering).

What do you think? Then, the DB would always be consistent with the storage where the volume resides.

@PaulAngus
Copy link
Member

There are many ways an admin can wreck their environment which are out of our control. This is within our control. And i don't believe there is another API that allows the database to be corrupted in this way.

Thats my objection. IMO these checks are not unnecessary, but vital.

I haven't -1'd this.
So if there enough others who aren't worried, I'm not stopping anyone.

@rafaelweingartner
Copy link
Member

rafaelweingartner commented Jan 4, 2021

just one thing needs to be clear, it does not corrupt the DB. The volume table shows the correct storage pool. The only thing is that this storage pool might not have have tags of the volume's disk offering (in case of disk placement override by the operator). However, this can be fixed if the operator uses the parameter newDiskOffering.

No DB will ever be corrupted by this feature. There is only the possibility to apply some override in case of an emergency or some necessity by the operator. Nothing will break! Tags on pools and volume are merely logical constraints applied by ACS, and nothing more.

@PaulAngus
Copy link
Member

Unless I've read the thread incorrectly (and I hope that I have):

The aim of the feature is to allow an admin to move a volume from storage pool A (with tag A) to storage pool B (with tag B), despite the fact that the volume's disk offering tag (A) doesn't match the destination tag (B).

If this is done, then the DB says that the volume should be in a pool with tag A when it's actually on storage with tag B.
Not only is the database then inconsistent with reality, but it will also corrupt the billing of the volume.

If I've understood this correctly, then...

This could be fixed by making it mandatory to supply a 'correct' new disk offering if the migration would otherwise be blocked by the tag mismatch. The disk offering change should only be made upon successful migration.

@rafaelweingartner
Copy link
Member

@PaulAngus it is your opinion. In my opinion, if the root wants to do a disk placement override, she/he should be able to do so. That is how it was presented for the community, when this feature was first introduced via #2425, then we had a series of other patches to improve this feature, one of the PRs introduced the new option in the API to update the DB (#2486), if the root admin wants to make a permanent change. The feature was presented to the community as a method to empower/enable root admins to do disk placement override in their daily tasks. Then, to normalize this situation, the root admin can use the option to replace the disk offering.

I really see no harm in this API.

@RodrigoDLopez
Copy link
Contributor Author

@rhtyd @shwstppr @sureshanaparti @PaulAngus

Hello to everyone, can I have more reviews/tests here?
I think this is a good one. And I'm just fixing something that at some point was discussed with the community and it was working as @rafaelweingartner said and showed in previous comments and that can also be seen in my description of this PR.
I don't know why is taking so long to analyze and merge this PR

@shwstppr
Copy link
Contributor

@RodrigoDLopez I understand this could be a use-case for operators as explained in previous comments but I cannot agree to it in its current form due to the reasons highlighted in Paul's comment. Also, completely doing away tag matching for a particular use-case is not appropriate in my opinion. If this is done on one hand we will have migrateVirtualMachineWithVolumes API which depends on storage tag matching for finding suitable storage pools for volume (when pools for a volume is not passed by the user) and on the other migrateVirtualMachine API which completely ignores storage tags
https://github.com/apache/cloudstack/blob/master/engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java#L2824

@weizhouapache
Copy link
Member

for vms, we use "cmk find hostsformigration" to list all hosts, it also mentions if the hosts are suitable or not for migration.
if users try to migrate vm to a unsuitable host, cloudstack still allows it.

for volumes, it looks like "find storagepoolsformigration" also mark storage pools with unmatched tag as "unsuitable".
https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/allocator/ZoneWideStoragePoolAllocator.java#L72
https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/allocator/ClusterScopeStoragePoolAllocator.java#L81
https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/allocator/LocalStoragePoolAllocator.java#L105

@rafaelweingartner @RodrigoDLopez could you please confirm it ?
if so, I think we could remove the check.

@RodrigoDLopez
Copy link
Contributor Author

for vms, we use "cmk find hostsformigration" to list all hosts, it also mentions if the hosts are suitable or not for migration.
if users try to migrate vm to a unsuitable host, cloudstack still allows it.

for volumes, it looks like "find storagepoolsformigration" also mark storage pools with unmatched tag as "unsuitable".
https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/allocator/ZoneWideStoragePoolAllocator.java#L72
https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/allocator/ClusterScopeStoragePoolAllocator.java#L81
https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/allocator/LocalStoragePoolAllocator.java#L105

@rafaelweingartner @RodrigoDLopez could you please confirm it ?
if so, I think we could remove the check.

@weizhouapache Exactly! Moreover, even though the override is a possibility, if the operator defines the "newDiskOffering" option, the tags are checked (tags of the new target storage pool with the new disk offering being set). The only situation that the tags are not checked is during a disk placement override that operators might execute in case of emergency or maybe for testing purposes.
The goal here is to enable/empower/provide flexibility for root Admins. They are the root admins of the cloud infrastructure, and they probably know best (better than me at least) how to use/manage their cloud infrastructure.

@GabrielBrascher
Copy link
Member

@RodrigoDLopez can you please resolve the conflicts?

@GabrielBrascher
Copy link
Member

GabrielBrascher commented Mar 5, 2021

The way I see there is no risk. The migration command is clear about overriding offerings.

  1. if the ADMIN does not want to override the tags, validation will happen as expected.
  2. otherwise, if the ADMIN wants to do such migration, the offerings are updated accordingly to one of the offerings that comply with the target storage pool. Thus, allowing the volume migration to the target pool with no harm.

NOTE: I would like to stress how important it is to allow migrating volumes between storage pools as well as overriding disk offerings. We have some annoying issues in production due to such restrictions. For now, we have to manually run MySQL queries any time that such a migration operation is necessary; thus, betting that there will be no human mistakes compromising the DB consistency.

@RodrigoDLopez
Copy link
Contributor Author

@RodrigoDLopez can you please resolve the conflicts?

Thanks for the review.
I carried out the correction of conflicts as requested.

@PaulAngus
Copy link
Member

PaulAngus commented Jun 11, 2021 via email

@RodrigoDLopez RodrigoDLopez deleted the removes_unnecessary_validation_when_migrating_virtual_machine branch September 9, 2021 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants