Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using backend to delete cascade nodes #2969

Closed
stellanl opened this issue Feb 13, 2019 · 2 comments
Closed

Stop using backend to delete cascade nodes #2969

stellanl opened this issue Feb 13, 2019 · 2 comments
Assignees

Comments

@stellanl
Copy link
Contributor

Part of #2694.

@stellanl
Copy link
Contributor Author

stellanl commented Feb 13, 2019

/app_worker/dataprocessor?action=deleteCascadeNodes
is called from three places:
com.gallatinsystems.survey.dao.CascadeNodeDao.deleteRecursive()
com.gallatinsystems.survey.dao.CascadeResourceDao.delete()
org.waterforpeople.mapping.app.web.DataProcessorRestServlet.scheduleCascadeNodeDeletion()

Deleting tree nodes is done in a nicely recursive way.
A fundamental problem is that the timing it is not predictable; the cascade may split into an arbitrary number of branches at any level. Call that number n.
The current algorithm fetches all nodes at the current level (1 datastore lookup) and then for each of them checks if it has children (n lookups). Parents are recursively scheduled on the backend task queue. Those without children (0-100% of n) are deleted (1 mass datastore deletion).
I assume the time taken is dominated by datastore ops, making the worst case (all children) be f(2n+1)

Unless the task was sent to the backend "just in case", this must be what can overrun the task queue's 600s limit. I see two ways to cut it down:

  1. Schedule all deletions on the task queue, regardless of leaf or not. Then there are always just 2 datastore ops per task. WIll take longer.
  2. Fetch the entire cascade into memory and calculate the list of nodes to be deleted. Only 2 datastore ops in total, but the memory required may be excessive.

@stellanl
Copy link
Contributor Author

stellanl commented Feb 13, 2019

After a few tests I believe nothing needs to be optimised. I deleted a huge cascade, and none of the tasks took>300ms (the last ones that should have had most of the leaves). Most were around 100ms.
I also created a single-level cascade with 1000 leaves. Deleting that node took just 250ms.
To be clear, though, this was when alone on the akvoflowsandbox instance.

stellanl added a commit that referenced this issue Feb 13, 2019
stellanl added a commit that referenced this issue Feb 15, 2019
@muloem muloem added this to the 1.9.44 Q... Q... milestone Mar 18, 2019
muloem added a commit that referenced this issue Mar 19, 2019
@muloem muloem closed this as completed Mar 22, 2019
finnfiddle added a commit that referenced this issue Mar 29, 2019
* develop: (48 commits)
  [#3026]Stop updating the backend.
  [#3024] release notes
  Update bootstrap-deploy.sh
  [#3022]Avoid NPE if no user found.
  [#3018] release notes
  [#3018]Initial version of release notes.
  [#2969] Remove whitespace
  [#2971]Extract constants. Remove trailing whitespace.
  [#3017] Add NSS package
  [#3017] Add NSS package
  [#3017] Add NSS package
  [#3017] Add NSS package
  [#2970]Remove trailing whitespace.
  [#2970]Formatting...
  [#2802] unused function
  [#2802] do it the ember way
  [#2694] Remove backends configuration file
  [#2696] Refactor cascade node deletion
  [#2694] Remove backends deployment code
  [#2971]Stop calling taks on backend. Remove unused stuff.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants