Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Optimize HSTU training and sampling process #93

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

iWelkin-coder
Copy link
Collaborator

No description provided.

@@ -40,3 +40,12 @@ docs/source/intro.md
docs/source/proto.html

.vscode/
graphlearn*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these

@@ -201,8 +201,8 @@ def _get_dataloader(
dataloader = DataLoader(
dataset=dataset,
batch_size=None,
pin_memory=data_config.pin_memory if mode != Mode.PREDICT else False,
collate_fn=lambda x: x,
# pin_memory=data_config.pin_memory if mode != Mode.PREDICT else False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

@@ -237,6 +240,7 @@ def launch_sampler_cluster(
multival_sep=self._fg_encoded_multival_sep
if self._fg_mode == data_pb2.FgMode.FG_NONE
else chr(29),
seq_str_delim=self._seq_str_delim,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq_str_delim -> item_id_delim, should be param of sampler_config

features = self._parse_nodes(nodes)
result_dict = dict(zip(self._attr_names, features))
return result_dict
# ids = np.pad(ids, (0, self._batch_size - len(ids)), "edge")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert it

@@ -338,6 +338,8 @@ def _train_and_evaluate(
ckpt_path: Optional[str] = None,
eval_result_filename: str = "train_eval_result.txt",
) -> None:
torch.backends.cuda.matmul.allow_tf32 = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should allow tf32?

self._loss_collection, self.item_tower.group_variational_dropout_loss
)
batch_sparse_features = batch.sparse_features["__BASE__"]
nonzero_indices = torch.where(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should we need nonzero_indices?

)[0]
default_value = torch.tensor([-1]).to(nonzero_indices.device)
batch_size = torch.cat([nonzero_indices, default_value]).max() + 1
neg_sample_size = batch_sparse_features.lengths()[-1] - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments detaily explain the design

@@ -181,18 +181,28 @@ def sim(
user_emb: torch.Tensor,
item_emb: torch.Tensor,
neg_for_each_sample: bool = False,
is_hstu: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not modify the sim, overwrite sim in hstu.py, and explain the logic in comments

@@ -317,6 +321,92 @@ def _build_batch(self, input_data: Dict[str, pa.Array]) -> Batch:
input_data = _expand_tdm_sample(
input_data, pos_sampled, neg_sampled, self._data_config
)
elif self._enable_hstu:
seq_attr = self._sampler._item_id_field
if pa.types.is_string(input_data[seq_attr].type):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make line 326-376 a function and move it to datasets/utils.py. Add comments to explain the logic and add a unit test for the function.

pa.array(input_data_k_split.offsets.to_numpy()[1:] - 1)
)
sampled = self._sampler.get(input_data)
for k, v in sampled.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make line 377-409 a function and move it to datasets/utils.py. Add comments to explain the logic and add a unit test for the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants