You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem:
While processing Excel file (XLSX) workbook data are not passed to the LLM (only sheet name is used).
Analysis:
Document loader 'DocumentLoaderSpreadSheet' produces a page with 'content' containing Sheet name. Workbook data are stored in the 'data' key of the page_dict ('document_loader_spreadsheet.py' lines 88...94).
Invoking Extractor.extract(...) with that document loader we will loose 'data' information because of Extractor._map_to_universal_format(...) -> extractor.py lines 276 ... 277: loaded_content = loader.load(source) unified_content = self._map_to_universal_format(loaded_content, vision)
Function Extractor._map_to_universal_format do not use 'data' key (of parsed spreadsheet) while building unified 'content'. Probably missing support for 'is_spreadsheet' flag when processing page information.
How to reproduce the problem:
Try to use document loader called 'DocumentLoaderSpreadSheet' with some Excel file to extract data using Extractor.
The text was updated successfully, but these errors were encountered:
Problem:
While processing Excel file (XLSX) workbook data are not passed to the LLM (only sheet name is used).
Analysis:
loaded_content = loader.load(source) unified_content = self._map_to_universal_format(loaded_content, vision)
How to reproduce the problem:
Try to use document loader called 'DocumentLoaderSpreadSheet' with some Excel file to extract data using Extractor.
The text was updated successfully, but these errors were encountered: