Conversation
… de nuevas apps y aumento del límite de campos
…s, textos con idioma y manejo flexible de fechas
…ón de referencias
…s y ampliación de tipos soportados (confproc, full_text, etc.)
…istas de búsqueda, utilidades y hooks de Wagtail
…o, utilidades y hooks de Wagtail
…ones OMML a MathML
…es de inferencia, tareas y hooks de Wagtail
…s de procesamiento de datos
…ial.py y eliminación de migraciones intermedias
…n de Django y traducción de verbose_name a inglés
Corrige el tipo de excepción para responder 404 cuando el registro no existe.
…nlaces Reduce ruido en logs y mantiene la función enfocada a su retorno.
Mejora legibilidad y buenas prácticas de manejo de errores.
…a prompt de referencias Se agregan comillas a campos textuales y se corrigen comas/keys para evitar errores de parseo del prompt.
Permite traducción de 'Mixed Citation' y 'Rating from 1 to 10'.
…eference status' (incluye migraciones)
- function_llama passou a ser LlamaInputSettings em llama.py - generic_llama passou a ser llama.py com LlamaService
There was a problem hiding this comment.
Pull Request Overview
This PR implements a merge of markup functionality, introducing AI-powered document processing capabilities for converting DOCX files to structured XML format. The changes integrate LLM services (both Llama and Gemini) for metadata extraction, reference parsing, and content labeling.
Key changes:
- Adds new
model_aiandmarkup_docapplications for AI model management and document processing - Integrates Google Generative AI and python-docx libraries
- Refactors reference processing to use
data_utilsmodule instead of tasks - Renames classes for consistency (e.g.,
ReferenceAdmin→ReferenceModelViewSet)
Reviewed Changes
Copilot reviewed 55 out of 66 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Adds dependencies for AI/ML (google-generativeai, langid) and document processing (python-docx) |
| reference/wagtail_hooks.py | Refactors import paths and renames admin class for consistency |
| reference/models.py | Adds ReferenceStatus enum and replaces raw integer field with structured status |
| reference/marker.py | Updates import path from llama3.generic_llama to model_ai.llama |
| reference/data_utils.py | Updates to use ReferenceStatus enum instead of raw integer |
| reference/api/v1/views.py | Updates to use ReferenceStatus enum and new import path |
| model_ai/* | New application for managing AI models (Llama/Gemini) with download functionality |
| markup_doc/* | New application for DOCX-to-XML conversion with AI-powered content extraction |
| markuplib/* | Adds DOCX processing utilities and OMML-to-MathML transformation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| ], | ||
| "date": c2005, | ||
| "source": "Advanced practice nursing: an integrative approach", | ||
| "edition: "3rd ed", |
There was a problem hiding this comment.
Corrected missing closing quote in 'edition' key. Should be \"edition\": \"3rd ed\".
| "date": 1995, | ||
| "source": "Inflammatory bowel disease", | ||
| "chapter": "The epidemiology of idiopathic inflammatory bowel disease.", | ||
| "edition: "4th", |
There was a problem hiding this comment.
Corrected missing closing quote in 'edition' key. Should be \"edition\": \"4th\".
| 'uri': {'type': 'string'}, | ||
| 'access_date': {'type': 'string'}, | ||
| 'version': {'type': 'string'}, | ||
| "full_text": {"type": "integer"}, |
There was a problem hiding this comment.
The 'full_text' field should have type 'string' not 'integer', as it contains citation text.
| "full_text": {"type": "integer"}, | |
| "full_text": {"type": "string"}, |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 55 out of 66 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| marked_xml=etree.tostring(get_xml(i), pretty_print=True, encoding='unicode') | ||
| ) | ||
| obj_reference.estatus = 2 | ||
| obj_reference.estatus = ReferenceStatus.READY |
There was a problem hiding this comment.
The code uses obj_reference.estatus but the model field was renamed to status in reference/models.py. This will cause an AttributeError. Change estatus to status.
| obj_reference.estatus = ReferenceStatus.READY | |
| obj_reference.status = ReferenceStatus.READY |
| new_reference = Reference.objects.create( | ||
| mixed_citation=post_reference, | ||
| estatus=1, | ||
| estatus=ReferenceStatus.CREATING, |
There was a problem hiding this comment.
The field name estatus should be status to match the renamed field in the Reference model.
| estatus=ReferenceStatus.CREATING, | |
| status=ReferenceStatus.CREATING, |
| 'uri': {'type': 'string'}, | ||
| 'access_date': {'type': 'string'}, | ||
| 'version': {'type': 'string'}, | ||
| "full_text": {"type": "integer"}, |
There was a problem hiding this comment.
The full_text field is defined with type 'integer' but should be 'string' since it contains text content, not numeric values.
| "full_text": {"type": "integer"}, | |
| "full_text": {"type": "string"}, |
|
|
||
| def form_valid(self, form): | ||
| self.object = form.save_all(self.request.user) | ||
| self.object.estatus = ProcessStatus.PROCESSING |
There was a problem hiding this comment.
The field name estatus should be status based on the pattern of renaming fields from Spanish to English throughout the codebase.
| self.object.estatus = ProcessStatus.PROCESSING | |
| self.object.status = ProcessStatus.PROCESSING |
| instance.content_body = stream_data_body | ||
| # Guardar el XML en el campo `file_xml` | ||
| #archive_xml = ContentFile(xml) # Crea un archivo temporal en memoria | ||
| instance.estatus = ProcessStatus.PROCESSED |
There was a problem hiding this comment.
The field name estatus should be status to maintain consistency with the model field naming convention.
| instance.estatus = ProcessStatus.PROCESSED | |
| instance.status = ProcessStatus.PROCESSED |
Em produção, deve-se definir a variável de ambiente
LLAMA_ENABLED=True