Skip to content

Adiciona módulo Markup#35

Open
pitangainnovare wants to merge 44 commits intoscieloorg:mainfrom
pitangainnovare:eduranm-merge_markup
Open

Adiciona módulo Markup#35
pitangainnovare wants to merge 44 commits intoscieloorg:mainfrom
pitangainnovare:eduranm-merge_markup

Conversation

@pitangainnovare
Copy link
Copy Markdown
Contributor

@pitangainnovare pitangainnovare commented Oct 20, 2025

Em produção, deve-se definir a variável de ambiente

LLAMA_ENABLED=True

eduranm and others added 30 commits September 26, 2025 10:15
… de nuevas apps y aumento del límite de campos
…s, textos con idioma y manejo flexible de fechas
…s y ampliación de tipos soportados (confproc, full_text, etc.)
…istas de búsqueda, utilidades y hooks de Wagtail
…ial.py y eliminación de migraciones intermedias
…n de Django y traducción de verbose_name a inglés
Corrige el tipo de excepción para responder 404 cuando el registro no existe.
…nlaces

Reduce ruido en logs y mantiene la función enfocada a su retorno.
Mejora legibilidad y buenas prácticas de manejo de errores.
…a prompt de referencias

Se agregan comillas a campos textuales y se corrigen comas/keys para evitar errores de parseo del prompt.
Permite traducción de 'Mixed Citation' y 'Rating from 1 to 10'.
Copilot AI review requested due to automatic review settings October 20, 2025 20:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a merge of markup functionality, introducing AI-powered document processing capabilities for converting DOCX files to structured XML format. The changes integrate LLM services (both Llama and Gemini) for metadata extraction, reference parsing, and content labeling.

Key changes:

  • Adds new model_ai and markup_doc applications for AI model management and document processing
  • Integrates Google Generative AI and python-docx libraries
  • Refactors reference processing to use data_utils module instead of tasks
  • Renames classes for consistency (e.g., ReferenceAdminReferenceModelViewSet)

Reviewed Changes

Copilot reviewed 55 out of 66 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
requirements/base.txt Adds dependencies for AI/ML (google-generativeai, langid) and document processing (python-docx)
reference/wagtail_hooks.py Refactors import paths and renames admin class for consistency
reference/models.py Adds ReferenceStatus enum and replaces raw integer field with structured status
reference/marker.py Updates import path from llama3.generic_llama to model_ai.llama
reference/data_utils.py Updates to use ReferenceStatus enum instead of raw integer
reference/api/v1/views.py Updates to use ReferenceStatus enum and new import path
model_ai/* New application for managing AI models (Llama/Gemini) with download functionality
markup_doc/* New application for DOCX-to-XML conversion with AI-powered content extraction
markuplib/* Adds DOCX processing utilities and OMML-to-MathML transformation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

],
"date": c2005,
"source": "Advanced practice nursing: an integrative approach",
"edition: "3rd ed",
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected missing closing quote in 'edition' key. Should be \"edition\": \"3rd ed\".

Copilot uses AI. Check for mistakes.
"date": 1995,
"source": "Inflammatory bowel disease",
"chapter": "The epidemiology of idiopathic inflammatory bowel disease.",
"edition: "4th",
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected missing closing quote in 'edition' key. Should be \"edition\": \"4th\".

Copilot uses AI. Check for mistakes.
Comment thread reference/config.py
'uri': {'type': 'string'},
'access_date': {'type': 'string'},
'version': {'type': 'string'},
"full_text": {"type": "integer"},
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'full_text' field should have type 'string' not 'integer', as it contains citation text.

Suggested change
"full_text": {"type": "integer"},
"full_text": {"type": "string"},

Copilot uses AI. Check for mistakes.
@robertatakenaka robertatakenaka changed the title Eduranm merge markup Adiciona módulo Markup Oct 20, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 55 out of 66 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread reference/data_utils.py
marked_xml=etree.tostring(get_xml(i), pretty_print=True, encoding='unicode')
)
obj_reference.estatus = 2
obj_reference.estatus = ReferenceStatus.READY
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses obj_reference.estatus but the model field was renamed to status in reference/models.py. This will cause an AttributeError. Change estatus to status.

Suggested change
obj_reference.estatus = ReferenceStatus.READY
obj_reference.status = ReferenceStatus.READY

Copilot uses AI. Check for mistakes.
Comment thread reference/api/v1/views.py
new_reference = Reference.objects.create(
mixed_citation=post_reference,
estatus=1,
estatus=ReferenceStatus.CREATING,
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name estatus should be status to match the renamed field in the Reference model.

Suggested change
estatus=ReferenceStatus.CREATING,
status=ReferenceStatus.CREATING,

Copilot uses AI. Check for mistakes.
Comment thread reference/config.py
'uri': {'type': 'string'},
'access_date': {'type': 'string'},
'version': {'type': 'string'},
"full_text": {"type": "integer"},
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full_text field is defined with type 'integer' but should be 'string' since it contains text content, not numeric values.

Suggested change
"full_text": {"type": "integer"},
"full_text": {"type": "string"},

Copilot uses AI. Check for mistakes.

def form_valid(self, form):
self.object = form.save_all(self.request.user)
self.object.estatus = ProcessStatus.PROCESSING
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name estatus should be status based on the pattern of renaming fields from Spanish to English throughout the codebase.

Suggested change
self.object.estatus = ProcessStatus.PROCESSING
self.object.status = ProcessStatus.PROCESSING

Copilot uses AI. Check for mistakes.
Comment thread markup_doc/tasks.py
instance.content_body = stream_data_body
# Guardar el XML en el campo `file_xml`
#archive_xml = ContentFile(xml) # Crea un archivo temporal en memoria
instance.estatus = ProcessStatus.PROCESSED
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name estatus should be status to maintain consistency with the model field naming convention.

Suggested change
instance.estatus = ProcessStatus.PROCESSED
instance.status = ProcessStatus.PROCESSED

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants