A repository for code and issues related to efforts to develop and benchmark a successor for reasoner-pydantic to be used in the new Translator architecture.
The time to beat in the original implementation was 0.007s Small, 0.05s Med, and 1.1s Large. Additional implementation may lengthen these, but it should be kept close.
- Fix performance issues caused by
Attribute,RetrievalSource, and (less so)Qualifier - Write pytests to check that parsing and dumping works
- Go through and add examples/etc. using pydantic field annotations
- Link all concepts together as appropriate
- Create an
__init__.pyso all items may be imported directly - Add hash methods for everything which needs to be hashable
- Add utility methods to parity with reasoner-pydantic
- Create a utility mixin so each object can be parsed/dumped without creating a new type adapter
- Create advanced validation methods for higher levels of validation
- Add regex validation where it's used in the spec
- Add advanced validation and utility methods for CURIEs and biolink interactions
- Beta phase: implement in Retriever and find any friction points/missing functionality/etc.
- Get feedback on design decisions
- Bring test coverage up after initial feedback
- Create a PyPI package
In some sections (such as knowledge_type), only certain values are used by existing systems, despite the field being an open string. In these cases I have added the implictly defined enum so in TOM it is explicit, as this may help to catch early validation errors. This does technically bring TOM out of sync with TRAPI, so it's worth group consideration.
I avoided doing this in areas where an enum is implied, but am unsure if Translator-wide use respects (e.g. Response.status)
Python Literals are slightly faster for serialization, so internally, literals are used. Enums are still provided for ease of use.
Some properties are non-required, but default to an empty list. I'm setting these as None, to save serialized space. This is in-line with intended TRAPI 2.0 changes, and doesn't break interoperability.
Hashing is used in several cases, including KG normalization. We use stablehash to ensure hashes are stable, and output hashes as unpadded base64url, with 120 bits truncation, by default. This offers a nice tradeoff of collision safety and hash shortening; all hashes are exactly 20 characters long. It does reduce the immediate recognizeability and readability of hashes to use base64, though, so I thought to raise it for discussion.
- Extra fields do not contribute to hashes
- Knowledge Node hash does not take
attributesorcategoriesinto account - MetaAttribute hash does not take into account name fields
- Message does not auto-normalize, and results do not auto-merge. You have to manually call the appropriate methods.
- BiolinkEntity, BiolinkPredicate, and BiolinkQualifier are now sub-types on the Biolink utility class.
- This causes one issue: BiolinkPredicate and BiolinkEntity don't show up the JsonSchema generated from these models (but the patterns are preserved )
There's initial tooling for semantic validation. It's not complete, but the general idea is:
- Import
from translator_tom import semantic_validate - Call
semantic_validate(<some model>) - Get back a list of warnings and a list of errors, with clear and specific descriptions and location tuples
This may end up being seen as redundant given the Reasoner Validator, but it was quick to prototype and could serve slightly-tanget use-cases (or become a core which the validator wraps around).