Skip to content

Better handling of attempted charges #3

@jeancochrane

Description

@jeancochrane

There are two primary ways in which attempted charges are represented in our messy data:

  1. An (att) suffix is appended to the charge, like 430-6/5-2a1 (att)
  2. A separate charge code, 720-5/8-4, is prepended to the charge, like 720-5/8-4 720-570/402

Pattern 1) is relatively simple to detect, and we have a dedicated label for it in the ILCS parser. Pattern 2) is harder to detect since it comes out looking like two entirely separate charges. In addition, the existence of the two different types of patterns makes it difficult to match against the canonical set, since we can only have one representation of an attempted charge in our canonical set.

There are a few ways I can think to approach this:

  • Data preprocessing: Check for the code 720-5/8-4 in the messy data and replace it with the (att) suffix. This will work well for cases of pattern 2) where there are two separate charges, but unfortunately sometimes the attempted code is the only recorded charge, and I'm not sure how those cases will behave. It'll also be tricky because the instances of 720-5/8-4 are not all formatted the same way.
  • Update the parser to catch the charge prefix: Try to get the parser to parse 720-5/8-4 as a separate label indicating an attempted charge. This is potentially the most semantically correct way to handle things but also seems difficult from a parsing perspective.

What do you think @fgregg?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions