Skip to content
regetz edited this page Oct 13, 2025 · 7 revisions

Concept names

This applies to both plants and communities

  • What is special about the name attached directly to the concept record?
  • Given that most concepts have multiple usages, and each usage has a name, what is the expected relationship between the (directly linked) concept name and the usage names?

Concept status

This applies to both plants and communities

  • Presumably at the time of (new) concept upload, we only need to support one status record per (new) concept
  • Do we need to provide a way to create a new status record? Who and when? What happens to the old one?
  • In the case of multiple status records, do we ever need to return more than one? Is just the most recent one sufficient?
  • What if it's a "historical" query, i.e. could there be a desire to known what the status was sometime in the past, like at the time of observation record creation?

Community correlations

The commCorrelation table (currently 34,377 records) documents a particular relationship between two community concepts. Questions:

  • Why does commCorrelation link commConcept<-->commStatus, rather than commConcept<-->commConcept?
    • Note: In all current cases, the referenced commStatus points to a different commConcept than the one referenced directly by the commCorrelation record (which makes sense, but isn't enforced by the DB). Moreover, there are no cases where
  • Does it matter which one is linked via commConcept and which as commStatus?
  • What do we want to store here?
  • Do we need to support adding new correlations independently of adding new concepts, or is sufficient to expose this only at the time of adding new concepts (which can, via the loader tables, be correlated either to existing concepts in VegBank or to other new concepts being loaded).

How much can we simplify what we party information we load and store?

  • Ditch partyMember table. It has zero records.
  • Do we need all of the columns of party (which currently has 4,038 records)?
    • organizationname, surname, givenname, and maybe middlename all seem reasonable
    • email is populated for 324 parties
    • currentname_id is always null
    • salutation is null in all but 1 case
    • contactinstructions is null in all by 4 cases, 3 of which are organization URLs and 1 is uninformative
    • partytype is "person" in 102 cases, and null otherwise
    • What do we want to do with partyPublic?
    • Add ORCID field?
    • Add ROR field?
  • What address information do we want to capture, if any?
    • Current schema has separate address table with 622 entries
    • No parties have more than one address, which means current data could be a captured in the party table
    • 268 parties have no email in the party table, but do have email in the address table. (Plus 13 records with email in both tables; in all cases, the two email addresses are identical)
    • phonetype is usually "work"/"Work", less often "Not specified" and "Cell" in 2 cases
  • What telephone information do we want to capture, if any?
    • Current schema has separate telephone table with 404 entries
    • No parties have more than one telephone, which means current data could be a field in the party table
    • phonetype is usually "work"/"Work", less often "Not specified" and "Cell" in 2 cases

Denorm / calculated fields

The original VegBank application maintains a number of denormalized and calculated fields for performance reasons, beyond what is documented in the ERD. We need to either live without them, or introduce logic somewhere for keeping them updated.

  • Embargo info
    • plot.emb_plot
    • observation.emb_observation
    • disturbanceobs.emb_disturbanceobs
    • soilobs.emb_soilobs
    • commclass.emb_commclass
    • classcontributor.emb_classcontributor
    • comminterpretation.emb_comminterpretation
    • taxonobservation.emb_taxonobservation
    • taxonimportance.emb_taxonimportance
    • stemcount.emb_stemcount
    • stemlocation.emb_stemlocation
    • taxoninterpretation.emb_taxoninterpretation
    • taxonalt.emb_taxonalt
  • Plant concept info
    • plantconcept.plantname (set from plantname)
    • plantconcept.d_currentaccepted (set from plantstatus where (stopdate is null OR stopdate>now()) AND lower(plantconceptstatus)='accepted'))
    • plantusage.plantconcept_id (set from plantstatus)
    • plantusage.plantname (set from plantname)
    • plantusage.party_id (set from plantstatus)
  • Plant concept info
    • commconcept.commname (set from commname)
    • commconcept.d_currentaccepted (set from commstatus where (stopdate is null OR stopdate>now()) AND lower(commconceptstatus)='accepted'))
    • commusage.commconcept_id (set from commstatus)
    • commusage.commname (set from commname)
    • commusage.party_id (set from commstatus)
  • Observation denorms
    • comminterpretation.commname (set from commname)
    • plot.country: set to min(placename) from namedplace via place where placesystem='area|country|territory'
    • plot.stateProvince: set to min(placename) from namedplace via place where placesystem='region|state|province'
    • stratum.stratummethod_id: set to stratummethod_id from stratumtype
    • stratum.stratumname: set to stratumname from stratumtype
    • taxonimportance.stratumheight: set to stratumheight from stratum
    • taxonimportance.stratumbase: set to stratumbase from stratum
  • Observation counts and one date
  • Taxon observation plant interpretations
    • taxonobservation.int_origplantconcept_id set to min(plantconcept_ID) from taxoninterpretation
    • taxonobservation.int_origPlantSciFull set to sciname from temptbl_std_plantnames based on int_origplantconcept_id (classsystem 'Scientific', party_id 511, usage stop is NULL)
    • taxonobservation.int_currPlantSciNameNoAuth set to scinamenoauth from temptbl_std_plantnames based on int_origplantconcept_id (classsystem 'Scientific without authors', party_id 511, usage stop is NULL)
    • taxonobservation.int_currPlantCode set to code from temptbl_std_plantnames based on int_origplantconcept_id (classsystem 'Code', party_id 511, usage stop is NULL)
    • taxonobservation.int_currPlantCommon set to common from temptbl_std_plantnames based on int_origplantconcept_id (classsystem 'English Common', party_id 511, usage stop is NULL)
  • Observation original comm interpretations (first one created)
    • observation.interp_orig_ci_ID set to comminterpretation.comminterpretation_ID via commclass with min(classStartDate)
    • observation.interp_orig_cc_ID set to commconcept_ID from comminterpretation using interp_orig_ci_ID
    • observation.interp_orig_sciname set to sciname from temptbl_std_commnames using interp_orig_ci_ID
    • observation.interp_orig_code set to code from temptbl_std_commnames using interp_orig_ci_ID
    • observation.interp_orig_party_ID set to party_ID from classcontributor via comminterpretation using using interp_orig_ci_ID LIMIT 1
    • observation.interp_orig_partyName set to party_id_transl from view_party_transl using interp_orig_ci_ID
  • Observation current comm interpretations (last one created based on commclass.classstartdate)
    • observation.interp_current_ci_ID set to comminterpretation.comminterpretation_ID via commclass with max(classStartDate)
    • If interp_current_ci_ID == interp_orig_ci_ID, set curr to what was just set for orig everywhere (cc_ID, sciname, code, party_ID, partyname), else repeat the 5 updates above using interp_current_ci_ID instead of interp_orig_ci_ID
    • observation.interp_bestfit_ci_ID set to comminterpretation_ID from [view_comminterp_more] LIMIT 1 ordered by classfitnum (asc), classconfidencenum (desc)
    • If interp_bestfit_ci_ID == interp_orig_ci_ID, set curr to what was just set for orig everywhere (cc_ID, sciname, code, party_ID, partyname), else if interp_bestfit_ci_ID == interp_current_ci_ID use what was just set for curr everywhere, else repeat the 5 updates above using interp_current_ci_ID
  • Observation top taxa
    • observation.topTaxon2Name set to coalesce(int_currplantscinamenoauth, authorplantName) from view_taxonobs_withmaxcover ordered by maxcover limit 1 offset 1
    • observation.topTaxon3Name set to coalesce(int_currplantscinamenoauth, authorplantName) from view_taxonobs_withmaxcover ordered by maxcover limit 1 offset 2
    • observation.topTaxon4Name set to coalesce(int_currplantscinamenoauth, authorplantName) from view_taxonobs_withmaxcover ordered by maxcover limit 1 offset 3
    • observation.topTaxon5Name set to coalesce(int_currplantscinamenoauth, authorplantName) from view_taxonobs_withmaxcover ordered by maxcover limit 1 offset 4
    • observation.topTaxon1Name set to coalesce(int_currplantscinamenoauth, authorplantName) from view_taxonobs_withmaxcover ordered by maxcover limit 1 offset 0
    • observation.latitude potentially round reallatitude depending on confidentialitystatus (see here)
    • observation.longitude potentially round reallongitude depending on confidentialitystatus (see here)
  • (more TBD? -- see starting here)

Clone this wiki locally