Document created on: July 5, 1996

A Description of the Pacific Fisheries Information Network (PacFIN) 1981-1996

6.1 Introduction
6.2 PacFIN 1981 - 1987
6.2.1 The Initial System for PFMC Groundfish
6.2.2 Foreign and Joint-venture Data
6.2.3 Incorporating ADFG Data
6.2.4 Data Series Merger
6.2.5 PacFIN Salmon
6.2.6 The Quota Species Monitoring (QSM) Subsystem
6.2.7 NMFS/AKR as a Data Source
6.3 PacFIN after 1987
6.3.1 Re-definition Project - Specification Process
6.3.2 Re-definition Project - Development
6.3.3 Vessel Summaries
6.3.4 Re-development in Unix/Oracle
6.3.5 Limited Entry Tables
6.4 The PacFIN System
6.4.1 Overall Data Flow
6.4.2 PacFIN Data
6.4.2.1 Tables: sp, ar, gr, pc, ag, cl
6.4.2.2 Tables: asp, aar, agr, apr, apc
6.4.2.3 Tables: ft, ftl, sv
6.4.2.4 Tables: acm, scm, ecm, aw
6.4.2.5 Tables: sc, dc, se, de
6.4.2.6 Tables: cg, nv, ul, dl
6.4.3 Central Processing - Update
6.4.4 Central Processing - Retrieval
6.4.5 Quota Species Monitoring (QSM) Subsystem
6.4.6 Build Vessel Summaries Subsystem
6.4.7 Data Completeness
6.4.8 Confidentiality of Data

Referenced diagrams and tables:

Diagrams:
- PacFIN Data Flow
- PacFIN Database Structure Chart
Tables:
- PacFIN Report: Column Descriptions
- PacFIN Report: General Code Lists

6.1 Introduction

Since 1974, the Pacific States Marine Fisheries Commission (PSMFC) has worked actively with its member States and Federal fisheries agencies to improve the quality and timeliness of fisheries data collection, processing, and analysis and to produce regionally coherent data summaries required for regional conservation and management purposes. This effort had its formal inception in recommendations of albacore fishing industry leaders that Pacific Coast fisheries agencies organize and consolidate into coastwide coherent form West Coast fish landings and effort data, and information concerning vessel characteristics, for all Pacific Coast fisheries. Those leaders recognized that summation of separate State summaries can lead to serious misconceptions because of major differences in definitions and methods and that any such summations may produce misleading conclusions for highly mobile fisheries.

This coastwide data coordination and consolidation effort received major impetus from enactment of the Magnuson Fisheries Conservation and Management Act (MFCMA) of 1976, which established Regional Fishery Management Councils charged with management of regional fishery resources as units throughout the geographical range of the species on the basis of the best available scientific and statistical information. It was clear that regionally comprehensive and coherent fisheries data were needed on a timely basis to provide the information base required by these Regional Fishery Management Councils.

Regional fisheries data coordination requires effective cooperation and mutually supportive interactions among State fisheries agencies, which on the Pacific Coast collect all commercial catch statistics from domestic fishers who land their catch at shoreside ports in the United States, and among Pacific area National Marine Fisheries Service (NMFS) Regions and Centers, which have responsibilities for collection of all data for fisheries that operate in the MFCMA Exclusive Economic Zone (EEZ). To assure effective communication and cooperation among those State and Federal entities, the Pacific area has been served since 1974 by a sequence of regional coordinating committees comprised of representatives from those participating agencies. First, there was the Albacore Coordination Committee and its Data System Task Group, which was superceded under NMFS sponsorship by the Coastwide Data Task Force, then by the Committee on Goals and Guidelines for Regional Fisheries Data Collection, and restructured in 1980 as the Pacific Coast Fisheries Data Committee (PCFDC).

The PCFDC consists of 13 members appointed by the directors of the participating agencies. The participating agencies are: the Alaska, California, and Idaho Departments of Fish and Wildlife; the Oregon and Washington Departments of Fish and Wildlife; the six Centers and Regions of the NMFS; the Pacific and North Pacific Fishery Management Councils (PFMC and NPFMC); and the PSMFC. The member appointed by the Alaska Fisheries Science Center (AFSC) represents both the AFSC and the Northwest Fisheries Science Center.

This Data Committee was chartered in 1980 with four stated goals:

To implement and manage a Pacific Fisheries Information Network that aggregates detailed and summarized State and Federal fisheries data for use by fishery managers and associated agencies;
To provide data management consultation and technical advice to the Councils' Management Teams and participating agencies upon request;
To establish priorities and coordinate plans to improve the efficiency, effectiveness and timeliness of the data acquisition and delivery with a minimum of unnecessary duplication; and
To promote the development and implementation of coastwide data collection standards to facilitate the merging of fisheries data into the PacFIN system.

Another very important agreement specified in this PCFDC charter was that the NMFS would pay the necessary travel and per diem costs for members not supported by other Federal funds to attend meetings, and the salary, travel and other normal costs necessary to support any designated staff and consultants. One of the vehicles used to implement this funding support of the PCDFC and PacFIN by NMFS was a Memorandum of Understanding (MOU) drawn up in 1981 between the Northwest and Alaska Fisheries Center (NWAFC) and the PCFDC. This MOU basically laid out the responsibilities of the NWAFC in terms of providing office space for the PacFIN staff and PacFIN's use of the NMFS computer system(s) in Seattle. Since the NWAFC was separated into two agencies, the AFSC has continued to honor the terms of this original MOU.

6.2 PacFIN 1981 Through 1987

6.2.1 The Initial System for PFMC Groundfish

In February of 1981 the PCFDC hired the System Designer/Manager to design and implement the PacFIN system. Prior to this the PCFDC had met approximately eight times over two years producing an initial requirements document that became the starting point for system development. One of the requirements was that the system would be operational within six months and another was that input data would be provided to the central database monthly on the 15th of each month and that the data for the month ending 15 days earlier would be 90% complete and all earlier months would be more complete than the most recent month. Confidentiality of data was discussed at the February 1981 PCFDC meeting. The issue of confidentiality was viewed as formidable. The consensus of the committee was to avoid the confidentiality issue by specifying a system that only required data aggregated, or reduced, to some reasonable and useful higher level. Specifically, individual fish-tickets and vessel registration records were ruled out of consideration as input to the central database. The PFMC's Groundfish Management Team (GMT), originally called Plan Team, specified two initial reports as their primary retrieval requirements. Their requirement was one report displaying monthly catch by species by INPFC area and another report that would retrieve monthly catch by species by data source, including foreign countries and joint-ventures. A system specification was produced by May, and a number of specification review meetings were subsequently held resulting in modifications to the original specification. Development proceeded and the initial implementation of the PacFIN system was operational in October 1981.

The system was developed on the Burroughs B7800 system owned, operated, and maintained by the Office of Fishery Information Systems (OFIS) of the NWAFC. The DMSII database management system and the Algol programming language were the primary tools used to build this 1981 system. When the system went on-line in October it included a single input transaction type that contained the following data elements: trans-type, input-aggregation level, year, month, day, species, area, gear, port/country/JV, weight-of-catch, number-of-landings, number-of-fish, and dollar-value. The transactions provided by WDFW, ODFW, and CDFW (W-O-C) were daily aggregated transactions, while those provided by the NWAFC were weekly aggregates. Trans-type, originally intended to have the values: +, *, -, indicating an add, change, or delete operation, was changed almost immediately to: &, -, indicating only two operations would be performed: change-or-add and delete.

The data elements species, area, gear, and port/country/JV allowed the data providers to specify a particular species (or collection of fish), a particular catch area, a particular gear that was used, and a particular port-of-landing where the fish were delivered, or a foreign country or joint-venture enterprise, to whose vessels the catch was delivered. One of the very important efforts during this initial six-month process was the establishment of a set of coastwide (i.e. systemwide) PacFIN codes for species, areas, gears, and ports/countries/joint-ventures. Since each data source had their own coding system it was deemed critical that the PacFIN system be based on a set of codes that would have the same meaning throughout the geographical range that PacFIN was intending to address as well as across all time periods (i.e. annual data sets) that were envisioned to some day be included in the PacFIN database.

The PFMC's GMT's reporting requirement for reports that would display catch for each groundfish species by each INPFC area and data source was soon expanded to include species-by-gear, species-by-port, and species-by-month reporting requirements. It became very apparent that in order to achieve this kind of reporting, data would need to be summarized as it was received from the data sources and stored in an on-line summary table. This summary-catch table included the following data elements: year, aggregation-level, period, species-id (spid), area-id (arid), gear-id (grid), port/country/joint-venture (pcid), pounds, pounds-that-were-priced, and estimated revenue of pounds. This summary-catch table is essentially a five dimensional array that allows for the storage and retrieval of any combination of period, spid, arid, grid, and pcid. The earliest summary-catch tables contained about 600,000 rows per year. In this summary structure a period could be any month or year. A spid could be a species, a species complex, or management group. An arid could be a PSMFC area, an INPFC area, or all PFMC areas combined. A grid could be a gear, a gear group, or all gears combined. And a pcid could be a port, port group, all ports for a state combined, all ports for all states combined, a foreign country, all foreign countries combined, all joint-ventures combined, all foreign countries and joint-ventures combined, or all ports for all states and all foreign countries and joint-ventures combined. The first two reporting programs, rpt/area and rpt/source, used this summary-catch table to produce the first reports: report #001 PFMC Groundfish by INPFC areas; and report #002 PFMC Groundfish by Source in October 1981.

In addition to the rpt/area and rpt/source programs, the rpt/gear, rpt/port, and rpt/period programs were developed by October 1981. These three additional programs provided the PFMC GMT with report #009 Groundfish by Gear Group, report #010 Groundfish by Port Group, and report #013 Groundfish by Month. All of these first reports (001, 002, 009, 010, and 013) were coastwide reports that retrieved and displayed coastwide, or PFMC-wide, information. Since the database contains summaries for essentially every combination of period:spid:arid:grid:pcid, it was a natural and relatively simple extension to enhance the reporting system to produce similar reports specific to each data source. These enhancements were made and production and distribution of agency specific reports ensued. These agency specific reports were originally intended as feedback to PacFIN agency coordinators so that they would be able to compare PacFIN compiled summaries with their own agency generated statistics. But almost immediately these state specific reports became the primary information source for some agency managers, biologists, and economists.

This set of five programs became the primary PacFIN reporting system. The capabilities of this retrieval system continued to be enhanced as new functions and features were suggested by PacFIN clients or were deduced as a result of day-to-day interactions with those users. Many of the extensions and enhancements that were made in the first few years were a direct result of suggestions and requests made by ODFW personnel.

In January of 1982 the PFMC GMT requested that a count of fish-tickets classified as groundfish, pink shrimp, etc. be included in the central database as there were no other measures of effort that were readily available and that were comprehensive in scope (i.e. W-O-C coastwide). This count of fish-receiving tickets became known as deliveries and it was specified that these deliveries would be aggregated by each data source by management group, area, gear, and port. Some of the data sources were able to develop the requisite software rather quickly and the PacFIN central processing system was enhanced to handle this additional data. However, coastwide reporting of deliveries information did not commence until March of 1987 when all PFMC data sources were able to provide groundfish deliveries to the central processing system. Nevertheless, deliveries statistics by management group became an integral part of the PacFIN system starting in 1982, even though reporting of this information was relegated to agency specific reports for the first five years.

6.2.2 Foreign and Joint-venture Data

In October of 1982 an initiative to include the foreign and joint venture catch for the NPFMC was started. The data source for this data was the NWAFC. This effort had its roots in the fact that some fishery scientists that participated in the PFMC fishery management process also were involved with NPFMC fishery management. These users wanted the same kinds of reports for the NPFMC that they were use to receiving for the PFMC. In addition, the PCFDC charter includes the NPFMC as one of the two councils that the PacFIN was intended to support. It was known at the start of this effort that ADFG had elected not to participate, but since the vast majority of the groundfish was being harvested in the foreign and joint-venture fisheries it was decided to proceed without ADFG's involvement. The data provided by NWAFC was weekly aggregates of landed catch by species, area, gear, and foreign country or joint venture. This expansion was completed by the end of November 1982 and included data starting in 1981.

6.2.3 Incorporating ADFG Data

In April of 1983 efforts to include ADFG data in the central database in order to round out the NPFMC portion of the database were under way. Again, this exercise was truly an expansion since most of the effort went into developing new codes for species, areas, ports (actually sub-regions), and gears. Only minor software changes were required. ADFG provided their data in April of 1984 and they provided a complete set of aggregated catch transactions for 1981 through 1983 at that time. The most significant deviation from the W-O-C data was that since ADFG did not include the port-of-landing in their fish ticket data file the ADFG sub-regional office in which the ticket was received and entered was used in lieu of port-of-landing. ADFG provided essentially the same data elements as the W-O-C data sources except that their transactions were monthly aggregates instead of daily.

The incorporation of ADFG aggregated catch data within the PacFIN central database was an important event for NPFMC fishery biologists, but it soon became apparent that this event was even more important for fishery economists. The PacFIN central database had essentially become the primary, and sometimes the only, source of economic information for the PFMC and NPFMC fisheries. The PacFIN database now included catch statistics for all fish harvested in U.S waters (0-200 miles) from California to Alaska.

6.2.4 Data Series Merger

In February of 1983 an analysis was performed to determine whether the PSMFC Data Series and the PacFIN system were both necessary. The Data Series consisted of a set of tables (hard copy) going back to the 1950's that included catch for groundfish and shrimp species. This data was submitted annually by the Canadian Department of Fisheries & Oceans (DFO) and the four west coast U.S. marine fishery agencies. The analysis indicated that with certain enhancements to the PacFIN system, the PSMFC Data Series system could be eliminated. The required enhancements were: groundfish trawl-hours by month by PSMFC area; logbook adjusted catch by month by PSMFC area; logbook adjusted species composition; and the inclusion of DFO data in the PacFIN database. The first specification to incorporate these changes was drawn up in April of 1984. This proposal was presented to the Technical Sub-committee (TSC) of the U.S.-Canada Groundfish Committee at their annual meeting in June of 1984. During 1984 both the PCFDC and the TSC endorsed this major improvement to the PacFIN system. Additional meetings were held in September of 1984 and in April of 1985 to work out the details of this specification change. At the close of the April 1985 meeting, all U.S. data source agencies agreed to the enhanced specification, which by this time included grade (or size) and condition. One agency estimated that these enhancements could be implemented and the data provided by May of 1986, other agencies estimated November of 1986 for compliance, and one agency estimated no earlier than November of 1986 and most likely much later.

With this agreement finalized, the DFO was invited to participate in the PacFIN as a data source and as a user. The DFO enthusiastically agreed to participate and the first DFO data (1981 through 1984) was incorporated into the PacFIN system in May of 1987. A schedule for submitting annual, and eventually semi-annual, data feeds was established with the DFO PacFIN coordinator at that time and DFO has continued to provide the agreed-to data semi-annually.

By May of 1987 only two of the five agencies involved were able to provide logbook adjusted catch data and trawl hours by PSMFC area and month as agreed. The Data Series Merger project was never completed in its original agreed-to form. This Data Series Merger was the first real failure for the PacFIN project. Although this project was overall a failure a few important successes did result: DFO became a PacFIN data source and user; starting in 1988 the PacFIN system began providing an annual report to the TSC consisting of landed-catch of groundfish species by INPFC areas for the entire U.S.-Canada domestic harvest; and this failure was one of the main events that set the stage for the conceptualization and implementation of the re-definition project.

6.2.5 PacFIN Salmon

At the direction of the PCFDC in February of 1983 the first PacFIN system specification to address salmon was distributed to members of the PCFDC, the PFMC Salmon Team, data source agencies, and other potential users. This proposal to incorporate W-O-C salmon data into the central PacFIN database met with stiff resistance. The very first meeting with the PFMC salmon team on this subject occurred in October of 1981; the first proposed specification was presented to the PCFDC in December of 1981; and during the first part of 1982 the Inseason Salmon subsystem within the PacFIN Office was developed and put into production use. So the PCFDC and PacFIN were no strangers to the enormous expenditure of effort needed to accomplish anything within the salmon arena. Nevertheless, members of the PCFDC were surprised by this resistance. No real progress was made on this project until it was decided to establish users other than the PFMC Salmon Team as the targeted users of the future salmon-enhanced PacFIN system. In addition, re-packaging the project as an "historical" salmon database so that there was no longer any reference to "inseason" was instrumental as well. Likewise, abandoning a full-blown system, including recreational catch data, to support PFMC salmon activities and instead settling in on the minimum extensions necessary to the existing PacFIN groundfish system proved to be beneficial to the process.

Then things got on track and in September of 1984 agreement was reached by the PCFDC's salmon subcommittee on an enhanced specification to incorporate salmon data into the PacFIN central database. And the W-O-C agencies agreed to provide one year of data (1983) by January 1985 for testing purposes. This enhanced specification specified the same kind of daily aggregated catch (AGC) transaction as was already in place for groundfish with the addition of size/grade, condition, and participant-group (Indian commercial and non-Indian commercial). Days fished was added to the aggregated effort (AGE) transaction, which already included deliveries and trawl-hours. This project overlapped/coincided with the Data Series Merger project and so the two newly augmented specifications evolved into a single expanded specification that satisfied both projects. By October of 1985 all three W-O-C agencies had provided the agreed-to 1983 AGC salmon transactions and two of the W-O-C agencies had provided the required AGE transactions. At the October 1985 subcommittee meeting, after reports displaying the new 1983 salmon data were reviewed with much discussion, all potential users that were present viewed the 1983 pilot project as successful and it was agreed by all parties to proceed with full implementation (1981-1985). By March of 1987 two of the W-O-C data sources had provided completed data sets for salmon for 1981-1986 and these were successfully incorporated into the central PacFIN database. One of the data sources did provide their data, but it was subsequently discovered that the number-of-fish statistic had been computed incorrectly. This incorrect computation for number-of-fish plagued the PacFIN project for many years to come, and still does to a certain extent.

Incorporating salmon data into the PacFIN central database turned out to be a very difficult and expensive proposition. However, there were benefits. While the first PacFIN system was operational (OCT81 through MAR95; data years: 81-94; also know as: FINDB) the PacFIN Office was able to respond to many requests for coastwide (W-O-C) salmon catch data. In fact it became known far and wide (mostly by economists) that PacFIN was the place to go for comprehensive W-O-C commercially caught salmon pounds and revenue by state, port, gear, species, and/or month.

6.2.6 The Quota Species Monitoring (QSM) Subsystem

The QSM subsystem had its beginnings in December of 1983 following the PFMC meeting held the previous month. At the November 1983 PFMC meeting, individuals from the fishing industry requested that there be made available to them estimates of how much cumulative catch had already been recorded toward the annual quota, or guideline, for widow rockfish, sablefish, and other quota species. The industry asked that this best estimate of catch be updated monthly or possibly weekly. The industry's primary purpose in requesting this capability was for their own production planning. Although the PacFIN system had been in place and routine reports were being produced and distributed monthly, the total cumulative catch contained in those reports was not very current - not current enough for industry's needs. Of the three W-O-C data sources the agency with the most current data lagged by 15 to 45 days, depending on the day of the month the data was retrieved from the central database. The agency with the least timely data had a lag time of anywhere from 4 to 5 months. The data input frequency to the PacFIN central database was (and still is) once per month, nominally on the 14th of the month.

So out of this December 1983 meeting came an agreement that one member of the PFMC GMT would compile weekly catch reports for widow rockfish, POP, yellowtail rockfish, sebastes complex, and sablefish phoned in by one individual for each agency. On a weekly cycle the GMT volunteer combined the weekly catch reports with data from the PacFIN database to produce the "Best Estimate of Catch" report through the previous Saturday. The weekly catch reports were multiplied by some correction factor based on the assumption that the catch that had been collected by each port biologist was something less than the actual catch for that week. In the beginning the correction factors were quite rough, but as a base of weekly catch reports was accumulated improved correction factors became available. Eventually correction factors were computed by comparing annual summations of weekly reported catches to comparable data in the PacFIN central database. This manual method continued on in this manner from January 1984 through October 1985.

In August of 1985 it was decided that the time was right for an automated implementation of this QSM subsystem. The QSM software development commenced in September of 1985 and was completed by the end of October. The first reports from the automated system were produced in November of 1985. The very short development time for this project was a direct result of the system having been implemented as a manual system and operated in this manner for more than one year. The greatest benefit of this "manual-before-automated" approach is that the processes are then well understood and the resulting specification for the automated system is very accurate on the first attempt. The QSM subsystem has become the single most important PacFIN software tool for the PFMC's inseason fisheries management. The original report from 1985 included only six species-area combinations for the three W-O-C agencies. By 1996 it has grown to 18 species-area combinations and other ancillary information has been added to and deleted from the report over the years.

Modifications and enhancements to this subsystem have become an annual event in order to incorporate changes, when possible, that reflect the ever-changing management regime of the PFMC. Some of the most significant changes/additions over the years have been: in June 1987 all sablefish unknown gear catch from the central database was apportioned based on proportions implicitly existing in the PacFIN database; dover sole and thornyheads were added in November 1990; in February 1993 coastwide thornyheads was changed to Columbia, Eureka, and Monterey only, coastwide sablefish became Vancouver, Columbia, Eureka, and Monterey only, and dover sole for the Columbia area was added in addition to coastwide; in December 1993 the Columbia area was divided into north and south for yellowtail rockfish only creating the Vancouver and North Columbia and Eureka and South Columbia management areas for yellowtail; and December 1994 Columbia, Eureka, and Monterey thornyheads were replaced by coastwide longspine and shortspine thornyheads.

6.2.7 NMFS/AKR as a Data Source

During the second quarter of 1986 domestic at-sea processors began to operate in the EEZ off of Alaska in significant numbers. Previously, only one or two at-sea processors operated in the NPFMC EEZ and the catch data for these one or two vessels was provided to PacFIN via WDFW since the catch (i.e. processed product) was off-loaded at the port of Seattle. The NPFMC groundfishery which had originally been primarily a foreign fishery, had evolved into a joint-venture fishery, and with this event was moving very quickly towards a completely domestic fishery. This new wave of at-sea processors were not off-loading their product at any domestic U.S. port but were instead shipping the processed product from the at-sea vessel directly to the foreign market. Due to these circumstances, the NMFS' Alaska Regional Office (AKR) became a data collection agency for domestic catch data. This was the first time that a west coast NMFS agency had become responsible for primary data collection of a domestic commercial fishery, although NMFS agencies had been responsible for data collection and/or estimating of catch in the earlier and co-existing foreign and joint-venture fisheries. One very interesting part of this new domestic catch data collection arrangement was that the economic value (i.e. revenue) of the fish caught and processed was not made a requirement within the federal regulations that authorized this federal data collection of domestic commercial fisheries data. Also, it was not made a requirement in spite of the fact the MFCMA requires that all Fishery Management Plans (FMPs) receive appropriate review from the social and economic view points. And it was not made a requirement even though this fishery, long targeted for total domestication, has been estimated to be valued at over $500 million per year in ex-vessel, unprocessed revenue.

The PacFIN Office was first alerted to the potential need to include the NMFS/AKR as a PacFIN data source when one of the prolific AFSC users of PacFIN data informed the PacFIN Office that the PacFIN reports being published at the time were essentially incorrect and misleading since they did not include this major component of the NPFMC fishery. And so the scramble was on to figure out what to do about the "missing" catch data. The solution was to accept a weekly data file from the NMFS/AKR and then transform the data into the standard AGC PacFIN transaction format so that data could be incorporated into the PacFIN central database using existing software. Sometime around July of that year NMFS/AKR had become a regular PacFIN data source. Incorporating NMFS/AKR as a data source included: developing the transformation software; adding new areas; modifying the reporting software minimally; and developing a new ex-vessel, unprocessed revenue estimating algorithm. This NPFMC revenue estimating algorithm/software was necessary since all estimates of revenue for the at-sea processor fishery were made using the average prices derived from the ADFG component of the database.

After this initial start-up event for NMFS/AKR, there were regular additions of subareas and new species categories added to the system at least annually. Over the years the only transaction type used by the NMFS/AKR data source has been the aggregated-catch (AGC), which allows the data source to input landed-catch by week for species or species assemblages, by catch-area and gear.

6.3 PacFIN After 1987

6.3.1 Re-definition Project - Specification Process

By the time 1988 arrived there were a number of concurrent events which pointed toward the need for a new and improved PacFIN system. First of all, the ability to provide input data to the central database at the PSMFC area level was never fully achieved on a coastwide (including Canada) basis. At least one agency, which was able to compute the necessary ACM and SCM proportions on a monthly frequency, was unable to combine this distribution of catch-to-area and catch-to-species data with the fish-ticket data in order to produce the aggregated catch transactions for input to the PacFIN system. Another shortcoming was the inconsistency of the PacFIN central database with the NMFS/SWR's Research Database (RDB). The RDB was, at the time, the only W-O-C coastwide data system that contained individual fish-ticket and vessel data. Researchers and fishery management professionals would attempt to perform analyses using data from both the PacFIN database and the SWR's RDB. These attempts to use both data systems in a single analysis often were unsuccessful. One of the main inconsistencies was that the RDB contained only landed-weight with no provision for a round-weight equivalent expansion factor, while the PacFIN central database contained only round-weight equivalent catch data. A third major problem was that for many years many PacFIN users were disappointed that rockfish market-categories were not included in the PacFIN database. It had been obvious for sometime that although most users were more than happy with the performance of the PacFIN system, these same users were becoming more and more disappointed by the content of the PacFIN central database. And finally, the PFMC made a specific request to the PCFDC to include all species in the PacFIN database that are commercially harvested. This PFMC request was initiated primarily by economists who need the whole picture when attempting to describe or analyze the economics of a particular harvesting sector.

All of these events led to the first proposal in July of 1988 that the PacFIN system specification be re-defined such that individual fish-ticket and catch-by-area (ACM) and species-composition (SCM) proportions be specified as input data for W-O-C data sources, replacing the existing aggregated-catch transaction. This issue was first taken up by the PCFDC in December 1988. The PCFDC appointed a subcommittee to study and refine the proposal and this subcommittee first met in March of 1989. At that first re-definition meeting many new ideas were proposed and discussed, but the two most significant proposals that expanded the scope of the project were that vessel registration files would be included and that the fish-ticket base data would contain "unreduced fish-ticket data." This "unreduced fish-ticket data" meant there would be no combining of individual fish-ticket-lines by the data sources and necessitated the incorporation of the agency-code-list table which translates W-O-C agency codes for species, areas, gears, and ports to coastwide PacFIN codes. And the proposal to include vessel registration files brought the age-old issue of confidentiality square into focus once again. After much discussion on this issue, over many months, it was decided that any confidentiality problems could be overcome, but to this date this confidentiality issue has never been entirely resolved for all W-O-C data sources. The proposal to include vessel registration files along with the fish-ticket data was essentially a proposal to merge the NMFS/SWR's Research Database project into the PacFIN system as part of this re-definition.

From April 1989 until October 1990 a number of revisions of the new re-defined system specification were distributed far and wide, suggestions were analyzed and incorporated if they were appropriate, and a couple of meetings of the re-definition subcommittee were convened. Finally, in October an agreement was reached, by all parties concerned, on the specification for the new system. Included in this agreement was a statement of the scope of the project including: each W-O-C data source would contribute data for all commercially harvested marine and anadromous species; and ACM and SCM proportions would be provided by all W-O-C data sources starting with the 1987 data year. This exclusion of ACM and SCM proportions for 1981 and 1986 meant that the agency-provided aggregated catch data for those years would be retained as the base, detail data for the best estimates of rockfish catch by area and species for 1981-1986. And conversely, the combining of ACM and SCM data with fish-ticket data would only occur for 1987 and later. With this agreement on the re-defined system specification accomplished, the full PCFDC authorized the implementation to proceed.

During the October 1990 re-definition subcommittee meeting where the final negotiated settlement was forged, the subcommittee addressed two significant questions: "Why is the re-defined system needed?" and "Why should all commercially harvested species be included?" The answers to these questions as developed at that meeting in October of 1990 are included here.

Why the Re-defined System is Needed

The need for WDFW groundfish catch data by PSMFC area. This level of detail is currently not available from either the central database nor from the SWR-RDB.
Fish-ticket data available on an in-season basis. Among other benefits this will allow the GMT to perform trip frequency analysis during the season. This capability is not available via either database.
Availability of historical fish-ticket data. Although the SWR-RDB does contain fish-ticket data for 1981-1988, implementation of the re-defined system would make this data available to users with less delay.
Ad-hoc retrievals of fish-ticket data are currently not available, but would be in the redefined system.
Lowest level species data. Although the SWR-RDB contains considerably more detailed fish-ticket data than the central database in Seattle, neither database contains unreduced fish-ticket data.
Single data feed from data sources (W-O-C) to a single central PacFIN database. The data providers consider this very desirable, and the user community would no longer have the problem of resolving the inconsistencies inherent in retrieving data from two different databases which are assumed to be consistent.
The lack of a round-weight conversion factor within the SWR-RDB makes it difficult to use data from the SWR-RDB and the MDB in concert.
Catch for individual rockfish species is required. W-O-C combine species composition data with fish-ticket data producing catch for individual rockfish species which are then input to the central MDB in Seattle. The SWR-RDB does not include species composition data.
The existing fisheries information resource consisting of two databases is an artificial split which was a result of confidentiality rules established in 1981. Those confidentiality rules have been revised and therefore this artificial separation is no longer necessary.
Groundfish catch by PSMFC area is also required. The SWR-RDB does not contain catch by area. The re-defined system will contain catch by area.

Why all Commercially Harvested Species should be Included

The recent PFMC limited entry proposal required analysis of vessels participating in the groundfish and other fisheries.
The PFMC does not manage fish directly. Instead the PFMC manages harvest units and therefore a complete view of all harvest units, within the PFMC region is required.
An all species database is required in order to perform regulatory impact reviews and economic analyses to meet the requirements of the MFCMA, Executive Order 12291, NEPA, and the Regulatory Flexibility Act.
Future projections of effort shift due to management measures requires an all species database.
Species other than groundfish are harvested with groundfish gear. The harvest of these other species is needed for analysis of various issues such as consistency of federal-state regulations.

The final agreement, including the scope of the project, did not please everyone. Of course, this was to be expected given the make-up of the redefinition subcommittee. All participants agreed that the scope, as agreed to, was large enough, since one of the objectives was to complete the implementation as quickly as possible. So a non-inclusive, non-prioritized list of additional desirable PacFIN datasets or subsystems was developed by the subcommittee and augmented by the full committee a month later. That list included:

Joint-venture tow (logbook) data for the W-O-C
ADFG vessel, fish-ticket, and composition data
NMFS/AKR catcher-processor data
Offshore processor data for the PFMC
Port sampling data for WDFW, ODFW, and CDFW
Domestic log book data for WDFW, ODFW, and CDFW
Vessel summaries for W-O-C

Of these seven only the vessel summaries for W-O-C has been implemented to date.

6.3.2 Re-definition Project - Development

With the two-year effort to re-define the PacFIN database concluding with the PCFDC's authorization to proceed with implementation, efforts began immediately on the development process. The PacFIN office, in conjunction with PSMFC management decided to employ an outside software developer to aid in this major project. Development began in earnest in January 1991. This development effort was the main focus of the PacFIN office for the next two and one-half years. And of course the operation and maintenance of the existing system continued concurrently while this development moved forward.

The first data to test the new re-defined system begin to arrive in June of 1991. As would be expected, there were a few specification changes, but by April of 1992 the new re-defined system was able to correctly process all of the ten new transaction types. The module that processed the fish-ticket-line (ftl) transaction type required the most development time and also part of this effort was the implementation of the new algorithm for estimating prices for those ftl transactions input without a price. This was a new algorithm and used actual prices exclusively and took into account price differences for different grades, landing conditions, dispositions, and other ftl characteristics. Loading all data back to 1981 commenced in April of 1992 and by April of 1993 all data for 1981 through 1991 had been incorporated into the new "Redef" database.

With the transaction processing portion of the system essentially complete, the focus turned to developing software to summarize ftl data and to combine that data with ACM and SCM data with the resulting data being stored in the summary-catch (SC) table. In January of 1993 the software developer delivered the first reasonable version of the summarization module that could be tested. Testing/debugging commenced and by May of 1993, the FTL-ACM-SCM summarize/combine software had been completely checked-out for two of the W-O-C agencies using 1987 data and all problems had been resolved. Summarizing 1988 through 1992 data commenced for these two agencies and was completed in October of 1993. The entire W-O-C 1987-1992 summary-catch data was finally available from the PacFIN central database in October of 1994.

While the FTL summarize software was being developed, the suite of programs that produced all of the standard reports was re-worked to operate using the new database. In addition new software was developed to generate various data extracts as specified by certain PacFIN users. One of the first significant uses made of the new system was by the NMFS/NWR's Groundfish Permit Office. Starting in February of 1993 and continuing on until the end of the year the staff of the NWR verified the fishing history for at least 950 groundfish permit applications using the new re-defined database.

6.3.3 Vessel Summaries

A number of other useful applications based on the new re-defined system were implemented long before the new system was completed in its entirety. One of these applications was the vessel summary subsystem. As the re-definition project was being developed the west-coast economists requested that a set of vessel summaries be generated for each year starting with 1981. This request, initially made by the NMFS/SWR economists in June of 1991, was reviewed and endorsed by all west-coast economists. These vessel summaries were intended to replace the vessel summaries that were part of the SWR's RDB. In addition to being a replacement these newly specified summaries provided considerably more detail. Due to a lack of staff resources, this project languished on the "to do" list until October of 1992 when development on this project was started. In addition, a request from the U.S. Coast Guard (USCG) was received providing additional impetus. The USCG request was similar in that summaries by individual vessel-id were requested, but it was considerably smaller in scope since management groups were requested instead of individual species. The USCG project was accomplished in less than a month and helped move the project toward full development the next spring. These summary files consist of two kinds of files: a vessel summary and a trip-principal file. The vessel summary file contains aggregates of pounds and revenue for each vessel-id and for 13 other dimensions, while the trip-principal file contains derived vessel characteristics such as principal port, principal gear, and principal species. This project was re-started in April and the first annual vessel summaries were made available to a few economists by June. The monthly and weekly vessel summaries were being distributed by September of 1993. After a few enhancements had been incorporated based on feed back from economists and others the vessel summaries project concluded in February of 1994 with little or no changes since then.

6.3.4 Re-development on Unix/Oracle

In May of 1993 the National Marine Fisheries Service (NMFS) concluded a ten year project to replace all of the NMFS' primary computing resources with homogeneous hardware and software by awarding a contract to install servers running the Unix operating system and the Oracle relational database management system (DBMS). What this meant to the PacFIN project is that it would be required to migrate the system from the Unisys B7900. This new system (Orca) was first made available to users like the PacFIN in February of 1994 but because of various difficulties with configuring the Oracle RDBMS, effective use of Oracle started in September of 1994. During March 1994 through June 1994 and April 1995 and June 1995 all members of the PacFIN staff received training in one or more of these operational or software development tools: the use of the Unix operating system; the Oracle RDBMS; the 'C' programming language; and the X-Window user-interface network software. This training was provided by the NMFS as part of the IT-95 contract award.

In order to assist in the re-development of the PacFIN system in the Unix/Oracle environment an outside Oracle developer was employed. This developer assisted in all phases of the re-development from October 1993 until February of 1996.

The PacFIN system resided on the same computer system from February 1981 until March of 1995, when the B7900 system was shut down permanently. Prior to this event the system had been re-designed for the Oracle environment, all necessary tables had been created, all data had been unloaded, transferred to the Unix system, and loaded into the Oracle PacFIN tables. All functions and features of the system had not yet been implemented, but all existing data was available for retrieval. Development and testing continued during 1995, with the QSM system coming on-line in June, transaction processing was operational in August, the AGC summarize portion of the system was functioning by January, and the FTL-ACM-SCM summarize/apportionment capability was completed in March of 1996. This last event brought the system to a reasonable level of completion.

As of this writing there are two major subsystems still in the process of being converted to operate on the Orca system: the vessel summary subsystem and the *_rpt suite of retrieval programs that generate standard reports. Both of these subsystems, once completed, will include capabilities otherwise not available to the PacFIN user community. While these two subsystems have been in development, retrieval routines employing the Oracle SQL*Plus query/reporting language have provided many of the capabilities included in these two subsystems. Because of these retrieval routines the PacFIN users with online access to the Orca system have had nearly all of the capabilities of the system available to them since April of 1996.

6.3.5 Limited Entry Permit Tables

In August of 1995 the PacFIN Office accepted the task of incorporating limited entry groundfish permit data into the PacFIN database. The groundfish permit data is collected from each applicant by the staff of the Permit Office of the NMFS/NWR and is stored in a computer system developed and maintained by that office. After meeting with the appropriate Permit Office staff in the fall of 1995, the necessary PacFIN tables were designed and populated with the existing limited entry permit (LEP) data. Since the size of the entire LEP history file is relatively small (less than 3,000 rows) it was decided to us a refresh technique rather than an add/change/delete transaction method. This means that the entire file of limited entry permits maintained by the NMFS/NWR is forwarded to the PacFIN Office monthly. All existing LEP data are deleted and the new data file is inserted in its place. This refresh process requires only minutes to complete. Although this refresh process includes some validation to catch erroneous data, the accuracy and quality of this data is entirely the responsibility of the NMFS/NWR.

6.4 The PacFIN System

6.4.1 Overall Data Flow

The PacFIN system receives all of its data from four state fishery agencies, three NMFS offices, the U.S. Coast Guard (USCG), and the Department of Fisheries and Oceans, Canada (DFO). The chart "PacFIN Data Flow 21JUN96" describes the complete PacFIN flow data from data sources to central database to data users. There are ten different transaction types and two kinds of data files used by these nine data sources to supply all of the data to the central database. The following table summarizes this data source vs. input transaction or data file relation.

Transaction Type or Data File	CDFW	ODFW	WDFW	ADFG	AKR	AFSC	DFO	USCG	NWR
Agency-code-list (ACL)	X	X	X
Fish-ticket (FT)	X	X	X
Fish-ticket-line (FTL)	X	X	X
State-Vessel (SV)	X	X	X
Species-composition (SCM)	X	X	X
Catch-by-area-composition (ACM)		X	X
Average-weight (AW)		X
Effort-by-area-composition (ECM)		X
Aggregated-effort (AGE)	X	X	X				X
Aggregated-catch (AGC)				X	X	X	X
Merchant Vessels of the U.S.								X
Limited Entry Permit (LEP) History									X

All data destined for the PacFIN central database arrives on Orca using one of five methods: internet FTP directly to the Orca system, initiated at either the sending or receiving end; a 9600 BPS (or faster) Kermit file transfer from a data source bulletin board to a PacFIN Office computer system and then an FTP to Orca; a diskette containing an MS-DOS ASCII file(s) sent via the U.S. mail, or hand carried, with the file(s) subsequently moved to Orca using FTP; an 8mm Unix tape containing an ASCII file with the transfer performed by the Orca operations staff; or 9-track tape, which has not been used for sometime.

The frequency of update varies depending on the data source. For the PFMC state agency data sources the update frequency is monthly while the AFSC, which is also a PFMC data source, the update frequency is weekly for the very short at-sea processor pacific whiting fishery. The ADFG and AKR data sources are weekly data providers. The DFO is scheduled to provide a preliminary data feed each May for the previous calendar year with a final update due in November. The NWR's LEP history file is provided monthly and USCG's vessel data file is obtained annually. These are the agreed upon update frequencies.

The update frequencies mentioned above are the nominal update frequencies and for some agencies the actual update frequencies diverge considerably from the expected update frequencies. The three W-O-C data sources are very consistent with regard to providing their monthly data feeds each month on, or about, the 14th of each month. And the NMFS/AKR continues to contribute their data on a regular weekly or biweekly basis. In opposition to this excellent pattern of complying with the agreed upon update frequency, the ADFG and DFO data sources represent the other extreme. The final 1993 DFO data feed was delivered in March of 1995 (due in November 1994), the first DFO 1994 data feed was provided in May of 1995 on schedule, but the final 1994 had not been received by the end of June 1996, likewise with the preliminary 1995 data feed due in May of 1996. Final data feeds for 1994 and 1995 were received by ADFG in March of 1996, but no data for 1996 had been received from ADFG as of the end of June 1996.

Data completeness of the PFMC PacFIN data varies from ODFW, which is normally 90-95 % complete 15 days after the end of each month to CDFW which is usually about 90% complete three and one-half months (105 days) after the end of each month. These completion estimate statistics are based on the total groundfish catch added each month and do not take into account any changes in the ACM and/or SCM data or possibly the total absence of ACM and/or SCM data.

6.4.2 PacFIN Data

All of the data supplied by all PacFIN data sources is validated, to some degree, and then stored in database tables. The diagram "PacFIN Database Structure Chart 24JUN96" lists all of these PacFIN central database tables and describes the basic relationships between and among these tables. The table titled " PacFIN Report: Column Descriptions" contains a description of each column in the PacFIN database. The reader is referred to this table for a detailed description of each column. The following paragraphs will attempt to describe each table in a more general sense.

6.4.2.1 Tables: sp, ar, gr, pc, ag, cl

The PacFIN code list tables: species (sp); areas (ar); gears (gr); ports-jvs-cntry (pc), agencies (ag), and code-list (cl) are all tables that are populated with data originating from within the PacFIN Office. The sp table contains one entry for each PacFIN species code or identifier (spid). The other columns, or attributes, in the sp table are used to group, order, or further describe each entry in this table. This table is used extensively by the validation, update, and retrieval portions of the system. As an example, when each ACL and AGC transaction is added to the system this table is inspected to ensure that each specified spid exists in the sp table. During the update process the summarized column determines whether summarized data will reside in the summary-catch (sc) table for that spid. Also during the update process the spid, complex, and mgrp columns determine the spid(s) that are maintained in the sc table. To one extent or another all of the columns in the sp table are used during one or more of the retrieval processes. However, all of the "...order" columns and all of the "...flag" columns are used exclusively for retrieval purposes. The "...order" columns are used as aids in ordering rows in specific reports and the "...flag" columns are a means for selecting particular spids for inclusion in certain reports. A complete list of all PacFIN species codes is produced on the Orca system using the "list_sp" sqlplus retrieval routine.

The ar table is structured very much like the sp table in that the columns are used for validation, update, retrieval, and general description purposes. There is however one very significant difference. Each entry in the ar table only contains the larger group (argroup) to which the arid belongs, whereas each sp row contains all of the groups or complexes that the particular spid participates in. Example: sp row where spid = DBRK contains complex = ROCK, mgrp = GRND, and complex2 = SBTS, while ar row where arid = 2C contains argroup = CL and ar row where arid = CL contains argroup = PC and ar row where arid = PC contains argroup = ALL. Another very important column in the ar table is type, which is used to group area ids within the same area system together, or to specify subareas within a system. Examples: ar row where arid = AS contains type = 5, while ar row with arid = 2A contains type = 1. Type = 1 specifies that the area is a PSMFC groundfish area, while type = 5 specifies that the area is a salmon area.

The gr table is similar to the ar table in that each row contains a grgroup column like the argroup in table ar. The pc table is structured like the ar table as well. The gr and pc tables are also similar to the sp and ar tables in that they are used in the validation, update, and retrieval portions of the system.

The ag table is a very short table containing one entry for each data source, or potential data source. The 90%-completion-estimate date (i.e. month) is maintained within this file.

The cl table is a table that contains 20 code-lists. Each code list corresponds (more or less) to a specific column in the database. All of the possible codes for columns such as condition, disposition, and grade are contained in this table. What characterizes these code lists and why they are all maintained in the same table is that their primary use is for validation during update and they have no grouping attributes like the sp, ar, gr, and pc tables. In addition a description of each code is contained in this table, providing some of the most important documentation of the data contained in this database.

Nearly all of the data that resides in PacFIN tables originated with one of the data sources or has been added for data management purposes. This category of data management data includes data elements: ulid and modified. An inspection of the description of all tables that are populated from input transactions will show that ulid and modified are contained in each of these tables.

The agency-code-list (ACL) tables consist of five tables that contain all of the relations that exist between state agency codes and PacFIN codes. The data source codes that do exist in these tables are those codes that reside, or potentially may reside, in the ft and/or ftl tables. At this time only the three W-O-C data sources supply PacFIN with ft/ftl data and therefore the ACL tables contain only W-O-C species, area, gear, port, and processor codes. All five tables are structured similarly. So a description of the asp table should serve to describe all five tables.

6.4.2.2 Tables: asp, aar, agr, apr, apc

Each row of the asp table contains an agency ticket-category and a PacFIN species code (i.e. spid). In addition each row includes a description of the code and the agency-id, which is used to qualify the ticket-category.

A note about the naming of columns and the meaning of species, species-code, market-category, ticket-category, and category is appropriate at this point. Although each W-O-C data source uses the terminology "species code" to refer to the identifier that identifies the biological mass that is the main subject of each ftl row, "species code" is really a misnomer. In only some of the cases does the "species code" contained in the ftl row actually refer to a specific, scientifically identifiable species (ex: Microstomus pacificus). Most state agency supplied "species codes" refer to some collection of harvested marine animals. The primary driving force determining the manner in which these assemblages of marine animals are harvested is the market and so these collections of species have come to be known as market categories. But, that handle even misses the mark, because a market category is also determined by the condition of the fish at landing, the disposition (i.e. market), the grade or size, and possibly other attributes. And so the column containing this state "species code" in the asp table was given the name of category during the re-definition process, meaning simply a category of fish/animals.

As mentioned, the aar, agr, apr, and apc tables are structured essentially the same as the asp table. And thank goodness we don't have the same problem with the meaning of columns area, gear, port, and processor that we have with category!

There is one last anomaly and it has to do with the apc table. The apc table is the only agency-code-list table that does not contain the relationship between a data source code and a PacFIN code. A set, or domain, of PacFIN processor codes has never been established. The set of processor codes, with each code's qualifying agency-identifier is in essence the set of unique coastwide W-O-C processor codes.

6.4.2.3 Tables: ft, ftl, sv

The ft and ftl tables are easily the largest tables that contain data supplied by data providers. The total number of rows, in thousands of rows, contained in these 32 tables is included below.

Number of Rows in the ft and ftl Tables as of June 26, 1996

Units = Thousands of rows

	FT				FTL
Year	WDFW	ODFW	CDFW	Total	WDFW	ODFW	CDFW	Total
Total	1,830	833	1,947	4,610	3,388	2,017	3,774	9,179
1981	138	67	144	349	281	158	285	724
1982	131	61	152	344	238	138	312	688
1983	110	48	113	271	210	114	231	555
1984	98	40	107	245	185	90	217	492
1985	113	54	117	284	225	121	232	578
1986	128	63	122	313	230	141	250	622
1987	146	70	129	345	283	176	255	714
1988	163	87	147	396	294	191	283	768
1989	150	77	132	360	280	174	257	710
1990	138	60	135	333	239	134	255	629
1991	122	50	131	303	226	115	244	586
1992	108	43	124	275	197	119	228	544
1993	103	37	123	263	191	120	221	532
1994	87	31	115	232	143	94	208	446
1995	77	34	119	230	136	98	228	462
1996	19	12	36	67	28	34	67	129

Although both the ft and ftl tables are logically one table each, the design that was considered optimum for this implementation called for a set of annually partitioned tables. So the logical ft and ftl tables consist of 16 physical tables each. The ft table contains one row for each landing receipt recorded by the fishery agencies of the states of California, Oregon, and Washington. Each ft row contains data that occurs at most once on each physical, hardcopy fish-ticket. Another way of characterizing the data contained in each ft row is that all of the data elements are trip specific, i.e. each data element describes some attribute of a fishing trip.

On the other hand, each row in the ftl table contains one entry for each market-category line that was recorded as part of each fish-ticket document. All of the data elements included in this table are essentially attributes of the ticket-category that was harvested. The estimated and worst_est columns in the ftl table are unique in that they are not data input by the data sources and they are not data management attributes. These two elements can be classified as "derived" attributes since the values for these columns are derived as a result of the update process for ftl transactions. The drvid column (meaning derived vessel id) in the ft table is developed in a similar fashion using the contents of columns veid and vesseltype in conjunction with data in the sv table.

The sv table nominally contains one row for each vessel registered to harvest fish commercially. There is one row for each vessel registered by each W-O-C agency for each year. If a vessel was registered by one state fishery agency each year then there would be 16 entries for that vessel, one for each year. If a vessel was licensed to fish commercially in all three W-O-C jurisdictions each year then there would be 48 entries in the sv table for this vessel-id.

It is known that for some of the earlier years the set of CDFW "registered vessels" is not necessarily the set of registered vessels for that year, but rather the set of registered vessels plus vessels registered in the subsequent calendar year but not in the year specified. This table is used primarily during the update process, but is also used when vessel attributes, such as length and name, are requested. The following table contains the number of rows in the sv table by year and agency.

Number of rows in table sv as of June 26, 1996

YEAR	WDFW	ODFW	CDFW	TOTAL
Totals	61,075	55,257	124,326	240,658
1981	5,768	4,865	9,947	20,580
1982	5,488	4,408	9,947	19,843
1983	5,264	4,130	9,713	19,107
1984	3,440	2,581	9,123	15,144
1985	4,455	3,146	8,437	16,038
1986	4,313	3,225	7,931	15,469
1987	4,384	3,898	7,151	15,433
1988	4,129	3,870	7,478	15,477
1989	4,137	3,774	7,476	15,387
1990	4,156	3,622	7,420	15,198
1991	3,755	3,567	7,614	14,936
1992	3,384	3,535	7,362	14,281
1993	3,212	3,092	6,801	13,105
1994	2,392	2,760	6,343	11,495
1995	2,142	2,563	6,136	10,841
1996	656	2,221	5,447	8,324

6.4.2.4 Tables: acm, scm, ecm, aw

The acm, scm, ecm, and aw tables collectively are referred to as proportion tables or simply proportions. Even though the aw table contains average weights, which are rates rather than proportions, the aw table is included in this group of proportion tables since it functions in much the same manner. The data contained in these tables, along with the ft and ftl data, is the part of the system that distinguishes the "re-defined" PacFIN from the earlier PacFIN system. All four of these tables include columns coeffvar and samples. The coeffvar is the coefficient of variance and was included in the specification with the intention of documenting the quality of the estimated proportion. It was soon learned that not all data sources could provide the coeffvar so the column "samples" was added. Samples is simply the number of samples, or observations, that were used to develop the estimated proportion. All proportion tables include year, month, day, and period. These four items are referred to collectively as "time-period" in the subsequent paragraphs.

The catch-by-area proportion (acm) table contains proportions that are used to distribute catch for a particular time-period, comptype, pcid, grid, spid, grade, and triptype to fishing areas specified in column arid. Of these seven attributes that describe a "strata" only the grade, and triptype may be empty or null. Grade is typically used only for sablefish and salmon species and triptype is used exclusively for salmon. Acm transactions are used by both WDFW and ODFW to distribute catch of groundfish species categories and they are used by ODFW to distribute catch of salmon species in a similar manner. The acm transaction was originally proposed in the context of distributing groundfish catch to PSMFC areas where individual vessel logbooks contain the basic or lowest-level data used to develop these proportions. The inclusion of acm proportions for salmon species was a result of the specification review process that uncovered the need to include these proportions in order to produce the "best available data" given individual fish tickets as the lowest level data. Below is a table that displays the number of acm rows by year and data source.

Number of Rows in Table acm as of June 26, 1996

YEAR	WDFW	ODFW	TOTAL
Totals	26,574	51,495	78,069
1987	2,260	6,968	9,228
1988	3,150	6,754	9,904
1989	3,229	6,895	10,124
1990	3,282	6,067	9,349
1991	3,332	5,500	8,832
1992	3,460	4,762	8,222
1993	3,173	5,652	8,825
1994	2,515	3,251	5,766
1995	2,173	4,414	6,587
1996		1,232	1,232

The species-composition (scm) table contains proportions that are used to distribute catch for a particular time-period, pcid, arid, grid, and unspid (unspecified species assemblage) to the species specified in column spid. Of these five attributes that describe a "strata" only the pcid or the arid may be empty, but not both. Scm transactions are used by all three W-O-C data sources to distribute rockfish assemblages to individual species. These proportions are developed from rockfish samples taken at selected west coast ports and/or from trawl logbooks. Below is a table that displays the number of scm rows by year and data source.

Number of Rows in table scm as of June 26. 1996

YEAR	WDFW	ODFW	CDFW	TOTAL
Totals	29,517	21,911	30,577	82,005
1987	462	2,501	2,193	5,156
1988	1,335	2,468	1,860	5,663
1989	2,347	2,320	1,731	6,398
1990	2,108	1,940	1,029	5,077
1991	2,421	2,200	1,624	6,245
1992	2,205	2,271	2,547	7,023
1993	6,162	3,663	4,978	14,803
1994	6,357	2,013	7,058	15,428
1995	6,120	1,956	5,483	13,559
1996		579	2,074	2,653

The effort-by-area proportion (ecm) table contains proportions that are used to distribute effort for a particular time-period, comptype, pcid, grid, mgrp, and triptype to fishing areas specified in column arid. This table parallels the acm table except that spid is replaced by mgrp, the subject of the table is effort instead of catch, and there are two possible measures of effort that are being apportioned: deliveries and days-fished. This table is used exclusively by ODFW for both the groundfish and salmon management groups.

Number Rows in table ecm as of June 26, 1996

YEAR	ODFW	TOTAL
Totals	13,652	13,652
1987	1,956	1,956
1988	2,048	2,048
1989	1,899	1,899
1990	1,689	1,689
1991	1,384	1,384
1992	1,049	1,049
1993	1,454	1,454
1994	835	835
1995	1,065	1,065
1996	273	273

The average-weight (aw) table contains average-weights that are used to compute an estimated number-of-fish for a particular time-period, pcid, arid, grid, spid, grade, and condition. This table is used exclusively by ODFW for salmon species and Columbia River sturgeon and shad.

Number of Rows in Table aw as of June 26, 1996

YEAR	ODFW	TOTAL
Totals	21,115	21,115
1987	5,163	5,163
1988	3,145	3,145
1989	3,213	3,213
1990	2,229	2,229
1991	2,104	2,104
1992	1,654	1,654
1993	1,523	1,523
1994	781	781
1995	1,097	1,097
1996	206	206

6.4.2.5 Tables: sc, dc, se, de

The summary-catch (sc) and detail-catch (dc) tables date back to the original implementation in 1981. The logical sc table is an annually partitioned table like the ft and ftl tables while the dc table is physically a single table. As stated previously in this document (6.2.1) the primary purpose for the sc table at that time was to aid in the speedy retrieval of summarized catch statistics. And that is still the sole purpose of the sc table. In fact every statistic contained in the sc table can be derived from some other data residing in the PacFIN database. Most (maybe all?) of the relationships between and among time-period, spid, arid, grid, and pcid are contained in this table. Summarized data for all PacFIN data sources is included in this table.

The dc table has a direct relationship with the sc table. In the early years all input data was supplied using a single transaction type: the aggregated-catch (AGC) transaction type. The data contained in these AGC input records was stored in the dc table. And to this day, the ADFG, AKR, AFSC, and DFO data sources continue to use this transaction type to submit their data. Now with the W-O-C data sources submitting ft, ftl, acm, and scm transactions in lieu of aggregated catch transactions internally generated AGC transactions are produced for W-O-C data sources and stored in the dc table. At all times the dc and sc tables are consistent with each other, meaning that as each dc row is added, changed, or deleted the necessary sc rows are added, changed, or deleted. For 1981 through 1986 the dc table still continues to contain the original daily AGC records for the W-O-C data sources and for rockfish species this is the best available data. For 1987 and later the dc table contains internally generated monthly aggregates for the W-O-C spid codes that are maintained in the sc table. Not all spid codes found in the PacFIN sp table are found in the sc table. The summarize column in the sp table determines which spid codes are maintained in the sc table. Summaries for all groundfish, salmon, and many other species are maintained in this table.

The following tables contain the sc records by year and the dc record counts by year and agency.

Number of Rows in Table dc as of June 27, 1996

YR	WDFW	ODFW	CDFW	ADFG	AKR	AFSC	NWAFC	DFO	TOTAL
Totals	355,940	359,101	478,208	31,621	35,620	626	26,443	42,415	1,329,974
1981	48,415	31,119	48,741	434			4,890	1,652	135,251
1982	45,116	33,081	43,969	461			4,774	1,738	129,139
1983	45,652	60,887	44,957	518			4,913	1,853	158,780
1984	38,900	53,832	64,588	778			4,670	1,456	164,224
1985	43,257	59,834	72,145	1,149			2,760	2,261	181,406
1986	41,795	51,267	59,829	1,487	690		2,086	2,314	159,468
1987	10,549	7,409	13,523	1,922	1,540		1,130	2,613	38,686
1988	10,560	7,164	13,400	2,162	2,490		639	2,913	39,328
1989	10,781	7,356	13,681	1,887	3,320		305	3,296	40,626
1990	9,694	6,832	13,235	2,705	4,156	13	276	3,664	40,575
1991	9,621	7,530	14,333	3,181	4,330	292		3,872	43,159
1992	9,033	7,790	14,791	3,863	6,871	99		4,489	46,936
1993	10,797	9,287	12,741	3,354	3,422	36		4,752	44,389
1994	9,586	6,645	20,742	3,366	3,053	111		5,542	49,045
1995	10,330	6,853	21,091	4,354	3,523	75			46,226
1996	1,854	2,215	6,442		2,225				12,736

Number of Rows in the Summary-Catch (sc) Table as of June 27, 1996

YEAR	# of rows
Total	18,740,346
1981	652,153
1982	670,189
1983	703,821
1984	759,894
1985	784,751
1986	786,932
1987	1,437,640
1988	1,429,835
1989	1,423,633
1990	1,480,563
1991	1,567,079
1992	1,677,352
1994	1,591,630
1994	1,687,996
1995	1,583,218
1996	503,660

The summary-effort (se) and detail-effort (de) tables have been part of the system since about the second year. These tables potentially contain three measures: deliveries, trawl-hours, and days-fished. Deliveries is simply a count of fish-tickets (i.e. a count of ft rows) based on a certain criteria. Trawl hours is an estimated value for the number of hours a fishing vessel is actually engaged in the act of fishing with its net in the water. Days-fished is computed or derived from the ft table. Days-fished has been reported to the central system primarily for the salmon fishery. Trawl-hours is exclusive to the groundfish trawl fishery. The deliveries statistic is included in this table as it is an attempt to provide some coastwide W-O-C comprehensive consistent measure. But it is acknowledged that number-of-deliveries is not a measure of effort, but rather an "index to effort" or possibly just some measure that only has meaning in the context of a particular user's application.

The se and de tables are structured in the same manner as the sc and dc tables. The biggest difference is that the se and de tables have data element mgrp (management group) instead of spid as found in the sc and dc tables. As a result the se and de contain only about 10% of the number of rows that reside in the sc and dc tables. The set of mgrp codes is a subset of all spid codes. The fact that mgrp resides in the se and de tables while spid is a column in the sc and de tables is often overlooked by users of the PacFIN system. The PacFIN system at this time contains statistics for deliveries, trawl-hours, and days-fished for certain management groups but NOT for individual species. Example: while the sc table contains catch statistics for sablefish there are no deliveries nor trawl-hours statistics for sablefish, but there are deliveries and trawl-hours for the groundfish management group. Another example: while there are catch statistics for chinook salmon there are no days-fished statistics for chinook, but there are days-fished for the salmon management group.

As part of the transition from the original system to the "re-defined" system, the responsibility for computing number of deliveries and days-fished shifted from the W-O-C agencies to the PacFIN Office. So for 1987 through 1994 the number-of-deliveries and days-fished statistics that do reside in the de table were computed by the central processing system and the 1981 through 1986 data for these two statistics is retained from each W-O-C's original input. Deliveries and days-fished statistics for 1995 and 1996 do not reside in the de nor the se tables at this time since the software to generate these statistics has not yet been developed.

6.4.2.6 Tables: cg, nv, ul, dl

The U.S. Coast Guard (USCG) vessel data (cg) table contains selected attributes from the USCG's Merchant Vessels of the U.S. data file. At this time all entries, or rows in this data file are entered into the cg table. The attributes selected include columns such as gross weight, length, horsepower, and the year the vessel was built. This table is available to those users who need to join ft, sv, or lep_src rows with cg rows in order to "pick-up" additional attributes. This table is not however an integral part of the PacFIN central processing in that there are no specific standard reporting applications that require this data. The USCG will not provide this data file directly to the PacFIN Office, so the NMFS/SWR acts as an intermediary requesting the data file and forwarding it to the PacFIN Office. There are currently two "editions" of this data file contained in table cg distinguished by column "pubyr". There is one data set that was "published" in 1991 and another that was published in 1995. Column "latest" allows a user to select the latest entry for any one particular vessel-id.

The non-vessel (nv) table is an ancillary table which is a by-product of translating state fishery agency vessel plate numbers to either a USCG vessel-id or a state marine board id using the sv table. Other "non-vessel" identifiers that occur in the veid column of the ft table and plate numbers that are not found in the sv table are given a special "vessel-id" so that each ft row will have a unique and correct entry in column drvid. Many of these entries in the nv table are a result of tribal identifiers that populate the veid column of table ft.

The update_log (ul) and detail-log (dl) tables are PacFIN data management tables. These two tables hold data about each data feed processed by the system. Included is a unique identifier for each update to the system, which allows one to determine when any particular datum entered the system, or when any particular row that corresponds to an input transaction was most recently modified. In addition statistics about the amount of data that enters the system during each update process is saved in the dl table. This information is one of the data sets used to determine the data completeness for each PacFIN data source.

6.4.3 Central Processing - Update

The update portion of the central processing part of the PacFIN has recently been implemented in a Unix/Oracle software environment. All of the software routines that comprise the suite of "update" software were developed using one or more of these programming, or software development, languages: Oracle's PL/SQL; Oracle's SQL*Plus; the 'C' programming language; and Oracle's SQL*Loader. In addition the Unix shell programming language was used to integrate all update modules into a single "production update job". Not all update modules are needed for all data sources. All of these update routines validate the input data to some degree. In some cases where the data value is found to be in error the input transaction is rejected, but in other cases a warning might be generated and the record would be accepted. Transactions that are rejected or warnings that are generated are reviewed by each data source PacFIN coordinator and the appropriate action is taken by that coordinator.

Although the central processing system does include attempts to validate input data, it should be noted that these validation exercises are merely offered as a service to the agencies that provide PacFIN input data. It should also be noted that the content of each data file (i.e. the value of each data element) is solely the responsibility of the data source. So although certain data are not allowed entry into the database, other invalid or incorrect data may gain entry to the database since the suite of central processing validation routines is not absolutely comprehensive. More extensive central processing validation routines can be incorporated and are incorporated as the "need" arises.

The following table lists each module and includes the data sources, whose data are processed by the module and the software used to develop the routine.

Module	Data Sources	Development Language
SQL*Loader Control File	All	SQL*Loader Utility Program
ul_update	All	PL/SQL, SQL*Plus
acl_update	W-O-C	PL/SQL, SQL*Plus
acm_update	W-O-C	PL/SQL, SQL*Plus
scm_update	W-O-C	PL/SQL, SQL*Plus
age_update	W-O-C	PL/SQL, SQL*Plus
aw_update	ODFW	PL/SQL, SQL*Plus
ecm_update	ODFW	PL/SQL, SQL*Plus
sv_update	W-O-C	PL/SQL, SQL*Plus
ft_gen_trans	WDFW	PL/SQL, SQL*Plus
ftl_gen_trans	WDFW	PL/SQL, SQL*Plus
ft_update	W-O-C	PL/SQL, SQL*Plus
ftl_update	W-O-C	PL/SQL, SQL*Plus
ftl_bld_actual_prices	W-O-C	'C' with embedded Oracle SQL
ftl_estimate_prices	W-O-C	PL/SQL, SQL*Plus
ftl_summarize	W-O-C	PL/SQL, SQL*Plus
agc_gen_deletes	ADFG, AKR, AFSC, DFO	PL/SQL, SQL*Plus
agc_update	All	PL/SQL, SQL*Plus
agc_bld_actual_prices	ADFG, AKR, AFSC	'C' with embedded Oracle SQL
agc_estimate_prices	ADFG, AKR, AFSC	PL/SQL, SQL*Plus
agc_summarize	All	PL/SQL, SQL*Plus

The update process gets started when the SQL*Loader utility is used to load all data from a particular input data file into one or more Oracle tables. The SQL*Loader control file serves as a type of software module in that it specifies to the SQL*Loader program how to load each datum from the input ASCII Unix file to the temporary Oracle tables. These tables are designated as temporary since the data loaded into them are only retained for the duration of the update process. One of the interesting features of the PL/SQL programming language is that it can only operate on data in Oracle tables. It can read from and write to an Oracle table, but it cannot read/write directly from/to a Unix data file. All of the temporary data table names are prefixed with "df_". Prior to the beginning of each update process, all "df_" tables are purged. For the most part the value of each datum is moved "as is" to the temporary table, but in a few cases certain "translations" are performed. As an example, the item "day" occurs in may tables and in some tables it is a required item, while in others it is not required and my be input by the data source as "null". For those tables where day is allowed to contain the null value it is translated to zero. This is apparently required by Oracle since the day column is used as part of a larger key field. If the reader is interested in the details of this SQL*Loader control file then he/she should contact the PacFIN Office.

The first module that is executed for all data sources is the ul_update module. This module reads the single row contained in the df_report table creating the permanent entry in table ul, or in the event of a continuation data feed verifies that the report record exists in table ul. In addition certain data specific to the update that is in progress is saved in the df_report row. In the following paragraphs whenever the phrase "inserts into <table name>" is used it implies updating and deleting rows as well.

The acl_update inserts agency codes into the appropriate agency-code-list table (asp, aar, agr, apr, and apc). A reject could occur if the transaction contained an spid not found in the sp table where type = 1. And the same would be true for arid, grid, and pcid.

The acm_update module inserts acm transactions into the acm table. There are a number of ways a reject could occur, but the most common would be an invalid PacFIN spid, arid, grid, or spid.

The scm_update module inserts scm transactions into the scm table. A common reason for a rejected transaction would be an invalid PacFIN spid, unspid, arid, grid, or pcid.

The age_update module inserts aggregated-effort (nominally trawl-hours) into the de table. An invalid mgrp, arid, grid, or pcid would cause a reject.

The aw_update and the ecm_update modules insert average-weight and ecm transactions into the aw and ecm tables. And once again all columns are validated to the extent possible and rejects and warnings are produced as appropriate.

The sv_update module inserts state-vessel transactions into the sv table. In this case a warning would be caused if the value contained in items length, weight, or horsepower failed the range check that is performed on each of these items.

The ft_gen_trans and ftl_gen_trans modules are only used to process WDFW ft and ftl input records. WDFW no longer attempts to generate add, change, and delete transactions for these two transaction types. Instead they submit all ft and ftl records using the "&" operator, which means that the record may be an add, it may be a change, or it may already be in the database (i.e. an ignore record). This kind of input data has come to be known as "& all", since all records for a single calendar year must be provided in order to determine delete transactions. So these two modules determine the necessary add, change, and delete transactions as well as the input records that can be ignored.

The ft_update module inserts ft transactions into the appropriate ft table (ex: ft95). There are a number of situations that could cause a reject to occur and one of these involves port-of-landing. The unique key for each ft record is: agid-ftid-year-month-day-pargrp. If there is an attempt to enter the same exact ft row (i.e. same unique key) with a different port code the record will be rejected. This kind of reject has been a fairly common occurrence. In addition this module is different than any of the other "_update" modules in that it derives a value for column "drvid" (derived vessel identifier). This task of deriving a vessel-id involves searching the sv table and/or the nv table and possibly inserting a new entry into the nv table. This exercise is necessary since both CDFW and WDFW use a plate number, instead of a USCG or State Marine Board #, to identify the vessel that was used to harvest the catch, if a vessel was used. The veid column contains this plate number, if vesseltype = 3, or possibly other "things" - see vid-type in attached table "General Code Lists". If vesseltype = 3 then veid is used to search the sv table to find the actual coastwide vesselid contained in column svid. If the plate number is found then the corresponding svid is retrieved and stored in drvid. If the plate number is not found then a search of the nv table is performed attempting to match the year, agid, vesseltype, and veid of the ft transaction to the year, agid, idtype, veid in the nv table. If a match is found then the vessel-id contained in artvid (artificial vessel identifier) is retrieved and stored in drvid. If a match is not found then a new entry is inserted into table nv with a unique artvid for the year, agid, vesseltype, and veid. This new artvid is then stored in column drvid. The whole purpose of this exercise is to ensure that ft rows with unknown vessel identifies or non-vessels in the veid column are translated to an identifier that will not assign catch to a vessel incorrectly.

The ftl_update module processes ftl transactions and inserts rows into the appropriate ftl annual partition. There are many data items within this transaction and all elements are validated to the extent possible. Rejects can occur if the category, grade, condition, disposition, area, gear, or par-group are determined to be invalid, but rejects can also occur for many other reasons as well. One feature that separates this routine from other "_update" routines is that if the ftl transaction contains a price-per-pound that is null the transaction is saved in a separate table for later processing by the ftl_bld_actual_prices and ftl_estimate_prices routines. This temporary table is called the null_dollar table. Null_dollar ftl transactions are processed last to ensure that the best actual prices reside in the ftl table when estimates are developed.

The ftl_bld_actual_prices routine builds a temporary table of actual prices. This routine inspects the null_dollar table in order to develop a list of ticket categories so that this process does not compute actual prices unnecessarily. The actual prices table contains the following dimensions: category, condition, disposition, grade, month, pcid, and grid. For each cell within this seven-dimension array the total pounds and revenue are computed and stored. This table is built using only ftl rows that contain actual prices and it is built for all months in the current and previous years.

The ftl_estimate_prices module uses the actual prices contained in the temporary table in conjunction with a particular search algorithm to determine the best estimated price for each ftl row contained in the null_dollar table. Once the estimated price has been found the ftl transaction is stored in the ftl table with data items ppp set equal to the estimated price and "estimated" set equal to true. There is a possibility that the search algorithm will not find an actual price. In these few cases a default worst-estimated price found in the sp table is used and the ftl attribute "worst-est" is set to true.

In recent years nearly all groundfish and salmon catch has been reported by the W-O-C data sources with actual prices. However, this method for estimating ex-vessel prices continues for species other than groundfish and salmon species. As an example, the June 14th, 1996 data feed provided by CDFW contained 33,697 ftl transactions and 1,322 of these required that an estimated price be determined.

The ftl_summarize process is the part of the system that corresponds to the "re-definition" more than any other part. The purpose of the ftl_summarize process is to summarize ftl data into monthly aggregates applying acm proportions and the scm proportions producing acm transactions that will subsequently be processed by the agc_update and agc_summarize modules that maintain the dc and sc tables. The ftl summarize process can be logically divided into four parts: 1. create monthly ftl aggregates; 2. apply catch-by-area proportions; 3. apply species-composition proportions; and 4. generate aggregated-catch transactions.

The computation of monthly aggregates from ftl data gets started by determining which months need to be summarized. This is determined by finding each month that occurs at least once in the set of ftl, acm, and scm transactions that have just been processed. These monthly aggregates are summarized by month, spid, arid, grid, and pcid and for each cell in this five-dimensional array the round-weight equivalent pounds (rwt-lbs), number-of-landings, number-of-fish, pounds that where actually priced (lbs-priced) and estimated revenue are computed and stored. It should be noted that all ftl rows, for the selected months, participate in this aggregation exercise, not just those that will be subsequently apportioned.

These monthly aggregates are then apportioned to area of catch for only those cells that have corresponding proportions in the acm table. Many of these ftl produced aggregates will not need apportionment by acm proportions. Here is where comp-type comes into play. Those aggregates that do have corresponding acm proportions are further summed by month, spid, grid, and pcid and by either Pacific Ocean or Puget Sound. Acm proportions are then applied based on comp-type. Each month-spid-grid-pcid aggregate that is apportioned will produce one or more month-spid-grid-pcid-arid aggregates. As a result of the data contained in the acm table this process currently only apportions into PSMFC areas. At present there are only three groundfish comp-types specified - two for WDFW that separates ocean areas from Puget Sound and one for ODFW that essentially specifies that only those ftl aggregates with arid = unknown will be apportioned. These newly generated aggregates replace only the original aggregates that were apportioned so that there is no double counting and the total pounds remain the same.

The set of monthly aggregates that were apportioned to area and those that were not apportioned are all passed through the scm apportionment process. There are two special cases that need to be handled. Scm rows input as quarterly are expanded to the corresponding monthly scm rows and appended to the existing monthly rows. And CDFW's sample port group that populates the scm pcid for CDFW's scm rows is translated so that scm rows will match the correct pcid contained in the monthly aggregates. Column sgroup in table pc affects this translation. Once again there are many aggregates in the set that will not have any corresponding rows in the scm table. Example: there will be monthly aggregates for dover sole but there will not be any scm rows for dover sole, while there will be scm rows for unspecified rockfish and hopefully there will be corresponding rows for unspecified rockfish in the set of monthly aggregates. The scm proportions are applied by matching month, unspid, grid, arid, and/or pcid in the scm table to corresponding columns in the set of original ftl aggregates and acm apportioned aggregates and the necessary statistics are apportioned by multiplying each by the proportion in the "matched" scm row. The rwt-lbs, number-of-fish, lbs-priced, and estimated revenue are all apportioned, while the number-of-landings is set to null for all aggregates resulting from either an acm or scm apportionment.

The last step is to generate only the necessary agc transactions. Since this ftl_summarize process is accomplished on a monthly basis there are potentially monthly aggregates resulting from the above three steps that may not be needed any longer. This is determined by comparing the monthly aggregates produced so far to the monthly aggregates residing in the dc table. From this process only the necessary add, change, and delete agc transactions are determined and then stored in table df_agc for subsequent processing by the agc_update and agc_summarize routines.

The last segment of the update process has to do with processing agc transactions. Agc transactions can be one of two types: those generated by a data source or internally generated agc transactions like those generated by the ftl_summarize process. In either case agc transactions are loaded into the df_agc table and agc processing begins with that data.

The agc_gen_deletes routine is only used to process ADFG, AKR, AFSC, and DFO data feeds. All four of these agencies do not provide delete agc transactions, instead they provide all agc "&" transactions for one complete calendar year. The "&" indicates that the input transaction is either an add or a change. Given this complete set of "& all" agc transactions, the necessary delete transactions can be determined by comparing all rows in the input data feed to rows in the dc table for the particular year and the particular agency. This routine does just that and inserts any generated agc delete transactions into table df_agc with the "&" transactions that already reside there.

The agc_update module is used to process both varieties of agc transactions: data source generated or internally generated. This routine basically inserts rows into the dc table after certain validations are performed. In addition as each add, change, or delete transaction is inserted into, deleted from, or used to update a dc row a copy of the agc transaction is saved in a special table called df_agc_to_sum for later processing by agc_summarize. If the agc transaction is a change transaction, then the difference, or delta, for each statistic is computed and stored in the df_agc_to_sum row. So the values in the df_agc_to_sum row can be either positive or negative. The agc transaction includes revenue of landed catch. If the revenue, or estimated value, is null then the agc transaction is not inserted into the dc table, but is instead saved in an agc null-dollar table for later processing by agc_bld_actual_prices and agc_estimate_prices. The DFO data source does not provide any economic data, so in this case agc transactions with null revenue are left as is with no attempt to estimate the revenue. The AKR and AFSC data sources also do not provide any economic information as provided for within the agc transaction, but all AKR and AFSC transactions receive a value for estimated revenue. So all agc transactions for AKR and AFSC are inserted into the agc null-dollar table during this process.

The agc_bld_actual_prices program creates a temporary table containing actual prices based on data in the dc table. A list of spid codes is developed by inspecting the agc null-dollar table so that actual prices are computed and stored for only the necessary spid codes. This table contains five dimensions: spid, month, pcid, grid, and arid. The actual pounds that were priced and the actual revenue of those pounds are computed for each cell in this five-dimensional array and stored in the appropriate row in the table. This table includes actual prices for a minimum of 13 months and a maximum of 36 months depending on the calendar year being processed and the months that reside in table dc for that calendar year.

The agc_estimate_prices routine uses a particular search algorithm to find an estimated price in the agc actual prices temporary table for each row in the agc null-dollar table. The estimated price along with the total pounds are used to compute an estimated revenue which is stored in item "estval". The agc row is then inserted into the dc table and a copy is added to the df_agc_to_sum table. For ADFG null-dollar transactions, the search algorithm uses only actual prices derived from ADFG dc rows. For AKR price estimating, only a subset of the ADFG actual prices are used (deliveries such as bait deliveries are excluded). And for AFSC price estimating, the agc actual prices table is not used at all and the sc table is used instead, where the average round-weight equivalent estimated price resides in each row. The AFSC search algorithm only looks for average prices where pcid = 'ALP' (all domestic).

The agc_summarize module is the final step in the update process. This module maintains the summary-catch (sc) table by inserting, updating, or deleting the necessary rows. The rows contained in table df_agc_to_sum are the rows that are summarized. Therefore modules agc_update, agc_bld_actual_prices, agc_estimate_prices, and agc_summarize must be executed consecutively. They essentially form a single unit. Of course, for W-O-C processing, only agc_update and agc_summarize need be executed; since all W-O-C estimated prices are now developed by the ftl_estimate_prices routine and therefore all internally generated agc transactions contain an estimated, or actual, value for revenue.

This summarization process gets started when a row is read from the df_agc_to_sum table. Five vectors are developed, one each for: period, spid, arid, grid, pcid. The values for each of these vectors are determined by the month, spid, arid, grid, and pcid contained in the dc row being summarized. As an example, if the dc row contained: month = 5, spid = DOVR, arid = 2C, grid = LGL, and pcid = AST, then the vectors would be: prdvec = M5, Y1; spvec = DOVR, FLAT, GRND; arvec = 2C, CL, PC, ALL; grvec = LGL, HKL, ALL, and pcvec = AST, CLO, AOR, ALP, ALL. Period = Y1 designates the annual period. For all possible combinations (for this example 360) rwt-lbs, number-of-landings, number-of-fish, estimated revenue, and lbs-priced are either inserted as new rows, deleted from the table, or used to update existing rows. Of course the length of some of the vectors will vary depending on the particular codes being processed from the agc row. The period and grid vectors always contain 2 and 3 elements, respectively, while the spid vector can have 3 to 5 elements, the arid vector 3 to 7 elements, and the pcid vector either 4 or 5.

A simplified version of this summarize process using "Structured English" or a pseudo-programming language might look like this:

read a row from the df_agc_to_sum table containing:
- period=5
- spid=DOVR
- arid=2C
- grid=LGL
- pcid=AST
- rwt-lbs=n1
- num-landings=n2
- num-fish=n3
- estval=n4,
- lbs-priced=n5
For prdvec = M5, Y1
- do for spvec = DOVR, FLAT, GRND
- do for arvec = 2C, CL, PC, ALL
- do for grvec = LGL, HKL, ALL
- do for pcvec = AST, CLO, AOR, ALP, ALL
do one of the following three operations:
1. insert a sc row with period=prdvec, spid=spvec, arid=arvec, grid=grvec, pcid=pcvec, lbs=n1, nlndgs=n2, nfish=n3, estval=n4, and bspriced=n5
2. update the sc row where period=prdvec and spid=spvec and arid-arvec and grid=grvec and pcid=pcvec with lbs=lbs+n1, nlndgs=nlndgs+n2, nfish=nfish+n3, estval=estval+n4, and lbspriced=lbspriced+n5
3. delete the sc row where period=prdvec and spid=spvec and arid=arvec and grid=grvec and pcid=pcvec

DOVR = dover sole, FLAT = all flatfish, GRND = all groundfish
2C = PSMFC area 2C, CL = INPFC area Columbia
PC = Pacific Council area or region, ALL = all areas
LGL = longline gear, HKL = all hook and line gear, ALL = all gears
AST = Astoria, CLO = Columbia River, Oregon port group
AOR = all Oregon ports, ALP = all domestic ports and at-sea processors
ALL = all domestic and joint-venture and foreign

The above simplified version of the summarization process or summary-catch maintenance, must be modified considerably before it becomes useful. Mainly, there are a number of exceptions to the "general rule" specified in the above pseudo-code. As an example, for many years now summary-catch rows for individual rockfish for WDFW ports and port groups have not been maintained in the sc tables. Another exception is that no summary-catch rows are maintained where arid is a PSMFC area and pcid is either 'ALP' or 'ALL'.

6.4.4 Central Processing - Retrieval

There are at least two methods for retrieving data from the central database: SQL*Plus routines; and the *_rpt suite of programs. The SQL*Plus script file routines may be developed by the PacFIN Office or they may be developed by PacFIN users who have access to the Orca system, while the *_rpt suite of programs are developed by the PacFIN Office and can be executed by PacFIN users or PacFIN staff.

The SQL*Plus script files replace the extract and ext/pacfin programs that were part of the retrieval system during the Unisys B7900 days. Hundreds of SQL*Plus script files have been developed by the PacFIN Office. These routines can be grouped into certain "classes" based on the retrieval functions that they perform. A partial list of these script file groups, or classes, includes: list_*, rpt_*, ann_*, sum_*, and pcs_*. This notation is Unix shorthand for the group of files that begins with "list_", "rpt_", etc. The group of list_* routines all produce various lists from the sp, ar, gr, pc, and other "list" type tables. The rpt_* and ann_* class of routines retrieve selected rows from the summary-catch table (one or more sc tables). The ann_* files retrieve only annual values, while the rpt_* class of routines allow for more general selections that include individual months.

The sum_* and pcs_* routines operate against the ftl, ft, and possibly other tables, but specifically do not select any data from the summary-catch tables. Both of these groups compute landed-weight pounds, round-weight pounds, revenue, ex-vessel price-per-pound, percentage of pounds priced, number-of-landings by PacFIN spid, and possibly other statistics. The main difference between these two groups of routines is that the pcs_* routines select only data for the PFMC (pcs = PFMC sums), while the sum_* group produces sums without any area-of-catch restriction. The main reason these two classes of summation/retrieval routines exist is that not all data residing in the ftl and ft tables are summarized and saved in the sc tables.

The reader is referred to a document, available upon request from the PacFIN Office, titled "Using Unix & Oracle to Access PacFIN Data", which gives an introduction to these script files along with other new user orientation information. All of these script files have been developed primarily to be used "as is" to retrieve selected data, but secondarily these script files serve as "how to" models or templates for users who need to develop their own custom retrievals. All Orca users are encouraged to make use of these central processing SQL*Plus script files to the extent they can be useful, including copying any of these files to use as "starting points".

The other retrieval mechanism consists of the area_rpt, source_rpt, gear_rpt, port_rpt, species_rpt, and activity_rpt programs. These programs are exact replacements for the RPT/= suite of programs of similar names that were part of the system when it was operational in the Unisys B7900 DMS-II Algol environment. The products from these programs have become known as the "Standard PacFIN Reports". The reader is directed to the PSMFC Homepage (http://www.psmfc.org) or if the reader is an Orca user she is directed to the ~pacfin/rpts/<year>/pfmc Unix subdirectory, where <year> can be 1996, 1995, etc. for examples of these kinds of PacFIN Standard Reports. As of this writing, this *_rpt retrieval subsystem is still in development. When development has been completed Orca users will be able to generate their own standard reports, which number in the thousands. Until then selected standard reports will be produced by the PacFIN Office and made available as described above.

6.4.5 Quota Species Monitoring (QSM) Subsystem

The QSM subsystem is a part of the PacFIN system that is intended to provide the PFMC's GMT with the best estimates of total commercial catch of certain species from certain PFMC managed areas. The QSM was first put into operation in 1985 and has been expanded to include additional species and ocean area combinations many times over the last eleven years. In addition, this QSM subsystem was re-built during the first half of 1995 in the Unix/Oracle environment of the Orca system.

The main concept behind the QSM subsystem is that one can get a reasonably good estimate of total catch for the current year by combining hard data from the main database with soft data derived from state reported catches for recent weeks combined with some correction factor. For the QSM subsystem the best estimate of total catch is defined as:

Best Estimate of Total Catch = hard data + soft data

hard data = catch summaries derived from state agency provided fish ticket data and/or rockfish species composition proportions derived from port sampling, and/or catch-by-area proportions derived from logbook data; these catch summaries are maintained in the summary-catch table

soft data = weekly report catches multiplied by some correction factor

weekly reported catches = catches reported by each state agency (W-O-C); each agency reports each week's catch within six days

correction factor = hard data for the most recent 12-month period divided by the sum of reported catches for the same 12-month period

The hard data used by QSM is determined by each agency's 90% data completion estimate. The most recent 12-month period used in the correction factor computation is also controlled by this data completion estimate date. The 90% completion estimate for each state agency is the month that 90% or more of the data has been input to the PacFIN system, with the implication that data for all earlier months are more complete. Although "hard" data is referred to as "hard", it is also an estimate since it is based on data provided by the state agencies and is subject to change (usually additions). We don't expect much change in the hard data totals over 12 months. The change/additions usually occur in the most recent months.

6.4.6 Build Vessel Summaries Subsystem

The build vessel summaries (bld_vsums) subsystem was originally developed at the request of West Coast economists. The primary users continue to be economists, but researchers other than economists have made use of this summarized data. Although this summarized data is similar in concept to the data residing in the summary-catch tables, it is developed in an entirely different manner. The vessel summaries are built in a "batch" fashion, meaning that all six vessel summary files for each year are built with a single "pass" through the ftl and ft tables. These vessel summaries reside in Unix ASCII files rather than Oracle tables since most users want to transfer entire sets of weekly, monthly, or annual vessel summaries to their local computer systems.

The bld_vsums program produces two kinds of summary files: a vessel-summary file and a vessel-trip-principal file. The vessel summary file consists of 16 items: year, vessel-id, time-period, pcid, spid, grid, arid, agid, processor-id, grade, condition, disposition, participant-group, landed-weight, round-weight, and revenue. The first 13 items serve as the unique key for the table and values for the last three items are computed for each combination of the 13 key items. The vessel-trip-principal file consists of 14 items: year, vessel-id, time-period, principal-port, principal-species, principal-gear, agency, principal-processor, first-day-fishing, last-day-fishing, number-of-trips, total days-fished, total round-weight, and total revenue. For vessel-trip-principal records, the first eight items form the unique key and for each unique key the bld_vsums program determines the values of each of the last four items.

The process of building vessel summaries (vsums) in 13 dimensions creates rather large files. As an example the 1988 bld_vsums process produced 456,661 weekly vsums, 277,631 monthly vsums, and 170,287 annual vsums for a total of 904,579 vessel summary records. In addition during the same process 165,294 weekly, 63,465 monthly, and 15,427 annual vessel-trip-principal (vtp) summary records were produced for a total of 244,186 vtp summary records.

As of this writing, the program that builds these vessel summaries in the Unix/Oracle environment is still in the process of being developed. While this subsystem remains uncompleted, some of the vessel summary statistics are being produced using SQL*Plus script routines. Annual, monthly, and weekly vessel summaries are available for 1981 through 1996 and annual, monthly, and weekly vessel-trip-principal summaries are available for 1981 through 1994.

6.4.7 Data Completeness

Data completeness for each PacFIN data source is determined using a variety of indicators. The data captured about each data feed and saved in the dl table is one of the sets of data used in this exercise. The data saved in the dl table includes the total groundfish pounds added or deleted for each month by transactions included in each data feed.

Another method used to help determine data completeness is the historical comparison of similar statistics from the summary-catch table. An example of this kind of historical comparison is selecting the total reported catch for dover sole for the month of May for 1991 through 1996 for all gears and all PFMC areas combined for the W-O-C agencies. Reviewing the table included below one can deduce that ODFW's data is probably at least 90% complete for the month of May 1996, while both CDFW's data and WDFW's data appear to be less than 90% complete for the month of May 1996.

From PacFIN database (summary-catch) as of June 28, 1996

YEAR	MONTH	SPID	ARID	GRID	PCID	MTONS
1991	5	DOVR	PC	ALL	ACA	589.6
1992	5	DOVR	PC	ALL	ACA	631.6
1993	5	DOVR	PC	ALL	ACA	595.7
1994	5	DOVR	PC	ALL	ACA	386.1
1995	5	DOVR	PC	ALL	ACA	519.3
1996	5	DOVR	PC	ALL	ACA	157.9
1991	5	DOVR	PC	ALL	AOR	824.8
1992	5	DOVR	PC	ALL	AOR	488.9
1993	5	DOVR	PC	ALL	AOR	463.8
1994	5	DOVR	PC	ALL	AOR	391.2
1995	5	DOVR	PC	ALL	AOR	323.1
1996	5	DOVR	PC	ALL	AOR	437.2

From PacFIN database (summary-catch) as of June 28, 1996

YEAR	MONTH	SPID	ARID	GRID	PCID	MTONS
1991	5	DOVR	PC	ALL	AWA	154.4
1992	5	DOVR	PC	ALL	AWA	118.3
1993	5	DOVR	PC	ALL	AWA	116.9
1994	5	DOVR	PC	ALL	AWA	113.0
1995	5	DOVR	PC	ALL	AWA	94.8
1996	5	DOVR	PC	ALL	AWA	58.5

Dover sole is one of the best spid codes to use for this type of data completeness indicator.

6.4.8 Confidentiality of Data

The PacFIN database in its entirety is a confidential database. It is confidential in that the economic history of individual fishing vessels and fish processors can be determined from the contents of the ft and ftl tables. Access to the PacFIN confidential data follows the rules set by the NMFS in NOAA administrative orders. The gist of these rules basically says that only statistics that do not reveal the economic activity of individuals or corporations can be made public. In order to adhere to these NOAA rules those individuals who are given access to the confidential part of the PacFIN database are required to sign a "Certificate of Non-disclosure of Confidential Fisheries Data". Only individuals who have a demonstrated need for access to confidential data are considered. The primary criteria for demonstrated need is that the individual must be participating in council activities that require the confidential data. Only employees of the NMFS, and other PCFDC member agencies are considered for on-line access. Certain individuals, who would be classified as independent consultants, after having signed the same "Certificate of Non-disclosure of Confidential Fisheries Data", are given electronic copies of confidential data specific to the PFMC study project they have contracted to complete. These independent consultants are required to destroy the confidential data once their study project has concluded.

A Description of the Pacific Fisheries Information Network (PacFIN) 1981-1996

Table of Contents